Hi @vchung @trberg The status of my docker submission (submission ID: syn26166876) is invalid due to the time limit. The error message says, "time limit of 390s reached for case 00001 - stopping submission". I performed inference in a Tesla V100-DGXS computer, and it takes 60 -70 seconds to complete a case. Is the inference time difference between K-80 and V100 so high? I doubt that. Could you kindly look into it what went wrong? Could you suggest what to do? Thanks

Created by Saruar Alam saruarlive
Thanks, @vchung et al for considering the issue. Now the docker ran smoothly and I received scores for five subjects.
Thank you @vchung to consider our remarks. For my case it now runs smoothly. Best regards,
@saruarlive , @Alxaline , We apologize for the troubles and inconveniences caused by the K80. After some discussion, we have moved towards updating the infrastructure hardware to a V100 to better emulate the specs used by the participants. Please feel free to resubmit this model for evaluation with the updated GPU. For reference, you will have access to 1 GPU with 16 GiB vRAM (GPU memory), 8 vCPUs, and 61Gib RAM.
Hello @vchung, I try with num_workers = 0, and it change nothing. How is it possible to run if the K80 is ~18x slower than a V100?? I also run on a 1080TI, quadro P5000 and it run smoothly within the time constraint ....
@Alxaline , Thank you for your patience. We locally ran your model of submission 9715853 on a machine with the exact specs as the submission system and received the following stats: ```sh $ docker container inspect alxaline [ "Id": "0b6b443ec477c1ca1bbbc9273f3e49e136191221d338d5bcfd9d616601438675", "Created": "2021-09-16T05:39:24.518936838Z", "Path": "conda", "Args": [ "run", "--no-capture-output", "-n", "BraTS21", "python", ... ], "State": { "Status": "exited", "Running": false, "Paused": false, "Restarting": false, "OOMKilled": false, "Dead": false, "Pid": 0, "ExitCode": 0, "Error": "", "StartedAt": "2021-09-16T05:39:28.082664307Z", "FinishedAt": "2021-09-16T05:53:07.418964013Z" }, ... } ] ``` Using the `StartedAt` and `FinishedAt` values, we were able to calculate the model's execution time of one case at ~819 seconds: ```python >>> from dateutil import parser >>> start = parser.isoparse("2021-09-16T05:39:28.082664307Z").timestamp() >>> end = parser.isoparse("2021-09-16T05:53:07.418964013Z").timestamp() >>> end - start 819.3362998962402 ``` While running your container, we also kept watch of the GPU device to ensure that it was being utilized: ```sh $ nvidia-smi -q -g 0 -d UTILIZATION -l ==============NVSMI LOG============== Timestamp : Thu Sep 16 05:39:43 2021 Driver Version : 470.57.02 CUDA Version : 11.4 Attached GPUs : 1 GPU 00000000:00:1E.0 Utilization Gpu : 100 % Memory : 32 % Encoder : 0 % Decoder : 0 % GPU Utilization Samples Duration : 11.56 sec Number of Samples : 70 Max : 99 % Min : 0 % Avg : 46 % Memory Utilization Samples Duration : 11.56 sec Number of Samples : 70 Max : 49 % Min : 0 % Avg : 4 % ENC Utilization Samples Duration : 11.56 sec Number of Samples : 70 Max : 0 % Min : 0 % Avg : 0 % DEC Utilization Samples Duration : 11.56 sec Number of Samples : 70 Max : 0 % Min : 0 % Avg : 0 % ``` One thing we noted while running your container was the following warning -- perhaps that's something that could help speed up execution time? ```sh UserWarning: This DataLoader will create 12 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. ``` Hope this can provide further insights!
@saruarlive , Thank you for your patience. We locally ran your model of submission 9715816 on a machine with the exact specs as the submission system and received the following stats: ```sh $ docker container inspect mmiv [ { "Id": "e76f948657d9df4fcc0defeaa20bb157b4fc5a6be128f57f955bad37d10095b7", "Created": "2021-09-16T03:43:37.54117289Z", "Path": "python", "Args": [ "/usr/local/bin/run_model.py" ], "State": { "Status": "exited", "Running": false, "Paused": false, "Restarting": false, "OOMKilled": false, "Dead": false, "Pid": 0, "ExitCode": 0, "Error": "", "StartedAt": "2021-09-16T03:43:41.178154916Z", "FinishedAt": "2021-09-16T03:58:06.870371964Z" }, ... } ] ``` Using the `StartedAt` and `FinishedAt` values, we were able to calculate the model's execution time of one case at ~865 seconds: ```python >>> from dateutil import parser >>> start = parser.isoparse("2021-09-16T03:43:41.178154916Z").timestamp() >>> end = parser.isoparse("2021-09-16T03:58:06.870371964Z").timestamp() >>> end - start 865.6922171115875 ``` While running your container, we also kept watch of the GPU device to ensure that it was being utilized: ```sh $ nvidia-smi -q -g 0 -d UTILIZATION -l ==============NVSMI LOG============== Timestamp : Thu Sep 16 03:44:39 2021 Driver Version : 470.57.02 CUDA Version : 11.4 Attached GPUs : 1 GPU 00000000:00:1E.0 Utilization Gpu : 100 % Memory : 23 % Encoder : 0 % Decoder : 0 % GPU Utilization Samples Duration : 16.58 sec Number of Samples : 99 Max : 99 % Min : 99 % Avg : 99 % Memory Utilization Samples Duration : 16.58 sec Number of Samples : 99 Max : 37 % Min : 4 % Avg : 16 % ENC Utilization Samples Duration : 16.58 sec Number of Samples : 99 Max : 0 % Min : 0 % Avg : 0 % DEC Utilization Samples Duration : 16.58 sec Number of Samples : 99 Max : 0 % Min : 0 % Avg : 0 % ``` Hope this can provide further insights!
Hello @vchung, Same issue that @saruarlive, Time limit of 390s reached for case 00001 - stopping submission... I used V100 for the validation phase and it takes ~ 45s per case .. Best regards,
Thank you for the feedback, @saruarlive . We are looking into the issue now.

Time limit of 390s reached for case 00001 - stopping submission page is loading…