Hi Team, I submitted 3 submissions yesterday but 2 of them have failed and the logging file doesn't show any errors. The only difference is that previously i had submissions with approx 8 models on my docker container but i now have 12 . Though the dockerfile is of only 3.4GB against the 8GB limit. My submission ID's are : 9704113 and 9704112 I tested these docker containers before uploading and there wasn't any issue. Could you please let me know what went wrong.

Created by Srinivas Chilukuri R2D2
Done!
Thank you very much ! That running submission you mentioned (9704121) did fail because of time out issue, could you help us restart that under longer time limit as well because that one is also based on CPU. Very appreciate the help!
@arielis Sorry to hear that!! I've certainly been there myself...
@R2D2 yep, thanks for following up. We worked with UAB to get a 48 h time limit. Unfortunately if they take longer than that, we willl not be able to run them. I restarted submission 9704113 and 9704112. You also have another submission that is currently running. I expect that will also hit the old time limit but i can restart that under the longer time limit.
Hi , I noticed there are other discussion too regarding the time limit. Will it be possible, if you can increase the time limit for these submissions since our submissions are based on CPU.
Thanks and waiting for your reply :)
My container also failed, unfortunately, but I can trace the reason to a python syntax incompatibility in the last line of the script, summing the results...
To clarify, the limit is 12 hours currently.
If your container is not configured to use the GPU it could definitely take longer, but the submission jobs also have 8 CPUs and 64 GB RAM, so they have a reasonable amount of CPU processing power as well. Based on the fact that they used to take about 7 hours, i'm betting that increasing the model size by 50% might have pushed you over the time limit. I'll look into this a little more and get back to you.
Thanks a lot for your reply. Previous submissions used to take approx 7hours. One question: My docker is based on CPU since I don't have a GPU with machine with me to build the container. Though I am uploading a non trainable docker container, could that be the issue behind this?
I'm wondering if the addtion of the extra 4 models pushed you over the limit.
Hi, your job hit a time limit: ```slurmstepd: error: *** JOB 4786414 ON c0099 CANCELLED AT 2020-05-15T08:22:07 DUE TO TIME LIMIT ***``` How long does your container take to run when you tested locally?

Submissions getting failed page is loading…