Dear Contest Organizers,
It appears that roughly every other time that I submit a preprocessing job to the Training Queue, the job stalls. That is, rather than making progress and outputting to the log file as normal, the preprocessing job apparently makes no progress. The first time this happened (log file: syn8063198), the job failed to make progress for two days, and since time was being counted against me, I canceled the job and resubmitted the same Docker, and the preprocessing task completed in under an hour.
This stalling phenomenon happened again today. I submitted the same preprocessing Docker again (log file: syn8102970), and it ran for two hours without making any progress, so I've decided to cancel the job and resubmit. Could anyone help me to understand why this inconsistent stalling is happening?
Thanks!
Created by DREAMer Dear Thomas,
Thanks for your reply! I've responded in another thread, but the summary is that the stalling is definitely occurring during the execution of the preprocessing script; the training script never begins execution, because the preprocessing script never completes! Dear DREAMer/TamThuc.
Apologies for the delay in response. The log files are definitely updated every 30 minutes as shown in synapse file history, but if there are no changes to the log files then it will appear to not be updated. Only if there are additions in the log file will it appear to be updated.
DREAMer:
I looked at your submissions and the one that completed really quickly only had a preprocessing job. The other submissions has a training job attached to it. It could be that the training takes much longer and you don't output anything during the training phase. Have you tried submitting to the express lanes to check?
Best,
Thomas Could any of the contest organizers help me to understand why one submission of my preprocessing script (log file: https://www.synapse.org/#!Synapse:syn8063198) executed partially but then consumed nearly two days of computation without making any log file progress (leading me to cancel the submission), while a second submission of the same preprocessing script executed fully in a very short amount of time (log file: https://www.synapse.org/#!Synapse:syn8077593)?
This is a blocking issue for my team, as our latest preprocessing submissions (like https://www.synapse.org/#!Synapse:syn8104123) seem to be stalling midway through execution just as they did in (log file: https://www.synapse.org/#!Synapse:syn8063198). I also tried cancelling and resubmitting, but the resubmitted job also stalled and stopped updating the log. I just cancelled the processing job, and submitted again a new job, let hope that the cancel request won't stuck and nothing go wrong with the new job Dear Contest Organizers,
As above, I submitted a processing job with ID 8102870, and it updated the log file every 30 minutes, but somehow it stopped updating the log file couple hours ago. Could anyone help me to check the processing job, so that a can re-submit it if something went wrong ?
Thanks a lot!