Dear organizers,
My training submission 8432501 has been running since about Sunday evening (March 12),
and consuming CPU hours. However, it is not apparently updating the log file (the log folder
ID is syn8435294). The code produces frequent log messages on my desktop and on the
Express Queue. I don't have any other way to find out the progress of the run. Is there any
information you could provide? At this point, I don't know if it is hanging - it should not be,
given that it is burning CPU hours. But then, could you explain why there isn't a log?
Thanks
Ljubomir
Created by Ljubomir Buturovic ljubomir_buturovic Closing this thread. The problem was caused by docker mistake on my part. Sorry for confusion
Regards
Ljubomir
This is caused, it seems, due to GPU not being used. Investigating... no action needed
at this point. I'll let you know if further assistance is needed. Thank you
I think I know what happened. The systems running the training submissions seem to be
surprisingly slow. On my desktop an equivalently-sized public dataset took 5 minutes to train.
On the training queue, it took one hour, using identical code and parameters
Therefore the logs that my program was writing were not missing but delayed, about 10-fold
(compared with the expectations based on desktop performance). I canceled the submission
because at this pace it would not complete on time
Do you have an explanation and/or remedy for the poor system performance? Obviously it
makes an already enormously challenging computational environment that much harder
Dear Ljubomir,
Your log file is in fact updated every 30 minutes. If there are no logs being written, then there will be no change in the log file.
Best,
Tom
Drop files to upload
training submission log not being updated page is loading…