I understand that a few people are waiting for job submissions to go through because of system load? Is it possible to really participate this late into the game?
Created by Clinton Mielke subcosmos @thomas.yu
Would you mind taking a look at 8103285? Tried canceling that process 6 hours ago and it is still in progress.
Or if there is any way to let my next submission (8103565) start validation (still in received) before the previous job is fully stopped, that would be great too.
Thanks.
EDIT: Fixed now, disregard. Thanks. Are you running out of system memory or GPU memory?
If you're running out of RAM, try explicitly deleting your objects (i.e. del data and anything that references it when done with it). My guess is that you're trying to load the complete image dataset into memory multiple times. Think there are a few memory profiler packages for python you could use to troubleshoot if that doesn't work
If running out of GPU memory, reduce batch size, image size, or network depth/trainable parameters. For some reason tensorflow always looks like it is using all of your memory so out of memory errors are harder to predict. (If anyone has any tips on that I'm all ears.)
Also be careful with python's built in iterators e.g. cycle - that one saves all data in memory.
Good luck. @davecg
Actually, there is no any error in the express line. Could you please help us a little more? How to designate the gpu use?
Here is the read image code
for i in xrange(epochs):
for j in xrange(num_of_images):
with open(fname, 'rb') as f:
data = pickle.load(f)
It seems the memory is not released from the memory usage.
Thanks a lot! Try running your docker image on a P2 AWS instance with the pilot data to troubleshoot. My job id 8090462. I do not know why the docker iteratively read the images and will out of memory. Thanks a lot! Dear Wentao,
Your job has been processed.
Best,
Thomas My job is validated for 2 days, the whole weekends. Could you guys help me check it? Thanks a lot! Why my job is validated for 1 day? The job id is 8074997.
Thanks,
Wentao My job is also stuck for a night and half day. submission id 8076318 and 8074997 Noticed one of my submissions (8074105) has been stuck in validated for several hours now.
Just wanted to check if other people are having that problem too. Dear Clinton,
The system is currently just backed up. We thank you for your participation and patience.
Best,
Thomas To clarify, I asked this because it seems that even fastlane submissions are stalling for 13 hours. Is the system down, or just backed up?