I submitted the Tensorflow example as per the instruction on the synapse webpage. I find that when we are allocated Tesla K80 for each person. The system shows that one of the GPU is running and in use. Thus after waiting for couple of days, I found the 'ran out of memory error'.
I have stupid question to ask. Do we need to modify the code to identify free resources? Please let me know as will be helpful in testing our approaches.
```
STDERR: I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
STDERR: name: Tesla K80
STDERR: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
STDERR: pciBusID 0000:87:00.0
STDERR: Total memory: 12.00GiB
STDERR: Free memory: 485.29MiB
STDERR: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x6a07e50
STDERR: I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 1 with properties:
STDERR: name: Tesla K80
STDERR: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
STDERR: pciBusID 0000:88:00.0
STDERR: Total memory: 12.00GiB
STDERR: Free memory: 11.81GiB
.
.
STDERR: W tensorflow/core/common_runtime/bfc_allocator.cc:270] *******************************************************************************************xxxxxxxxx
STDERR: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 9.66MiB. See logs for memory state.
STDERR: W tensorflow/core/framework/op_kernel.cc:909] Resource exhausted: OOM when allocating tensor with shape[50,225,225,1]
```
Created by Chetak Kandaswamy chetak Thanks Bruce.
I will reattempt this time. > Do we need to modify the code to identify free resources?
The submission processing infrastructure is designed to provide your submission exclusive use of one Tesla K80 (two GPUs with 12GB of video memory each). I verified that there's nothing running on the server that ran your submission which would have used the dedicated GPUs. I don't have a ready answer for this issue but will report back as we find out more.
${leaderboard?queryTableResults=true&path=%2Fevaluation%2Fsubmission%2Fquery%3Fquery%3Dselect%2B%2A%2Bfrom%2Bevaluation%5F7213944%2Bwhere%2BuserId%253D%253D%25223345216%2522&paging=false&pageSize=100&showRowNumber=false&columnConfig0=none%2CSubmission ID%2CobjectId%3B%2CNONE&columnConfig1=none%2C%2Cstatus%3B%2CNONE&columnConfig2=synapseid%2C%2CentityId%3B%2CNONE&columnConfig3=none%2C%2CteamId%3B%2CNONE&columnConfig4=epochdate%2CCreation Date%252FTime%2CcreatedOn%3B%2CNONE&columnConfig5=epochdate%2C%2CTRAINING%5FSTARTED%3B%2CNONE&columnConfig6=userid%2C%2CuserId%3B%2CNONE&columnConfig7=none%2C%2Cname%3B%2CNONE&columnConfig8=epochdate%2C%2CTRAINING%5FLAST%5FUPDATED%3B%2CNONE&columnConfig9=none%2C%2CFAILURE%5FREASON%3B%2CNONE&columnConfig10=none%2C%2CSTOP%5FSTATUS%3B%2CNONE&columnConfig11=none%2C%2CSUBMISSION%5FFOLDER%3B%2CNONE&columnConfig12=none%2C%2CWORKER%5FID%3B%2CNONE}
Drop files to upload
Tensorflow GPU allocation problem? page is loading…