I just got a strange error. In the log file, it says:
```
STDERR: /train.sh: line 54: 6 Killed my_script.py
```
This corresponds to submission ID 8203282. The same program passed test on the express training lane. Any clue?
Created by Li Shen thefaculty I don't have experience doing memory profiling in Python, but I wonder if you could "instrument" your code with a tool like this: http://guppy-pe.sourceforge.net/#Heapy to see where memory is being used. @brucehoff
I see. My calculation is that my job will use a little over 100GB. I don't know exactly why that exceeded the memory limit. > So the memory limit for the container is indeed less than what is physically available, i.e. 200GB, right?
The servers used to run the containers have in excess of 500GB of physical memory. We run only two jobs at a time, entirely isolated from each other. Each job is allocated 200GB of memory using the "memory" option with the "Docker run" command, https://docs.docker.com/engine/reference/run/#/user-memory-constraints. So our expectation is that your submission has unrestricted use of 200GB of memory. This applies both to the express lane and leaderboard, and to preprocessing, training and inference submissions. I see. That's exactly what I suspected. @brucehoff
So the memory limit for the container is indeed less than what is physically available, i.e. 200GB, right? Submission 8212274 terminated with a return (exit) code of 137. The meaning of this code is discussed here:
https://bobcares.com/blog/error-137-docker/
"Error 137 in Docker denotes that the container was ?KILL?ed by ?oom-killer? (Out of Memory). This happens when there isn?t enough memory in the container for running the process." @brucehoff
===
Sorry, I referred to the wrong submission ID. The one that was killed is: 8212274. The log for submission 8203282 is below. (The file is [8203282_dm-ls-train-dl_training_logs.zip](https://www.synapse.org/#!Synapse:syn8204586).)
The line you quote is NOT in the file. The file seems to contain a Python stack trace. Could it be that there is a bug in your code?
```
STDOUT: >>> Fit image generator <<<
STDOUT: There are 0 cancer cases and 178 normal cases.
STDERR: Traceback (most recent call last):
STDERR: File "dm_candidROI_train.py", line 329, in
STDERR: run(args.img_folder, **run_opts)
STDERR: File "dm_candidROI_train.py", line 85, in run
STDERR: roi_clf=None)
STDERR: File "/dm_image.py", line 820, in flow_from_candid_roi
STDERR: verbose=verbose)
STDERR: File "/dm_image.py", line 602, in __init__
STDERR: self.blob_detector = cv2.SimpleBlobDetector_create(params)
STDERR: AttributeError: 'module' object has no attribute 'SimpleBlobDetector_create'
``` Dear Li/Dan,
Thanks in advance for your patience. We are looking into this issue.
Best,
Thomas Same with me (running ID 8212317)
```
"/train.sh: line 30: 17 Killed python run_cnn_k_mil_new.py"
```
I post mine in
[thread1673](https://www.synapse.org/#!Synapse:syn4224222/discussion/threadId=1673)
I have submitted the same code (runid 8110350) and it worked. Really wried.
Liangjian Could this be due to my job exceeding memory limit? It was loading sth. like >100GB into RAM...