I just recently happened to submit one job repeatedly. This means the two submissions are exactly identical.
I immediately canceled one within 10 mins to avoid wasting time quota.
However, I found an interesting thing by analyzing the two log files.
One submission reported an IOError during python import, but the other one didn't. (BTW, seems this IO error is not so lethal, and both continued to run my program. )
Since the two submission are identical, I am just wondering how this can happen?
The only factor that might be different is the underlying server.
I will appreciate it if someone can provide some ideas on this problem? @thomas.yu @tschaffter @brucehoff
I attached the first several lines for both log files:
Log file1: **with IOError**
STDOUT: Fri Jan 20 18:52:46 UTC 2017
STDOUT: copy val.lst to /scratch folder
STDOUT: done
STDOUT: total 116K
STDOUT: -rw-r--r--. 1 root root 37K Jan 20 18:52 image_list.txt
STDOUT: -rw-r--r--. 1 root root 73K Jan 20 18:52 val.lst
STDOUT: total 120K
STDOUT: -rw-r--r--. 1 root root 37K Jan 20 18:52 image_list.txt
STDOUT: drwxr-xr-x. 2 root root 4.0K Jan 20 18:52 images_crop
STDOUT: -rw-r--r--. 1 root root 73K Jan 20 18:52 val.lst
STDOUT: Fri Jan 20 18:52:46 UTC 2017
STDOUT: start to preprocess the .dcm to png
STDERR: parallel: Warning: $SHELL not set. Using /bin/sh.
STDERR: Traceback (most recent call last):
STDERR: File "main_dcm_crop_png.py", line 6, in
STDERR: from pylab import *
STDERR: File "/usr/lib/pymodules/python2.7/pylab.py", line 1, in
STDERR: from matplotlib.pylab import *
STDERR: File "/usr/lib/pymodules/python2.7/matplotlib/pylab.py", line 226, in
STDERR: import matplotlib.finance
STDERR: File "/usr/lib/pymodules/python2.7/matplotlib/finance.py", line 23, in
STDERR: from matplotlib.collections import LineCollection, PolyCollection
STDERR: File "/usr/lib/pymodules/python2.7/matplotlib/collections.py", line 23, in
STDERR: import matplotlib.backend_bases as backend_bases
STDERR: File "/usr/lib/pymodules/python2.7/matplotlib/backend_bases.py", line 50, in
STDERR: import matplotlib.textpath as textpath
STDERR: File "/usr/lib/pymodules/python2.7/matplotlib/textpath.py", line 11, in
STDERR: import matplotlib.font_manager as font_manager
STDERR: File "/usr/lib/pymodules/python2.7/matplotlib/font_manager.py", line 1356, in
STDERR: _rebuild()
STDERR: File "/usr/lib/pymodules/python2.7/matplotlib/font_manager.py", line 1343, in _rebuild
STDERR: pickle_dump(fontManager, _fmcache)
STDERR: File "/usr/lib/pymodules/python2.7/matplotlib/font_manager.py", line 939, in pickle_dump
STDERR: with open(filename, 'wb') as fh:
**STDERR: IOError: [Errno 2] No such file or directory: '/tmp/matplotlib-root/fontList.cache'**
STDOUT: Convert /trainingData/495822.dcm to /scratch/images_crop/495822.png
STDOUT: AcqTime: ; Manu: HOLOGIC, Inc.; Reso: ['0.0700', '0.0700']; Rows: 3328; Cols: 2560
STDERR: libdc1394 error: Failed to initialize libdc1394
STDOUT: Convert /trainingData/494052.dcm to /scratch/images_crop/494052.png
STDOUT: AcqTime: ; Manu: HOLOGIC, Inc.; Reso: ['0.0700', '0.0700']; Rows: 3328; Cols: 2560
Log file2: **No IOError**
STDOUT: Fri Jan 20 18:54:16 UTC 2017
STDOUT: copy val.lst to /scratch folder
STDOUT: done
STDOUT: total 116K
STDOUT: -rw-r--r--. 1 root root 37K Jan 20 18:54 image_list.txt
STDOUT: -rw-r--r--. 1 root root 73K Jan 20 18:54 val.lst
STDOUT: total 120K
STDOUT: -rw-r--r--. 1 root root 37K Jan 20 18:54 image_list.txt
STDOUT: drwxr-xr-x. 2 root root 4.0K Jan 20 18:54 images_crop
STDOUT: -rw-r--r--. 1 root root 73K Jan 20 18:54 val.lst
STDOUT: Fri Jan 20 18:54:16 UTC 2017
STDOUT: start to preprocess the .dcm to png
STDERR: parallel: Warning: $SHELL not set. Using /bin/sh.
STDOUT: Convert /trainingData/495847.dcm to /scratch/images_crop/495847.png
STDOUT: AcqTime: ; Manu: HOLOGIC, Inc.; Reso: ['0.0700', '0.0700']; Rows: 3328; Cols: 2560
STDERR: libdc1394 error: Failed to initialize libdc1394
STDOUT: Convert /trainingData/495818.dcm to /scratch/images_crop/495818.png
STDOUT: AcqTime: ; Manu: HOLOGIC, Inc.; Reso: ['0.0700', '0.0700']; Rows: 3328; Cols: 2560
STDERR: libdc1394 error: Failed to initialize libdc1394
Created by Bibo Shi darrylbobo > The only factor that might be different is the underlying server.
Not true. You could have a race condition in your code.
https://en.wikipedia.org/wiki/Race_condition
Drop files to upload
Two exactly same submission, but one report ERR page is loading…