My submission has run - which is great! I've got the environment, and can see that I'm in the right place, because: GPUS=/dev/nvidiactl;/dev/nvidia-uvm;/dev/nvidia2;/dev/nvidia3 However, none of these directories appear to be there: /modelState /trainingData /preprocessedData /metadata Should we have a 'MOUNT' command in the /preprocess.sh ? Or is this another problem?

Created by Peter Brooks fustbariclation
> Should we be providing VOLUME statements in our Dockerfiles for the externally mounted volumes? That is a good question. We have not found the VOLUME statement to be required.   To check whether the volumes are mounted correctly I ran the following train.sh script as a submission to the challenge: ``` #!/bin/sh echo "this is the content of the root directory" ls -al / echo "this is the content of the metadata folder" ls -al /metadata echo "(attempt to) count the files in the /preprocessedData folder" cd /preprocessedData && ls -1 | wc -l echo "(attempt to) count the files in the /trainingData folder" cd /trainingData && ls -1 | wc -l echo "write some content into the /modelState folder" echo "this content represents some model state" > /modelState/state.txt ``` Below is the output I got back, which is as expected, i.e. the /metadata, /trainingData and /modelState folders are found. The /preprocessedData is **not** found, since I did not include a preprocessing step. The /modelState folder is *writable.* ``` STDOUT: this is the content of the root directory STDOUT: total 8992 STDOUT: drwxr-xr-x. 24 root root 4096 Oct 28 21:00 . STDOUT: drwxr-xr-x. 24 root root 4096 Oct 28 21:00 .. STDOUT: -rwxr-xr-x. 1 root root 0 Oct 28 21:00 .dockerenv STDOUT: drwxr-xr-x. 2 root root 4096 Sep 23 21:00 bin STDOUT: drwxr-xr-x. 2 root root 6 Apr 12 2016 boot STDOUT: drwxr-xr-x. 5 root root 440 Oct 28 21:00 dev STDOUT: drwxr-xr-x. 42 root root 4096 Oct 28 21:00 etc STDOUT: drwxr-xr-x. 2 root root 6 Apr 12 2016 home STDOUT: drwxr-xr-x. 8 root root 90 Sep 13 2015 lib STDOUT: drwxr-xr-x. 2 root root 33 Sep 23 21:00 lib64 STDOUT: drwxr-xr-x. 2 root root 6 Sep 23 21:00 media STDOUT: drwxr-xr-x. 2 root root 58 Oct 28 21:00 metadata STDOUT: drwxr-xr-x. 2 root root 6 Sep 23 21:00 mnt STDOUT: drwxrwx---. 2 root 1000 4096 Oct 28 12:22 modelState STDOUT: drwxr-xr-x. 2 root root 6 Sep 23 21:00 opt STDOUT: -rwxrwxrwx. 1 root root 529 Oct 28 20:57 preprocess.sh STDOUT: dr-xr-xr-x. 671 root root 0 Oct 28 21:00 proc STDOUT: drwx------. 2 root root 35 Sep 23 21:00 root STDOUT: drwxr-xr-x. 5 root root 54 Sep 26 21:26 run STDOUT: drwxr-xr-x. 2 root root 4096 Sep 26 21:26 sbin STDOUT: drwxr-xr-x. 2 root root 6 Sep 23 21:00 srv STDOUT: dr-xr-xr-x. 13 root root 0 Sep 5 19:08 sys STDOUT: drwxrwxrwt. 2 root root 6 Sep 23 21:00 tmp STDOUT: -rwxrwxrwx. 1 root root 466 Oct 28 20:54 train.sh STDOUT: drwxr-xr-x. 2 root root 9166848 Sep 5 17:37 trainingData STDOUT: drwxr-xr-x. 10 root root 97 Sep 26 21:26 usr STDOUT: drwxr-xr-x. 11 root root 4096 Sep 26 21:26 var STDOUT: this is the content of the metadata folder STDOUT: total 14220 STDOUT: drwxr-xr-x. 2 root root 58 Oct 28 21:00 . STDOUT: drwxr-xr-x. 24 root root 4096 Oct 28 21:00 .. STDOUT: -rw-r--r--. 1 root root 4029132 Sep 7 03:26 exams_metadata.tsv STDOUT: -rw-r--r--. 1 root root 10524986 Sep 7 03:26 images_crosswalk.tsv STDOUT: (attempt to) count the files in the /preprocessedData folder STDOUT: (attempt to) count the files in the /trainingData folder STDERR: /train.sh: 7: cd: can't cd to /preprocessedData STDOUT: 313847 STDOUT: write some content into the /modelState folder ``` I then submitted a container that includes the aforementioned /train.sh file as well as this /preprocess.sh file: ``` #!/bin/sh echo "this script is /preprocess.sh" echo "this is the content of the root directory" ls -al / echo "this is the content of the metadata folder" ls -al /metadata echo "(attempt to) count the files in the /preprocessedData folder" cd /preprocessedData && ls -1 | wc -l echo "(attempt to) count the files in the /trainingData folder" cd /trainingData && ls -1 | wc -l echo "write some content into the /trainingData folder" echo "this content represents some preprocessed data" > /preprocessedData/preprocessed_data.txt ``` I created a submission like so: ``` preprocessing=docker.synapse.org/syn4224222/dm-volume-check@sha256:fdb71f06b78934358f2aa264fbb6b3d00ef3159c2104276b77aac2b3c34ed903 training=docker.synapse.org/syn4224222/dm-volume-check@sha256:fdb71f06b78934358f2aa264fbb6b3d00ef3159c2104276b77aac2b3c34ed903 ``` That is, this submission has both a preprocessing and training step. The output for the preprocessing step is: ``` STDOUT: this script is /preprocess.sh STDOUT: this is the content of the root directory STDOUT: total 8880 STDOUT: drwxr-xr-x. 24 root root 4096 Oct 28 21:31 . STDOUT: drwxr-xr-x. 24 root root 4096 Oct 28 21:31 .. STDOUT: -rwxr-xr-x. 1 root root 0 Oct 28 21:31 .dockerenv STDOUT: drwxr-xr-x. 2 root root 4096 Sep 23 21:00 bin STDOUT: drwxr-xr-x. 2 root root 6 Apr 12 2016 boot STDOUT: drwxr-xr-x. 5 root root 440 Oct 28 21:31 dev STDOUT: drwxr-xr-x. 42 root root 4096 Oct 28 21:31 etc STDOUT: drwxr-xr-x. 2 root root 6 Apr 12 2016 home STDOUT: drwxr-xr-x. 8 root root 90 Sep 13 2015 lib STDOUT: drwxr-xr-x. 2 root root 33 Sep 23 21:00 lib64 STDOUT: drwxr-xr-x. 2 root root 6 Sep 23 21:00 media STDOUT: drwxr-xr-x. 2 root root 58 Oct 28 21:31 metadata STDOUT: drwxr-xr-x. 2 root root 6 Sep 23 21:00 mnt STDOUT: drwxr-xr-x. 2 root root 6 Sep 23 21:00 opt STDOUT: -rwxrwxrwx. 1 root root 529 Oct 28 20:57 preprocess.sh STDOUT: drwxrwx---. 2 root 1000 4096 Oct 4 00:37 preprocessedData STDOUT: dr-xr-xr-x. 657 root root 0 Oct 28 21:31 proc STDOUT: drwx------. 2 root root 35 Sep 23 21:00 root STDOUT: drwxr-xr-x. 5 root root 54 Sep 26 21:26 run STDOUT: drwxr-xr-x. 2 root root 4096 Sep 26 21:26 sbin STDOUT: drwxr-xr-x. 2 root root 6 Sep 23 21:00 srv STDOUT: dr-xr-xr-x. 13 root root 0 Sep 5 19:08 sys STDOUT: drwxrwxrwt. 2 root root 6 Sep 23 21:00 tmp STDOUT: -rwxrwxrwx. 1 root root 466 Oct 28 20:54 train.sh STDOUT: drwxr-xr-x. 2 root root 9056256 Sep 5 17:53 trainingData STDOUT: drwxr-xr-x. 10 root root 97 Sep 26 21:26 usr STDOUT: drwxr-xr-x. 11 root root 4096 Sep 26 21:26 var STDOUT: this is the content of the metadata folder STDOUT: total 14220 STDOUT: drwxr-xr-x. 2 root root 58 Oct 28 21:31 . STDOUT: drwxr-xr-x. 24 root root 4096 Oct 28 21:31 .. STDOUT: -rw-r--r--. 1 root root 4029132 Sep 7 03:24 exams_metadata.tsv STDOUT: -rw-r--r--. 1 root root 10524986 Sep 7 03:25 images_crosswalk.tsv STDOUT: (attempt to) count the files in the /preprocessedData folder STDOUT: 0 STDOUT: (attempt to) count the files in the /trainingData folder STDOUT: 313847 STDOUT: write some content into the /trainingData folder ``` The output for the training step is: ``` STDOUT: this is the content of the root directory STDOUT: total 40 STDOUT: drwxr-xr-x. 24 root root 4096 Oct 28 21:37 . STDOUT: drwxr-xr-x. 24 root root 4096 Oct 28 21:37 .. STDOUT: -rwxr-xr-x. 1 root root 0 Oct 28 21:36 .dockerenv STDOUT: drwxr-xr-x. 2 root root 4096 Sep 23 21:00 bin STDOUT: drwxr-xr-x. 2 root root 6 Apr 12 2016 boot STDOUT: drwxr-xr-x. 5 root root 440 Oct 28 21:37 dev STDOUT: drwxr-xr-x. 42 root root 4096 Oct 28 21:36 etc STDOUT: drwxr-xr-x. 2 root root 6 Apr 12 2016 home STDOUT: drwxr-xr-x. 8 root root 90 Sep 13 2015 lib STDOUT: drwxr-xr-x. 2 root root 33 Sep 23 21:00 lib64 STDOUT: drwxr-xr-x. 2 root root 6 Sep 23 21:00 media STDOUT: drwxr-xr-x. 2 root root 58 Oct 28 21:37 metadata STDOUT: drwxr-xr-x. 2 root root 6 Sep 23 21:00 mnt STDOUT: drwxrwx---. 2 root 1000 4096 Oct 28 09:12 modelState STDOUT: drwxr-xr-x. 2 root root 6 Sep 23 21:00 opt STDOUT: -rwxrwxrwx. 1 root root 529 Oct 28 20:57 preprocess.sh STDOUT: drwxrwx---. 2 root 1000 4096 Oct 28 21:31 preprocessedData STDOUT: dr-xr-xr-x. 661 root root 0 Oct 28 21:36 proc STDOUT: drwx------. 2 root root 35 Sep 23 21:00 root STDOUT: drwxr-xr-x. 5 root root 54 Sep 26 21:26 run STDOUT: drwxr-xr-x. 2 root root 4096 Sep 26 21:26 sbin STDOUT: drwxr-xr-x. 2 root root 6 Sep 23 21:00 srv STDOUT: dr-xr-xr-x. 13 root root 0 Sep 5 19:08 sys STDOUT: drwxrwxrwt. 2 root root 6 Sep 23 21:00 tmp STDOUT: -rwxrwxrwx. 1 root root 466 Oct 28 20:54 train.sh STDOUT: drwxr-xr-x. 10 root root 97 Sep 26 21:26 usr STDOUT: drwxr-xr-x. 11 root root 4096 Sep 26 21:26 var STDOUT: this is the content of the metadata folder STDOUT: total 14220 STDOUT: drwxr-xr-x. 2 root root 58 Oct 28 21:37 . STDOUT: drwxr-xr-x. 24 root root 4096 Oct 28 21:37 .. STDOUT: -rw-r--r--. 1 root root 4029132 Sep 7 03:24 exams_metadata.tsv STDOUT: -rw-r--r--. 1 root root 10524986 Sep 7 03:25 images_crosswalk.tsv STDOUT: (attempt to) count the files in the /preprocessedData folder STDOUT: 1 STDOUT: (attempt to) count the files in the /trainingData folder STDOUT: write some content into the /modelState folder STDERR: /train.sh: 9: cd: can't cd to /trainingData ``` Again, the output is as expected for our system design.   The source code for this example is here: https://github.com/Sage-Bionetworks/dm-volume-check   The Docker container is here: https://www.synapse.org/#!Synapse:syn7459883   And the submitted preprocessing/training file is here: https://www.synapse.org/#!Synapse:syn7459893   In summary, my (admittedly limited) testing suggests everything is OK. I do notice that your comments on this thread we *before* the Oct. 21 substantial software update. So my question is: Do you continue to have problems or do you now find that the mounted volumes are correct? Thank you.
Should we be providing VOLUME statements in our Dockerfiles for the externally mounted volumes? I would have normally thought so, but the Dockerfile inherited from a colleague managed without, and we've been running successfully in the startup phase without them. Could this relate to the issue we are having?
Dear @brucehoff, at the top of each of our submissions we have been logging the ``` $ du -sh ``` of the contents of the mounted directories, such as /trainingData. Simply to understand what is mounted, when. For the past several days I've been running tests by opening the docker image, using the dropdown to Submit to Challenge, selecting the commit, Next, and choosing the Digital Mammography Model Training. Our train.sh log shows 6.3TB of files in /trainingData/ under these circumstances. When the same image is submitted via the two-line text file. (@dnouri has been trying this method, and the one I described above) the behaviour is that only the metadata volume is mounted. This is completely repeatable, and consistent with earlier post by @fustbariclation Would you check the way you run submissions from the formal submission queue, and compare with submitting direct from the docker image page. I think you are not mounting the volumes that we are expecting. For the avoidance of doubt, bearing in mind that other users have confused build-time and in-container run-time commands (e.g. RUN ls /xyx), here are some output lines, showing that these are runtime results (STDOUT), from the two cases. Run from the docker image page: ``` STDOUT: Number of train images: STDOUT: 313847 313847 4080011 STDOUT: Size of train images: STDOUT: 6.3T /trainingData/ ``` Run via the two-line submission file. ``` STDOUT: Number of train images: STDERR: ls: cannot access /trainingData/: No such file or directory STDOUT: 0 0 0 ``` For forthcoming runs, I will tag our logs with the HOSTNAME from the ENV. If you felt inclined to add an ENV Variable to distinguish the method used to submit a run, I'd paste some trivial bash code here that others could use to document their logs, to help diagnose future issues. Thanks for helping run the challenge. Regards, --r
I guess it's a different issue for us then but our `ls` command was run from within a script that's called by `/train.sh`. And it can't access `/trainingData/`. It can also not access `/modelState/`, `/scoreData`, `/testData`. On the other hand `/metadata/` seems to exist! The ID of the log file is 10bb24916ad4654e78053836b384d47cb4c177a7fd7d613d5a9f57d4a28f9a653098021116783466643. Let me know if I can provide any more details.
I can't overemphasize that the following is **NOT** the way to interrogate the state of the system at the time a Docker container is run: ``` RUN /bin/df -H RUN ls -lt ``` These commands are run on your **local system** (or, more accurately, in the environment of the Docker daemon which you are using to build images) at the time you create the container image for submission.
I'm pushing everything again - it takes a few days, and I'm really hoping that this problem won't still be there! I have these two lines in the Dockerfile: ``` RUN /bin/df -H RUN ls -lt ``` Also, in preproces.sh, I make execution conditional on the directory modelState being there: ``` if [ -e /modelState ] then ``` Any news of progress with this?
I ran into the same issue just now. From my log: ``` STDERR: ls: cannot access /trainingData/: No such file or directory ```
Yes, it's simple, really, at the top of preprocessing.sh, I do a: ``` df -H ls -lt ``` Which show the directories not mounted.
No, you do not need to mount anything. Can you share the evidence you have that the directories are not mounted?

Directories missing - do we have to mount them? page is loading…