some questions:
1. i found i cannot replace the docker file, (sane as when it was in the file tab). but when it was in the file tab at least it can be manually deleted and replaced. now i cannot even delete it. then i will need to create a new project, EVERY time i edit one line?
2. is it possible to terminate the training by force? if i write a dead loop, then i will lose all my compute hours?
3. model state is not readable? then eventually how can we choose which one to submit?
thanks,
yuanfang
Created by Yuanfang Guan ???? yuanfang.guan can you please take a look at my recent submissions? around 16:00 at your time. it was the same as like half a month ago. zero line changed . but it cannot find the gpu anymore.
thanks a ton.
> when i run the image locally, it cannot find gpu device
To access NVIDIA GPUs from with a docker container you should run with nvidia-docker rather than Docker. You will have to install the tool, and there is some guidance here:
https://devblogs.nvidia.com/parallelforall/nvidia-docker-gpu-server-application-deployment-made-easy/
Please scroll down to "Installing Docker and NVIDIA Docker" yes. you taught me that before.
when i run the image locally, it cannot find gpu device, while it can find on your machine. but i am sure my installation of my machine is correct as well, since i am also using it to do a couple of other projects, they run just fine.
would be really nice to get a tutorial. > but the second time when i run a new pre-processing image, the previous one is gone, (when i ls, nothing is produced).
Yes. If you decide to change how you preprocess data you indicate this to the challenge by changing the "preprocessing" line in submitted file. The challenge will comply with your request by clearing your 10TB share and then running the *new* preprocessing image.
> that means, we will have to do pre-processing correctly in one batch,
If I understand, the answer is yes. The important thing for me to state is the processing of submissions is **not** incremental or cumulative. Your training result should be entirely reproducible from just the single submission that generated it.
> and every time we run, it is consuming a new section 10TB?
Per the above, **no**. Every time you preprocess the entire 10TB is cleared for the new/updated preprocessing algorithm you have submitted.
> for example, if i run cp *dcm /preprocessedData three times, if each time consumes 1 TB, after I submit 3 times, 3TB is gone, and there is no way we can remove? is that true?
The result of the previous run is cleared automatically.
> "The challenge will note that the preprocessing step is unchanged, skip it, and go straight to your new training algorithm."
> how is that achieved?
We keep a record of the Docker image (including the digest) that was run to produce the <=10TB output. If you submit several times, indicating the same preprocessing image each time, we know to skip the preprocessing because the 10TB share already contains its output. Note: If your preprocessing algorithm encounters an error we do not preserve the content of the 10TB share. We assume the content is invalid.
Yuanfang: Your unspoken question (I think) is "How can I experiment with and check my preprocessing algorithm prior to running on the entire training data set?" The answer is that you can run on your local computer using the ~500 image pilot data set, e.g. something like:
```
docker run --rm -it -v /path/to/pilot/data:/trainingData -v /path/to/an/empty/folder:/preprocessedData docker.synapse.org/syn7221819/dm-python-example /preprocess.sh
```
The `docker run` command is a little more complicated if you are using NVIDIA GPUs to preprocess. My hope is to compile a tutorial to help participants test locally, as time permits, but I hope this gets you started.
"The challenge will note that the preprocessing step is unchanged, skip it, and go straight to your new training algorithm."
how is that achieved?
for example, if i run cp *dcm /preprocessedData three times, if each time consumes 1 TB, after I submit 3 times, 3TB is gone, and there is no way we can remove?
is that true?
another question:
if i run cp *dcm the first time
cp \*dc\* the second time
cp \*d\* the third time.
i think it is impossible to detect... does that consumes 3TB, or does it consumes 1 TB?
thanks a ton. > can you please just give an example submission on github, just the three lines i wrote, so that i can just start from there?
Yes: Please create a file with the following content:
```
preprocessing=docker.synapse.org/syn7221819/dm-python-example@sha256:e22879e90806d32e3c6f91b2099123489fb72c287bd39ff644e6f86bc228926a
training=docker.synapse.org/syn7221819/dm-python-example@sha256:e22879e90806d32e3c6f91b2099123489fb72c287bd39ff644e6f86bc228926a
```
then upload the file to Synapse and submit to the challenge. This tells the challenge (1) run the Docker image with the /preprocess.sh entry point, then run it a second time with the /train.sh entry point. Note that the Docker image reference takes the form @. The digest comes from your page:
https://www.synapse.org/#!Synapse:syn7221845
Now you might ask why this is so complicated. Why do you have to make a file in which you have to list the image twice and why do you have to give the long digest string? The reason is so that you can *change* your training algorithm and submit again, without rerunning the preprocessing step which, our collaborators inform us, can take days to complete. When you change your model, simply push to Synapse:
```
docker push docker.synapse.org/syn7221819/dm-python-example
```
while noting the new digest string that is printed to your command line (or displayed on the repository's Synapse page). Update the 'training' line of the two line file above, upload the modified file to Synapse and submit. The challenge will note that the preprocessing step is unchanged, skip it, and go straight to your new training algorithm. Admittedly this is a little clunky but we hope that avoiding waiting days to rerun a preprocessing step is worth it to you and the other participants.
As always, we appreciate your questions and feedback.
thanks, bruce, i have reviewed it at least 10 times, but i didn't get it at all.
i read the description, in the example docker file it has this line:
COPY preprocess.sh /preprocess.sh
i really cannot understand that
1) i already copied this line, why do i need a separate docker image for pre-processing
2) if i need a separate image, why do i copy this line in this docker image.
3) if i need a separate image, how do i do that exactly? how do i submit two simultaneously,
is it some information i need to get from the webinar? that webinar is usually hours long.....
or can you please just give an example submission on github, just the three lines i wrote, so that i can just start from there?
thanks so much.
yuanfang
-------
update: never mind. i think i know what you mean now. thanks
-----
update to update:
but the second time when i run a new pre-processing image, the previous one is gone, (when i ls, nothing is produced). that means, we will have to do pre-processing correctly in one batch, and every time we run, it is consuming a new section 10TB?
for example, if i run cp *dcm /preprocessedData three times, if each time consumes 1 TB, after I submit 3 times, 3TB is gone, and there is no way we can remove?
is that true? According to our records you created submission 7254998 this morning, it ran to completion without error and, as you say, it did not create any log file. One reason no log file would be created is if your container created no output, but if your code does what you claim then there should be *some* output. So I dug in further.
You submitted to the training submission queue the following image:
docker.synapse.org/syn7221819/dm-python-example
As per the challenge instructions (see the section "How to Submit" on the page https://www.synapse.org/#!Synapse:syn4224222/wiki/401759) if you are skipping preprocessing you can simply submit your container image, but to submit with preprocessing involves submitting a file listing the two container images constituting your submission (one for preprocessing and one for training). Since you submitted a Docker image, the system assumes you are skipping preprocessing and will run your "/train.sh" entry point. To look at your /train.sh entry point I ran the following:
docker run --rm -it docker.synapse.org/syn7221819/dm-python-example:latest bash -c "cat /train.sh"
The shell script I read produces no output so it makes sense that the system did not produce a log file.
It seems to me that the root cause might be my explanation on how to invoke the extra preprocessing step. Will you review the challenge instructions and tell me if you feel they are clear or how they might be clarified (e.g. to help other participants)?
Thank you.
hi, bruce,
i made a submission but there is no log file. i am absolutely sure i didn't exceed size limit, because i only have three lines in total:
ls /processedData/
mkdir /processedData/XXX
ls /processedData/
i just want to see how preprocess works see if i can make a dir and and if this dir still exists when i reenter the system.
> how can i call preprocess.sh to run?
Kindly see the section "Training with Preprocessing", here:
https://www.synapse.org/#!Synapse:syn4224222/wiki/401759
If the instructions are at all unclear, please do let us know so we can clarify them.
> is there any mechanism to know that i used up the space, e.g. daily root report?
If you use up the space your container will stop running and return an error.
> (i don't think i really need 10TB, but just in case a dead loop is written, then i will fill up the space, right?)
Correct. The feedback you get is:
- how long your model has been running;
- your log files, uploaded every few minutes;
- an error message if your code encounters an error and stops. One possible error would be writing more than 10TB to your preprocessing area.
This is a learning experience for us as much as for you. If you and the other participants find that you need greater visibility into your running model, we can pursue the issue.
Thank you. thanks bruce.
how can i call preprocess.sh to run? is there any mechanism to know that i used up the space, e.g. daily root report? (i don't think i really need 10TB, but just in case a dead loop is written, then i will fill up the space, right?) "thanks bruce, so synapse captures all past images?"
Yes. This is particularly important to allow you to continue working while your submission is in the queue: You may modify your repository after submitting but when your submission runs, the system will use the specific version of your repository at the time you submitted. This would not be possible if we did not capture each and every version that you "push" to Synapse.
"i tend to test when even write one line"
That's fine. We support that style of work if you prefer to do so. Below I make some suggestions that might help you speed up your test/debug cycle.
"would that burst synapse platform, my image contains a large library ~1.5G."
No, it should not: Docker images are structured in terms of layers. When you push a new version, the back end "registry" stores those layers, but it only stores each layer once. So if many images have most of their content in common (only differing by one small layer) then the large, common layers should not be stored repeatedly.
On a related note, when you are running "docker build" repeatedly there are some things you can do to speed things up on your machine. In particular, be sure that in your Dockerfile the items that do not change are closer to the top and the items that change frequently(e.g. ADD or COPY commands that pull in frequently edited scripts) are lower/later in your Dockerfile. Doing this lets the docker system on your machine avoid repeating the early steps which have not changed when it builds a new image. You may wish to read this article on [best practices in writing Dockerfiles](https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/). Again, this has nothing to do with the Synapse system or the challenge infrastructure, it just helps your local copy of Docker work efficiently.
Regarding testing, you are free to run your Docker image on your local machine to make sure it runs. Let's say you have a data preprocessing strategy. The command Docker run would look something like this:
```
docker run -it --rm \
-v path-to-image-data:/trainingData:ro \
-v path-to-empty-folder:/preprocessedData \
docker.synapse.org/syn12345/my-preprocessing-repo /preprocessing.sh
```
where
path-to-image-data is a folder containing DICOM files. (You can use the ones from the pilot data set.)
path-to-empty-folder is an empty folder where your script should write its output
(If your preprocessing algorithm requires the NVIDIA GPUs to run (and you have such hardware on your development machine), then the run command requires some additional parameters. Please let us know if you need guidance for doing this.)
By testing locally you may find that you can correct small problems more quickly than were you to send your image to us to run.
I hope these comments are useful. Please let us know if we can help further.
thanks bruce, so synapse captures all past images? i tend to test when even write one line, because i am not professional programmer, i have to do this to ensure nothing stupid happens. would that burst synapse platform, my image contains a large library ~1.5G. Thanks for asking these questions.
1. You say docker "file" but I think you mean docker "repository". Each repository is a _series_ of images. You can think of an image as a snapshot of a virtual machine containing your model with all dependencies (scripts, libraries, etc.) When you modify the content of your repository (e.g. "edit one line" in one of your source files) then "docker build" and "docker push", you will update the existing repository with a new version of the image. Rather than "deleting and replacing" you simply update the repository with a new version, generated when you run "docker push". You certainly do not need to create a new project or even a new repository when you make a change.
Two other things to note: (1) When you click on a repository on the Docker tab in Synapse you will be taken to the page for the repository. At the right hand side you will see a list of "tags", the default being "latest". To the right is a digest (SHA 256) of the image. When you edit your source, build, and push you will see that this digest has changed. This demonstrates that a repository is a series of images. (2) You can use "tags" to have multiple series of images under a single repository. The syntax looks like this:
```
docker build -t docker.synapse.org/syn123456/my-digital-mammo-model:v2 .
```
where "v2" is a tag of your choice. In addition to specifying a tag when you build you can apply a repository name (with optional tag) with the docker tag command. You can learn more about the docker build command [here](https://docs.docker.com/v1.8/reference/commandline/build/) and the docker tag command [here](https://docs.docker.com/v1.8/reference/commandline/tag/). Regardless of how you create a tagged name, when you "docker push" the new name will appear in Synapse.
2. Yes. We have a mechanism to let you terminate a running submission, which we will expose soon. We will also require that each participant manage their allotted compute hours. If you write an infinite loop then you will consume all your allotted time for the given round.
3. As training progresses we capture and return your log files. By "log files" we mean any command line output your model produces. We hope you can use this mechanism to assess how well your model converges, informing your submission choice.
Thanks again for these questions. Let us continue this discussion. We'd like your experience to be as smooth as possible.
Drop files to upload
how can i remove an existing docker from synapse? page is loading…