Hi.
I received an email that said model had finished training:
*Your Submission to the Digital Mammography challenge (submission ID 7435897) has completed its training phase. . Please direct any questions to the challenge forum, https://www.synapse.org/#!Synapse:syn4224222/discussion .*
but it does not direct me to log files. How can I find my log files?
Thank you.
Seung
Created by Seung Wook Kim swkim @mobileroaming:
I reran the two submitted entities syn7358071 and syn7477495. Here are the entities, their submission IDs and their log folders:
```
syn7358071 7542259 https://www.synapse.org/#!Synapse:syn7542296
syn7477495 7542260 ** no submission folder **
```
The folder syn7542296 has only preprocessing logs and the file is 1MB which means it was almost certainly truncated. I retrieved the logs from the preprocessing container and verified that it's 11.6MB (uncompressed) so explicit truncation was certainly at work here (and working as designed).
The next question is why the syn7542296 folder has no *training* logs. I looked at the logs in the original container, which we retained and, as expected, there is no output. So the reason there is no training log returned is that your model generated no output. Looking inside your container I see that your "/train.sh" has a line:
```
$TOOLS/caffe train --solver=$MODEL 2>&1 | tee $LOG &
```
Which would appear to shunt all log output into a file. This would explain why there are no logs captured.
Turning to submission 7542260 (file syn7477495) we first note that it uses the same preprocessing container as did the previous submission:
```
docker.synapse.org/syn7357809/preprocessing@sha256:9ffd51a10fd4b90fa1da2f85d18d4c87a26dad304f416731862d0f25d6c69212
```
Since the preprocessing step was the same it was not repeated. Since it was not repeated there were no output logs to capture.
Turning to the training step of submission 7542260: We do have an archived container for this model. Once again we see that it has no logs.
Since there were neither preprocessing nor training logs for submission 7542260 we did not create a log folder.
In short, the system appears to be working as designed.
Hope you find this feedback helpful.
Thank you for rerunning the model, Bruce. It appears preprocessing has completed here and no outputs were produced of the training step.
Could you rerun file syn7358071? This is the example for preprocessing and training Dockers using Caffe, unchanged. It appears the training Docker here does not produce outputs either. > Is it that TrainingV1 and TrainingV2 Dockers are not producing any outputs, though completing without error? Thus receiving a "Model Training Completed" message?
Yes, that's a possibility. To check whether this is the case I took the liberty of rerunning your model, i.e. sending the file syn7477495 to the submission queue. The submission ID is 7488063. Log files are going to https://www.synapse.org/#!Synapse:syn7488072. (I shared the folder with you so you can see the output.) I already see that preprocessing logs are being collected. Will update this post when it completes. Thank you for the description of the resolution to the issue of training logs.
It would seem I continue to have the same issue as before, however:
Resubmitting an updated training Docker (submission ID 7484049) as first submitted yesterday (submission ID 7477496), again immediately produced an email message notifying me of the submission's completion, though without a link to any logs.
As a test, I resubmitted the original training Docker (submission ID 7487095) and it also produced the same.
Admittedly, I cannot be sure of the results of the original submission of this training Docker (submission ID 7373553). Though a log was produced from this submission, it met the log size limit, and I was unable to determine the outcome of the training.
Please excuse me if the description above is hard to follow, here is the same as a timeline:
10/22 21:30 PreprocessingV1-TrainingV1 (submission ID 7373553) - Was unable to see log of training steps
10/31 17:30 PreprocessingV1-TrainingV2 (submission ID 7477496) - No log
11/01 00:00 PreprocessingV1-TrainingV2 (submission ID 7484049) - No log
11/01 12:00 PreprocessingV1-TrainingV1 (submission ID 7487095) - No log
Is it that TrainingV1 and TrainingV2 Dockers are not producing any outputs, though completing without error? Thus receiving a "Model Training Completed" message? All: We identified and have resolved (as of today, 10/31) a problem in which, after running a preprocessing+training submission, you receive just the training log file back. For such a submission you should now receive a folder with *two* .zip files, one for preprocessing and one for training. Please note we only return log files if you container actually produces output. If there is no output then we create no archive. Also note, we do not run your preprocessing step if it is the same as the preprocessing step in the previous submission. We simply reuse the cached result. When preprocessing is skipped there is no preprocessing log file. I've just met the same problem, after pushing an updated training Docker and uploading a submission file pointing to this new training Docker as well as the original preprocessing Docker. The email message contains no link to a log, though with the original submission there had been:
Dear (mobileroaming):Your Submission to the Digital Mammography
challenge (submission ID 7477496) has completed its training phase. .
Please direct any questions to the challenge forum, https://www.synapse.org/#!Synapse:syn4224222/discussion
.Sincerely,Challenge Administration
@tschaffter @brucehoff
This problem still persists.
I submitted a new docker that does preprocessing again -> then training.
I got an email saying they were completed but the two (preprocessing, training) log files point to the same log file:
------------------------------------------
The data preprocessing phase of your submission (submission ID 7437511) to the Digital Mammography challenge is in progress. Log files produced while your model is** pre-processing** the input data will be periodically uploaded here:** https://www.synapse.org/#!Synapse:syn7442500**.
------------------------------------------
Your Submission to the Digital Mammography challenge (submission ID 7437511) has completed its **training **phase. Your logs are available here: **https://www.synapse.org/#!Synapse:syn7442500**. Please direct any questions to the challenge forum, https://www.synapse.org/#!Synapse:syn4224222/discussion .
-------------------------------------------
Hence , I still cannot see what the training docker has produced.
Could you please take a look at that?
Thanks. Hi. I'm pretty sure because the same docker file has printed to log file before.
The submission ID was 7435897.
Thanks. Hi Kim,
I experienced similar issues a few days ago when testing the upcoming Express Lane. Are you sure that you are printing something to stdout or stderr? What was the submission ID?