Apologies if this is answered elsewhere...
The steps I took to submit are the following:
1. Pushed preprocessing and training docker images to my project (syn7458985)
2. Uploaded submissions file specifying the two images
3. Shared all of the above with dmchallengeit
4. Submitted the submission file to the Digital Mammography Model Training Express Lane
I have had no feedback neither by email nor (as far as I can tell) on Synapse, dashboard (I don't know where to find this) or otherwise.
Is this normal, or am I missing something in the steps I took?
Created by jcb > can you please open the log file just for scoring EXPRESS lane?
Returning log files when scoring on the express lane is a good idea as it's informative and doesn't risk leaking scoring data. We will work towards implementing this. In the meantime I have contacted you privately to return the logs of your scored express lane submissions.
HEllo thomas,
it says i didn't make predictions for a chunk of patients. but i have no access to log, so i have no idea where i could miss. can you please open the log file just for scoring EXPRESS lane? Dear Yuanfang,
I just rescored your submissions and you should have gotten emails about it now.
Best,
Thomas > bruce, can you please delete the content of the previous post? (the one that you listed all my jobs)
Done.
> i meant the fast lane of SC1 scoring, i received no feedback for the recent two submission, no email, what so ever.
Our record show that you submitted five times and that the last two were scored. You should have received a notification. I will investigate why you have not. bruce, can you please delete the content of the previous post? (the one that you listed all my jobs)
i meant the fast lane of SC1 scoring, i received no feedback for the recent two submission, no email, what so ever. >I believe we have addressed your question this morning in another discussion thread.
Hello Bruce, Yes, but you didn't address my other question, why I received no feedback from the test fast lane. By this time, if it is really 1% of the patients, it should have already scored 8 times....
I hope to make a submission today, to see some feedback, before making a final submission on 23rd.
thank you
Dear Bruce,
Thanks for investigating. I can confirm Caffe outputs to /modelState in my training image on my system. @yuanfang.guan I believe we have addressed your question this morning in another discussion thread. can you please take a look at 7839042 EVALUATION_IN_PROGRESS IN PROGRESS
Cancel Requested
12/13/2016 03:01:45AM syn7359254 1 syn7839055 3319559
?
it has been hanging over-long time yes today, so i had to request a cancel and it doesn't even response,
also, can you please help to deduct the dozen hours with this job? i am absolutely sure i didn't do anything wrongly or use large memory. the machine just hang @jcb Your submissions to the training queue are below. The latest one is 7837226. I see the point you are raising: (1) There is no training log for the training phase of this submission, (2) there is no 'model state'. There are two reasons this might happen, (1) because (as you suggest) the training phase did not run or (2) it ran but your code produced no output. I checked the server that ran your submission and the container is there. It ran (very briefly) 8 hours ago, and it produced no log output. This could be explained by your `/train.sh` which appears to shunt STDOUT/STDERR to a file:
```
#!/usr/bin/env sh
##############################################
########## RUN THE TRAINING ROUTINE ##########
##############################################
# /modelState (writable) volume has been mounted
# /preprocessedData (writable) already mounted
# /trainingData (read-only) already mounted
# /metadata (read-only) already mounted
# Train an AlexNet model using preprocessed data (available in /preprocessedData)
# and pipe training output to a log file
export TOOLS=/opt/caffe/.build_release/tools
export LOG=/modelState/train.log
export MODEL=/solver.prototxt
$TOOLS/caffe train --solver=$MODEL 2>&1 | tee $LOG &
```
I'm not sure why your container writes nothing to /modelState. Can you try running your image locally and investigate further?
${leaderboard?path=%2Fevaluation%2Fsubmission%2Fquery%3Fquery%3Dselect%2B%2A%2Bfrom%2Bevaluation%5F7213944%2Bwhere%2BUSER%5FID%253D%253D%25223346599%2522&paging=true&queryTableResults=true&showIfLoggedInOnly=false&pageSize=100&showRowNumber=false&jsonResultsKeyName=rows&columnConfig0=none%2CSubmission ID%2CobjectId%3B%2CNONE&columnConfig1=none%2CStatus%2Cstatus%3B%2CNONE&columnConfig2=none%2CStatus Detail%2CSTATUS%5FDESCRIPTION%3B%2CNONE&columnConfig3=cancelcontrol%2CCancel%2CcancelControl%3B%2CNONE&columnConfig4=epochdate%2CLast Updated%2CmodifiedOn%3B%2CNONE&columnConfig5=synapseid%2CSubmitted Repository or File%2CentityId%3B%2CNONE&columnConfig6=none%2CFile Version%2CversionNumber%3B%2CNONE&columnConfig7=synapseid%2CLog Folder%2CSUBMISSION%5FFOLDER%3B%2CNONE&columnConfig8=none%2CSubmitting User or Team%2CSUBMITTER%3B%2CNONE&columnConfig9=synapseid%2CModel State File%2CMODEL%5FSTATE%5FENTITY%5FID%3B%2CNONE} Dear All,
I am finding for my preprocessing + training jobs, execution never proceeds to training. Whenever I try, it defaults to the cached preprocessing job, before terminating and sending the notification email.
I have tried uploading new training images and modifying the submission file, but without success. Has anyone else experienced this problem? A problem in our system that began last night (Wed., Dec. 7) caused submissions to the Express Lane not to be processed. The backlog grew to about thirty submissions. We have now fixed the problem and verified that the backlog is being cleared.
There was no impact on the main, long-running submission queue nor was there any data loss.
We apologize for the inconvenience and appreciate your continuing participation. Hi duhao,
I know Synapse was experiencing some troubles this morning. @brucehoff should be able to tell you more.
Thanks! Hi Thomas @tschaffter, I also ran into similar problem as jcb. After submitting my submission file as well as example submission file, I received no feedback, emails or info on dashboard. I'm following the same steps as jcb. Thanks if let me know anywhere I'm wrong. ok after three hours, i received an error which should have been reported in 2 seconds
STDERR: IOError: [Errno 2] No such file or directory: '/trainingData/146747.dcm'
so the trainingData dir is no longer mounted? We are extending the time quota of the participants during the first round so you can continue to train without being concerned by the time limit for now. An email will be sent later this week with more information. I'll also answer your other question regarding the predictions soon. ok. thanks... but this is their lunch time. after their lunch, my quota is used up. Hi Yuanfang,
Bruce is the one who makes sure that the system is keeping running.
@brucehoff: Can you have a look at that?
Thanks! hi thomas all my jobs today are stuck without any logs. what happened. Dear Thomas,
Thank you for the information. The job did in fact commence after a few hours, and I now have status information as well.
Thanks again! Hi jcb,
The steps that you listed seems correct.
First, can you submit this [example submission file](https://www.synapse.org/#!Synapse:syn7501865) (Caffe example) and let us know if you have received an email that mentions that the run has successfully completed? If this works, please compare the example to your setup and try submitting your job a second time. The Wiki page [Submitting a model](https://www.synapse.org/#!Synapse:syn4224222/wiki/401759) includes a dashboard right after the sentence "Your pending, current and former jobs are shown here:". However the dashboard is initially invisible if there is no entree inside it (the header is not displayed in that case). This is an issue that Synapse should hopefully correct soon. The plan is also to place the dashboard for the training and scoring queues on their own sub-page as we done for the [Model Training Express Lane](https://www.synapse.org/#!Synapse:syn4224222/wiki/401759).
Thanks!