I was running a long preprocessing job, I waited 3 days before it started running, and another 2 days of running ...after which I received this error below. Running a new job (like "ls -la") to figure out the status, or rerunning it will take many days.
I have 4 question,
* Can you tell me if this error has to do with it being a preprocessing job that doesn't do any training, testing and which that doesn't do any "submission"? Or has the preprocessing failed?
* Can I get access to logfiles? During the preprocessing I wrote progress info to the console.
* My webbased docker console page doesn't seen to show any logfile info. Is that as expected? Where are logfiles normally located?
* Can you see if the conversion of all dcm file to the preprocessing folder succeeded?
Kind regards,
Thijs
below the email I've received:
```
Your Submission to the Digital Mammography challenge, docker.synapse.org/syn7327563/sitmo0@sha256:7b98b464a005e39eb1329272915cec9be76fe848551ef0089f5bd4defe36069a, has stopped before completion. The message is:
org.apache.http.conn.HttpHostConnectException: Connect to bm17-dreamchallenge.sl851865.sl.edst.ibm.com:2376 [bm17-dreamchallenge.sl851865.sl.edst.ibm.com/169.44.28.59] failed: Connection refused
```
Created by Thijs van den Berg sitmo Hi Bruce
Thanks you for answering and restarting the job.
I've pondered some more about this situation in the mean time, and the unpredictability of the available computing resources makes it impossible for me to participate going forward, and so I've unfortunately decided to leave this competition.
The problem is that I have a full time job, and a family, and hence very little time to work on this competition. To be effective I need predictability of available computing resources, and short queues. I might have a new idea and 2 hours to spare, and what I want is to be able to write some code, run a job, and get feedback -all within in that 2 hour timeframe-. .. And the other 22 hours per day other people can use the resources. The job queues are currently days long, jobs crash, and so it's impossible to do decent research. It will be a waste of time and a lot of frustration to not be able to do all the experiments that pop up in my mind.
I wouldn't know how to solve this myself other than renting more servers. With 600 participants you should have 100-300 servers? But I expect it's more like 10? I understand that running this infrastructure can be challenging (although I was expecting IBM and Amazon to be able to handle that), but this is setting me up for failure. Thijs:
The server running your submission, 7328151, encountered an error. We just reinitialized it and it will be enqueued for another server. To address your questions:
> Running a new job (like "ls -la") to figure out the status, or rerunning it will take many days.
To see the status, look at the table under "Quotas and Limits" at the bottom of this page: https://www.synapse.org/#!Synapse:syn4224222/wiki/401759
> Can you tell me if this error has to do with it being a preprocessing job that doesn't do any training, testing and which that doesn't do any "submission"? Or has the preprocessing failed?
The problem was due to a problem encountered by the server processing your submission, and not by your submission itself.
> Can I get access to logfiles? During the preprocessing I wrote progress info to the console.
Yes. When your submission runs properly you will be emailed a link to the log files captured from your running submission.
> My webbased docker console page doesn't seen to show any logfile info. Is that as expected? Where are logfiles normally located?
Not sure what your "webbased docker console page" is. When we run your submission you have no direct access to the container. You can only examine the log files we return and/or cancel it. You will be emailed a link to the captured log files.
> Can you see if the conversion of all dcm file to the preprocessing folder succeeded?
Once again, we have requeued your job to run. Once it starts running the logs will be returned to you and, if your process prints out this information, the logs should answer your question.
Hope this helps.
${leaderboard?queryTableResults=true&path=%2Fevaluation%2Fsubmission%2Fquery%3Fquery%3Dselect%2B%2A%2Bfrom%2Bevaluation%5F7213944%2Bwhere%2BuserId%253D%253D%25223321632%2522&paging=false&pageSize=100&showRowNumber=false&columnConfig0=none%2CSubmission ID%2CobjectId%3B%2CNONE&columnConfig1=none%2C%2Cstatus%3B%2CNONE&columnConfig2=synapseid%2C%2CentityId%3B%2CNONE&columnConfig3=none%2C%2CteamId%3B%2CNONE&columnConfig4=epochdate%2CCreation Date%252FTime%2CcreatedOn%3B%2CNONE&columnConfig5=epochdate%2C%2CTRAINING%5FSTARTED%3B%2CNONE&columnConfig6=userid%2C%2CuserId%3B%2CNONE&columnConfig7=none%2C%2Cname%3B%2CNONE&columnConfig8=epochdate%2C%2CTRAINING%5FLAST%5FUPDATED%3B%2CNONE&columnConfig9=none%2C%2CFAILURE%5FREASON%3B%2CNONE&columnConfig10=none%2C%2CSTOP%5FSTATUS%3B%2CNONE&columnConfig11=none%2C%2CSUBMISSION%5FFOLDER%3B%2CNONE&columnConfig12=none%2C%2CWORKER%5FID%3B%2CNONE}
Drop files to upload
Submission Failed on preprocessing job page is loading…