Hello,
My inference submission for SC2 with ID 8505132 has failed. It successfully passed the express lane and the last few lines of log file did not point out to a bug in the code, it has just fallen in the middle of inference phase. Could you please relaunch the submission, if it is not due to my fault?
Thank you
Yaroslav
Created by Yaroslav Nikulin (Therapixel) ynikulin @brucehoff thank you for checking. My comment was merely for information, not a request to restart. Apologies. There is one server, recently added to process inference submissions, which ran three submissions, all of which failed. One of the submissions was 8505132. Clearly there is a problem with the machine. The server has been taken off line and all the submissions requeued (as is our policy whenever we detect a problem with a submission due to our server). We will also take off line and investigate other machines that were added at the same time, to ensure they are all configured and functioning correctly.
@smorrell, I see no submission from your team that ran on a suspect server. If you feel it is the case that your submission failed due to an infrastructure problem, please share the submission ID and we will investigate. Hi Stephen,
> CancerL pilot includes a . while CancerR does not, so the field dType becomes str vs int when pandas sets the dtype automatically.
This is correct for the Pilot Data, however we have documented this. The [Challenge Dictionary](https://www.synapse.org/#!Synapse:syn7214004) specifies for the field `cancerL` and `cancerR` that a dot "." is the symbol used to represent missing elements. We also make the following suggestion in the [Cheat Sheet](https://www.synapse.org/#!Synapse:syn4224222/wiki/409763):
> Specify that "." represents missing data when reading the exams metadata and images crosswalk files (e.g. using R or Python/pandas). For example, specifying the symbol for missing values allows a parser to read age values as Integer ("." value will be set to NA) instead of String.
> Also image 000000.dcm could become .png during preprocessing after stipping '.dcm'.
I'm not sure we can help you with that.
> I may have missed it, but it would be helpful if the organisers could (have) publish(ed) alphanumeric masks of the field contents, or similar descriptions.
I think that the [Challenge Dictionary](https://www.synapse.org/#!Synapse:syn7214004) provides a detailed description of the tables while the wiki ([Cheat Sheet](https://www.synapse.org/#!Synapse:syn4224222/wiki/409763)) provides best practices to help you manipulating the files in the context of the Challenge but also in your experiments. Please let us know if you find a description of the data not clear or if you have best practices that we can add to the wiki.
Hope this helps!
Thomas Hi Yaroslav and Stephen,
We are currently investigating the issue.
Thanks! I had similar problems with SC2 running fine on Express then failing on Leaderboard.
Are there any qualitative differences in the data? E.g. we found some data items differ between datasets which causes code to break. E.g. CancerL pilot includes a . while CancerR does not, so the field dType becomes str vs int when pandas sets the dtype automatically. Also image 000000.dcm could become .png during preprocessing after stipping '.dcm'. These type of errors caused our had code to fail on SC1 but run ok on the train set. etc etc...
I may have missed it, but it would be helpful if the organisers could (have) publish(ed) alphanumeric masks of the field contents, or similar descriptions.
HTH.
Stephen