Hello @brucehoff, @tschaffter ,
I have submitted a docker to the test it on the scoring express lane and returned ERROR_ENCOUNTERED_WHILE_SCORING, however the provided log file lines do not contain any STDERR and the STDOUT lines provided show that the code is executing correctly.
Notes:
- the submission ID is : 8032730
- Note that the docker file was tested locally on the pilot set and completes successfully.
- I submitted the docker again (submission ID: 8032830), and it resulted in ERROR_ENCOUNTERED_WHILE_SCORING with no STDERR, but the logs provided were different.
inquiries:
1- is it possible to review the full logs?
2- if an error occurred do you terminate the docker ? if that is the case, can we know the process for terminating the dockers? is it possible to let the docker complete even if it encountered a non-fatal error?
3- is it possible that the error log was not flushed out to the file, can we know the method you use to redirect the logs and save them into the file (e.g. do you use the command sc1_infer.sh &> log file name)
Created by wissam Baddar wbaddar We believe the 137 exit code is caused by your submission running over the 20 minute limit. The 'error' is that we return ERROR_ENCOUNTERED_WHILE_SCORING rather than STOPPED_TIME_OUT. We will address this and apologize for any confusion it caused. The exit code for 8054233 is 137. @brucehoff
you could instead investigate my submission.
Because when something is wrong, there could be one thousand reasons make it wrong. But mine is right on other lanes previously and later, you can narrow down the problem to this machine. Dear @brucehoff
Please note that I have submitted the model again with the ID 8054233, and the same problem occurred, I would highly appreciate if you let me know exit code for the error so that I can try to pinpoint if there is a problem in the code.
Thanks in advanced.
> is it ok to submit the docker to the express again
Yes, there is no restriction on the number of times you submit to the express lane
> can I ask if the docker image size plays a role in generating the termination error mentioned?
I cannot say for sure without deeper investigation but I would not think that image size per se affects memory usage. Dear @brucehoff is it ok to submit the docker to the express again and check what is the error that comes out because it seems that the docker logs don't flush the last lines into the log file hence there is no error message.
I am not sure this is a memory problem, because we already did memory profiling locally and there was no memory leaks (plus the code is written in python not c/c++). Moreover, use python subprocess on every N number of subjects so that we don't consume the memory.
also can I ask if the docker image size plays a role in generating the termination error mentioned? 137 can be many things. The first one was a bug in the code.
The second one was terminated by some system error on the server (the one I submitted successfully previous round and immediately after express lane in this round without any change)
update: as I think it now, it should be time-out, but it failed to report time-out. did you change your code around that? "**says exit code is 137, which implies a kill -9 scenario**"http://serverfault.com/questions/252758/process-that-sent-sigkill --- must be your side killed it, but didn't sent the time-out error @yuanfang.guan: Submissions 8048091 and 8047711 terminated with '137' errors which (according to a web search) mean that the container ran out of memory. We allocate the same memory to the container on the express lane as on the leaderboard so it's not yet clear what this means. Wissam's containers were cleaned up some time ago so I can't check the error codes now, but they may have suffered the same fate. update: I confirm that submitting the same submission from round 1 on the real lane received same score as round1 . But the exact same submission, without a single letter change failed on fast lane without any specific error.
Thank you @brucehoff for your response,
Dear @tschaffter, after reviewing the logs carefully, it seems that the docker is being killed abruptly and there are no STDERR messages in the log.
Please note that I have run the SC1 docker 2 times and the docker was stopped at different locations (different subjects) and there was no error messages and the logs show that the code is running correctly. Moreover, the docker has been tested locally on a large set (multiple replications of the pilot set) and it works fine without any errors or even warnings.
May I add that the log seems to be cut of while writing a string this can be cause an abrupt stop of the process or the log files not being fully flushed out into the log file. No, I am not able to solve it.
I encountered the same error, and I am submitting the same thing as round 1. I think there might be a problem on the fast lane. So I directly submitted to the real lane and we will see what happens. Hi Wissam,
Have you been able to solve your issue?
Thanks! > 1- is it possible to review the full logs?
Yes. You were sent the following email message:
```
Dear IVYCAD-KAIST:
Your submission to the Digital Mammography challenge (submission ID 8032730) has failed to complete its inference phase. The message is:
Error encountered during prediction. Last few lines of logs are shown below.
Your logs are available here: https://www.synapse.org/#!Synapse:syn8032785. Please direct any questions to the challenge forum, https://www.synapse.org/#!Synapse:syn4224222/discussion.
Sincerely,
Challenge Administration
```
The link in the message, https://www.synapse.org/#!Synapse:syn8032785, will take you to the full log file for your submitted model.
> 2- if an error occurred do you terminate the docker ? if that is the case, can we know the process for terminating the dockers? is it possible to let the docker complete even if it encountered a non-fatal error?
The message "Error encountered during prediction." means that your container stopped itself, returning a non-zero error code. Once that happens we simply return the log file to you. We only stop your container if the time limit is exceeded (in which case the message emailed to you is "Model exceeded allotted time to run.") or, for *training* submissions, if you cancel the submission.
> can we know the method you use to redirect the logs and save them into the file
We use the `docker logs` command: https://docs.docker.com/engine/reference/commandline/logs/
Drop files to upload
ERROR_ENCOUNTERED_WHILE_SCORING: but no STDERR in the provided logs page is loading…