Dear Organizer: My submission to the Digital Mammography challenge (submission ID 8225859) has failed to complete its inference phase. The error message is: Model exceeded allotted time to run. The submission was running for several days before being killed. I am wondering if this is due to technical difficulty (due to maybe sysadmin are upgrading the system, and thus had to kill and restart all runs?) or other reason. During the round 2 discussion, there had some discussion (https://www.synapse.org/#!Synapse:syn4224222/discussion/threadId=1295) about how long the inference should be allowed to run and the Thomas Schaffter has agreed that the organizer would set up guideline for round 3, and for all round 2, the organizer shall allow them to finish. So I am wondering why this run is stopped (from the log, it shows that it is still making steady progress. This is the only submission we submit to SC2 in round 2. Would you please allow this submission to rerun and wait for its completion so that we can get a score from that? Thanks!

Created by jkjk69
Dear organizers, thanks for the scoring effort! I am also very concerned with the apparent big variation in running time during the inference run. Using my own experience as a reference point: I had two inference runs, they are running exactly the same container and same sub-challenge, but the average time per image actually differ by about 3 times! So it would be hard to know when we should stop optimizing for speed for the final round, if there are too much variations for virtual machine performance. Given the limited human time left for the competition, we can either spend that time optimizing for final inference run speed, or we can spend time optimizing for better AUC score. It would be very helpful, for example, to limit one job for each physical machine per time for the **inference runs**. That would greatly reduce the variation of the job run time, given we do have a hard-limit on inference run time. Thanks a lot!
Before we get the bottom of the problem, one easy work around is force that every inference machine ONLY take ONE inference job at a time. That can avoid contention and **actually increase the whole system throughput to 2.50 time**. It is clearly a win-win for all of us.
Please note in the logs I posted above, split and stitch are pure CPU in-memory operations, i.e. no GPUs, no IO and no locks are involved. Basically you can view the split time and stitch time as micro benchmark. Yet the difference between slow one and normal ones are about 10 times! The difference between fast one (inference express lane) and normal one is more than 50 times!   I think before this problem is fixed, enforce the 8 days hard-stop policy is very problematic to get fair competition result.
I think 8 days is pretty tight for 128,000 images. Essentially only allow 5.4 second per image.   The much bigger problem is that it seems that there are two different kinds machines, and the HUGE performance difference:   The following logs are collected from 2 very similar container runs, exactly same code involved. As you can see, the later one is running at less than half of the speed of the former ones. I wonder it is the difference between two cloud providers. BTW, in all runs, there are two threads to leverage two GPU, so the actual secs / image is half of the shown value.   Given this condition, whether our inference will run out-of-time or finish in time are purely depend on whether our inference jobs will landed on the faster type of machine (or machine that handles contention better). i.e. pure luck. The slower machine will cost us machine time in training stage, and much more importantly, it will be a complete disaster for us in inference phase.   Dear Organizers, can you look into this issue? We can provide more logs if needed. 8444322 is still running. It is highly desirable to level the playground so that sophisticated models will not get penalized unfairly. Thank you very much!   8437355_z4_preproccessing_logs.txt: (**normal one**) run_predict() ] total done batch_size=120. total=671.49 avg_cost **5.60** second (prep=0.11, split=0.07, pred=3.14, stitch=0.07, write=0.92) run_predict() ] total done batch_size=120. total=489.94 avg_cost **4.08** second (prep=0.10, split=0.04, pred=2.67, stitch=0.04, write=0.44) run_predict() ] total done batch_size=120. total=532.05 avg_cost **4.43** second (prep=0.10, split=0.07, pred=2.88, stitch=0.07, write=0.65) run_predict() ] total done batch_size=120. total=719.92 avg_cost **6.00** second (prep=0.10, split=0.21, pred=3.36, stitch=0.21, write=1.23) run_predict() ] total done batch_size=120. total=749.33 avg_cost **6.24** second (prep=0.13, split=0.32, pred=3.44, stitch=0.32, write=1.39) run_predict() ] total done batch_size=120. total=765.91 avg_cost **6.38** second (prep=0.13, split=0.17, pred=3.75, stitch=0.17, write=1.15) run_predict() ] total done batch_size=120. total=522.13 avg_cost **4.35** second (prep=0.14, split=0.05, pred=2.81, stitch=0.05, write=0.48) run_predict() ] total done batch_size=120. total=420.41 avg_cost **3.50** second (prep=0.09, split=0.07, pred=2.31, stitch=0.07, write=0.50)   8444322_z4_preproccessing_logs.txt: (**slow one** and is still running) run_predict() ] total done batch_size=120. total=2035.30 avg_cost **16.96** second (prep=0.66, split=1.72, pred=8.76, stitch=1.72, write=3.71) run_predict() ] total done batch_size=120. total=1707.06 avg_cost **14.23** second (prep=0.27, split=1.44, pred=8.42, stitch=1.44, write=2.16) run_predict() ] total done batch_size=120. total=1515.85 avg_cost **12.63** second (prep=0.32, split=1.16, pred=7.21, stitch=1.16, write=2.22) run_predict() ] total done batch_size=120. total=2069.70 avg_cost **17.25** second (prep=0.48, split=1.59, pred=9.10, stitch=1.59, write=3.75) run_predict() ] total done batch_size=120. total=1955.29 avg_cost **16.29** second (prep=0.45, split=1.43, pred=8.79, stitch=1.43, write=2.87) run_predict() ] total done batch_size=120. total=1741.35 avg_cost **14.51** second (prep=0.56, split=1.20, pred=8.19, stitch=1.20, write=2.70) run_predict() ] total done batch_size=120. total=1786.10 avg_cost **14.88** second (prep=0.52, split=1.30, pred=8.10, stitch=1.30, write=3.05)   This is collected from an express lane inference run, same code, and it is even faster. I guess it is because of the express lane machines are idle most of the time. i.e. no contentions. 8449487_z4_inference_logs.txt: (**fast one**) run_predict() ] total done batch_size=97. total=181.10 avg_cost **1.87** second (prep=0.10, split=0.02, pred=1.61, stitch=0.02, write=0.08) run_predict() ] total done batch_size=97. total=252.07 avg_cost **2.60** second (prep=0.16, split=0.03, pred=2.22, stitch=0.03, write=0.11) run_predict() ] total done batch_size=96. total=131.59 avg_cost **1.37** second (prep=0.07, split=0.01, pred=1.19, stitch=0.01, write=0.05) run_predict() ] total done batch_size=95. total=191.01 avg_cost **2.01** second (prep=0.12, split=0.02, pred=1.71, stitch=0.02, write=0.08) run_predict() ] total done batch_size=95. total=197.83 avg_cost **2.08** second (prep=0.12, split=0.02, pred=1.79, stitch=0.02, write=0.08) run_predict() ] total done batch_size=94. total=201.49 avg_cost **2.14** second (prep=0.13, split=0.03, pred=1.82, stitch=0.03, write=0.09) run_predict() ] total done batch_size=96. total=176.27 avg_cost **1.84** second (prep=0.10, split=0.02, pred=1.58, stitch=0.02, write=0.07) run_predict() ] total done batch_size=97. total=179.96 avg_cost **1.86** second (prep=0.11, split=0.02, pred=1.59, stitch=0.02, write=0.08) run_predict() ] total done batch_size=99. total=162.81 avg_cost **1.64** second (prep=0.09, split=0.02, pred=1.42, stitch=0.02, write=0.06) run_predict() ] total done batch_size=99. total=206.41 avg_cost **2.08** second (prep=0.12, split=0.02, pred=1.79, stitch=0.02, write=0.08)   Here is the log of local-test-run on my local machine : (GTX 1080 + i7-2600K). It is more consistent and even faster than the express lane. run_predict() ] total done batch_size=120. total=144.17 avg_cost **1.20** second (prep=0.17, split=0.01, pred=0.71, stitch=0.01, write=0.15) run_predict() ] total done batch_size=120. total=155.11 avg_cost **1.29** second (prep=0.15, split=0.03, pred=0.75, stitch=0.03, write=0.19) run_predict() ] total done batch_size=120. total=146.34 avg_cost **1.22** second (prep=0.15, split=0.02, pred=0.69, stitch=0.02, write=0.19)
I am in a similar situation as Yuanfang, I estimate my model would take around 5-6 days during inference for 128k scans. Also plan to speed it up for the final round though :)
Hi, Bruce, I just submitted a version that is going to run 9.5 days. would that be a problem? i will try to speed up in the next submission. but i have other obligations it is not likely to happen this month.
I was able to create a valid predictions file from your log output and pass it to the scoring process. Looks like you did quite well. > do you know what happened to the "/output/predictions.tsv" file in that container? No, it doesn't make any sense to me. > I am wondering one possibility of this happening is that, the /output/ failed to be mounted into the container from the host. So the /output/ directory we wrote into is only visible inside the container. That's a sensible hypothesis but it's not clear how that might happen, nor have other participants been reporting such a problem.   Going forward our intention is to limit inference submissions to 8 days. (We will increase the limit in the 'final round' since the final validation set is 50% bigger than the leaderboard.) Hopefully you can work within this time limit.   Good luck!
BTW, just to clarify, the last line in /sc2_infer.sh to do "cat /output/predictions.tsv" is the only place we print the result out to the log. So if you can see the predictions in the log, it means that these predictions number you see in the log comes from /output/predictions.tsv. I am wondering one possibility of this happening is that, the /output/ failed to be mounted into the container from the host. So the /output/ directory we wrote into is only visible inside the container. Thanks!
Dear Bruce, Thank you so much for your accommodation! In the 2nd to last step, our python inference program will generate the output and write to /output/predictions.tsv file as the last step in the python program. And then from our /sc2_infer.sh, in the end, we do this "cat /output/predictions.tsv", which will print the result again to the log. Thanks a lot for retrieving and score them! BTW, do you know what happened to the "/output/predictions.tsv" file in that container?
"NO_PREDICTION_OUTPUT" means your code did not write results to the `/output` folder.   I see the predictions in the log file. It might be possible to retrieve and score them.
Dear organizer, my scoring run had finished. But it says: "NO_PREDICTION_OUTPUT", (the submission id is 8225859, last updated 03/12/2017 12:01:15PM, and submitted repository is syn8221394) This scoring run was initially submitted in Round 2 for SC2, it was initially timed out at that time. And then thanks for your consideration, I get to have this submission rerun. It finally had finished after so many days, it would be really helpful if I can get some information from this run after it taking so much resources. I am wondering if you can kindly take a look at the log and see what had triggered the error "NO_PREDICTION_OUTPUT"? I had run the code on express lane, and that works fine. Thanks a lot!!
Thanks a lot for giving me a chance to finish the scoring run! Really appreciate that.
Adding @gustavo, @Justin.Guinney, @tschaffter to the thread. > There is absolutely no performance/time guideline for inference submission stated anywhere. ... Without knowing the per-image or per-case budget, it is impossible to make trade off between speed and accuracy. Yes, you are right. We value your participation and offer our apologies for this situation which (as explained before) was driven by our need to finalize the Round 2 leaderboard and free up servers for Round 3. Going forward we will work to give an explicit time cut-off for inference submissions.   The Challenge Leadership has directed that we restart your submission (8225859), though the result will not be added to the leaderboard. We hope that seeing the results will help you continue to develop your model for the final validation round. > Other than organizers, no one knows how much images in inference phase There are about 128,000 images in the inference phase. The inference data set is about 40% the size of the training data set.
Dear Bruce, > The servers running these submissions needed to be freed up for maintenance and use in Round 3 I think this is not fair. There is absolutely **no performance/time guideline** for inference submission stated anywhere. Other than organizers, no one knows how much images in inference phase and no one states the max time allowed in inference phase is 8 days. Thomas Schaffter (tschaffter) once said: 'For this round, we will do our best to process all the submissions as we did in Round 1.' Without knowing the per-image or per-case budget, it is impossible to make trade off between speed and accuracy. This basically hugely favours simple models and is very discouraging for sophisticated models.
Several submissions to the Digital Mammography inference queues were terminated due to running for excess time. While the average run time for an inference submission in Round 2 for Sub-challenge 1 was 31 hours and for Sub-challenge 2 was 51 hours, these ran for over eight days: ``` 8224308 8225859 8225716 8225688 8222677 8149138 ``` The servers running these submissions needed to be freed up for maintenance and use in Round 3. We appreciate your understanding.
Dear Organizers: I am wondering if you can address this concern about the timeout issue from my last score run so that I can get a score for the challenge? Thanks a lot!

Scoring run timed out? page is loading…