Hi @trberg
My submission is showing repeatedly showing "EXCEEDED TIME QUOTA" even though it barely takes 11 mins in my local machines with 6 core at 2590 MHz.
With the hardware provided in the challenge, it should not exceed the time limit.
One more weird observation, My submission ID: 969032 was significantly more computationally expensive than the recent submissions (9696123, 9696081, 9696079) and surprisingly it got validated in Fast Lane while the rest are failing.
Could you please look into the issue?
Thanks,
Shikhar Omar
Created by Shikhar Omar shikhar-omar Hi all,
If you submit your models prior to the end of Round 2 today, and you are still getting an "EXCEEDED TIME QUOTA", we will work with your team to get those models run all the way through (possibly by relaxing the runtime constraints), but we will still count those submissions toward Round 2. We will be working over the next couple days to get those run.
See the most [recent announcement](https://www.synapse.org/#!Synapse:syn18405991/discussion/threadId=6374) for more info.
Thank you for your patience,
Tim i have same issue, in local cpu, we train in one hour, but in fast_lane, it takes more than 4 hours calculated time Hi @trberg
Thanks for looking into this issue, but the problem still happens to me: the training log was returned in a decent time but no infer log or error was reported after then. Even the previous validated model cannot pass the fast lane test.
Best,
Jifan Hi all,
Try and submit your models again. We cleaned up the validation server so your models should run in a decent time.
Thanks,
Tim
I have similar issue.
Best,
Ari Hi @trberg ,
our team has still problems again with the time and I guess your infrastructure of the hardware.
On our local machines (Desktop not more than 8gb and 4 real cores) our model finished the fast lane data in 4min. with optimization of the parameters.
But unfortunately in the fast lane it takes longer than the 1 hour.
WE uploaded the same model on saturday (7.12) between 2pm and midnight (european time) and the same model finished the fast lane successfully and now the same model has again the problem with taking to much time.
Is there a solution to this from your side and is there a possibility to get an submission deadline extension due to this issue?
Best regards,
Stefan
This has happened again sometime in the previous month on fast lane. Script was running ok locally (15-20mins) but server timed out about 3 times. This issue however was solved in the next day. Hi @trberg
I believe the previous issue happened to me again. The training log was generated as normal but then it was stuck in the infer stage.
On my local machine, it took a similar time (~6 mins) to run the training stage and the infer stage. On the server, the training log was generated in 20 mins but nothing was returned since then until the EXCEEDED TIME ERROR occurs. I also submitted my previous validated model again. and the same problem happened: the training log was returned in 5 mins but no infer log was generated in the next 30 mins before I stopped the process.
Even if the infer stage takes too long, can we have access to the infer log?
The most recent submission id is 9696469, and its last update is 12/08/2019 5:03 PM, CST.
Many thanks for looking into this issue,
Jifan Hi @trberg
This is the submission id = 9695909, submitted on 11/27/2019 11:50 AM, scored on 11/27/2019 6:24 PM.
Thank you!
Anas Hi @belouali,
Would you mind giving me the submission id for that slowed down submission. Also, if possible, the date and time that it ran. We are looking into a possible cause of these problems and this information would be very helpful.
Thanks,
Tim @ivanbrugere true but I meant we ran the modelling on the full synthetic data (same size as UW data) on a VM with the same specs and our scripts finished in less than 5 hours whereas they time out in the UW environment.
We also noticed that a previous code that runs on our VM in an hour on full synthetic data, took 5 hours to complete and get scored on the UW data.
I think there is a considerable slowdown in the UW environment. @ivanbrugere We also had one good run today morning on Fast Lane, but unfortunately on the Challenge run the slowdown was back again..
All following submissions were also slowed down again. @shikhar-omar Our model ran to completion yesterday morning on Fast Lane and then Main Challenge and scoring completed.
@belouali, our model completed Fast Lane in 37 minutes, and it scales linearly by input size. Consider that your model may not scale linearly so the runtime of Fast Lane may not reflect UW. Same issue for our team. We are able to run models locally in decent times but it fails with "EXCEEDED TIME" error on the UW data. We have encountered the same problems over the last days. While we noticed slowdowns of factor 50-100 compared to local runs on a standard PC in the worst cases, it currently seems to be a slowdown of only factor 10-20. So it got better, but it's still really slow and we are unable to run our models sufficiently. It would be good to have this solved before November 9th..
@trberg Thanks for looking into the issue @ggggfan Still not working for me.
@ivanbrugere Is it working for you? @shikhar-omar @ivanbrugere I think the issue is resolved now.
@trberg Many thanks! Thanks @trberg Hi all,
We're looking into this. I'll update when we get this resolved.
Thank you for your patience,
Tim Yes, since training logs are not returned by the system I am unable to get to the root of the issue. Really need help in resolving this.
@trberg As this is a common concern among other participants too. Could you please help us in resolving this.
Regards,
Shikhar I am having the same issue. On my local machine, my model runs in 15 minutes. When I run on the Fast Lane submission, it times out, and the logs show it was executing for an hour. However, the training logs are not outputted so I can't see my own time logging of the underlying models. I believe I am also having the same issue. I got the training log in ten minutes but no infer log was generated until the "EXCEEDED TIME" error. I only changed one parameter from the previous validated model.