Hello,
Our most recent submission (9711100) has "status: Accepted" and "UW status: Invalid". Previously, I have been able to resubmit when my model was "UW status: Invalid;" however, the submission system tells me my team has reached the quota.
Am I going to have to wait until the quote resets this evening, even though our model wasn't scored?
-Ryan
Created by Ryan Melvin rmelvin Quick update, I added very granular print statements to identify that it fails on a to_sql command. Info says the size is 1.1GB. Hi @trberg ,
Thanks for the reply. I have tried a number of times with variations of this model, still without success. On my local box, I can build the docker with the corresponding hardware limits (e.g. 10 GB mem, 4 CPU) and successfully train and infer on the synthetic data. I've added print statements which suggest we are failing while loading the measurement data, even though we are only loading 5 columns. Any insight you have to help us figure out why this runs fine on the synthetic data locally with the corresponding limitations, but fails on the synthetic data in your system would be greatly appreciated.
Cheers,
Ryan HI @rgodwin ,
So 9711216 doesn't look like it failed from OOM. It failed on the synthetic data run. From the logs, I'm seeing some 404 errors. Somethings the system can bug out for unknown reasons. Try that submission again and if it fails, let me know and I'll take a deeper look.
Thanks,
@trberg Hi @trberg ,
I'm also working to troubleshoot the issues with our (Ryan-Squared(2)) failing submissions. Can you confirm that the failure for 9711216 is the same as that of 9711100?
In that attempt we did indeed limit the columns to load with Pandas, but am still seeing a failure that looks like an OOM failure, but it's difficult to diagnose with the shared log files. If it that run does fail in the same way, do you have any other suggestions? We have capped our docker RAM and it works fine on all the data we have access to, but it's not clear why it would fail from RAM issues, unless the UW data set is significantly larger than the synthetic data available for testing.
Any insight you can provide would be greatly appreciated.
Cheers,
Ryan Hi @rmelvin,
I've looked into some of your more recent submissions and it looks like your model is running over the RAM limit when it tries to load the measurement table.
```
two week condition query
Load measurement data
/app/train.sh: line 2: 8 Killed python /app/train.py
```
If you're using pandas you could do a batch query or load in just a couple columns instead of the whole data table.
Are you still running into the submission quota issue?
Let me know if you'd like more help,
Thank you,
@trberg