can you please implement modelState periodic update as the logs? because if i tend to train a set of models (since the initialization/pre-processing time is very long) at a round, and when i stop in the middle the finished models are also lost.
Created by Yuanfang Guan ???? yuanfang.guan > when i stop in the middle the finished models are also lost.
I'm sorry, but I don't follow this: By "stop in the middle" are you saying you are requesting a job cancellation? By "finished models are lost" are you saying that you don't get your model state back when you cancel a training job?
If your code has written out its model state and you cancel the job, then the model state will be returned to you (unless you exceed the 1GB limit, in which case you will be informed of the issue by email). Of course if you cancel your job before any state is written then there is nothing to return to you. Hi, Thomas,
y
ou don't have to upload every 5 minutes, you can offer something to upload every hour.
i intend to save several dozens of models in a round, because my bottleneck is read-in time, yestoday i cancelled a job in the middle, and i know that about half is done, so i lost the 10 models i trained.
also can you please offer a time usage track. with this mode of training i know i am depleting my quota quickly, but i am not sure how much time i have left, that i can plan on the rest shorter submission. Hi Yuanfang Guan,
The current alternative is to submit shorter jobs and restart from a trained model that you have retrieved. Nonetheless, I'll check Bruce to see if it's something that we can offer.
Thanks!
Drop files to upload
implementing modelState periodic update page is loading…