Hello,
In the past when I've used the same preprocessing docker key in an express lane training submission that I use for the actual training queue, it didn't do the preprocessing step (because I assume it just used the one from training), however now it did the preprocessing step. This eventually caused an error in further downstream processes because it recomputed validation and training splits, etc. which led to an error, whereas it ran fine before. Has something changed? Will this at all affect my preprocessing in the regular training queue?
Thanks,
Bill
Created by Bill Lotter bill_lotter Thanks for the response Bruce! That makes sense. It seems like it was a mix-up on my end then at some point, sorry about that. > In the past when I've used the same preprocessing docker key in an express lane training submission that I use for the actual training queue, it didn't do the preprocessing step
The caching of preprocessed data is done separately for the express lane and the leaderboard. This is because the two submission queues have different data sets. (The express lane data set is far smaller.) So the output of a Docker preprocessing submission run on one set cannot be substituted for that run on another set. Further, in our implementation the express lane and leaderboard are on completely different machines, with preprocessed data cached on the local disk (to provide high I/O performance). In short, there is no effect of using the express lane on your leaderboard results or vice versa.
> now it did the preprocessing step
This is the correct behavior. If earlier you saw your preprocessing step being skipped it was because you (or someone on your submission team) previously ran another submission with the same preprocessing step.
> This eventually caused an error in further downstream processes because it recomputed validation and training splits, etc. which led to an error, whereas it ran fine before.
Sorry, I don't understand. If your algorithm is deterministic then there should be no effect of using cached preprocessing output vs. recomputing from scratch, aside from the additional time spent redoing the computation.
> Has something changed?
There are no changes in our caching strategy.
If there are further questions, please ask.
Drop files to upload
Express Lane Training Pre-Processing page is loading…