Hi,
I'm trying to submit the model to the challenge. The model runs successfully in the fastlane, but fails in the actual challenge queue.
Email message says "Error encountered while running your Docker container"
The id of the most recent submission is: 9725572.
Please let me know if there is more information I should provide
Thanks!
Created by Chengcheng Mou ChengchengMou @ChengchengMou ,
From @jgolob :
> **Is the structure of the test data folder identical to the training data? For example, the training data folder contains metadata_normalized.csv, does test data folder also include it?**
> _Yes. They should be structured identically._
>
> **Should the prediction file be designed in a specific way? For example, type of features. ((participant,was_preterm, probability)**
> _The specification of the prediction file is on the [wiki (under Output Files)](https://www.synapse.org/#!Synapse:syn26133771/wiki/612555) and has not varied from that._
>
> **Does fast lane use bootstrap from training data?**
> _Yes. More specifically, it is a subset (without replacement) of the training data._
As for a sample script, we provided a [Python notebook](https://github.com/jgolob/ptb_microbiome_samples/blob/main/Random%20Forest%201.4-ForChallenge.ipynb) (in addition to the R sample code) that you could use as reference.
Hopefully that helps! I wonder if you can provide some script that can assist us understand the basic logic? I'm quite sure that i didn't use 'was_preterm/was_early_preterm/was_term/delivery_wk' in the model, and I write the file just like your R sample model in github. It outputs the prediction file in the fast lane.
I believe there are still a lot of people being stuck with just submitting the file to the challenge lane, and that might be one of the reasons why participant number is not large compared to the registrants number. Hi @vchung:
Thanks again for your response.
I'm sure that I've removed all four variables in the script, otherwise status of the fast lane would not show that there is a prediction file.
At the very beginning, I deployed your sample model using R to submit and it passed. I've checked the structure of my predictions file, which is the same as your sample model.
The most recent submission id is : 9725675
In terms of that, I have several questions:
1. Is the structure of the test data folder identical to the training data?
For example, the training data folder contains metadata_normalized.csv, does test data folder also include it?
2. Should the prediction file be designed in a specific way? For example, type of features. (participant,was_preterm, probability)
3. Does fast lane use bootstrap from training data? Otherwise, I can't get a reason why fast lane can output the prediction file, whilst the model cannot pass in the challenge lane
Thanks! @ChengchengMou ,
For submission ID 9725608, the following error was received:
```
Error in predict.randomForest(model, predictor, type = "prob") :
variables in the training data missing in newdata
Calls: predict -> predict.randomForest
Execution halted
```
I'm not sure what the missing training data variables are, but just in case these were used in your script, the following columns are not available in test set's metadata.csv:
* was_preterm
* was_early_preterm
* was_term
* delivery_wk
Hope this helps! Hi @vchung:
Thanks for your response!
I've checked the Fast Lane again. After fixing the above issues, my Fast Lane showed a predictions file was generated.
However, after submitting to the Challenge Lane, there is still some error.
The most recent submission id is 9725608
Please let me know if there is more information I should provide!
Thanks! Hi @ChengchengMou ,
For submission ID 9725572, the following error was received:
```
randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.
Error in library(tidyverse) : there is no package called 'tidyverse'
Execution halted
```
Looking at your latest submissions to the Fast Lane queue also showed the same error.
That being said, we understand where the confusion came from: the [Submission Dashboard](https://www.synapse.org/#!Synapse:syn26133770/wiki/618023) only showed the submission status, not whether it passed validation as well. We have since updated the dashboard so that it includes the validation status of your model. Under **Passed Validation?**, you will either see:
* `VALIDATED` : a predictions file was generated
* `INVALID`: a predictions file was not generated
Hope this helps!