Dear organisers,
We had a question about the scoring subset. When generating the pipeline to run with the real data from train and test, in both subsets the time and event variables appear in synthetic data. Specifically, our pipeline takes into account these two variables in these sets (train and test). In the first one to train the survival model, and in the second one to eliminate them when making the predict of the model. Will the scoring dataset also have the same characteristics as these two datasets?
We are worried that some error will appear when executing the code with the scoring subset or the test subset.
Best regards
Created by Jose Liņares-Blanco jlinaresb Hi @jlinaresb ,
1. Yes, there will be time period when you can choose your final model and modify your submitted image as needed. However, this time, only 1 submission is permitted, so please make sure to fine tune your model during the submission phase.
2. We will provide you only with Harell's C and Hosmer lemeshow p-value that you can see in your dashboard, once we evaluate your submission.
Best regards,
Pande Thank you for your response @pande . Just for clarification:
1) Will we be able to modify the code between the submission and scoring phase? Or do we just have to re-upload the image we choose among the 5 images?
2) What feedback will we have in the submission phase? Only the C and the Hosmer values calculated from the output of the model?
Best regards Hi @jlinaresb ,
I need to clarify this answer again.
During the validation phase, the scoring dataset will not have event and event time informations for participant (and this set, even the synthetic version, is not available for participant during the evaluation).
Thus, during this phase, we will allow the participant to choose the final model that works best based on the submission phase, then modify it accordingly to remove the need of using event and event time info when making predictions.
Best regards,
Pande
Hi @jlinaresb ,
The scoring dataset will have same characteristic with test dataset.
We will replace the test dataset with scoring dataset (i.e add an additional folder of scoring data) during the validation phase. Please not, we will not merge the train and test data into larger training set.
So the steps will remain similar between the Submission and Validations phase.
The difference is, we will score your submission based on the scoring dataset for final scoring.
The main reasoning behind splitting the data into 3 categories is to keep scoring system fair for everyone and to prevent participants from overfitting their models.
Best regards,
Pande
Drop files to upload
Event and time variables in scoring subset page is loading…