Hi organizers,
We have a INVALID submission (id:9731246). The error message seems to indicate that the script is having problems reading data (train or scoring).
Our script "grep" input files by keywords (i.e. ```pheno```, ```taxtable```, or ```readcounts```) within a directory (```train/``` and ```scoring/```). We replicated the directory structure as [here](https://www.synapse.org/#!Synapse:syn27130803/wiki/620471), ran our model locally and it works!
Is it possible that the ```train/```or ```scoring/``` directory has more than the three files listed in the wiki? If not, more information about the error would be helpful..
We want to confirm this, before doing another submission.
Thanks in advance!
Created by Jose Liņares-Blanco jlinaresb Hi @jlinaresb,
All train, test, and scoring set have Event_time range from 0 to 15 years, so yes it contains 0.
Best regards,
Pande Hi,
First, thank you so much for all the report. It is highly apreciate!
Regarding the error you report in the previous answer, we are not surprised, as the script is adapted to deal with scoring data without the Event and Event_time variables. And by using the test set, which has those variables, the error is expected.
We understand that the first error refers to the handling of negative times. But our script already considers that possibility.
The reason for the error could be given by the presence of Event_time = 0 values. Could you tell us if the train data have samples with Event_time = 0? In that case, we could modify the script to also handle those samples.
Best regards, and thanks Hi,
Yes indeed, your previous submissions of first round id 9730494 and 9729604 seems failed to be validated in training set with the same error message.
The score given in training set was performed exactly using the same steps as what we are doing in test and now in scoring dataset.
In short:
We run your submitted algorithm using the same command in all datasets, however we modified the 1st argument that we use to point to the respected "working_directory".
In case of validation in training set, we copied the train set (so there will be 2 train set folder) and rename this copied `train`set into `test`so that your algorithm can still point to exactly similar directory structure. I am not sure where this error stems from, because your last submissions (id: 9729772 and 9730514) works for both test and training validation.
From your final submissions, we only validate it in scoring dataset, so it seems that your model failed in scoring validation.
For further troubleshooting, I tried to run your submissions where the training done in train dataset and predictions in test dataset (which I renamed as scoring to preserve the directory structure for troubleshoots purpose ).
It did not results in same error, but the model also failed with this error:
`Error: ncol(x_train) == ncol(x_test) is not TRUE
Execution halted`
I checked the ncol of train and test and it is same, but please note in scoring dataset we removed 2 columns (event and event_time).
I suspect the error might came from how you exclude the subject or how your survival model deal with negative event_time , for example the definitions of a negative event (whether event_time <=0 or event_time<0)
Please let me know if I can further help. Hi,
Thank you, now the data can be read correctly.
The problem is that I get the same error as when you tested the model with the train data. How do you make this prediction? Because I find it strange that the model works with test data but fails to validate on train data.
Also, is it possible that this time it worked with the scoring subset and failed again with the train validation?
Sorry for the inconvenience. Hi,
Thanks for clarifying this.
We indeed has another file that were censored from participants inside the same directory for the evaluations purpose.
However, we moved it to different directory so your script should work fine with the structure.
Now, the directory structure is similar to what we've shared in wiki.