Hi I noticed that many of the cytogenetic data are completely missing at validation for example: the number of missing values are variable value 1 CYTO_predicted_feature_12 279 2 CYTO_predicted_feature_05 268 3 CYTO_predicted_feature_08 268 4 CYTO_predicted_feature_17 198 5 CYTO_predicted_feature_13 187 6 CYTO_predicted_feature_15 167 where there were 279 samples. Can we get a list of CYTO features that are actually available in validation? I rather exclude them from training from the beginning otherwise I will have to refit models in the docker after removing these variables. Imputation makes no sense with this level of missingness. thanks DA

Created by exquirentibus veritatem exquirentibus
Sorry I did not understand that this question is in regards to question 1. No you cannot use expression for that unfortunately.
Hi I thought expression data was not available for validation datasets in challenge1. Are you saying that we can use expression data to predict the missing cytogenetic in validation for challenge 1? thanks DA
Dear DA, I have asked a some folks with more background on these data to address this. But I can say that some studies simply did not have FISH assays or other assays targeted to the certain IGH translocations so they are not available and that is reflected in what you are seeing. If participants like they can use expression data or LOH in WES data (though I think those all have FISH calls for cytogenetics) to approximate Cytogenetic regions you my be interested in. I know the annotation files have blinded column names so that may not directly address your issue. Regards, Mike

CYTOGENETIC missingness in validation page is loading…