Is there any data aspects not captured in sc1_SimmulatedValidation_ClinAnnotations.csv? Thanks LoadingModelOrOptionData.... csvFile=/test-data/sc1_Validation_ClinAnnotations.csv ProcessingVCFfiles... == Processing WES_mutationFileMutect == Parsing VCF files WES_mutationFileMutect ==== Missing 0 of 276 == Processing WES_mutationFileStrelkaIndel == Parsing VCF files WES_mutationFileStrelkaIndel ==== Missing 0 of 276 == Processing WES_mutationFileStrelkaSNV == Parsing VCF files WES_mutationFileStrelkaSNV ==== Missing 0 of 276 ProcessingClinicalData... FormattingDataMatrix... training=FALSE Error in cbind_all(x) : Argument 3 must be length 185, not 276 Calls: format_dataMatrix -> -> cbind_all -> .Call Execution halted

Created by Charlie Xia xia.stanford
The next round (the validation round) will have the same columns populated as this round for sc1.
Hi Mike, Got scored. Thank you very much for the help! Also thanks for the hint. Cannot do much tonight but will keep it in mind. Are there any RNA-Seq based files available in the training set? Also just to make sure, in the final round, which of these VCF and Expression columns will have values? "MA_probeLevelExpFile","MA_probeLevelExpFileSamplId","MA_geneLevelExpFile","MA_geneLevelExpFileSamplId","RNASeq_transLevelExpFile","RNASeq_transLevelExpFileSamplId","RNASeq_geneLevelExpFile","RNASeq_geneLevelExpFileSamplId","WES_mutationFileMutect","WES_mutationFileStrelkaIndel","WES_mutationFileStrelkaSNV","RNASeq_mutationFileMutect","RNASeq_mutationFileStrelkaIndel","RNASeq_mutationFileStrelkaSNV","RNASeq_FusionFile"
Nope only the RNASeq_mutationFileMutect column (which is actually haplotypecaller for the rnaseq based but I digress). Please be aware that because it is rna-seq based it can have a very different distribution of mutaions so folks might want to be more conservative for these. good luck
Got it. I will modify my code to catch those RNASeq_mutationFileMutect files. Are there anything in these columns? "RNASeq_mutationFileStrelkaIndel","RNASeq_mutationFileStrelkaSNV","RNASeq_FusionFile" Thanks for the help.
Ok don't have answer yet but we are getting there. There are 276 samples in this leaderboard round for sc1. 91 of which are DFCI and do not have WES bases mutation data but have RNA-seq based... that leaves exactly 185 samples with the WES mutation data. I have checked and the "RNASeq_mutationFileMutect" files are indeed there for the DFCI data in the testdata folder. Is your code supposed to grab them? Your "process_vcf2gene" function doesn't seem to be grabing them from what I can tell.
Thanks. My intuition is some of the vcfs were not loaded or missing. Are there 185 vcfs instead of 276 in the express lane? BTW, ss there a way to circumvent this log size limit in express lane that I can debug? 22872 22880 22939 22963 22994 23017 1 2 2 2 2 2 2 == dim(vcfGeneMatrix)=185,18005 ProcessingClinicalData... FormattingDataMatrix... training=FALSE Error in cbind_all(x) : Argument 3 must be length 185, not 276 Calls: format_dataMatrix -> -> cbind_all -> .Call Execution halted Logs truncated because it exceeded size limit of 50kb
Ok great, That was indeed the section I was was confused by. I am going to go back in and look some more. Another thing that has changes is that we now provide the filtered vcfs with could affect the 180005 features. The unfiltered files are still there. I'll let you know what i find.
Hi Mike, 1. Yes, my submission was scored in round 2 for both Challenge 1 and 2. 2. Not sure what you are asking. Usually I use your simulated validation files to do sanity check and submit the script when it is OK. If you are asking files I load, there are some, here is my dockerfile: COPY model-state-metadata.Rd /model-state-metadata.Rd COPY xlib_extract_common.r /xlib_extract_common.r COPY xlib_utils.r /xlib_utils.r COPY xlib_cv.r /xlib_cv.r COPY score_sc1.sh /score_sc1.sh COPY run-mm-sc1.R /run-mm-sc1.R My Rd file is 4.9M as I can see. 3. No, it is running a full prediction. Are you relating to this section, which I used to do format check and it should be already commented out. ``` #Note any field maybe missing # if(opts$mode == 0) { #exit point, just test if we can generate the output file # n = nrow(opts$csvData) # scores = rpois(n, lambda=1) # flags = scores > mean(scores) # pred = data.frame( list( # study = opts$csvData[["Study"]], # patient = opts$csvData[["Patient"]], # predictionscore = scores, # highriskflag = flags # ) ) # #print(pred) # write.table(pred, file=opts$outputFile, sep="\t", row.names = F, quote = F) # message("syn7222203TestDriveDone") # quit() # ``` Thanks and appreciated. I did some research my own and found some samples got lost in one of my step but still not sure why. Here is my latest attemp: https://www.synapse.org/#!Synapse:syn11310101 -Info: invoking command:/usr/lib/R/bin/exec/R --slave --no-restore --file=./run-mm-sc1.R --args -m 1 -d /model-state-metadata.Rd -r /test-data -o /output/predictions.tsv sc1_Validation_ClinAnnotations.csv LoadingModelOrOptionData.... csvFile=/test-data/sc1_Validation_ClinAnnotations.csv ProcessingVCFfiles... == Processing WES_mutationFileMutect == Parsing VCF files WES_mutationFileMutect == Missing 0 of 276 == length(vgs[[c]])=276 == Processing WES_mutationFileStrelkaIndel == Parsing VCF files WES_mutationFileStrelkaIndel == Missing 0 of 276 == length(vgs[[c]])=276 == Processing WES_mutationFileStrelkaSNV == Parsing VCF files WES_mutationFileStrelkaSNV == Missing 0 of 276 == length(vgs[[c]])=276 **== length(vcfGenes)=276** **== dim(vcfGeneMatrix)=185,18005** ProcessingClinicalData... FormattingDataMatrix... training=FALSE dim(inputs_df)inputResponseMatrix,inputClinMatrix,inputGeneMatrix,inputExprMatrix= 276 ,1 , 276 ,21 , 185 ,18005 , , y , D_Age ,D_Gender ,D_ISS ,CYTO_predicted_feature_01 ,CYTO_predicted_feature_02 ,CYTO_predicted_feature_03 , VCF_GENEID_numeric(0) ,VCF_GENEID_1 ,VCF_GENEID_2 ,VCF_GENEID_6 ,VCF_GENEID_68 ,VCF_GENEID_70 , , Error in cbind_all(x) : Argument 3 must be length 185, not 276 Calls: format_dataMatrix -> -> cbind_all -> .Call Execution halted
Hi Charlie, I have pulled your docker image and am trying go through your code to help you quickly since the deadline is tonight. Some questions: 1. have you ever successfully ran your docker image on the express lane before? 2. there are any things printed to your log file that do not have calls in your run-mm-sc1.R file. I do not see a source to any other files so I think I am missing something. Your .Rd file has almost nothing in it so I don't that is where things are supposed to be. Is there any other file you are sourcing where these print out are coming from ( like "== Processing ....") 3. is your docker image just running a formatting check? your model looks based on a simple poisson unless I am missing something. Usually I would not go digging into someones code but I want to see if we can address before tonights deadline.
Thanks for quick reply. Please see 9647798_log.txt. https://www.synapse.org/#!Synapse:syn11309118 Do you mean I am grabbing RNAseq vcfs instead of WES vcfs provided in the "WES_mutationFileMutect", "WES_mutationFileStrelkaIndel" and "WES_mutationFileStrelkaSNV" columns? Sounds unlikely but I will double check.
just a quick guess from the 185 vs 276 numbers that something is going with your grabbing rnaseq based vcfs vs regular vcfs but I'd have to look deeper. can you supply the submission id?

Script OK with sc1_SimmulatedValidation_ClinAnnotations.csv but failed in ExpressLane1 page is loading…