Dear moderators, We found that many "NA" (but not all) in our prediction file of expression lane submission, due to the fact that non-overlapping genes or overlapping genes with lots of missing values, between training sets and validation sets of express lane submission, were used. After checking overlapping genes with non-missing values between training sets and validation sets of express lane submission, ................................................................................................................................................ Is there a guarantee that those genes are also without missing values in validation sets of main queue ( formal submission) ? ................................................................................................................................................ If not so, it is questionable and thus participants may need to change their algorithms as soon as possible. e.g., If clinical factor ISS were used for prediction and ISS were with many missing values in validation sets, e.g., of main queue (formal submission), it can lead to many "NA" in prediction file and failure of (formal) submission. On the other hand, ISS may become useless. Thanks, Bruce

Created by Wei-Quan Fang deleapoli
Hi Mike Thanks and looking forward to see it. Bruce
Hi Bruce, That is acceptable. FYI we will be using the forum posts from the first Leader Board round to create a frequently asked question section on the wiki that will put much of these topics in one place. I'll send out an e-mail when the leader board scores are populated that mentions this FAQs page.
Hi Mike Thanks for yor detail explanation. I agree with your point to solve it in a manner of algorithm adjustment. As for your noted link, I considered it is also acceptable for participants to print the ratio (a binary mean) of "NA" in each gene of validation data? Kind Regards, Bruce
Please note that it is acceptable to print out what genes overlap in your log file as noted [here](https://www.synapse.org/#!Synapse:syn6187098/discussion/threadId=2377)
Dear Deleapoli,   The M2Gen validation set was run with a "Access kit" for consistency with other cohorts from that institute. This kit uses WES probes instead of poly-A to pull down transcripts which may lead to a difference in coverage between in and other expression datasets. for more details on can review the [webinar](https://www.synapse.org/#!Synapse:syn10288544) or the [data descriptions[(https://www.synapse.org/#!Synapse:syn6187098/wiki/449432).   We can review whether gene coverage was an issue for many teams after this round closes and will consider providing statistics on coverage in cohorts and possible an overlapping gene list. There are ways for a predictor to address this algorithmically of course but a simple/immediate fix now could be setting predictions score for samples that are missing many genes to the median of all other the remaining prediction scores. Alternatively the missing gene itself can be set to a default value.   Please remember it is a violation of challenge rules to output prediction scores in a log file.   Kind Regards, Mike

Is there a guarantee? page is loading…