Thanks for posting the pre-leaderboard results. However, I don't find them very helpful unless you provide more information, such as
1. What represents the best performance, a low rank number or a high one?
2. What metric is being used to generate the ranks? For isoforms, is it Pearson or Spearman, log or linear? For fusions, is it sensitivity, specificity, or F1?
3. If these are real tumors, then how do you know what the specificity is for fusions? Sensitivity can be measured, since it sounds like you spiked in fusions and you can use that as the gold standard for sensitivity. But if a program predicts that some other fusion transcript exists, how can you evaluate the specificity of that prediction without doing a wet-bench test for it?
4. Why are you providing ranks? It would seem to be more useful feedback to provide the actual statistics.
I know this project has been a lot of work, and I greatly appreciate your efforts. I just would like to understand what is going on.
Created by genomehacker Dear @creason
Can you be more specific about the fusions outside the spike-in transcripts? I am asking, because it is not trivial to decide what qualifies as a fusion:
- On the one hand, there are many cis-spliced transcripts between neighboring genes (a.k.a. read-through fusions), which would certainly validate via long read sequencing, but which are irrelevant to cancer, because they can be observed in healthy tissue. @ndaniel raised this question here: https://www.synapse.org/#!Synapse:syn2813589/discussion/threadId=1785&replyId=10757
- On the other hand, there are aberrant transcripts, which are relevant to cancer, but which do not join two genes together, such as translocations which fuse a tumor suppressor with some intergenic region, or fusions of the sense strand of one gene with the antisense strand of another gene.
There are many more examples one could think of. I already mentioned some here: https://www.synapse.org/#!Synapse:syn2813589/discussion/threadId=1781
How will you handle such predictions?
Regards,
Sebastian The metric being used to generate the ranks is based on Spearman for isoforms and F1 score for fusions. Higher spearman or F1 scores are given a low rank (best performance = rank 1). The purpose of this leaderboard is to provide a preview of how the methods are performing on a subset of the spike-in data, and in doing so, give an opportunity for adjustments and tweaking to methods. In the final leaderboard and spike-in dataset, long read sequencing will be used to confirm the presence of fusions outside of the spiked-in transcripts. I hope that answers all of your questions, but please let me know if there's anything else :)