I am slightly worried that the scoring is not giving us a lot of information. Particularly, there is only type of information available to us - the GWAS data. Is there any chance that other types of enrichment data (pathway annotation) could be made available? I can understand this is perhaps impossible now that the challenge is underway, but it has been mentioned that other data (such as pathway enrichment) would be taken into account. As far as I know, there is no feedback towards the participants however.
Second question, it occurred to me that breaking up large clusters into smaller clusters could be beneficial -- by breaking up a large significant cluster in two, say, I may gain two significant clusters. I assume this is beyond the recognition power of the scoring method (recognising that two clusters of closely entwined nodes are both enriched for the same/similar traits). Any thoughts? (This is not something I have done).
Created by Stijn van Dongen speedwell Hi Stijn
(1) I see your point, but it won't be possible to include additional information such as pathway enrichment before the challenge closes. The GWAS approach is quite novel to compare module identification methods, a goal of the challenge is also to see how well that works and whether results are similar to other metrics such as pathway enrichment. As teams already noted, one limitation is that the number of modules that are hits/significant are quite low. In our exploratory analysis we found that enrichment in pathway annotation has the opposite problem, most modules show enrichment in some pathway. It seems getting significant hits with pathways is easy and with GWAS is very hard (as expected). While the challenge focuses on GWAS at this stage, I think in papers written by teams and us later on will include additional analyses.
(2) This question keeps coming up :) As teams don't know the identity of the significant modules, they can't just specifically break those up that are significant to gain an advantage. The results so far show that there is no systematic bias towards smaller modules, so in general breaking up clusters is not a strategy that gives an advantage. It could also be that the larger cluster is boarderline significant, and when it's broken up into two modules neither is significant anymore. Or the two smaller modules are significant in the leaderboard set but not in the final set. I think the limited number of submissions makes it basically impossible to try to find out which modules are significant and if breaking them up further would lead to a higher score. In general, if larger modules can be broken up into smaller modules that still show enrichment, this is encouraged by the scoring and that makes sense because smaller modules are typically more useful to gain biological insight and are easier to follow up experimentally.
Best
Daniel