Hi, Thanks for organizing this challenge. Reading the scoring section, I'm left unsure how to weight one big versus several small modules, assuming they're all correlated with the disease condition according to the GWAS studies. This is not a simple question, how to assess different sets of predicted modules; but I'm not understanding the answer you''ve chosen. Are we to submit a ranked list of modules, and our score is how many are correct before a certain number are in error? That would suggest a large coherent module of just under 100 elements would score higher if it were a number of small but relevant 3-element modules; that doesn't seem right. Also I'm unsure about the status of lower-confidence modules we find; are you saying it doesn't hurt to include them, but rank them lower, or label them with a low confidence score? Balancing precision versus recall is a nontrivial question; can you give us more guidance on your thinking about how to handle it? Thank you, Phil Rennert

Created by Phil Rennert philrennert
Daniel, Thanks for your extended thoughtful post. This is an education for me, and I hope anyone else reading it. So if I'm understanding, we're not talking about single-effect genes like BRCA1: in your experience it's combinations of genes in a module which cause the significant correlations; likely any module which contains one of these combinations will show significant enrichment, and only those. It's not clear to me that network-analysis tools alone will be able to discriminate the disease gene combinations from other pairs of genes with the same network degree and links of the same strength - but it's worth finding out. Part 1 is to get our feet wet; there will be more to work with, cross-correlating networks in sub-challenge 2. (And the actual gene combinations which have the effect may not even be linked in the networks, or only in a few of them.) But however this comes out, you'll get a large collection of submitted modules, some of which correlate with disease and some which don't; then you can scan the collection for pairs, triples, 4-tuples, etc. of genes/proteins which preferentially correlate, and get some good indications about which combinations are the "active ingredients" that make the successful modules succeed. So you answered my original question: large modules are more likely to contain significant combinations and therefore score, but we should ideally break them down far enough so that each submodule contains a significant combo, but no farther. My thinking in terms of coherent clusters probably doesn't apply here - it's the needles that matter, not the hay in the stack. Thanks again; looking forward to your preliminary analysis whenever other priorities allow. Phil
Dear Phil Thanks for your posts, it's indeed a nontrivial question and I think there are different valid answers. I think that choosing the scoring function for this challenge is so tricky because there is no biological "gold standard" for predicted modules. As you say, large modules can often be divided into smaller modules, and biologically it makes no sense to say that either one or the other is correct: smaller modules provide a more fine-grained grouping of genes whereas larger modules provide a more high-level view --- depending on the goal of the analysis, one or the other can be more useful. In this challenge, we opted to reward fine-grained modules, as smaller modules are often more useful to gain biological insight and guide experiments than large modules. Also, if we have fine-grained modules we can still merge them to produce high-level modules. Your example is not a case that typically occurs in our experience. If you have a module of size 100 that shows significant enrichment for a GWAS trait, and then you split in 33 modules, you likely end up with zero significant modules. That's because for very small modules, it's difficult to achieve significant enrichment after correcting for multiple testing. In our experience, a more typical example would be a high-level module of size 100 that shows significant enrichment because it contains 10 disease genes (large modules often have very weak but highly significant enrichment for disease genes). Suppose the module can be sub-divided into 10 modules of size 10, and now two of these modules show significant enrichment because they contain all or most of the disease genes, while the remaining 8 modules don't show enrichment. According to our scoring, this more fine-grained representation would be rewarded as it "zooms in" on the most relevant parts of the network (we count the total number of significant modules). But now suppose you further sub-divide these 10 modules, maybe you will end up with zero significant modules. The rationale of our scoring is: the more significant disease modules that we discover the better, i.e., we reward the more fine-grained, smaller modules (as long as they still show significant enrichment). I.e., we want to discover as many disease modules as possible and don't care about the other modules that don't show enrichment (they might be relevant for other processes that are not relevant to the traits and diseases in our GWAS set). During the leaderboard phase, teams will be able to explore how well different module sizes do and we will also post some of our preliminary analysis soon (sorry for the delay). We will definitely do additional analyses beyond the scoring based on the number of significant modules and we're very interested in participants ideas and feedback. Looking forward to continue the discussion. Best Daniel
Score is number of modules regardless of size? So if I have a 99-element module all with significant association with gwas summaries (say these proteins are up-expressed in the disease condition), and I break it down into 33 3-element modules, then I score 33 instead of 1? If this is so, what's the scientific justification? Wouldn't it be more useful to identify coherent modules of whatever size, subject to your 100-element limit?
The discovered modules will be evaluated equally regarding their association to a number of phenotypes. As mentioned in the wiki the confidence score of modules is not used for scoring. The score of a method is the number of modules with significant association with gwas summaries. Here we do not punish methods for false discoveries.

Disease Module challenge - How many modules? page is loading…