Given the scenario, there are two modules with 3 proteins/genes that overlap. Do we group them based on a ranking scoring scheme, separate modules for each case (ie modules without, and a module with overlap), the modules separately without these overlapping, or figuring this out is a part of the challenge?
Created by Nathan Johnson bionerd Hi Dmitry
I agree that (disease) genes often participate in different functions, and could thus belong to different modules. This is particularly true for hubs in a network, which may be difficult to assign to a single module. Consider the following example, where the hub gene clearly plays a role in both modules, but in a non-overlapping module prediction it has to be assigned to a single module (here the blue module).
${image?fileName=overlapping%5Fmodules%2Epng&align=None&responsive=false}
[Seems my image doesn't show, [click here to view](https://drive.google.com/open?id=0By0cfZQ-vaa8eHpLa18zQVNqNnc)]
Now assume theses are two disease modules, and the hub is associated to both diseases (disease-associated genes are shown in red for two diseases A and B). In this case, the hub would only contribute to enrichment for the blue module in disease B. However, the orange module may still show significant enrichment for disease A if it comprises enough other disease-associated genes.
If we would allow overlapping modules, the hub could contribute to enrichment in both modules. This could make sense in this particular example, but there are other scenarios where it could lead to overcounting of the same signal.
I'm not sure I understand what you mean with "module assignment protocol", we don't assign genes to modules. We simply test each module for enrichment in disease associated genes. The way this is done is described one the Challenge Scoring wiki page.
The majority of clustering / network module identification methods used in practice give non-overlapping modules. Personally, I think we should promote methods for overlapping modules, hopefully in a future challenge. The problem is just that it poses significant problems for the scoring because of potential overcounting. I'd be happy to discuss if someone has ideas to address these issues. (E.g., imagine a participant would add the 50 most densely connected genes of the network to all her modules simply because these hubs tend to be disease genes [I'm not saying that's the case, just a thought experiment], which could lead to many disease modules where enrichment is always driven by the same 50 genes... We discussed different possibilities to address such issues but didn't come up with a satisfactory solution).
Best
daniel Hi Daniel,
Thanks for the clarification. Given that there are a few pleiotropic genes associated with a number of complex diseases, the question of how to assign such a gene is unavoidable (e.g., which cancer types does one assign TP53?). However, in my view the assignment without genetic information becomes a rather arbitrary process, since from biological point of view those genes are often essential in more than one disease module. So, essentially, what one would be trying to do is to "guess" the specific module assignment protocol that is performed during the assessment by the organizers. I wonder whether, instead of that, it would be useful to provide to the participants the actual protocol implemented by the organizers for their assessments, so everyone will use the same protocol, saving their submissions for figuring out the module definition, and not for pleiotropy reconciliation.
Cheers,
Dmitry Hi
The modules in the submission must be non-overlapping. (This is to avoid potential over-counting of the signal of disease-associated genes that may be part of multiple modules).
For methods that produce overlapping modules, it is part of the challenge to find out how to best derive a non-overlapping set of modules. During the leaderboard phase, you could try different options. But the modules in your submission must always be strictly non-overlapping, i.e., any gene can appear at most once in the submission file.
Best, Daniel
Drop files to upload
Clarifying Question Concerning Genes that fit multiple modules page is loading…