i am thinking let a high-school kid play with this challenge, since it will be a good revision of middle school math. i want to make sure that he can play as much times as he wants in the summer break.
thanks
y. g.
Created by Yuanfang Guan ???? yuanfang.guan thanks so much dan for the timely reply. that is really informative. i wish you great success in organizing this challenge!! > if A and B are close together, a good method should cluter them togerthe
They can be clustered together, but if their gene scores are correlated due to LD, this has to be taken into account in the enrichment analysis. Otherwise, even random (simulated) GWAS will show significant enrichment for some modules, as we show in Fig. 3 of [Lamparter et al. 2016](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004714).
> do we get which id corresponding t o which gene id?/name/
No, the networks have to be anonymized because we include public GWAS data for evaluation. You're not expected to avoid having neighboring genes on the genome in the same module, the evaluation method just makes sure that the signal at a single locus is not over-counted, i.e. that module enrichment p-values are not inflated.
--daniel thanks dan.
'In this case, we merge/fuse these module genes and compute a new score that takes the full LD structure of the locus into account. This happens more frequently than one might think because neighboring genes on the genome are often connected in the networks, e.g. because they're co-expressed.'
but does that make sense. if A and B are close together, a good method should cluter them togerthe. but in fact, since there are together, so there are not scored twice. is it? sorry we don't really understand th elogic.
do we get which id corresponding t o which gene id?/name/ - A submission refers to all six networks.
- We do pre-compute the scores for individual genes. However, if some genes in a module are not independent (in LD), we cannot use the pre-computed scores. In this case, we merge/fuse these module genes and compute a new score that takes the full LD structure of the locus into account. This happens more frequently than one might think because neighboring genes on the genome are often connected in the networks, e.g. because they're co-expressed.
--daniel Does 1 submission refer to only 1 network or all 6 networks? thanks for the feedback. dan.
i don't see why gene score needs to be recalculated every time. --and so desu ka, THAT is the bottleneck..... that only needs to be calculated for ONCE for each GWAS study, because it is the same independent from modules/submissions. output it into a text file, and then for all input modules you only need to read in an existing gene score list.
then you can finish in 1/10th of th time, right? Thanks for the feedback, I agree with your points.
Computing resources for this challenge are generously provided by Vital-IT from the Swiss Institute of Bioinformatics, the evaluation is parallelized. However, evaluation of a single submission takes in the order of 10 hours of compute time (on a single node) and this is already heavily optimized. Compute time depends on the numbers of modules submitted, but for six networks there will generally be in the order of 1000 modules, tested against dozens of GWAS datasets. While the evaluation in some other challenges is trivial (some error function), we essentially perform GWAS pathway analysis (2.4 - Challenge Scoring, this involves re-computing scores for genes in LD for a given module and GWAS dataset as shown in Panel b of the Figure). So we will have to make some compromise regarding the number of submissions.
For students, I would recommend a setting where they can also do some offline evaluation, e.g. using a graph metric such as the modularity or using benchmark networks commonly used in the field.
The final evaluation will use independent GWAS datasets, but the same networks (we might add some additional networks in a second leaderboard phase).
--dan i just remembered a big problem of a small limit in submission is that people do not dare to use up their quota. e.g. in the alzheimer's challenge. we were give liken a dozen submissions, and in the end, we didn't use even half of it. because many things we want to try we didn't dare to try.....
thanks, dan, i think the limitation should be 300 to 500. that's only 10 submissions per day for the summer holiday. i think it is necessary for this challenge, since we cannot train of-line. for a high school student i think he will need more than 50 trials to figure out the submission requirement.
in the RA challenge there was no limit. we submitted over 500. we wrote a small piece of code to do automatic submission periodically, so it is quite convenient.
i dont think there would a problem of overfittingbecause the same overfitting can happen offline. . among our hundreds submissions, we know clearly which ones are overfilling and the model we submit originally ranked number ~50 on leaderboard.
i think synapse is run on amazon cloud. they can easily parallel the evaluation if they want.
so is the netowrs independent in the final test or the gwas independent in the final test?
thanks a bunch!! Hi
It would be great to have high-school students participate, I think this challenge is indeed well suited for student projects. Submissions are in fact limited, but should be enough to play with different solutions (we are still working on the scoring scripts and need to fix the number, it could be 20 or 50, for example). The main reason for limiting the number of submissions is that evaluation is computationally intensive in this challenge, as all the modules from each of the networks have to be evaluated against dozens of GWAS datasets. Another reason is to avoid excessive tuning of the methods to the data of the leaderboard phase (although we do have independent data for the final evaluation that would show this).
**What do participants think, what should be the limit for the number of submissions?**
Best
Daniel
The answer seems to be yes. See below.
2.6 - Challenge Stages & Timeline
The challenge consists of a leaderboard phase and the final evaluation. During the leaderboard phase, teams can make repeated submissions and see their performance on each network to iteratively improve their methods. For the final evaluation, teams make their ultimate submissions, which are scored after the challenge closes to determine the top performers. The leaderboards and final evaluation use different sets of GWASs for evaluation.
Drop files to upload
is this challenge infinite submissions? page is loading…