A team who does not intend to participate in Sub-challenge 2 could in principle misuse its leaderboard to score more submissions for Sub-challenge 1. Although the networks are anonymized differently, one could apply a given method to a single network from Sub-challenge 2 and submit the prediction to Sub-challenge 2 to know the score for that network, without the intention to really participate in Sub-challenge 2.
Of course, a single submission in Sub-challenge 2 would only allow for scoring 1 network in this way, while a single submission in Sub-challenge 1 gives the scores for all 6 networks. Nevertheless, this is forbidden. As stated in the rules, any tricks to try to make more submissions than allowed are considered cheating and suspicious behavior will be followed up, especially for the best performers. Module predictions for individual networks must be submitted to Sub-challenge 1. **Submissions to Sub-challenge 2 must be from a method that integrates information across more than one network. **
We are confident that teams are working hard to improve their methods instead of trying to find ways to cheat the system, but it's always good to make sure that the rules are clear.
Best, Daniel
Created by Daniel Marbach daniel.marbach ======
may i know what is the eventual number of gwas?
======
I remember from the original description that the full collection of GWAS data is over 200, however the test set (if I recall well) consists only of 68 datasets. Is this correct? thanks, daniel. i found it is consistently lower, due to larger background. i am only surprised that my random submission beat 70% of the teams who are seriously submitting.
and about my recent submission, i am going to share the code for aligning networks when i am back to work, since i found it could be the barrier for some students.
may i know what is the eventual number of gwas?
because i pruned most of my trees off to deal with multiple testing issues, then suppose you have infinite number of gwas, i can still have at most 89 NS, but for someone submitted 3000, they will have 3000. > hope that explains most the bizarre behaviors/inconsistency on the leaderboard
Hi Yuanfang, I agree the scores in SC2 looked surprisingly low, but I did some tests over the weekend and there is no bizarre behavior/inconsistency. The scoring script of SC2 is correct. While the scores between SC1 and SC2 cannot be directly compared, the scores were low because it seems the multi-network methods did not perform well so far. Indeed, I saw that one of your submissions already got a better score now.
Scoring of random predictions is running, I will likely have results tomorrow to share on the wiki.
Best, Daniel i know part of the reason now, and i think this piece of info is useful to everyone. i submitted the same thing i did in bonus phase to the new leaderboard, the score doubled-almost tripled. the bonus phase approximates the result of sc2. and depending on the network, an extreme case that score of 20 in sc1 could correspond to 3 in sc2/old leaderboard to the other extreme: 15 corresponds to 10.
hope that explains most the bizarre behaviors/inconsistency on the leaderboard.
now the only bizarre thing is my random prediction of 7 in sc2, which is approximately equivalent to one of the highest predictions now could exist in a single network in sc1, raising the question of what is the random distribution, e.g. is a current top score of 77, really statistically different from a score of 30 or 40? "but dmitry, don't you think it is insanely weird that your submission of any is only 1 higher than my random submission?? while in sc1 you are doing among the top.
i wonder what will happen if you submit your 23 entry there. i wish i had a high number entry, so i can submit and test..."
Yuanfang, yes it is very bizarre! I do not have a good intuition of what is going on in Sub-challenge 2. We all puzzled: apparently the clustering methods that work greatly for the single networks do not work at all for the joined network. We are now comparing the modules from each submission to see if that can be explained by the module structure. But, you do have a very strong point here. Weird. > I second Yuanfang's concern about the weighted combination of the network that would really look like one network.
I agree it's a concern. It's linked to the baseline, because of course if the SC2 prediction outperformed the best SC1 predictions, we wouldn't care much how the networks are weighted. Next week we will: (1) double-check that the SC2 scoring is correct, (2) discuss the structure of the challenge (baseline etc.), (3) see if scores improve in the 2nd round with more submissions (I think these first 5 submissions were very little to devise a good strategy for SC2, because teams have to address both issues of granularity/module size like in SC1, plus also how to weight / integrate networks).
--daniel > i wish i had a high number entry, so i can submit and test...
Yuanfang, I will do that test and post the result.
--daniel > We would like to submit a single network to complete the test of this hypothesis
Dmitry, feel free to make your single network submission, as it is obviously part of your work on SC2. I'm not worried about one submission, but someone who would start making multiple single-network submissions to gain an advantage in SC1.
--daniel >also, it is too naive to assume that only i (and dmitry now) would consider do it, or do a variant of it. as i have said like a thousand times to different dream organizers, the difference is i >do it, and i admit. and some others do it, they just do it. for example, when i do alignment, i give one network a weight of 0.99, the other 0.002 each, is it ok? if 0.99 is not ok, then how >about 0.98, 0.97, 0.9, what cutoff is ok?
I second Yuanfang's concern about the weighted combination of the network that would really look like one network. There are many ways to boost a single network's signal, so I don't know what is the best way to control it. A more advanced control would a requirement for the set of output modules to be different than any module output of the top-scoring single network. But, this is just a rough thought.
Dmitry
but dmitry, don't you think it is insanely weird that your submission of any is only **1** higher than my random submission?? while in sc1 you are doing among the top.
i wonder what will happen if you submit your 23 entry there. i wish i had a high number entry, so i can submit and test... Hi Daniel, just to clarify:
"The submissions by Dmitry and Yuanfang (and maybe others) were not with the intention to cheat. Also, they provide a score for only one network and are unlikely to result in a real advanatage for sub-challenge 1 (SC1). However, if someone would make such submissions to SC2 systematically over the following rounds, it could result in an advantage for SC1. For consistency, we thus ask everyone not to make single-network submissions in SC2 anymore."
We never submitted a single network for sub-challenge 2. What we tried is an integration of
- 6 networks
- 4 networks
- 3 networks
- 2 networks
All of them provided nearly identical results, hence our hypothesis about the noise-to-signal ratio. We would like to submit a single network to complete the test of this hypothesis, but we didn't, following your new rule.
Best,
Dmitry =================================
I understand that you are curious how well random predictions do.... The organizers can do such analyses much more efficiently and systematically, we will definitely compare the single-network predictions form SC1 to the multi-network predictions from SC2.
=================================
i really need to know how random predictions distribute. to me my random prediction is as good as my real predictions. i really want to know it is a one time thing, or systematically we are just predicting random.
thanks Great discussion, thanks everybody for your input.
* The submissions by Dmitry and Yuanfang (and maybe others) were not with the intention to cheat. Also, they provide a score for only one network and are unlikely to result in a real advanatage for sub-challenge 1 (SC1). However, if someone would make such submissions to SC2 systematically over the following rounds, it could result in an advantage for SC1. **For consistency, we thus ask everyone not to make single-network submissions in SC2 anymore**.
* If someone already made some single-network submissions, it's not a big deal. Invalidating them now would not make difference because the teams already know the scores, and only the final submissions are relevant for the ranking and analysis of results of the challenge.
* I understand that you are curious how well random predictions do, or how well individual networks do in SC2. However, making a few submissions is no systematic analysis and an inefficient way to explore these questions. The organizers can do such analyses much more efficiently and systematically, we will definitely compare the single-network predictions form SC1 to the multi-network predictions from SC2. If you have ideas for an experiment or think some additional information would help in the development of methods, it's better to ask us on the forum if we could do the experiment and then share the results with everybody.
* We have always planned to use as baseline for SC2 predictions from single networks, but maybe we haven't thought it completely through. On the one hand, if the methods cannot improve over single networks, there's no point in using multiple networks and this would be an interesting negative result. On the other hand, using the best performer on any single network of SC1 as baseline for SC2 would put the bar exceedingly high! At the core of any given multi-network method lies usually a standard module identification approach, let's say for some team that's approach A. Now imagine their multi-network method actually has merit as it improves over predictions of approach A on single networks. But the top performer in SC1 is likely some other approach B, which might be impossible to outperform with your multi-network method based on approach A. Basically, to outperform the baseline in SC2 you may have to be the top performer in SC1 yourself, or collaborate with the top performer of SC1. However, this such a collaboration would now give this team an unfair advantage over other teams in SC2 (we do not want the challenge to be decided by a game of strategic allegiences, I'll write a post on that next week). **A better solution then would be to make available the final module predictions from SC1 for a second phase of SC2**. We will discuss these ideas next week.
* For now, we continue with SC2 as planned. **Please do not be discouraged to participate in SC2, teams who make a serious effort (final submissions with writeups and code _corresponding to a multi-network method_) will be recognized and included in the Consortium even if the result is negative.**
* The evaluation committee will decide whether a method qualifies as a multi-network method or not based on the write-up and code. **You cannot "win" SC2 with a single-network prediction. However, the prediction may be based on only a subset of the networks (even just two networks)**. It seems the two PPI networks and the co-expression network are most informative, maybe focusing on just these networks and potentially the signaling network is a better strategy than including all six networks.
* We will increase the number of submissions allowed for SC2 in the following rounds.
=========================
That would be important to know. One solution might be that for the final evaluation, each team's subchallenge 1 submissions are automatically submitted to subchallenge 2 as well after mapping sc1 anonymous ids to sc2 anonymous ids. Then the best single network solutions would become part of the subchallenge 2 baseline, and teams could focus on trying to beat those baselines by combining networks.
------------------------------------------
multiple testing issue, but such a smart idea > I think, that by itself, the result that a single network is better than an integration of networks would be a very important "negative" result for the community.
That would be important to know. One solution might be that for the final evaluation, each team's subchallenge 1 submissions are automatically submitted to subchallenge 2 as well after mapping sc1 anonymous ids to sc2 anonymous ids. Then the best single network solutions would become part of the subchallenge 2 baseline, and teams could focus on trying to beat those baselines by combining networks. ========================================
"Submissions to Sub-challenge 2 must be from a method that integrates information across more than one network."
========================================
I AGREE with dmitry that this might not be something we want to do, because if 2 and 3 are ok, for what kind of reason that 1 is not ok? this is something for sure to come up from the reviewers. if indeed fact proves that a single network works better, then we need to admit this fact.
---------------------------------------------------
also, it is too naive to assume that only i (and dmitry now) would consider do it, or do a variant of it. as i have said like a thousand times to different dream organizers, the difference is i do it, and i admit. and some others do it, they just do it. for example, when i do alignment, i give one network a weight of 0.99, the other 0.002 each, is it ok? if 0.99 is not ok, then how about 0.98, 0.97, 0.9, what cutoff is ok?
----------------------------------------------------
to me the only choice is to allow one network submission.
One more note: in order to gain any advantage for sub-challenge 1 from sub-challenge 2, a team needs first to align the two networks, to put the gene_ids into correspondence. But that was forbidden by the rules from the very start. Hi Daniel,
You've initiated a very interesting discussion here. So when you say that "Submissions to Sub-challenge 2 must be from a method that integrates information across more than one network." does it mean that those submissions with a single network for Sub-challenge 2 that were submitted before this message will be invalidated? The reason I am asking that is it looks like (at least, at the moment), that the noise-to-signal ratio when merging a network with another network is too high to provide any meaningful information. In other words, integration of 6 networks will provide less signal than integration of three networks, for instance. If this hypothesis is correct then submission of clustering on only one network for sub-challenge 2 would likely to be a winner. As a matter of fact, we were about to submit a single network clustering for this sub-challenge to test our hypothesis, but then saw your message and decided not to do that. But it doesn't mean other teams didn't hypothesize and try the same thing.
So, please clarify how strict is your position on that. I think, that by itself, the result that a single network is better than an integration of networks would be a very important "negative" result for the community.
Best,
Dmitry Dr. Marbach,
Thank you for clarifying the questions about the background set of genes and scoring.
Kind regards,
Suhas Scores in sub-challenge 2 are indeed quite low at the moment. The scores for some of the individual networks in sub-challenge 1 are higher, so it seems to be difficult to leverage information across networks to improve predictions. A possible explanation is that teams would need many more submissions to devise effective strategies, e.g. how to weight the different networks. We will consider increasing the number of submissions for sub-challenge 2 for the next round, because evaluating a single submission takes about 6x less time than in sub-challenge 1.
--daniel
> The challenge organizers would have to provide more details on Sub-challenge 2 scoring methods.
The scoring method is the same as in sub-challenge 1. The only difference is that the background set of genes is the union of all genes of the six networks. Please let us know if you have further questions.
--daniel
i was worried, daniel (thanks for the detailed explanation), that a random network has a score of 7.
even not comparing with myself, clearly, it is a strong submission compared to other teams in sub challenge 2. it is so weird that i only tried once, and it is random, and yet it beat the majority of the teams. isn't that weird? that out of 50 submissions, only 15 out-performed a completely random partition that must have already missed out a good bunch of genes due to single network and non-existing ids......... that is equivalent to say all people are submitting random results.
thanks for taking the time with us. Hi Yuanfang
* No problem that you made these test submissions in sub-challenge 2. **However, we would like to ask you not to make any further such "test" submissions**, because if everybody starts doing that we would clog the queue for teams who are making real submissions for sub-challenge 2. Moreover, insights gained from such submissions could be used to your advantage.
* The gene 9656 is a non-issue, in sub-challenge 2 this ID corresponds to a gene that was only present in network 6 and is one of three genes that were removed because they only had self-interactions (see [this post](syn6156761/discussion/threadId=473)). We didn't remove these three genes from the table that maps anonymized genes IDs to real gene names, that's why the script gave no error.
* So nicebug_1.2.zip is a module prediction for network 4 of sub-challenge 1, but submitted to sub-challenge 2. Since the anonymized gene IDs were assigned randomly in the two sub-challenges, I'm sure you figured out that this amounts to shuffling the genes of your module prediction using as background all genes of sub-challenge 2 (the union of the genes of the six networks). It's basically the experiment that we discussed in [this post](syn6156761/discussion/threadId=706) that allows to understand the expected number of modules that turn up significant for a random module prediction of the same structure (number of modules and their size).
* It is possible that a random prediction could have an NS score of 7, mainly because the multiple testing correction is done for each GWAS separately. This was a deliberate choice, I am about to reply to a question about this on another thread to explain our reasoning, so let's not enter into a discussion about this choice here. Again, as discussed in [this post](syn6156761/discussion/threadId=706), comparison to random predictions will be key to analyze results in the end. (However, note that the background is not correct in your case, it should only be the genes of network 4, and as we have discussed the background used can sometimes make a big difference).
* The fact that the scores in both sub-challenges are 7 seems to be a coincidence, because the result files are different. It seems your prediction for network 4 in sub-challenge 1 was not that good to begin with (other teams now have scores >20 for this network), making this coincidence not that unlikely given the low numbers. Just to be sure, I will double-check this tomorrow.
* You mention that precision for the random prediction is around 10%, which is very high compared to other teams. That's a great example why precision is not a sensible scoring metric in this challenge, as we have discussed in other threads.
* I see our challenges as a collaborative, scientific experiment. We don't know yet what will come out, how good teams will perform, and how many disease modules can be discovered - we're very curious to find out together (and a bit nervous, because so many teams are working on the challenge, we feel a lot of responsibility to not make any mistakes and bring it to a success). We carried out an exploratory analysis with baseline methods and results were encouraging enough that we decided to launch the challenge, in particular results were substantially better than random and we hope that teams will do even better. But of course we don't know what the limit is given the networks, in particular networks 5 and 6 may not be very informative, as I previously mentioned. We'll do a systematic analysis with random predictions using the final scoring pipeline ASAP and share results.
Best, Daniel Dr. Guan,
Thank you for clarifying my question. And also for testing out the scoring methods.
This certainly raises questions about the validation and also the background set of genes for Sub-challenge 2.
The challenge organizers would have to provide more details on Sub-challenge 2 scoring methods.
Kind regards,
Suhas Srinivasan Hi, Suhas,
I suspect daniel is busying checking this submission and their validation and scoring right now. --i will explain step by step to be best of my knowledge, so we are all on the same page:
(and everyone else, please don't do it, since it will waste one of your submissions and might become the ground to disqualify a team as the first post of this thread).
1. Generate some partitions and submit to SC1, and this is my i_support_trump_1.1.zip. i was hoping this was not random, since it had a score of 42, which was not too bad.
2. Pick one of them, network 4, and submit to leaderboard SC2 (not the method, **the direct submission file**) nicebug_1.2.zip
Result: i got the same score for this submission in SC1 and SC2.
To be completely open to everyone, I wrote three test submissions to tease the validation and scoring code. i often do such things, and i think i have annoyed many organizers by doing this... This was the first one and I am still checking if i did anything wrong. but so far i din't have find anything on myside but on the other side:
1. minor problem, it contains IDs not existing in SC2, this submission has failed my own validation code (the sanity_check.tar file) that i have shared to this community, Yet, it went on evaluation on the leaderboard-- which is good, and saved me one submission quota and allowed me to test the next question i had in mind.
2. what is the real background gene set, and what is the background distribution of random networks? as the ids are completely randomized between SC1 and SC2, i had expected a score of zero here. Yet, I got the exactly the same score as what I got in SC1.
then I have the following conclusions:
1. my team's submissions to SC1 and SC2 have been all random, but gained through an advantage of size, because all our precision is around 10%. this submission was exactly in the middle on the leaderboard; BUT it is possibly a random submission, which raises the possibility that everyone here now has the same possibility of random submission but distinguished by the** SIZE** vs.** NUMBER **they have submitted.
2. there is an alternative possibility, that the gene mapping of SC1 and SC2 are somehow mistakenly shared (then good news to everyone, if this was a common bug in both, everyone will double their score).
3. or, the SC2 uses the SC1 gold standard with wrong gene ids all though over, then go back to conclusion 1.
4. name mapping for SC1 and SC2 are not random, but follow some very deep biological meaning conveyed in it, which i seriously doubt
my motivation of doing this, again, was really not cheating, but when i observed several submissions top the leaderboard by minimal partition, i think i would waste several submissions to test the scoring out. Dr. Guan,
This raises concerns about the scoring. Could you please clarify on "randomized ids".
Did you randomize the given sub-challenge 2 ids in your submission?
Or was it a random network with ids that are not there in the sub-challenge 2 data?
Kind regards,
Suhas Srinivasan i submitted one with (supposedly) completely random ids, yet i got a score of 7 which is higher than most participants and with an accuracy of 10%, which is the highest so far.... can you please check why? i am sure there is a bug somewhere crossed between sc1 and sc2....
(update: or does that mean a random partition at a very specific size can achieve very nice performance, without even looking at the networks?) hi, daniel,
i submitted a network from sub-challenge 1 to sub-challenge 2. HOWEVER, it was not part of the cheating process. as clearly this network will get a score of 0, because anonymize differently
as most organizers know that some of my submissions is to detect scoring bugs.
and indeed i found one: the clusters i submitted contains ids that doesn't even exist in sub-challenge 2. gene 9656. yet, it passes the scoring and goes on evaluation right now. clearly all teams can make use of this to increase cluster number, or change background distribution.
you are not going to exclude me from the challenge, right?
yuanfang
Drop files to upload
It is FORBIDDEN to make submissions to Sub-challenge 2 with the intention to gain information for Sub-challenge 1 page is loading…