>JUSTIN GUINNEY WROTE: B Sub-challenge 2, ..., **must be statistically better than submissions when only the images of the latest exam are available. **
What if no team is able to be statistically better than the best-performer of sub-challenge 1? Does Sage Bionetworks reserve the money? or the money will be distributed to Sub-challenge 1 best-performers?
>JUSTIN GUINNEY then WROTE: Therefore, the code submitted to Sub-challenge 2 should be able to run for subjects with only the latest exam (in the absence of a metadata table, such as is the case in Sub-challenge 1)
But the format of cross file is different for sub-challenge 1 and 2, how can an algorithm possibly ran through both?
>Then he wrote: For Sub-challenge 2, ....must significantly improve predictions over use of only the latest images...
That's all compared to the best-performer of sub-challenge 1, right? Otherwise, one may submit something random to SC1, and then submit something non-random to SC2.
also SC2 vs. SC1 also does the tie-breakign thing right?
>Then he wrote: Isubmissions will have to outperform a baseline algorithm developed by the organizers using off-the-shelf tools.
But all tools are of-the-shelf. Can you just name a number?
For example, **now you have all the dockers**, you can just ensemble the top 3 dockers, then no body can out-perform your baseline. I am asking because obviously tuning at the organizer's side is way easier than tuning from our side.
**also, it doesn't seem to be so off-the-shelf. off-the-shelf should be something that you implement over an afternoon at most. it has been 3 months now. it looks like you are trying even harder than us to come up with a method (which will not be revealed after the end of validation phase, i.e. one additional month than all participants, and with answers and all other methods at hand).**
>then, about ranking robustness.
A bayes factor of 19. in some cases, that means 90% of the teams will be winners (or even teams worse than random prediction, as long as their predictions are orthogonal to the top one enough).
>finally about time?
Will we be provided with additional hours? 300 hours is far from enough. you have money. If you don't want to pay additional, we are all willing to pay our additional portion. But I think compute hours are essential for experiments.
Created by Yuanfang Guan ???? yuanfang.guan There are two things that would be helpful:
1. An express cue for Challenge 2 on which one can test the new requirement. I.e. one can quickly check whether the algorithm works fine in the normal case and in the case of only one examination for each patient and fake metadata. Please make sure that the randomly generated values are of the same class. For example right now the training set has integers as subject-Ids. However, the express cues use random strings. If the algorithm has a convertion to integer set, it fails. There were different cases where missing values were not set to dots (".") on the fast lane, etc. Please make sure, that the fake data is actually formatted identical!
2. A time limit to the prediction express cue that scales up to the whole data set. If an algorithm can complete the prediction on the express lane in the given time limit of 20 minutes, it will be able to complete before being killed by your team for taking an excessive amount of time. At the moment it is difficult to judge how much time the prediction phase will need, since we do not know the number of images that have to be processed. Hi, Thomas,
I don't understand some of the answers. You said, that SC2 will be compared against a sorrogate. That is fine. But does that mean that a poorly performed method (A) would win over a better performed method (B), because A performs better than on surrogate (i.e. A>A'), while (B=B')?? I think that makes no sense. Because obviously, everyone can build an age-only model, which will not be random.
i.e. will a clinical only model win, if it is the top one that out-perform itself on a surrogate dataset, although it is much worse than the image-only model by other teams?
To me, that is concluding that only 50k+20k will be distributed fairly.
The community phase prize ideally should also be laid out now. I am fine with this being given to 1 team or 100 teams. It can be anything you want. But it has to be clearly out lined before hand.
Not everything I got from community phase was positive. I don't have the confidence in the so-called community phase, if things are not even made clear at the beginning.
The reason I ask for additional hour is that I believe that many teams here who do not need additional hours clearly have access to not just external, but INTERNAL data for training, while I have been relying solely on your data. That's why they don't need hours. **I CANNOT IMAGINE ANYONE CAN GENERATE ANYTHING COMPETITIVE with just 300 hours of compute time. ** So it is not fair at all.
If you say 'GO' to INTERNAL data, then obviously most people can get access to such data, because every clinic in the USA has a referral form to mammography that the data is everywhere. and EVERY hospital in the united states has a repository of this data. Who chooses to use them would have a great advantage to train with similar data off line with essentially infinitely hours. Then I feel I am significantly being punished by being a good participant to stick with your training data. Hi Yuanfang,
Thanks for your patience!
> What if no team is able to be statistically better than the best-performer of sub-challenge 1?
We are not asking participants in Sub-challenge 2 to perform significantly better than the best-performer of Sub-challenge 1 (see below). The goal of Sub-challenge 2 is to encourage participants to find a way to use the longitudinal and/or clinical information to significantly improve the performance of methods compared to when only the images of the latest exam are available.
> Does Sage Bionetworks reserve the money? or the money will be distributed to Sub-challenge 1 best-performers?
The prize money associated to vacant places in the Competitive Phase will be added to the cash prize in the Collaborative Phase.
> JUSTIN GUINNEY then WROTE: Therefore, the code submitted to Sub-challenge 2 should be able to run for subjects with only the latest exam (in the absence of a metadata table, such as is the case in Sub-challenge 1)
> Yuanfang Guan: But the format of cross file is different for sub-challenge 1 and 2, how can an algorithm possibly ran through both?
We have decided to remove the requirement that containers submitted in SC2 must also run with the data from SC1 (see below).
> That's all compared to the best-performer of sub-challenge 1, right?
No, see below.
> Otherwise, one may submit something random to SC1, and then submit something non-random to SC2. also SC2 vs. SC1 also does the tie-breakign thing right?
As already mentioned, we have decided to remove the requirement that containers submitted in SC2 must also run with the data from SC1 (see below). Instead, we will run the docker container submitted to SC2 to the actual exams, and to a surrogate dataset in which we have kept only the latest exam but randomized the metadata. A method is then eligible for receiving the cash prize if its performance in the actual exam is statistically better than its performance in the surrogate. It is possible that the metadata and/or longitudinal data do not improve performance, in which case no group will be the winner of SC2.
> But all tools are of-the-shelf. Can you just name a number?
The baseline method has been frozen. Its AUC applied to the Leaderboard dataset in SC1 is 0.585, which has already been beaten by a large number of submissions. The performance of the same baseline method applied to SC2 is also 0.585. Its performance in the real SC2 dataset is not statistically significantly better than in the surrogate SC2 dataset.
> also, it doesn't seem to be so off-the-shelf. off-the-shelf should be something that you implement over an afternoon at most.
We couldn?t create the off-the-shelf method in one afternoon: it took us a little longer, and we did the best we could. But we are happy that the Challenge participants already improved over it. However, we needed to have a baseline for the validation phase, and this will be it. We don?t yet have the performance of the baseline in the validation dataset.
> A bayes factor of 19. in some cases, that means 90% of the teams will be winners (or even teams worse than random prediction, as long as their predictions are orthogonal to the top one enough).
We used BF of 19 because it is equivalent to a p-value of 0.05. We believe that because the test datasets are sufficiently large we should have enough statistical power to distinguish between two close contenders. The results of Round 1 show that this doesn?t make 90% of the teams winners. There were only 2 best performers in SC1, and they were really close, at a BF ~ 2 in AUC and very far from the third (BF 499). In SC2 there was only one best performer. On the other hand, all predictions must be above the performance of the baseline method, which weeds out random predictions.
> Will we be provided with additional hours?
We have significantly increased the capacity of the Cloud compared to Round 2 (e.g., there are five times more machines to evaluate inference submissions). While we would definitively have liked to increase your time quota, the upgrade of the Cloud is mainly used to process the additional submissions resulting from the large increase in participation in Round 2.
> If you don't want to pay additional, we are all willing to pay our additional portion.
This has already been discussed. One of the purposes of the Cloud is to provide all the participants with the same chance of winning the Challenge independently of the funding available to each team to buy servers. The best-performing teams will then be invited to the Collaborative Phase, who will then have access to significantly more computational resources to further develop their method(s).
Thanks again for your constructive questions! The changes mentioned in the above answers will be included in a newsletter that will be sent shortly.
The DM Challenge Organizers OK... I am very patient. You cannot find a more patient person than me...
But my main problem now is I do not even dare to do experiments, as probably in 5 days I am going to use up all the hours. Can you just please quickly discuss this? Dear Yuanfang,
I have forwarded your message to other challenge organizers and they will answer soon. Thank you in advance for your patience.
Best,
Thomas