Hello,
I was just wondering when the community phase milestones will be posted. It will be helpful to know the milestones as we train our models so that we know what to shoot for and the incentive scales.
Thanks,
Bill
Created by Bill Lotter bill_lotter >Yeah I like the idea of having stratified incentives, but I think **the only way to be fair and transparent is to have a non-subjective metric to evaluate submissions. **
I absolutely agree with you, Bill.
that's the whole meaning of an unbiased challenge/competition.
Thank you so much for putting it straight forward for all of us.
But just to make sure, is there anyone here that doesn't agree on this point?
Obviously, some people are good at coming up with innovative ideas, but also good at turning great ideas into crappy implementation. Others are good at copying code (as Li Shen mentioned), and turning mediocre ideas into winning algorithms. if the organizers choose to re-ward by innovation or team work or hard-working or other things, we will have no other choice but to accept . But I think (and **i am sure EVERY participant would agree**) , it is undesirable that it is arbitrarily determined by one or two person, no mater who he is. This person(s) must release a piece of code to unbiasedly evaluate aspects such as innovation prior to the final stage. Alexandre,
Thanks for the great post! I also agree, the organizing team has taken on a very daunting task and being provided with this opportunity is valuable in of itself. Your list of ideas also seem very interesting and useful. Hopefully we can all work together to help solve this problem, which could have dramatic impact well beyond this competition. It's very exciting!
Best,
Bill Hello,
I am a radiologist and a computer engineer. Part of my radiology practice is in breast imaging (mammo, ultrasound, MRI, biopsy) in Canada. I got attracted by deep learning recently because this will completely change the way I work as a radiologist (until society still needs radiologists).
Just to let you know that I think this challenge already accomplished usefulness in 1st phase leaderboard. Automatically processed 40% specificity and 80% sensibility of cancer detection in a screening mammography is a radiologic breaktrough if this can be applied to general population. With this specificity, the recall rate can potentially drop from 10% (radiologist) to about 1% (inference model). This means 90% of the following workup tests usually done after the screening mammography (extra mammography, US, MRI) could potentially be dropped. This will probably be done on the short term using a radiologist validation (to take the legal risk) but when the legal definition of an intelligent machine will be clearer, complete automation will probably be possible. This is huge value for the patients and for society. The potential intrinsic value for this application is *a lot* more than a 1 million $ prize; as a rough estimate in US/Canada, it is probably more in the billion $ domain annually. Where this value will flow ? We can't predict. Of course this challenge is not unique; many companies already have access to this kind of data and started the training/development process. Most of the challenge participants can probably accomplish more than 20% specificity with a decent model based on the demo code; 20% specificity is already a game changer in real life since you get about 2% recall rate instead of the actual 10%. In my opinion, the organizing team who implemented this unique hierarchised challenge structure is the most valuable. We (normal people/participants) couldn't train a model with this kind of confidential dataset in normal conditions without this challenge structure. The added performance compared to the demo code (not from the actual clinical baseline CAD) is the real participant value.
Of course, fortunately, one of the primary objective of this great challenge is directed on scientific communication (publishing algorithms). Consequently, any entity who have access either to 1) model state of best performing algorithm or 2) decent size validated mammography data, will be able to emulate the process. Consequently, I think the participants in the community phase will need to decide (in group or individually) if they want to share widely and publicly their model state or not. Both directions have very distinct outcome for the patients especially in term of cost and time to implement the clinical applications; even more distinction for 3rd world countries.
The community phase could also look into potentially useful secondary outputs :
- Identifying a segmented area of potential cancer on the positive mammography to optimize the following workup needed for a potential breast conservative surgery.
- Scoring the diagnostic quality (noise, complete inclusion of breast tissue, movement artifact) of a specific mammography image and radiation dose (usually in dicom) according to the training dataset (could be related to the distribution and variance of the predictions from a set of different performing models) for quality improvement of a mammography/technologist unit.
- Identifying pretest cancer risk (before doing the next mammography) based on the type of breast tissue of the actual mammography for insurance optimisation.
- Classifying invasive cancer from DCIS is interesting but probably not that useful in practice at screening since this is eventually the pathologist call from the biopsy. Biopsy will be done no matter if we suspect dcis (often microcalcifications) or invasive cancer. Pathologically, there is often both DCIS and invasive cancer in the same specimen. Optimizing specificity for DCIS from magnified mammography images usually done as a following workup (need high resolution for these tiny microcalcifications) could be interesting and useful to discard biopsies.
- Pathologic-mammography correlation score based on trained dataset to confirm the technical validation of a biopsy
- I have many others ideas but I need to prioritize!
Have a nice day,
Alexandre i thought that was your suggestion. if not, then never mind.
i just felt i won't get anything in the later rounds, so try to get as much as possible in the previous round.... "Then, to ensure at least some level of sanity of a competition, it is a good idea to reward top-performer at each stage with cash as soon as possible, maybe 10% every stage."
@yuanfang.guan, How do you come to such a conclusion? Yeah I like the idea of having stratified incentives, but I think the only way to be fair and transparent is to have a non-subjective metric to evaluate submissions. At some point it was determined that AUC or pAUC is the proper metric for the challenge, so it would make sense to me that those remained the metric throughout, but potentially there's another metric that is more appropriate. Then it comes down to do you award based on ranking or absolute value or maybe some combination of both. There's never going to be a perfect way to decide it and again, there's a bigger picture, but I think these things should be decided as soon as possible. >On top of that, anyone who achieved top scores during each round, such as Yuanfang Guan ???? (yuanfang.guan) shall be rewarded.
i agree. and i think that is a very good idea.
to be frank, in the past, who eventually gets the award, e.g. authorship, or money, can be very arbitrary. Sometimes not even related to performance at all. Obviously there will be issues like Li Shen said like who contribute idea, and writing, and but didn't score highly, versus the ones who scored highly but just found out, as you said, using example code, or linear regression and a crappy writing, which can be questionably determined by some people. Then, to ensure at least some level of sanity of a competition, it is a good idea to reward top-performer at each stage with cash as soon as possible, maybe 10% every stage.
looks we all have a consensus now. so this is really great start. I'd vote for multiple criteria. The bottom line is to achieve statistically significant result to be considered for cash rewards.
On top of that, anyone who achieved top scores during each round, such as @yuanfang.guan, shall be rewarded.
However, someone who comes up with a novel feature that may be useful for everyone, or provides a novel solution shall also be rewarded, even though they may not produce scores that rank top. Other criteria shall include the cleanness of code and documentation, etc.
Does this make sense? @thefaculty
I do not agree.
I think it should be allowed to copy example code. To set up and copy example code correctly is not easy. For most challenges here I implement solutions on the fly during my machine learning class. This year i implemented my first stage-solution every line within my machine learning class and the docker in my journal club class. And my only home for students is to copy down my lecture notes and my code as much as possible. Nobody in my class is even getting close in copying a complete version.
as such, good students (and adults) should get rewarded for copying code correctly. @bill_lotter
===
I see your point. One way to prevent this is to ask anyone who wants to claim a reward to submit their code as open source. Then we can evaluate the merits of their contribution and see if it is simply a copy from the example code. @bill_lotter
then you say what's the cutoff? obviously, for anyone that ranks #n would like to the cutoff to be at #n+1, LOL.
in the unlikely event that i win, I will set up a yearly cash award to the best algorithm in dream. pity i don't know deep learning that i won't be able to take the big chunk. I'll all for awarding people for effort/results, but I think simply awarding people for statistically significant results wouldn't be a good idea. Anyone can submit one of the models that they posted and get somewhere. It would also encourage a lot of people to jump on in the end without too much effort simply to get some cash prize, which would clog up the servers, preventing people who have been working on it awhile to improve results. I agree with @bill_lotter, it's better to have the milestones sooner than later. They will guide us in better distributing our effort.
I also agree with @yuanfang.guan, I think anyone who achieves statistically significant result must have done a significant amount of work given the complexity of this data. They all shall be awarded with some cash. This will also encourage people to contribute their intelligence and effort towards solving this difficult problem. As many of you know, "an ensemble always (almost) beats an individual in machine learning".
Finally, I think it would be very useful to use machine learning to identify the locations of lesions that may lead to cancers. I know this is not a goal of this competition but it is often mentioned in similar competitions. I believe such kind of information is extremely useful for radiologists. It is more difficult than predicting cancer or normal but we shall at least try to do that.
> by the end of all phases there isn't a model that achieves this, then no one gets rewarded,but if 5 teams beat it, then somehow it is split.
One million dollars is a lot of money, they definitely will not give to a single person. I think it will be split among all participants that submitted something non-random, hopefully it is not by head count. Otherwise I need to put all my four kids into my team. Hi Justin,
Thanks for the response. I wasn't sure if there was going to be some absolute metric, i.e. AUC > 0.7 => $100K, AUC > 0.8 => $500K, etc. Ultimately, obviously the goal is to build something useful that could hopefully save lives, so I guess a lot of it will depend on how you guys plan on using the model and what the community deems as useful. If performing better than a typical radiologist will end up being useful, and if this can be measured with something like AUC or pAUC, then maybe that should be the milestone. Or if being able to differentiate between DCIS vs. Invasive (by say achieving X % accuracy) is the use case, then that could be a reasonable milestone as well (although this is different than how the competition is evaluated). I do certainly understand that there is a desire to see how the competitive phase progresses in setting the milestones, and that this challenge is unique and very demanding in terms of IT/engineering, but I also think it is fair and helpful to set the milestones ahead of time as much as possible. For example, in the AUC milestone case, the "usefulness" level seems like it should be absolute and not relative to submitted scores. If this level is deemed to be say 0.8 and by the end of all phases there isn't a model that achieves this, then no one gets rewarded, but if 5 teams beat it, then somehow it is split. This will also help teams gauge if they're close to having a good model or if they should try something completely different. Again, I don't think anyone is doing this solely for the cash incentives, but people are certainly working hard on it and there is an opportunity cost there. It's also unclear how people will be picked to participate in the competitive phase and how working together will be structured.
Thanks,
Bill Hi Bill,
Thanks for your question.
We have not finalized the community phase milestones although we are currently evaluating several options. Some of our choices will depend on how the competitive phase progresses, and will also depend on access to new data sets (which we are pursuing). One question that is promising - and important clinically - is differentiating between 'DCIS' and 'Invasive' states.
We take the 'community' aspect of the community phase to heart. What types of questions do you consider important and would want to see posed? I hope you and others will use this thread to share your ideas.
Best,
DM Challenge organizers