Dear Organizers,
Please give a final clarification on if this is allowed: As for the other thread, it is allowed. But I just cannot believe it. I need a 'GO' to actually do it.
Could we retrieve breast cancer images from local hospitals?
How about remote sites?
As you know all images come from patients, there is a possibility of overlapping between your datasets and these datasets.
As you know, being able to see the images will make a huge difference in models.
Thank you for your clarification. and appreciate your quick response.
Yuanfang Guan
Created by Yuanfang Guan ???? yuanfang.guan @alalbiol
Do you call fair a competition where participants are not put in the same conditions?
So for you being able to find external (public) data is fair, so people can see images, have a consultant annotating the cancer regions, identify a padding around the tumours and train on patches instead of trying to argue blindly? Is this fair compared with other people that instead were training models blindly on the provided training data without having access to those images, where only the whole mammogram is labelled? Can you appreciate the huge difference between the two conditions?
If you think that this is a fair competition, I invite you to try to publish an algorithm showing that it outperforms the state-of-the-art but with the necessary condition of using additional training data, and we'll see if you can publish it.
Good luck with your modelling @gustavo
I am sorry but I don't agree with you. It would have been extremely easy to control that the only provided data were used.
Once the docker is uploaded, you can easily run the code and check if using the only provided data you get the same scores.
Also, if random numbers are generated, you ask us to give the random seed. In this way everything would be deterministic and reproducible.
If the only allowed pretrained models were the publicly available (imagenet like), asking for the place where they have been downloaded would allow you to reproduce the same results.
Said that, everything could be fair but you have chosen (in my opinion) the wrong option and now it's only about having more data. So it's not about finding the best algorithm anymore. That's the basics for all the models comparisons, having been applied to the same data.
Would have you accepted the result that says that LeNet outperformed the other methods if it were trained using 10,000 images more than the others?
Can you imagine a paper where people argue their algorithm performs better but using a different training set? What's the rationale of all this? if some participants utilize features obtained from previously available feature extraction algorithms (which was well-tuned hand-crafted feature extractors designed from their own mammograms; e.g., from some published MICCAI papers, etc), could they be thought of as using external data? or using the DREAM challenge data only? Gustavo,
It is well known that perhaps the key ingridient for the recent success of neural networks is the availability of
tons of data. This also applies here
I think that if your objective is to identify the best algorithms you could miss better approaches just because
one has not access to such data and at the end you could miss an important contribution that could
contribute to the primary challenge goal: avoid unnecessary recalls.
For this reason I still think that only should be used data that is availlable to all. i.e. Public data or Dream data.
You say:
"On the other hand, our ability to restrict the use of external information is difficult to enforce, as people can use private or private data without disclosing it. "
I do not agree with that, I think that at the end of the competitive phase should be easy to check if private data has been
used because the algorithm should be reproducible, even the pretraining (if any is done), so that would reveal which data
has been used to pretrain models
At the beginning, we were happy to join this challenge because it eliminated the differences between groups related to computational power and
data availability. People should just focus on the algorithms. I think this spirit should remain
After all this discussion, I am happy that all the leader comptetitors (including us) have clearly stated that are not using private
data. So the fair competition continues.
Dear DM Challenge Participants
**It is allowed to use external data, public or private, to solve the Digital Mammography Challenge.**
As mentioned in another thread, this is a question that the Challenge organizers have put much thought toward. At the end of the day, our first priority is to assess and identify the best algorithms that can help hundreds of thousand women avoid an unnecessary recall when they have their screening mammograms. On the other hand, our ability to restrict the use of external information is difficult to enforce, as people can use private or private data without disclosing it. Because of these two issues, we elected not to restrict how people construct their models, and allow the inclusion of models that may be informed by external data.
We understand that there are divergent views on this, and we are sorry that we cannot satisfy everybody's preferences.
The DREAM Digital Mammography Challenge organizers Hi, Alberto,
I just want a clarification whether it is allowed or not.....
now I know it is not allowed, i will not proceed to acquire any. Now I do not agree with you. "As for the other thread" I understand that it is fair to use PUBLIC databases
such as those mentioned in the thread.
Why are you now so worried about using private data?,
It is very obvious to me that the two leaders have clearly stated that they are NOT using
any kind of private data. Perhaps, its the time to acknowledge that their algorithms are better at this point and
that fact should serve the rest to motivate and struggle our brains to improve our approaches. I think that this is much
productive that complaining all the time
Drop files to upload
Clarification on external/private data page is loading…