Hi all. Thanks to the organizers for arranging this challenge.
I have a few questions about the data.
1 - It seems the real challenges allow people to use any data (and not just those suggested in the gwas file). If the aim is to compare different prediction tools, is it problematic that datasets are trained on different data (ie a "poorer" tool with a large dataset might perform better than a "better" tool with a smaller dataset),
2 - For CAD, the suggested summary statistics contain the first data from UKBB (the interim data). I imagine you have thought of this, but does the training data exclude these?
3 - Are you able to provide details of how you selected the UKBB training individuals for the four diseases. Many of us have access to UKBB data, so will train models ourselves instead of submitting scripts so that you can perform on our behalf.
Thanks very much
Doug
Created by Doug Speed dougspeed82 I now see the data provided are those used by Chun et al (https://pubmed.ncbi.nlm.nih.gov/32470373/) and that the paper answers questions 2 and 3. However, 1 is still outstanding
Thanks, Doug