i found out and compiled the GEO data list mentioned in Erika Bongen's paper when she asked the question about whole blood, peripheral blood on the forum http://www.ncbi.nlm.nih.gov/pubmed/?term=26682989. i was going to use these 3000 samples to win the challenge!!
now there is no point anymore. i don't want to win in this situation, nor do i want to enter any competition (or so called collaborative phase) without a blind test (data leak here. or in some other cases, without a well-defined, objective scoring metric).
so i have left the challenge team and now leave this dataset to the organizers and other participants. http://guanlab.ccmb.med.umich.edu/yuanfang/external.tar
but i choose not to release the algorithm i planned for this problem, since there is no way to prove it is the best here. i will wait for another challenge.
for each GSE, i maintained the original data i downloaded, the mapping to ncbi gene names. under the GSE*/data/, I have split each data by either types of infection, time point, pos/neg in virus shedding, severity of symptom or the combinations of all labels. the code used for splitting is included.
when the original data is not log transformed, i added corresponding GSE*/data/*log.txt. it included some vaccine data and post-vaccine response which i believe will be useful in solving this problem.
(it's a big file and will take a while to download)