Hi, In the COVID-19 EHR Challenge Paper it is stated that "For both the Q1 Challenge Dataset and Q2 Challenge Dataset, EHR data after each patient?s COVID-19 test were removed." However in one of the earlier discussion topics it is stated that "For Q2, all patients' COVID measurement dates exist in the data we provided, it's possible some patients might have multiple measurement dates for Covid test." From this answer for discussion, there can be multiple measurement dates for each patient but in the paper it says EHR data after each patient?s COVID-19 test were removed. Is the only data remained in the dataset is the measurement dates for each patients COVID-19 test or how should we elaborate this two information together ? Thanks.

Created by Ege Arikan egearikan
Hi @egearikan, Thank you for your questions. The data curation process is complicated and we didn't have the bandwidth to cover all the details in the paper. I listed information I know about challenge data curation for your reference, hope it will be helpful. For the Q1 challenge dataset, if a patient has multiple COVID tests, the first one is used as a reference point for prediction, all the data after(including the COVID test) are removed from the challenge dataset. For the Q2 challenge dataset, if a patient has multiple COVID tests, the most recent one is used as a reference point for prediction, all the data after(excluding the COVID test) are removed from the challenge dataset. Thanks, Yao

Data availability for Q2 page is loading…