So my question is what time point are we predicting the person being covid positive? I didn't find anything on the data pages regarding of this, is it something that the comp host doesn't want to reveal?
Best,
Endi
Created by Henry Niu niudd Some time has elapsed since the last update on this. Is it still the case, for both Q1 and Q2, that the actual test time is unavailable? It doesn't seem to be included in the synthetic data, except for a small subset of the data. Owing to the way Question2 in particular is framed, it seems peculiar not to have that information (although subsequent hospitalization data would of course need to be withheld.) @Bcragin You're correct, the data has been cut off at the datetime of the return of covid results.
@niudd I think the best "prediction time" would be MAX of the individual patient record datetimes as that would get you the most recent data in each individual's records prior to their test result.
I think it's a good point that having the covid test date would be helpful. For the next release of the data, I'll look into how we could best include that information.
Thanks!
@trberg Great question, Endi. My team's evaluation-only model dealt with that issue by assuming that there were no entries in the /data/ files with (start or end) dates after the COVID-19 test date. Although that model seemed to be quite successful (8 weeks at the top of the leader board), it's possible that it was effectively over-trained as a result of the large number of trial submissions made. The assumption about (estimated) test dates was made early in the trial process though and seemed to help a lot, so I think it's probably valid. I agree that it could be very helpful to know the actual COVID-19 test date, if that can safely be released.
Bruce