Hi @trberg The design of Task 2 allows not only for covid_index to generally be prior to hospitalization_date but also up to 14 days after hospitalization_date (though the challenge instructions say 7). This appears to be true in both task_2_goldstandard and Task_2_Testing_Blinded_Gold_Standard. Can you confirm that the full validation set, Task_2_Testing_Blinded_Gold_Standard, will continue to look into the future and provide dates that are up to 14 days after the hospitalization_date? Note, the only other way we could calculate a covid_index in the "future" relative to hospitalization_date is if we had patient data after that (which is not supposed to be the case). Currently, My model currently counts on you to "cheat" and provide the covid_index in Task_2_Testing_Blinded_Gold_Standard, using the crystal ball of saying when covid_index occurs, if it is in the "future". Thanks, @christophe.lambert

Created by Christophe Lambert christophe.lambert
Hi @Bcragin, You're not being dense, I just wasn't clear. There won't be a data leak, I'll make sure that the COVID index dates that are after hospitalization_date do not appear as after the hospitalization_date. Thank you, Tim
Hi @trberg , Forgive me if I'm being somewhat dense here, but I'm having trouble understanding your reply above. In the first paragraph you seem to be saying that the validation data will leak data from the future, while in the next to the last sentence of the second paragraph, you seem to be saying that it won't. Can you please provide clear assurance that the leak will be eliminated? I would hate to be accused of making use of a data leak that I was not aware of at the time I created the model and that wasn't made public until the day before the challenge ends. @Bcragin
Hi @christophe.lambert, Yes, the full validation set, Task_2_Testing_Blinded_Gold_Standard, will continue to look into the future and provide dates that are up to 14 days after the hospitalization_date. It's normally 7 days after hospitalization, but there were some cases where the patient was still hospitalized and then was getting the covid test back, but they had been in the hospital longer than 7 days. If the patient is still in the hospital when they get their covid test back, I'm including them in the gold standard if the hospitalization_date was up to 14 days prior. When you zero the time deltas that give future information, does your model still perform substantially better? That seems like it's the best solution. If a COVID index is happening after hospitalization_date, just change it to hospitalization_date? This is probably what I'll do for the validation data in order to not leak future information. Thank you for bringing this to our attention. @trberg
The more I think about this, the more I think it is problematic from a modeling perspective to utilize covid_index values in the future. I created a model both with and without the time difference between hospitalization_date and covid_index. There are 2447 patients in the training set where covid_index happens after hospitalization_date. The model gives substantially better performance with that variable included. The problem is that the magnitude of that difference gives you a lower bound on how long somebody will be hospitalized (you can't have a positive test 14 days later if you are not still hospitalized) . The longer you are hospitalized, the greater chance you had poor outcomes, hence the better performance. So I have a dilemma: use that variable to have a better score, versus not use it to produce a model that is sensible in terms of not cheating and looking into the future. A third alternative is set to zero any time deltas that give you future information, but even then, the lack of a COVID-19 positive test prior to the hospitalization_date tells us that there will be a positive one later -- an immortal time bias where those people are only included in the cohort because the organizers looked into the future. Please advise! Thanks, @christophe.lambert

Task 2 covid_index can be in the future relative to hospitalization date. How does this work for validation? page is loading…