(1) My understanding is all 211 K patients included in the challenge data are COVID positive only. But we need to select patients who had COVID test positive date before or on the outpatient visit date. Then among these select patients who had inpatient hospitalization date within 35 days of the outpatient visit date ( but not on the same day) (2) The gold standard patients ..are they the same as the challenge set patients or different /subset of them

Created by Sandeep Tripathi sandeep
Hi @trberg , Yes, I want to confirm that we will be given the new set of patients per #1 for training, even though we will only be assessed on #3 for testing purposes. Also, please clarify whether you will provide all patients in a single table without differentiating between #1, #2, and #3 -- I recommend doing it this way as it would simulate a live deployment environment where your data is not filtered based on the "future". Also, can you clarify whether or not all the past patients we have been using up until now will be part of the same table, or whether we should expect to expand our code workbooks to read 2 sets of every OMOP table? It would be easiest if we did not have to do that, and we could just read from one set of person, condition, etc. tables and you just withhold events past the outpatient visit (except for the original training set). It would be helpful if you can let us know the expected table names and schema ASAP so we can structure our pipelines to be prepared for incorporating testing into the model pipelines. Thank you, @christophe.lambert
Hi @christophe.lambert, Just to clarify, are you asking what will be made available for you to train your model when I update with the prospective data? If that's the question, then you will be given #1 as the dataset. If your question is relating to only what will be used as a test dataset, you will be given #3, and we will be witholding data from day 1 and onwards. Thank you, @trberg
Hi @trberg , Can you further clarify what will be provided as the test cohort? I see 3 options, each with potentially large implications for model building: 1. You include all patients with new COVID-19 diagnoses, analogous to how you fetched the original 211K cohort. 2. You give us a subset of 1 where each person meets the criteria of having a positive COVID-19 test then an outpatient visit within a week, and include day 0 hospitalization people. 3. You give us only the subset of 2 that has no day 0 hospitalizations. I strongly urge you to provide us #1 or at least #2 to make our models be consistent with real-world deployment where the future is unknown, and also so we can take advantage of that extra information, even though you will only use subset 3 for scoring. If you give only #3 it will be very difficult to use the extra information from #1 and #2 from the training set and calibrate properly in the test set. Also, can you clarify what info, will be withheld, if any, about Day 0 in the test set? I would suggest nothing be withheld -- only withhold day 1 and onwards, as you will only score subset 3. Thank you, @christophe.lambert
Hi @sandeep, 1) Yes, that is how we defined the gold standard. You can find the Task 1 gold standard code I used to generate that file in the `Resources` folder. 2) The gold standard patients are a subset of them. You can use any of the 211K patients to train your models, but we will only be evaluating the models on that gold standard subset. During the evaluation, I'll use my script to build a new gold standard file from the prospectively collected data that we'll use to evaluate models. Thank you, @trberg

patients included in the challenge data page is loading…