Hi, I understand that patients who have a known date of death in the training set will have that date in the death.csv file. However for patients who do not have a recorded date of death, until when do we have a guarantee that they were alive? I found two possible answers, but not sure which is the correct one: - Feb 1st 2019 or - The last end date of that patient in observation_period table. Which of the above two is the correct answer? Also, what is that guarantee based on in the source?

Created by Anand Avati avati
Hi @trberg , Thanks for the clarification! We will adjust our pipeline accordingly. Cheers, :Peter
Hi @peter.banda, We used the visit_start_date to define the `dead` flag for the reasons you mentioned. A missing visit_end_date could mean a variety of things like it just wasn't recorded, it was a shorter visit, or it's just a data quality issue. We'd recommend using visit_start_date for these reasons. Thanks! Tim
Hi there, I have a follow-up question related to this thread. Is the last visit date, which is central for the definition of the target `dead` flag, determined by the column `visit_end_date` or `visit_start_date` of the table `visit_occurrence`? For the round 1 we've used `visit_end_date` but in the latest version of the synthetic data (v3) we've noticed there are a lot of undefined/null values (~25% in the training set) . That makes us think we should use `visit_start_date` (always defined) instead. Also we are not sure what it means if a visit has a start date but no end date... Could you pls. clarify that? Many thanks, :Peter
Hi @avati, >But the last visit date is also the prediction date, right? Yes, that is correct Tim
Hi @strucka, That's only going to happen in the synthetic data. The real data won't have those types of problems. I'll try and correct those issues for our next version of the synthetic data. Tim
observation_period_end_date is later than the last visit_end_date in most cases in the synthetic data; however, the observation_period_end_date is sometimes later than the the death_date. Should we expect discrepancies like this in the real data?
But the last visit date is also the prediction date, right?
>Would using the last date of the patient in observation_period give a conservative estimate of the last known date of being alive for patients without a date of death? Or is that table not populated in the data? What can I expect in the end-date field of the observation_period field of that table? Would it also be truncated to the End of Data date? The observation_period is populated, but a better conservative estimate would be the last visit in the patient's record. While this last visit date is often the same as the observation_period end, it is not consistently the same. > Is there an estimate of what fraction of the population in the DB might have moved out of Washington state and hence likely to not have a date of death? Unfortunately, we don't have any good estimates of this.
That was very helpful, thank you. A couple of follow ups: - Would using the last date of the patient in observation_period give a conservative estimate of the last known date of being alive for patients without a date of death? Or is that table not populated in the data? What can I expect in the end-date field of the observation_period field of that table? Would it also be truncated to the End of Data date? - Is there an estimate of what fraction of the population in the DB might have moved out of Washington state and hence likely to not have a date of death?
Hi @avati, The correct assumption for when a patient was alive would be the largest date available in the death table (which I believe in the synthetic data is Feb 1st, 2019). We cut off all data after that last death record, so you can assume that patients that are present in the dataset and who do not have death record have not passed away as of that latest death record. There is a caveat. The death table is composed of both patients who have passed away while at a UW clinic and patients who pass away in the State of Washington outside of a UW clinic (we link state death records to patients). If a patient came to UW but them moved out of state and passed away, we have no way of linking their death records with their UW clinical records. Hopefully this clarified things, Thank you, Tim

question on data page is loading…