Hello @EHRChallengeParticipants, We have finished Round 1 and have begun Round 2! The final round results and leaderboard for Round 1 will be announced soon. We have two announcements, (1) the release of version 3 of the fast lane synthetic data and (2) the release of a list of concepts that are available in the UW Challenge Data. ###Synthetic Data You can download the [synthetic data here](https://www.synapse.org/#!Synapse:syn20685954). This version of the synthetic data will be used on the Fast Lane to validate submissions. Version 3 of the synthetic data corrects for some previous discrepancies between version 2 and the UW data. - We have added null values to any columns for which the UW data also has null values. The distribution of the null values is similar to that of the UW data. If a column is 90% null in the synthetic data, you can assume that the column in the UW data is approximately that empty as well. - We have corrected for the datetime/date issue where those two columns were not the same date in many of the tables. - We have also made sure that all datetimes have a time associated with them with the notable exception of the *procedure occurrence* table. UW procedures don't currently have times associated with them so the procedure_datetime column is just the dates. The synthetic data reflects this characteristic of the UW data. - We have removed some extra columns such as "Unnamed: 0" and "observation_concept_i" that found their way into the previous versions. ###Concept lists We have also released a list of all concept_ids that occur in at least 100 patients in the UW Challenge data. Each clinical table has its own list of concepts, concept descriptions, and vocabulary_id. You can download the [zip file of the lists and accompanying README here](https://www.synapse.org/#!Synapse:syn21116682) as well as look up each of the concepts up at http://athena.ohdsi.org/ for more information. If you have questions or concerns, or you notice some problems with either of these resources, feel free to leave a comment in this discussion thread. Thank you very much and good luck, EHR DREAM Admin Team

Created by Timothy Bergquist trberg
Which one is the target/label in the data and in which csv file? Thanks!
Is there an WebAPI on Athena or other site where I can send an OMOP concept and get the description back? Thank you.
Hi @paulperry, Are you looking at the synthetic data for those 100 significant features? The distribution of concepts in the synthetic data is not the same as the UW data. I created that file based on the UW data. If we saw a concept in more than 100 patients, we included it in the list. If you are looking for a comprehensive vocabulary list, you can download the SNOMED, RxNorm, and LOINC vocabularies from athena. Thanks, Tim
The concept_id list is not the full list (dropping the concepts with < 100 patients) and more than 100 significant features (concepts) for the training set don't have id's in concept descriptions. Where can we get the full list of concepts in the CSV format provided? I did not see them on athena.ohdsi.org . Was there some processing done to produce these files? Thx
Hi @Kkadri, By validation data do you mean the data that we will be testing models on after the leaderboard phase? Most of the concepts should be in that set, but you shouldn't assume all will be there. Thanks, Tim
Hello! Can we assume that the concepts in the concept lists are also present in the validation data?
Yes, the concept_id lists are provided as csv files. If you download the linked zip file, you'll find the lists are in csv files with the columns concept_id, concept description, and vocabulary_id. These lists are separated into multiple files for each table as well as one master list for all tables. Thank you, Tim
Will the concept_id list be provided as a csv file like the training and the evaluation files?

New Synthetic Data and the Concept lists page is loading…