Based on OHDSI/CommonDataModel (5.3.1 ca1b2376c48c6d72aa581a82ceb5b008ffdc218e), the following fields were missing or null: * vocabulary.vocabulary_version * person.race_concept_id * drug_exposure.drug_exposure_end_date The format of tables in full_synthetic_training_data (syn21039808) differs significantly from fast_lane_synthetic_training_data (syn21038372): * Several, but not all tables have a leading unnamed column, assumed to be a 'row count' * The observation file has a unknown column name `observation_source_concept_i`, at the same time it has a valid `observation_source_concept_id` column * The order of columns in the files differ between full_synthetic_training_data and fast_lane_synthetic_training_data * Timestamp fields are null * Datetime fields may/maynot include time How will the final data differ from the synthetic ? Will field, names, column order in csv files, be consistent?

Created by bwalsh
Thank You. Much appreciated.
Hi @bwalsh, Thanks for bringing these to our attention. I'll work on correcting these issues and release a 3rd version. The unnamed column is an artifact of using pandas to generate the data and I mistakenly left than column in. That won't be present in the UW data. The observation_source_value_i column can be ignored in favor of the correct observation_source_value_id column. I'll be sure to remove that in the next version. As far as the column order, those will be consistent with the OMOP standard. The order of the columns were jumbled in the process of the synthetic data generation process. I'll correct that in the next version. Thank you for your feedback! Tim

Discrepancies with OHDSI/CommonDataModel page is loading…