Quite a few CDEs have DATE types annotated as Enumerated (in addition to others being NonEnumerated as expected).
In addition, some of the enumerated values themselves do not conform to the Display Format.
How should these be reported in our output? Do we need to mix both conforming entries and enumerated value matches?
A few examples are here:
Enumerated |DATE| MM/DD/YYYY| "99999999\Patient deceased, unknown date of death\ \|00000000\Patient still alive\ "
Enumerated |DATE| MM/DD/YYYY| 99999999\Unknown Date\ncit:C17998\|00000000\No date\
Enumerated |DATE| YYYYMMDD| 99999999\Unknown Date\ncit:C17998\|00000000\No date\
Also, I'm not sure how this affects the gold standard annotations or not, but the caDSR seems to have some invalid annotations which come out in the same search:
Enumerated| DATE| "1\YES, PLANNED\|2\Yes, due to toxicity\ \|Male\MALE\ncit:C20197\|female\Female Breast\ncit:C12851"
Enumerated| DATE | 99999999\Unknown Date\ncit:C17998\|88888888\Information on multiple tumors not collected/not applicable for this site\ \|00000000\Single Neoplasm\ncit:C48440 ncit:C3262
Enumerated| DATE | Complicated cysts\Complicated bilateral cysts\ \|Mobile internal echoes\Mobile internal echoes\ \|Fluid-Debris level\Fluid-Debris level\ \|Homogeneous\Homogeneous low-level echoes\
Created by Jeremy Jay jjay Hi Jeremy,
To clarify, the short answer is NO, a single column of data annotations cannot be annotated a mix of matches to both Enumerated and NonEnumerated CDEs. A column of annotations is to one CDE, and one CDE is either Enumerated OR NonEnumerated. Since we are asking for the top 3 best matching CDEs for each column, those 3 may include Enumerated and NonEnumerated CDEs.
Further clarification. There could be instances of CDEs in the dump file with the same "CDE_ID", but they are different versions of the CDE. These different versions will have different Value Domains. A Value Domain **cannot** be both Enumerated and NonEnumerated. This follows the rules specified by ISO/IEC 11179.
The CDE_ID is **not** the unique identifier of a CDE as was previously posted in the documentation; the documentation has been fixed. The uniqueness of a CDE is determined by its CDE Public ID and Version, but we did not include the CDE version number in the dump because it is not needed in the annotations.
A unique CDE is associated with only one DEC and one Value Domain. Since a Value Domain cannot be both Enumerated and NonEnumerated, a unique CDE (represented by a row in the dump file) cannot be both Enumerated and NonEnumerated. A column's annotations are based on only one CDE.
The most important feature is whether the column header matches a CDE name or definition, this is reflected in the documentation of the Manual Annotation Workflow through the "weights" placed on matching different aspects of a CDE to a column. If no possibly matching CDE is found based on name and other semantic details, the gold standard annotates the header as "NOMATCH" and the data values are not annotated because there is not a CDE to compare to.
If the column header is found to match a CDE based on its name or definition details, and the CDE Value Domain is Enumerated each of the column's data values will be annotated with the details of the matched permissible value, or if no matching permissible value "NOMATCH", irrespective of the other attribute of the Value Domain. For example, if a column heading matches the semantics of a CDE well, and the CDE's Value Domain is Enumerated, and a data value did not match a permissible value, that data value should be annotated with "NOMATCH" irrespective of the other value domain attributes such as datatype. It is also possible to match and annotate a column header, while all of the data values are annotated with "NOMATCH", if the CDE was the best match for the column header.
If the column header is found to match a CDE based on its name or definition details, and the CDE Value Domain is NonEnumerated, the data value annotations will be either "CONFORMING" or "NONCOMFORMING" based on the Value Domain's attributes, the most important of which is the DATATYPE.
Sorry for the long explanation, and so many edits! but I hope that helps.
OK Great! Thanks for that information Denise, we can definitely simplify our code then.
So to clarify, test data during the "Leaderboard" phases will be similar to the example data
thus far, e.g. there will not be columns with mixed Enumerated and NonEnumerated values?
CDEs with Enumerated Value Domains should consider the attributes of datatype, unit of measure, minimum/maximum length and display format as only informational. Solutions should focus on the entire CDE Permissible Vales list.
When a Value Domain is created as enumerated in the caDSR, our software does not currently use the other Value Domain attributes, if present, to validate the Permissible Values list, so inconsistencies between the Permissible Values and the other Value Domain attribute will exist.
The "gold standard" was picked based on the best overall match of the combination of column header and its unique set of data values to the CDE including its Value Domain. If no enumerated CDE matches, then then we look at the other attributes of the Value Domain to assess CONFORMING or NONCONFORMING.
An annotation of CONFORMING should only be inserted if the CDE's Value Domain type is "Nonenumerated".
There can be multiple versions of the same CDE_ID in the dump file, this is because CDEs are unique by their Public ID + Version, and we did not include the Version number in the dump as it is not needed in the annotation.
For the specific CDEs with an Enumerated Value Domain and Datatype=DATE in the dump, some applications use these additional versions of the CDE to capture reasons for missing Dates.
**The bottom line is that when a CDE has a Value Domain that is Enumerated, the data values should be compared to the Permissible Values, and the other attributes of Value Domain should be considered informational and could be used in order to break a tie.** Hi Jeremy,
Thanks for your questions. I will check with the team who created the annotations and respond back to you asap. Please continue to ask any more questions you may have so we can be sure to address them on the webinar on the 30th.
Best,
Eric
Drop files to upload
How should Enumerated DATE types be handled? page is loading…