Hi Glass team, ** Edit - for anyone reading this thread going forward, the Clinical_Surgeries table is likely correct as is, the confusion here came from the forward-looking nature of the table (see thread below for meaning of this - I would have misinterpreted the data without knowing that ** I am looking at the Clinical_Surgeries file, syn31121219. Row ID 356 for the patient GLSS-CU-R001 indicates the sample GLSS-CU-R001-TP did not receive TMZ therapy, but then also indicates that TMZ therapy was concurrent with radiotherapy, which is of course logically impossible. Row ID 357 is the same way, for GLSS-CU-R005. Am I missing something? To me, the only two possibilities are that either TMZ concurrence needs to be changed to 0, or that TMZ therapy was received and should have a 1. Thoughts?

Created by Vincent Laufer LauferVA
Hi @LauferVA, Thank you for reaching out and providing your feedback. I understand your concerns regarding the interpretation of the variable `treatment_tmz` and its alignment with the information provided in the Data Dictionary. In the Stupp et al NEJM paper, they provided the treatment regimen as, "radiotherapy plus continuous daily temozolomide (75 mg per square meter of body-surface area per day, 7 days per week from the first to the last day of radiotherapy), followed by six cycles of adjuvant temozolomide (150 to 200 mg per square meter for 5 days during each 28-day cycle)." The purpose of creating the `treatment_concurrent_tmz` and `treatment_tmz` variables was to capture instances where patients received concurrent therapy but did not continue with adjuvant chemotherapy, although this is less common in the GLASS dataset. In the Data Dictionary, `treatment_concurrent_tmz` explicitly addresses concurrent administration of temozolomide with radiotherapy. On the other hand, the `treatment_tmz` variable indicates whether a subject received temozolomide as adjuvant therapy or simply independent of concurrent radiotherapy and temozolomide. We have updated the Data Dictionary to reflect this clarification. For your analysis, I suggest creating an "either/or" TMZ variable by combining the `treatment_concurrent_tmz` and `treatment_tmz` columns. Lastly, I would like to emphasize that a small team within our lab collaborated with researchers/clinicians from various hospitals worldwide to curate this metadata. While we endeavor to maintain accuracy, it is possible that some issues or inconsistencies may have gone undetected. -Kevin
Also - thanks very much for letting me know that the clinical surgeries table is forward looking. That is super helpful. Are any of the other tables that way as well?
@kcjohnson thanks very much for taking time out to reply. A couple items. 1) Sufficient detail is provided in the original post to identify individual rows proposed as being in error. 2) Are you absolutely certain your interpretation of treatment_tmz is correct? Because, if so, the Data Dictionary is fairly misleading, as it states: ``` **treatment_tmz: Indicates whether a subject received temozolomide (tmz) for each surgical sample. "1" = received treatment, "0" = did not receive treatment and "NA".** treatment_tmz_cycles: Number of temozolomide treatment cycles. treatment_tmz_cycles_6: A categorical variable indicating whether a subject received at least six cycles of temozolomide. "1" = received at least 6 cycles, "0" = did not receive at least 6 cycles. treatment_tmz_cycles_notes: Clinical notes with regards to temozolomide therapy. treatment_concurrent_tmz: To indicate whether temozolomide was administered concurrently with radiotherapy. "1" = received treatment, "0" = did not receive treatment, and "NA". treatment_radiotherapy: Indicates whether a subject received radiotherapy. "1" = received treatment, "0" = did not receive treatment. treatment_radiation_dose_gy: The dosage of radiation a subject received, reported in Gray units. treatment_radiation_fractions: The number of smaller number of radiation doses matched with the values listed in "treatment_radiation_dose_gy". ``` The data dictionary makes no mention of any temporal relationship between TMZ therapy and radiotherapy. Of course, the treatment_concurrent_tmz column does. If you are in fact positive that the statement "TMZ is given post-radiotherapy and concomitant chemotherapy" is accurate, I'd recommend putting that phrasing into the data dictionary. Otherwise, it reads like an ever/never column, which are extremely common in biological research. Isn't the typical standard of care biopsy --> if GBM resection --> directly after resection TMZ + radiotherapy? Thus the only way to construe such a column would be that the patient received neoadjuvant TMZ and radiotherapy before the first surgery, which does happen some times, but even in those cases you would be saying youd also not recorded the amount of TMZ prior to surgery 1. This seems to me to be less likely than a simple data entry error ...
Hi Vincent, Thank you for carefully reviewing the data. The `treatment_concurrent_tmz` variable is meant to reflect when TMZ is given during radiotherapy and `treatment_tmz` is meant to reflect when TMZ is given post-radiotherapy and concomitant chemotherapy. So it is possible that this data could be accurate as is. That is, a patient did not receive additional TMZ following the initial RT+TMZ therapy. It's also possible that there were errors in translating the original clinical information into our GLASS format. If you could share which samples you think might be incorrect, we could take a closer look if we still have the data. Note that some of these data were collected years ago, and it may be difficult to resolve everything. I am not sure I understand your proposed source of the error, could you elaborate? The clinical.surgeries table is forward looking such that the treatment information reflects treatment received following surgical resection, but before the subsequent surgery. -Kevin
Ive been looking at this a little more - because these samples are the first surgery for each patient (there are a few others like this in the table as well), I think it is most likely that some amount of the metadata was pasted into the row for the corresponding R1 sample, but was put in the TP line. Otherwise, I am not sure how to construe the data.

Likely Covariate Labeling error in syn31121219 page is loading…