Hi there, we have a suspicion that some labels in the CIS-PD training dataset are corrupted. As an example, the labels of the first 59 recordings of ``` subject_id 1004 ``` seem all the same, which makes little clinical sense for the **non-ON cases** where _on_off_ ? 0 _dyskinesia_ ? 0 _tremor_ ? 0 Just as an example: A subject, which consistently has a correlation of R=1.0 between the classes on_off, dyskinesia, tremor is hardly to imagine from a clinical standpoint. Could you please double-check and clarify? Cheers, #243IDA ------------------ See the labels here: measurement\_id & subject\_id & on\_off & dyskinesia & tremor \\ cc7b822c-e310-46f0-a8ea-98c95fdb67a1 & 1004 & 1 & 1 & 1 \\ 5163afe8-a6b0-4ea4-b2ba-9b4501dd5912 & 1004 & 0 & 0 & 0 \\ 5cf68c8e-0b7a-4b73-ad4f-015c7a20fb5a & 1004 & 1 & 1 & 1 \\ fb188ae2-2173-4137-9236-19a137a402c2 & 1004 & 3 & 3 & 3 \\ 19a3e9ea-fce1-40b7-9457-2618970beb7b & 1004 & 1 & 1 & 1 \\ e2973da8-1250-4a7c-98d5-b165570a8aeb & 1004 & 1 & 1 & 1 \\ 8548d34c-4771-4ca4-bee4-d47bde435bdc & 1004 & 2 & 2 & 2 \\ 0c579a72-bac5-46a2-8671-1a50620723bf & 1004 & 2 & 2 & 2 \\ bb59d008-25fe-43cc-bf05-6bd6b874eea3 & 1004 & 3 & 3 & 3 \\ 4a1ca52c-2895-4094-bade-246fd474762f & 1004 & 3 & 3 & 3 \\ f53cfd9b-8c52-4d22-a35c-504542170ed3 & 1004 & 3 & 3 & 3 \\ dc90dc36-b4e5-43ec-b3e8-47c39c763c71 & 1004 & 1 & 1 & 1 \\ e31db4f8-f9a5-4273-a874-4bdbc6fcae2c & 1004 & 1 & 1 & 1 \\ d1a9294c-05ad-4eac-9915-7052c2ad98a3 & 1004 & 2 & 2 & 2 \\ cc0d147f-94ea-4637-91d7-d4ceceaf1728 & 1004 & 1 & 1 & 1 \\ 20f1dbcd-0954-4bfd-ad92-9bac1b15beb0 & 1004 & 2 & 2 & 2 \\ c05991ea-ed30-45ee-96a2-8a44d6ac0916 & 1004 & 3 & 3 & 3 \\ 3cf49c01-0499-4bad-9167-67691711204a & 1004 & 3 & 3 & 3 \\ ac449a51-1819-4944-b5c3-ef42be404541 & 1004 & 3 & 3 & 3 \\ 8b7abdf9-5aad-4edc-9bc4-078e29f134d6 & 1004 & 3 & 3 & 3 \\ 6110744d-3f5c-4f2e-9586-f2722352606f & 1004 & 1 & 1 & 1 \\ 68bf2103-4211-45e9-82d7-8b4e713b2e3b & 1004 & 2 & 2 & 2 \\ e93b52ca-83af-46fb-baad-46c934ab4edf & 1004 & 3 & 3 & 3 \\ 0b5f2f06-e73c-4838-9f4c-d68b909a9356 & 1004 & 1 & 1 & 1 \\ 278a1441-2e3a-467d-81c5-143e0298454b & 1004 & 3 & 3 & 3 \\ 18cdf618-e263-4843-9640-f41ad8ff4bde & 1004 & 1 & 1 & 1 \\ 610face1-43e9-4a7c-b1f2-20deba03d587 & 1004 & 1 & 1 & 1 \\ f59c374f-b39d-48d6-aef9-2f42ca3a67e4 & 1004 & 1 & 1 & 1 \\ 66a44bdc-b216-4be1-90aa-d5d05f64ba01 & 1004 & 1 & 1 & 1 \\ dde97977-d155-4f07-8a47-6a318bd530eb & 1004 & 3 & 3 & 3 \\ 979c5c53-30c7-4e9c-87f0-261ea0d79ffe & 1004 & 1 & 1 & 1 \\ e630e9fd-6518-43c0-9312-3254bcf9a8a0 & 1004 & 2 & 2 & 2 \\ 85fee6b9-b3d9-4506-833b-ce5ca3a8d94f & 1004 & 1 & 1 & 1 \\ 29103438-8d6c-41f8-a3b9-89fff8074b6a & 1004 & 2 & 2 & 2 \\ 9230106e-5dcf-4034-bd54-c016f49294d8 & 1004 & 1 & 1 & 1 \\ 73dee9e5-9bec-4dd2-967b-e1bd59b0629d & 1004 & 2 & 2 & 2 \\ 4b269cc2-8f0c-4816-adbf-10c0069b8833 & 1004 & 2 & 2 & 2 \\ 1a90e8a3-dd3f-440a-9161-7886737e9d87 & 1004 & 2 & 2 & 2 \\ f4728921-8a52-468b-b6af-523108a1285f & 1004 & 3 & 3 & 3 \\ daf11494-e6fa-4376-a78a-86c683885764 & 1004 & 4 & 4 & 4 \\ 3444e818-0ee3-4a2b-953a-f4dbc43b5d13 & 1004 & 3 & 3 & 3 \\ f76830fe-e0b0-463a-9162-63b00478067e & 1004 & 3 & 3 & 3 \\ 5f9347a1-bf84-48ee-b3cc-c357401780cf & 1004 & 1 & 1 & 0 \\ 9152519b-4b57-43be-963c-dd7218495001 & 1004 & 0 & 0 & 0 \\ c7312d73-cb34-4025-b8b8-5299b4033e2f & 1004 & 0 & 0 & 0 \\ cc730391-146b-420f-9255-c3185061f178 & 1004 & 0 & 0 & 0 \\ 7fa0d4ab-c159-4335-ad91-6dc3ec812686 & 1004 & 0 & 0 & 0 \\ 50fd9915-06d1-4871-9103-ed125ea75764 & 1004 & 0 & 0 & 0 \\ 11dfbcf2-cd03-4b10-83b4-ad428153b200 & 1004 & 0 & 0 & 0 \\ 476d6522-cd73-43e9-81c6-66980c575453 & 1004 & 0 & 0 & 0 \\ 4e996f6e-4979-4ffb-a017-112100675eed & 1004 & 0 & 0 & 0 \\ e5f4f2d3-9842-462f-8b91-6ccaa6c30a33 & 1004 & 0 & 0 & 0 \\ ef4b3a31-2744-4bec-996f-5c1861478c30 & 1004 & 0 & 0 & 0 \\ 631b2ad6-1b00-46f5-8d74-c0b47a2419f0 & 1004 & 0 & 0 & 0 \\ 1c3dda9b-984c-43b2-9686-0316e2254393 & 1004 & 0 & 0 & 0 \\ ef0da5f7-79e5-45bc-bd12-d70137054762 & 1004 & 0 & 0 & 0 \\ e49db734-9ccf-4581-ae15-cbda39262bbf & 1004 & 0 & 0 & 0 \\ cf15d497-b4d7-4bd4-9ba4-ae3bf0ed38b7 & 1004 & 0 & 0 & 0

Created by Dactyl Hygiea Dactyl
@aragonj - To elaborate on Larsson's explanation, it means that sometimes when patients are asked to keep journals or diaries, they don't fill them out in real-time, but wait to fill them out until they're due (i.e. in the parking lot just before their doctor's appointment), thus making them less accurate.
Dear moderator, Excuse me but May I ask you what do you mean with "parking lot effect"? I´m not so sure that I understand your response correctly. Thanks
Dear @Dactyl Your are right about the clinical interpretation of the reported values and it is indeed odd to see OFF and severe dyskinesia. The participants should have been aware on how to report their symptoms but the level of care and engagement that they participate is likely variable however. In many ways, this is one of the motivations for a challenge like this - self report data isn't unbiased and can be full of issues such as parking lot effects", i.e. where patients and research participants keep diaries in the proverbial parking lot of the clinic as well as an inverse Hawthorne effect.
Hi @phil, thanks for your answer. Yes, as I wrote in my post, in the cases where all values are 0, it actually does make sense, see above. However, with reference to all other cases, i.e. on_off ? 0 dyskinesia ? 0 tremor ? 0 I highly disagree with your comment ``` Likewise for fully off medication (on_off = 4) we might expect severe dyskinesia and tremor (tremor = 4, dyskinesia = 4). ``` While severe OFF and severe tremor at the same time can likely occur, patients exhibiting a severe OFF (4) [sign of missing dopaminergic stimulation] **and** severe dyskinesia, i.e. **hyper**kinesia (4) [sign of severe dopaminergic overstimulation] at the same time, simply makes little / no clinical sense. I believe that either labels are corrupted or patients are very uneducated making their labels not usable in any sensible way.
Hi @Dactyl , Labels with perfect correlation does seem like an eery coincidence, but why should that be surprising in this case? If a participant is fully ON medication (on_off = 0) they are more likely to have no tremor or dyskinesia (tremor = 0, dyskinesia = 0). Likewise for fully off medication (on_off = 4) we might expect severe dyskinesia and tremor (tremor = 4, dyskinesia = 4). In any case, subject 1004 has 174 measurements in the source file provided to us by the CIS-PD folk. Only 82 of these 174 self-reports made it into the training set and 27 into the testing set since the participant wasn't always wearing the smartwatch when they provided a symptom/medication report. Of those 174 measurements in the source file, 120 have the same reported label across on_off, tremor, and dyskinesia. This phenomenon is not uncommon across the other participants, either. 1570 of the 4433 self-reports (again, not all these measurements are included in the training/testing/auxiliary datasets) have on_off = tremor = dyskinesia.

Corrupted Labels? page is loading…