I am currently working with the BRIGHTEN and START study datasets (for this project ), but I had a few queries around the original designs of them. In particular with the BRIGHTEN study (I think this paper - https://innovations.bmj.com/content/2/1/14) where there are two apps mentioned, however this distinction is not made in the data available here.
Further Queries I had were:
? I noticed UIDs are grouped by colours, is there a meaning behind these?
? Are calendar dates available?
? Both datasets have a variable called tasktype, where the values are either Survey/Passive, I am not clear what these mean?
o If two activities occur in the same hour and both have values of ?Survey? are they the same activity performed? Or could this be a different activity but both are survey related, e.g. same type but not same activity? If so do you have data that helps distinguish between them?
o For task types of ?passive-sensor?, what is the data being collected and through what mechanism (e.g. a different app on the phone such as apple health, or through a device such as FitBit)?
? The time of day is a good indicator of each tasktype being performed, but another interesting indicator to assess engagement is likely to be duration spent on each page/activity and the time to move between activities. From this data this isn't possible, but is this the re "less clean" data that is available?
? For Brighten, an RCT, I see in the published paper there are withdrawals during the study. If so are these patients in the datasets, and is there a way to identify them, and when they withdraw?
? In start, it is not clear what "Casestatus" refers to?
? The ?list_of_healthCodes_tobe_removed? dataset, does the name mean the dataset is not relevant? Or does it refer to something that needs to be cleaned in these datasets?
I apologise if this is not appropriate for the discussion thread.
Thank you,
Jack