I am guessing that the train set in synthetic data is still not real data. It would be much more useful if some real data is shared so that we can train and optimize models locally. This may improve the performance and accuracy of the models considerably. Otherwise the progress would be much more slower. How can the patient privacy be a concern if the patient names are hidden? Thanks.
Created by Zafer Aydin zaferaydin Hi @zaferaydin,
We understand that having real data would be useful, however there are still privacy concerns even when just the name, or other identifying information, is hidden.
1. Malin, B., Sweeney, L., & Newton, E. (2003). Trail re-identification: learning who you are from where you have been. Proc. LIDAP-WP12. https://dataprivacylab.org/dataprivacy/projects/trails/trails1.pdf
2. Benitez, K., & Malin, B. (2010). Evaluating re-identification risks with respect to the HIPAA privacy rule. Journal of the American Medical Informatics Association: JAMIA, 17(2), 169?177. https://doi.org/10.1136/jamia.2009.000026
Even de-identified EHR data is susceptible to a join attack, where unique patterns in an individual record can be matched to another public record (i.e. voter records) or data set (i.e. news articles) where the public record contains identifying information.
Hope this helps clarify our current policies!
Thanks,
@trberg