The train set in synthetic data

I am guessing that the train set in synthetic data is still not real data. It would be much more useful if some real data is shared so that we can train and optimize models locally. This may improve the performance and accuracy of the models considerably. Otherwise the progress would be much more slower. How can the patient privacy be a concern if the patient names are hidden? Thanks.

Created by Zafer Aydin zaferaydin
Hi @zaferaydin, We understand that having real data would be useful, however there are still privacy concerns even when just the name, or other identifying information, is hidden. 1. Malin, B., Sweeney, L., & Newton, E. (2003). Trail re-identification: learning who you are from where you have been. Proc. LIDAP-WP12. https://dataprivacylab.org/dataprivacy/projects/trails/trails1.pdf 2. Benitez, K., & Malin, B. (2010). Evaluating re-identification risks with respect to the HIPAA privacy rule. Journal of the American Medical Informatics Association: JAMIA, 17(2), 169?177. https://doi.org/10.1136/jamia.2009.000026 Even de-identified EHR data is susceptible to a join attack, where unique patterns in an individual record can be matched to another public record (i.e. voter records) or data set (i.e. news articles) where the public record contains identifying information. Hope this helps clarify our current policies! Thanks, @trberg

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

The train set in synthetic data page is loading…