Citizen Scientists will apparently only have access to synthetic data for this challenge. This raises the question of whether the statistics of the synthetic data will be sufficiently similar to real data to be useful as training data for ML models. Can someone clarify this?

Created by Bruce Cragin Bcragin
Hi @Bcragin, I apologize, I was mistaken. Citizen Scientists are only able to access synthetic data. We are going to be using the de-identified dataset and at the moment, the Synthetic data is not going to be very useful for this challenge. Unless you're able to get access through an institution, I'm not sure there's a good way for citizen scientists to get involved in a meaningful way. We'd love to have you get involved, but the N3C data is very heavily protected, which makes having a traditional open challenge very difficult. Thank you, Tim
Hi @trberg, I will go ahead and apply, but per Chris Dillon at NCATS, "Additionally, You will be restricted to Synthetic data If you come in under a citizen scientist DUA?not sure what type of data the BARDA challenge will be using." Given this restriction, along with the rule that the use of "external data" is not allowed, it's hard to see how citizen scientists can participate in any meaningful way.
Hi @Bcragin, You should be able to [apply for access](https://ncats.nih.gov/n3c/about/applying-for-access) to the N3C enclave as a citizen scientist, and I don't believe you'll be limited to synthetic data.

Synthetic Data page is loading…