Dear operator.
Hi, I have a one question about the RNA expression data in synthetic data.
I wonder if the synthetic data went through the quartile normalization step(which is perl script) as suggested in data processing section.
Thank you.
KyounJun Lee
Created by kyoungjun lee kyoungjunlee Dear @kyoungjunlee ,
Thanks for participating in this Challenge. The synthetic RNA-seq data was created from the fully processed real data, *at the gene level*. Specifically:
> 1 a random sample with replacement was taken for a given gene across all patient samples.
> 2 the resulting values are checked to ensure there are enough non-zero unique numbers.
> 3 then the values were fit with a lognormal distribution.
> 4 this distribution was then sampled from the appropriate number of times.
This was done independently for each gene. **It should be noted that these values are not for building models or improving accuracy but simply for facilitating data ingest.**