Hi there, I am a little bit confused about the dataset. For example, we have a count matrix named `DS1C` processed from the scRNA-seq experiment. We get `DS1C_p10k` after downsampling by reads (to 10k in total). Is there any further action taken to generate input files for test? Such as random mask 50% cells to simulate a dropout (`DS1C_p10k_p50mask`)? The `input-ground truth` pair will be - `DS1C_p10k_mask`-`DS1C_p10k`, - `DS1C_p10k_mask`- `DS1C` , or - `DS1C_p10k`- `DS1C` ? Thanks so much!

Created by zoradeng
Hi @zoradeng, For Task 1, the testing data in the `/input` folder were downsampled either by reads or cells (with different proportions), from the ground truth data that was filtered out cells and genes with 0 counts, as well as mitochondrial genes. Therefore, the file structure of all downsampled/input files will be like below. The more details how the data was prepared could be found in the [Data > Task 1: scRNA-seq wiki page ](https://www.synapse.org/#!Synapse:syn26720920/wiki/620137). ``` /input/ ?? ds1c_p00625_n1.csv ?? ds1c_p0125_n2.csv ?? ds1c_p025_n3.csv ?? ds1c_p07_n1.csv ?? ... ?? ds1c_p20k_n3.csv ``` Thank you for your question! I hope it helps.

Principles for test dataset construction page is loading…