We would be allowed to select 20/40/60 gene out of ~8k genes but questions limited to 84 genes. Is there any reasons? If we have locations (1-3039 or coordinates of cells), as well as -8k genes expressions of every single cell, the question should be selecting 20/40/60 genes out out of ~8 and making models to predict locations. Correct? Please clarify if I am misunderstanding something here. Thanks! -Samad

Created by S J SAJA
Thanks Pablo!
In short, yes that is ok as our point is that you use less in situs. In long, the 84 in situ genes are the "gold standard", i.e the reference for the cells location obtained from the localization information in BDNTP. Indeed, these 84 genes are the most important transcription factors that determine development and hence cell differentiation. As you can see in the graph on the challenge description, using less genes from the 84 in situs makes the cells's position undetermined, we interpret this as localization information being encoded in these genes. We want to learn how you can make up for this "loss" of information when using less in situ genes (with positional information) by complementing with RNAseq information. Maybe you actually dont need any in situ information from these genes apart from the cell location. Good luck
Hi Pablo, Thanks for quick reply. My other question is, the reason for selecting 20/40/60 genes from 84 in situ genes is because you are sure these 84 genes are the best possible set for predicting locations? But if somebody select 20/40/60 genes from 8k genes and come up with different set of genes and get good prediction accuracy you are ok?
The 20/40/60 genes are selected from the list of 84 in situ genes, you can use all 8k genes in the RNAseq even if they overlap with the 84.

Why sub challenges limited to a subset of 84 genes? page is loading…