I still a bit confused with the rule of not using 84 gene in situ data. So, what I'm thinking is like this. There are 2 tasks here: 1. I use expression data from the dge normalized data file to extract 20/40/60 genes from 84 genes 2. I use the extracted 20/40/60 gene expression data to find matching position compared to bdtnp file using some classification algorithm Is this correct? Or should I select 20/40/60 genes in the first step other than 84 genes? Meaning I need to remove listed 84 genes and then do feature extraction to select 20/40/60 new genes? Thank you.

Created by Bharata Kalbuaji barbarian
You can use geometry.csv indeed, I meant that you can only use geometry.txt and nothing else from the challenge. Not very useful though...
Hi, Pablo From your comment "No, you have to select your 20/40/60 genes wo using any other information from geometry.csv", 'geometry.csv' shouldn't be 'bdtnp.txt?' Can't we use 'geometry.txt' for selecting 20/40/60 genes also with "dge_normalized.txt"? Thanks
Yes 2) seems ok, remember not to overfit as gene-patterns are also important P
Thanks for your quick reply. So when the genes are selected, it?s valid to do 2) ? (Correction for the previous post: bdtnp.txt and geometry.txt files, not bdtnp.csv and geometry.csv)
No, you have to select your 20/40/60 genes wo using any other information from geometry.csv P
Hi Pablo, Is it valid to 1. use 'bdtnp.csv' and 'geometry.csv' to select 20/40/60 genes? 2. learn a model using 'bdtnp.csv' and 'geometry.csv' to predict locations for 'dge' files, and only 20/40/60 genes selected above are included in the model? Thanks.
The way you are thinking about it seems right to me thanks

Workflow question page is loading…