Hi,
I noticed you have uploaded an overlapped entrez_id file earlier.
However, I found it failed to match my overlapped entrez_ids, could you provide a file consisting the ensg symbol to entrez_id symbol mapping file?
In the first round, actually, I found a total of 15319 overlapped entrez_ids if I followed the ensg to entrez mapping from the docker sample preprocess script?
Thank you.
Created by Di He DiHe After I switched to use org.Hs.eg.db package,
to match the provided overlapped entrez_ids, for scenarios when one ensemble_id mapped to multiple entrez_ids (ambiguous translation), I have to still keep the ensemble id. Is that how you get the overlapped Entrez_id file?
But I think for the ambiguous Ensemble to Entrez translation, you should delete it. Because it means if your original gene-level file is ensemble based, to translate it to Entrez based gene level file, you have to split one Ensemble id column into multiple Entrez_id columns (ambiguous translation)?
I don't think it makes sense in this case.
Thank you. Dear Di,
Please not that we provide a entrez id based tpm file [here](https://www.synapse.org/#!Synapse:syn10573789). Would start there to make life easier. Our mapping are generated via the org.Hs.eg.db package that we mention in the [resources](https://www.synapse.org/#!Synapse:syn9748391) site. I generally prefer that package over biomart for ease of use and stability though plenty of my colleague use biomart.
for scenarios when one ensemble_id mapped to multiple entrez_ids, I will delete the ensemble id. (using bioMart)
for scenarios when one entrez maps to multiple ensemble_id's, I will keep it also.
And I did check, there are indeed some entrez_ids from the overlapped entrez_id file that don't appear in MMRF dataset (after I map the ensemble_ids to entrez_ids)
I could run my model successfully with that 15319 entrez_ids in the first round.
what do you suggest we do in the second round and final round? I believe the leaderboard validation set will be the same as the first round?
I think such issue could potentially influence following models.
Are there any suggested ways we could follow to make sure no mistakes generated? or if one entrez maps to multiple ensembl_id's. I keep the the entrez in that case, fyi. maybe it is how to deal with the situation when one ensemble_id mapped to multiple entrez_ids? delete or keep it? It is a total of 13666 overlapped entrez id. Dear Di,
Can you give me the numbers of overlapping between our list an yours?
I want to see it ours are all contained in yours or visa versa.