Hi!
I was wondering which method of conversion from microarray probes to ENSG are you going to use.
For example, if a probe match to multiple ENSG ids how are you going to solve this.
Also, it would be important in my model to know all the microarray platforms that are going to be used.
In the input file you mentioned: Affymetrix HG-U133 Plus 2.0, Affymetrix Human Gene 1.0 ST, Illumina HumanHT-12 V4.0, Illumina HiSeq 2000, Affymetrix Human Gene 1.1 ST, Affymetrix Human Gene PrimeView.
If there is going to be any other it would be good to know.
Thanks!
Created by Martin Guerrero martinguerrero89 Thank you Brian! Yes, it does perfectly! Hi @martinguerrero89
Yes, the platforms listed under platform under the Input file specification below is exhaustive:
https://www.synapse.org/#!Synapse:syn15589870/wiki/592699
On that same page, the ensg.compression.function tells you how we match probes to ENSG. Of the 3 potential approaches listed:
'colMeans' takes the mean expression value of all probes mapping to a gene as that gene's expression; 'choose.max.mad.row' uses the probe with the maximum MAD (median absolute deviation) across all probes mapping to a gene as that gene's expression; 'identity' indicates that the expression value is the same as in the native format.
only colMeans and choose.max.mad.row are used for microarray data and colMeans is used most often. identity may be used for RNA-seq data. This field tells you how how we handle multiple probes mapping to a single ENSG id. Regarding how we handle a single probe mapping to multiple ENSG ids--we would just replicate that value for each ENSG id.
Please let me know if that doesn't answer your question.
Brian