Hi, I checked several different chromosomes, CMC_MSSM-Penn-Pitt_DLPFC_DNA_imputed_chr10.0-5Mb.gen for instance, and when determining the minor allele frequencies (added up all the genotype and divided by the number of subjects) they were always 0.33 for each snp. It seems like this is an imputation problem? I was doing this to filter by MAF but now am wondering if I am missing something? Thanks!
Created by Christin Glorioso glorioso Ok, gotcha, thanks! Christin-
These files come directly from Impute2. Description of the .gen file is here: http://www.stats.ox.ac.uk/~marchini/software/gwas/file_format.html
Description of the gen_info file can be found here: https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#output_options
I hope that helps. Ah yes, this makes more sense. Forgive me if I am missing something obvious but I couldn't find an explanation of the column headers for the gen_info or the gen files. I am guessing that allele 0 is to the left of allele 1- so if column 4 of the gen file is "A" and column 5 is "T" then allele 0 is "A" and allele 1 is "T". Further, I am guessing that this corresponds to the order from left to right of the next three columns for the first subject, ie. column 6 is P[AA] for subject 1, column 7 is P[AT] for subject 1, and column 8 is P[TT] for subject 1. I also am not sure what the info_type0 column of the info file refers to, for example. Thanks!
Christin @glorioso-
I suspect you're computing your allele frequency incorrectly. The .gen files contain the imputed genotype probabilities (P[00], P[01], P[11] for the 0 and 1 alleles) so you'll need to weight these probabilities by the number of alleles (i.e. sum(P[01]+2*P[11])/2N for the frequency of the 1 allele). I hope that helps.
Solly
Drop files to upload
Imputation problem: imputed gentoype data allele frequencies all 0.33? page is loading…