Previous attempts to identify genetic risk factors for common diseases have mostly failed because the statistical methods looked for individual SNPs (single nucleotide polymorphisms, i.e., "letters" in the genome). When a disease reduces reproductive fitness, however, any such SNP would be selected against in a few generations. Only combinations of SNPs ("words") can evade selective pressure.
Using a novel computational biostatistics approach for wide-locus GWAS based on u-statistics for multivariate, genetically structured data (US Pat 7,664,616) we were able to identify those common, complex risk factors while reducing the samples size from 100,000 to 500-1500, so that we could focus on mutism as a clearly defined phenotype.
Key words: multivariate data, multiple outcomes, cost-effectiveness, risk, decision support, comparison shopping, personalized medicine, ordinal data, censored data