STATISTICAL MODEL SELECTION METHOD TO ANALYZE COMBINATORIAL EFFECTS OF SNPS AND ENVIRONMENTAL FACTORS FOR BINARY DISEASE
Abstract
We propose a model selection method to estimate the relation of multiple SNPs, environmental factors and the binary disease trait. We applied the combination of logistic regression and genetic algorithm for this study. The logistic regression model can capture the continuous effects of environments without categorization, which causes the loss of the information. To construct an accurate prediction rule for binary trait, we adopted Akaike's information criterion (AIC) to find the most effective set of SNPs and environments. That is, the set of SNPs and environments that gives the smallest AIC is chosen as the optimal set. Since the number of combinations of SNPs and environments is usually huge, we propose the use of the genetic algorithm for choosing the optimal SNPs and environments in the sense of AIC. We show the effectiveness of the proposed method through the analysis of the case/control populations of diabetes, Alzheimer's disease and obesity patients. We succeeded in finding an efficient set to predict types of diabetes and some SNPs which have strong interactions to age while it is not significant as a single locus.
Remember to check out the Most Cited Articles! |
---|
Check out Notable Titles in Artificial Intelligence. |