skip to main content
10.1145/1389095.1389154acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Mask functions for the symbolic modeling of epistasis using genetic programming

Published: 12 July 2008 Publication History

Abstract

The study of common, complex multifactorial diseases in genetic epidemiology is complicated by nonlinearity in the genotype-to-phenotype mapping relationship that is due, in part, to epistasis or gene-gene interactions. Symobolic discriminant analysis (SDA) is a flexible modeling approach which uses genetic programming (GP) to evolve an optimal predictive model using a predefined collection of mathematical functions, constants, and attributes. This has been shown to be an effective strategy for modeling epistasis. In the present study, we introduce the genetic .mask. as a novel building block which exploits expert knowledge in the form of a pre-constructed relationship between two attributes. The goal of this study was to determine whether the availability of.mask.building blocks improves SDA performance. The results of this study support the idea that pre-processing data improves GP performance.

References

[1]
W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic programming: an introduction: on the automatic evolution of computer programs and its applications. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998.]]
[2]
R. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179--188, 1936.]]
[3]
C. Greene, B. White, and J. Moore. An Expert Knowledge-Guided Mutation Operator for Genome-Wide Genetic Analysis Using Genetic Programming.]]
[4]
L. Hahn, M. Ritchie, and J. Moore. Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions, 2003.]]
[5]
J. KOZA. The Genetic Programming Paradigm: Genetically Breeding Populations of Computer Programs to Solve Problems. Dynamic, Genetic, and Chaotic Programming: The Sixth-Generation, 1992.]]
[6]
J. R. Koza. Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA, 1992.]]
[7]
J. R. Koza. Genetic programming II: automatic discovery of reusable programs. MIT Press, Cambridge, MA, USA, 1994.]]
[8]
J. R. Koza. Genetic Programming IV: Routine Human-Competitive Machine Intelligence. Kluwer Academic Publishers, Norwell, MA, USA, 2003.]]
[9]
J. R. Koza, D. Andre, F. H. Bennett, and M. A. Keane. Genetic Programming III: Darwinian Invention & Problem Solving. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1999.]]
[10]
W. B. Langdon and R. Poli. Foundations of Genetic Programming. Springer--Verlag, 2002.]]
[11]
W. B. Langdon and K. J. R. Genetic Programming and Data Structures: Genetic Programming + Data Structures = Automatic Programming! Kluwer Academic Publishers, Norwell, MA, USA, 1998.]]
[12]
W. Li and J. Reich. A Complete Enumeration and Classification of Two-Locus Disease Models. Human Heredity, 50:334--349, 2000.]]
[13]
R. Lipshutz, S. Fodor, T. Gingeras, D. Lockhart, et al. High density synthetic oligonucleotide arrays. Nature Genetics, 21(Suppl 1):20--24, 1999.]]
[14]
J. Moore. Cross Validation Consistency for the Assessment of Genetic Programming Results in Microarray Studies. Applications of Evolutionary Computing: EvoWorkshops 2003: EvoBIO, EvoCOP, EvoIASP, EvoMUSART, EvoROB, and EvoSTIM, Essex, UK, April 14--16, 2003: Proceedings, 2003.]]
[15]
J. Moore. Computational analysis of gene-gene interactions using multifactor dimensionality reduction. Expert Rev Mol Diagn, 4(6):795--803, 2004.]]
[16]
J. Moore. Genome-wide analysis of epistasis using multifactor dimensionality reduction: feature selection and construction in the domain of human genetics. Knowledge Discovery and Data Mining: Challenges and Realities with Real World Data, 2006.]]
[17]
J. Moore, N. Barney, C. Tsai, F. Chiang, J. Gui, and B. White. Symbolic Modeling of Epistasis. Hum Hered, 63(2):120--133, 2007.]]
[18]
J. Moore, N. Barney, B. White, R. Riolo, T. Soule, and B. Worzel. Solving Complex Problems In Human Genetics Using. Genetic Programming Theory and Practice {V}, pages 69--86.]]
[19]
J. Moore, J. Gilbert, C. Tsai, F. Chiang, T. Holden, N. Barney, and B. White. A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. Journal of Theoretical Biology, 241(2):252--261, 2006.]]
[20]
J. Moore and J. Parker. Evolutionary computation in microarray data analysis. Methods of Microarray Data Analysis, 2002.]]
[21]
J. Moore, J. Parker, N. Olsen, and T. Aune. Symbolic discriminant analysis of microarray data in autoimmune disease. Genetic Epidemiology, 23(1):57--69, 2002.]]
[22]
J. Moore and B. White. Exploiting expert knowledge in genetic programming for genome-wide genetic analysis. Lecture Notes in Computer Science, 4193:969--977, 2006.]]
[23]
J. Moore and B. White. Genome-wide genetic analysis using genetic programming: The critical need for expert knowledge. Genetic Programming Theory and Practice IV. New York, Springer, 2006.]]
[24]
J. H. Moore, J. S. Parker, and L. W. Hahn. Symbolic discriminant analysis for mining gene expression patterns. Lecture Notes in Artificial Intelligence, 2167:191--205, 2001.]]
[25]
J. Neter, W. Wasserman, and M. Kutner. Applied Linear Statistical Models: Regression, Analysis of Variance, and Experimental Designs. Irwin Homewood, IL, 1990.]]
[26]
R. Neuman and J. Rice. Two-locus models of disease. Genet Epidemiol, 9(5):347--65, 1992.]]
[27]
D. Reif, B. White, N. Olsen, T. Aune, and J. Moore. Complex function sets improve symbolic discriminant analysis of microarray data. Lecture Notes in Computer Science, 2724:2277--2287.]]
[28]
M. Ritchie, L. Hahn, N. Roodi, L. Bailey, W. Dupont, F. Parl, and J. Moore. Multifactor-Dimensionality Reduction Reveals High-Order Interactions among Estrogen-Metabolism Genes in Sporadic Breast Cancer. The American Journal of Human Genetics, 69(1):138---147, 2001.]]
[29]
J. Rowland. Model selection methodology in supervised learning with evolutionary computation. Biosystems, 72(1--2):187--196, 2003.]]
[30]
M. Schena, D. Shalon, R. Davis, P. Brown, et al. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science(Washington), 270(5235):467--470, 1995.]]
[31]
C. Tsai, L. Lai, J. Lin, F. Chiang, J. Hwang, M. Ritchie, J. Moore, K. Hsu, C. Tseng, C. Liau, et al. Renin-Angiotensin System Gene Polymorphisms and Atrial Fibrillation, 2004.]]
[32]
R. Wilke, D. Reif, and J. Moore. Combinatorial pharmacogenetics. Nat Rev Drug Discov, 4(11):911--8, 2005.]]

Cited By

View all
  • (2023)Exploring genetic influences on adverse outcome pathways using heuristic simulation and graph data scienceComputational Toxicology10.1016/j.comtox.2023.10026125(100261)Online publication date: Feb-2023
  • (2017)Mathematical Modeling of Intestinal Iron Absorption Using Genetic ProgrammingPLOS ONE10.1371/journal.pone.016960112:1(e0169601)Online publication date: 10-Jan-2017

Index Terms

  1. Mask functions for the symbolic modeling of epistasis using genetic programming

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        GECCO '08: Proceedings of the 10th annual conference on Genetic and evolutionary computation
        July 2008
        1814 pages
        ISBN:9781605581309
        DOI:10.1145/1389095
        • Conference Chair:
        • Conor Ryan,
        • Editor:
        • Maarten Keijzer
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 12 July 2008

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. function set
        2. genetic analysis
        3. genetic epidemiology
        4. genetic mask
        5. genetic programming
        6. symbolic discriminant analysis
        7. symbolic regression
        8. two-locus model

        Qualifiers

        • Research-article

        Conference

        GECCO08
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 08 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Exploring genetic influences on adverse outcome pathways using heuristic simulation and graph data scienceComputational Toxicology10.1016/j.comtox.2023.10026125(100261)Online publication date: Feb-2023
        • (2017)Mathematical Modeling of Intestinal Iron Absorption Using Genetic ProgrammingPLOS ONE10.1371/journal.pone.016960112:1(e0169601)Online publication date: 10-Jan-2017

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media