ABSTRACT
Error correcting codes over the DNA alphabet are used as embeddable biomarkers. Error correction provides resilience of identification in spite of sequencing errors. Ring optimization is a type of spatially structured evolutionary algorithm derived from models of ring species in nature. This paper compares the performance of a ring optimizer with a standard evolutionary algorithm at searching for large edit metric codes of a given length and able to correct a specified number of errors. The algorithm also incorporates a novel variation operator called Conway crossover that uses Conway's lexicode algorithm as the basis for a binary variation operator. The ring optimizer is found to yield substantially inferior performance. A new type of statistic, last time of innovation is defined and used to compare the two algorithms. Several improvements to the table of best known edit metric codes are presented. Both algorithms yielded improvements to the table, but the improvements due to the ring optimizer were never better than those located by the standard algorithm and worse half of the time.
- D. Ashlock. Optimization and Modeling with Evolutionary Computation. Springer-Verlag, New York, 2006.Google Scholar
- D. Ashlock, K. Bryden, and D. McCorkle. Logic function induction with the blender algorithm using function stacks. In Intelligent Engineering Systems Through Artificial Neural Networks, volume 19, pages 189--196, Piscataway, NJ, 2009. IEEE Press.Google ScholarCross Ref
- D. Ashlock, L. Guo, and F. Qiu. Greedy closure genetic algorithms. In Proceedings of the 2002 Congress on Evolutionary Computation, pages 1296--1301, Piscataway, NJ, 2002. IEEE Press. Google ScholarDigital Library
- D. Ashlock and S. Houghten. A novel variation operator for more rapid evolution of dna error correcting codes. In Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pages 53--60, 2005.Google ScholarCross Ref
- D. Ashlock and S. Houghten. Dna error correcting codes: No crossover. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pages 38--45, 2009. Google ScholarDigital Library
- D. Ashlock and A. McEachern. Ring optimization of side effect machines. In Intelligent Engineering Systems Through Artificial Neural Networks, volume 19, pages 165--172. ASCME, 2009.Google ScholarCross Ref
- D. Ashlock and T. VonKonigslow. Evolution of artifical ring species. In Proccedings of the 2008 Congress on Evolutionary Computation, pages 653--659, Piscataway NJ, 2008. IEEE Press.Google Scholar
- D. Ashlock, T. VonKonigslow, E. Clare, and W. Ashlock. Transience in the simulation of ring species. In Proccedings of the 2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pages 256--263, Piscataway NJ, 2008. IEEE Press.Google ScholarCross Ref
- D. A. Ashlock, K. M. Bryden, and S. Corns. Small population effects and hybridization. In Proceedings of the 2008 Congress on Evolutionary Computation, pages 2642--2648, Piscataway, NJ, 2008. IEEE Press.Google ScholarCross Ref
- W. Ashlock. Using very small population sizes in genetic programming. In Proceedings of the 2006 Congress on Evolutionary Computation, pages 319--326, Piscataway, NJ, 2006. IEEE Press.Google ScholarCross Ref
- S. Baker, R. Flack, and S. Houghten. Optimal variable-length edit metric and insertion-deletion correcting codes. Congressus Numerantium, 184:65--80, 2007.Google Scholar
- W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann, San Francisco, 1998. Google ScholarDigital Library
- J. K. Campbell. Enumeration and Symmetry of Edit Metric Spaces. PhD thesis, Iowa State University, 2005.Google Scholar
- R. Flack and S. Houghten. Generation of good edit codes from classical hammming distance codes. Congressus Numerantium, 190:97--108, 2008.Google Scholar
- R. Flack and S. Houghten. Quaternary edit codes built from classical hammming distance codes. Technical report, Brock University, 2010.Google Scholar
- D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Boston, 1997. Google ScholarDigital Library
- S. Houghten, D. Ashlock, and J. Lenarz. Construction of optimal edit metric codes. In Proceedings of the 2006 IEEE Workshop on Information Theory, pages 259--263, Piscataway, NJ, 2006. IEEE Press.Google ScholarCross Ref
- D. E. Irwin, S. Bensch, and T. D. Price. Speciation in a ring. Nature, 409:333--337, 2001.Google ScholarCross Ref
- F. Qiu, T. Wen, D. Ashlock, and P. Schnable. Dna sequence-based bar-codes for tracking the origins of ests from a maize cdna library constructed using multiple mrna sources. Plant Physiology, 133:475--481, 2003.Google ScholarCross Ref
- Q. Qiu, D. Burns, P. Mukre, and Q. Wu. Hardware acceleration of multi-deme genetic algorithm for the application of dna codeword searching. In Proccedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pages 1349--1356. Association for Computing Machinery, 2007. Google ScholarDigital Library
- R. C. Stebbins. Speciation in salamanders of the plethodontid genus ensatina. University of California Publications in Zoology, 54:47--124, 1986.Google Scholar
- T. M. Thompson. From Error-Correcting Codes through Sphere Packings to Simple Groups. Cambridge University Press, New York, NY, 2004.Google Scholar
Index Terms
- Ring optimization of edit metric codes in DNA
Recommendations
FPGA accelerated DNA error correction
DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & ExhibitionCorrecting errors in DNA sequencing data is an important process that can improve the quality of downstream analysis using the data. Even though many error-correction methods have been proposed for Illumina reads, their throughput is not high enough to ...
Suffix-Tree Based Error Correction of NGS Reads Using Multiple Manifestations of an Error
BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical InformaticsNext Generation Sequencing (NGS) technologies produce large quantities of short length reads with higher error rates. Erroneous reads that cannot be aligned, are either ignored during de-novo sequencing, or must be suitably corrected. Such reads pose ...
Modeling evolutionary fitness for DNA motif discovery
GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computationThe motif discovery problem consists of finding over-represented patterns in a collection of sequences. Its difficulty stems partly from the large number of possibilities to define both the motif space to be searched and the notion of over-...
Comments