Multi-Objective Evolutionary Optimization of DNA Sequences for Molecular Computing


Abstract

DNA computing is a new computing paradigm which uses bio-molecules as information storage media and biochemical tools as information processing operators. It has shown many successful and promising results for various applications. However, DNA computing has suffered from some drawbacks due to the biochemical properties of DNA reactions. Since DNA reactions are probabilistic reactions, it can cause the different results for the same situation, which can be regarded as errors in the computation. To overcome the drawbacks, much works have focused on improving the reliability and efficiency of DNA computing. Especially, many researchers have tried to design the error-minimized DNA sequences (DNA codewords) to improve the reliability of DNA computing. In this dissertation, we focused on DNA codeword design, since it is the basic and important step for DNA computing as well as the most promising methods to overcome the drawbacks of DNA computing. Designing DNA codewords requires libraries of DNA sequences which form the specific duplexes during hybridization while simultaneously prevent other undesirable cross-hybridizations. Therefore, codeword design involves a number of heterogeneous and conflicting design criteria which make traditional optimization methods face difficulties. This dissertation formulated DNA codeword design as a constrained multi-objective optimization problem based on its characteristics and used multi-objective evolutionary algorithms (MOEAs) with constraint handling. MOEAs can handle lots of fitness functions efficiently without the artificial fix-ups such as weights for each fitness functions. First, elitist non-dominated sorting genetic algorithm (NSGA-II) was used based on its wide applications and comparative performance. Then, e-multi-objective evolutionary algorithm (e-MOEA) was applied to DNA codeword design due to its high speed and better performance than NSGA-II. NSGA-II and e-MOEA were compared for DTLZ2 benchmark function to confirm the ability of e-MOEA, then e-MOEA was analyzed to investigate which characteristics of e-MOEA affect the performance. This multi-objective approach was implemented into the integrated DNA computing platform, NACST, which consists of the DNA sequence design tool, sequence analysis tools, and in silico laboratory experiments tool. We can design codewords using MOEAs, then choose the best codeword set by sequence verification of in silico hybridization. The performance of the proposed multi-objective approach was first compared to other sequence design methods. Multi-objective evolutionary sequence optimization by NACST outperformed the other existing codeword design techniques such as single genetic algorithms, simulated annealing, and specialized heuristic method, and thermodynamic method. Then, the codewords optimized by MOEAs were applied to the real applications such as travelling salesman problem, version space learning, and theorem proving. The optimized codewords were verified by laboratory experiments and solved the given problems successfully. Based on these results, the proposed approach was applied to the probe design for microarray which also needs the error-minimized DNA sequences. e-MOEA found 19 probes for DNA microarray to detect human papillomavirus, which are more reliable compared to probe set designed by human experts. Therefore, our multi-objective strategy has been proven as a practical and useful method for any kind of applications which needs error-minimized DNA sequences, i.e. DNA computing, probe design for microarray, and DNA nano-structures. Keywords: DNA computing, DNA codeword design, Multi-objective evolutionary algorithms, Non-dominated sorting genetic algorithm-II, e-multi-objective evolutionary algorithm, Microarray probe design.