Refinement of Protein Structure Models with Multi-Objective Genetic Algorithms

Abstract

Here I investigate the protein structure refinement problem for homology-based protein structure models. The refinement problem has been identified as a major bottleneck in the structure prediction process and inhibits the goal of producing high-resolution experimental quality structures for target protein sequences. This thesis is composed of three investigations into aspects of template-based modelling and refinement. In the primary investigation, empirical evidence is provided to support the hypothesis that using multiple template-based structures to model a target sequence can improve the quality of the prediction over that obtained solely by using the single best prediction. A multi-objective genetic algorithm is used to optimize protein structure models by using the structural information from a set of predictions, guided by various objective functions. The effect of multi-objective optimization on model quality is examined. A benchmark of energy functions and model quality assessment methods is performed in the context of automated homology modelling to assess the ability of these methods at discriminating nearer-native structures from a set of predictions. These model quality assessment methods were unable to significantly improve the ranking of threadingbased prediction methods though some model quality assessment methods improved model selection for methods which use sequence information alone. The results suggest that structural informational can provide valuable information for distinguishing better models where only sequence information has been used for modelling. The suitability of these energy functions for high-resolution refinement is discussed. Finally, a stochastic optimization algorithm is developed for refining homology-based protein structure models using evolutionary algorithms. This approach uses multiple structural model inputs, conformational sampling operators, and objective functions for guiding a search through conformational space. Single- and multi-objective genetic variants are applied to homology model predictions for 35 target proteins. The refinement results are discussed and the performance of both algorithmic variants compared and contrasted.