idDock+: Integrating Machine Learning in Probabilistic Search for Protein-Protein Docking.

TitleidDock+: Integrating Machine Learning in Probabilistic Search for Protein-Protein Docking.
Publication TypeJournal Article
Year of Publication2015
AuthorsHashmi I, Shehu A
JournalJ Comput Biol
Volume22
Issue9
Pagination806-22
Date Published2015 Sep
ISSN1557-8666
KeywordsAlgorithms, Binding Sites, Computational Biology, Ligands, Machine Learning, Molecular Docking Simulation, Protein Binding, Protein Conformation, Protein Multimerization, Proteins
Abstract

Predicting the three-dimensional native structures of protein dimers, a problem known as protein-protein docking, is key to understanding molecular interactions. Docking is a computationally challenging problem due to the diversity of interactions and the high dimensionality of the configuration space. Existing methods draw configurations systematically or at random from the configuration space. The inaccuracy of scoring functions used to evaluate drawn configurations presents additional challenges. Evidence is growing that optimization of a scoring function is an effective technique only once the drawn configuration is sufficiently similar to the native structure. Therefore, in this article we present a method that employs optimization of a sophisticated energy function, FoldX, only to locally improve a promising configuration. The main question of how promising configurations are identified is addressed through a machine learning method trained a priori on an extensive dataset of functionally diverse protein dimers. To deal with the vast configuration space, a probabilistic search algorithm operates on top of the learner, feeding to it configurations drawn at random. We refer to our method as idDock+, for informatics-driven Docking. idDock+is tested on 15 dimers of different sizes and functional classes. Analysis shows that on all systems idDock+finds a near-native structure and is comparable in accuracy to other state-of-the-art methods. idDock+ represents one of the first highly efficient hybrid methods that combines fast machine learning models with demanding optimization of sophisticated energy scoring functions. Our results indicate that this is a promising direction to improve both efficiency and accuracy in docking.

DOI10.1089/cmb.2015.0108
Alternate JournalJ. Comput. Biol.
PubMed ID26222714