Publication

Article Metrics

Citations


Online attention

Protein family-specific models using deep neural networks and transfer learning improve virtual screening and highlight the need for more data

DOI: 10.1021/acs.jcim.8b00350 DOI Help

Authors: Fergus Imrie (University of Oxford) , Anthony R. Bradley (Structural Genomics Consortium, University of Oxford; Diamond Light Source) , Mihaela Van Der Schaar (University of Oxford; Alan Turing Institute) , Charlotte M. Deane (University of Oxford)
Co-authored by industrial partner: No

Type: Journal Paper
Journal: Journal Of Chemical Information And Modeling

State: Published (Approved)
Published: October 2018

Open Access Open Access

Abstract: Machine learning has shown enormous potential for computer-aided drug discovery. Here we show how modern convolutional neural networks (CNNs) can be applied to structure-based virtual screening. We have coupled our densely connected CNN (DenseNet) with a transfer learning approach which we use to produce an ensemble of protein family-specific models. We conduct an in-depth empirical study and provide the first guidelines on the minimum requirements for adopting a protein family-specific model. Our method also highlights the need for additional data, even in data-rich protein families. Our approach outperforms recent benchmarks on the DUD-E data set and an independent test set constructed from the ChEMBL database. Using a clustered cross-validation on DUD-E, we achieve an average AUC ROC of 0.92 and a 0.5% ROC enrichment factor of 79. This represents an improvement in early enrichment of over 75% compared to a recent machine learning benchmark. Our results demonstrate that the continued improvements in machine learning architecture for computer vision apply to structure-based virtual screening.

Journal Keywords: Peptides and proteins; Layers; Machine learning; Power; Receptors

Subject Areas: Chemistry, Medicine, Information and Communication Technology


Technical Areas:

Added On: 23/10/2018 11:16

Documents:
ac45444s.jcim.pdf

Discipline Tags:

Information & Communication Technologies Artificial Intelligence Life Sciences & Biotech Health & Wellbeing Drug Discovery Structural biology Chemistry Biochemistry

Technical Tags: