|
Rhiju
Das
,
Rachael C.
Kretsch
,
Adam J.
Simpkin
,
Thomas
Mulvaney
,
Phillip
Pham
,
Ramya
Rangan
,
Fan
Bu
,
Ronan M.
Keegan
,
Maya
Topf
,
Daniel J.
Rigden
,
Zhichao
Miao
,
Eric
Westhof
Open Access
Abstract: The prediction of RNA three-dimensional structures remains an unsolved problem. Here, we report assessments of RNA structure predictions in CASP15, the first CASP exercise that involved RNA structure modeling. Forty-two predictor groups submitted models for at least one of twelve RNA-containing targets. These models were evaluated by the RNA-Puzzles organizers and, separately, by a CASP-recruited team using metrics (GDT, lDDT) and approaches (Z-score rankings) initially developed for assessment of proteins and generalized here for RNA assessment. The two assessments independently ranked the same predictor groups as first (AIchemy_RNA2), second (Chen), and third (RNAPolis and GeneSilico, tied); predictions from deep learning approaches were significantly worse than these top ranked groups, which did not use deep learning. Further analyses based on direct comparison of predicted models to cryogenic electron microscopy (cryo-EM) maps and x-ray diffraction data support these rankings. With the exception of two RNA-protein complexes, models submitted by CASP15 groups correctly predicted the global fold of the RNA targets. Comparisons of CASP15 submissions to designed RNA nanostructures as well as molecular replacement trials highlight the potential utility of current RNA modeling approaches for RNA nanotechnology and structural biology, respectively. Nevertheless, challenges remain in modeling fine details such as noncanonical pairs, in ranking among submitted models, and in prediction of multiple structures resolved by cryo-EM or crystallography.
|
Oct 2023
|
|
|
Open Access
Abstract: The results of tertiary structure assessment at CASP15 are reported. For the first time, recognizing the outstanding performance of AlphaFold 2 (AF2) at CASP14, all single-chain predictions were assessed together, irrespective of whether a template was available. At CASP15, there was no single stand-out group, with most of the best-scoring groups—led by PEZYFoldings, UM-TBM, and Yang Server—employing AF2 in one way or another. Many top groups paid special attention to generating deep Multiple Sequence Alignments (MSAs) and testing variant MSAs, thereby allowing them to successfully address some of the hardest targets. Such difficult targets, as well as lacking templates, were typically proteins with few homologues. Local divergence between prediction and target correlated with localization at crystal lattice or chain interfaces, and with regions exhibiting high B-factor factors in crystal structure targets, and should not necessarily be considered as representing error in the prediction. However, analysis of exposed and buried side chain accuracy showed room for improvement even in the latter. Nevertheless, a majority of groups produced high-quality predictions for most targets, which are valuable for experimental structure determination, functional analysis, and many other tasks across biology. These include those applying methods similar to those used to generate major resources such as the AlphaFold Protein Structure Database and the ESM Metagenomic atlas: the confidence estimates of the former were also notably accurate.
|
Sep 2023
|
|
|
Jon
Agirre
,
Mihaela
Atanasova
,
Haroldas
Bagdonas
,
Charles B.
Ballard
,
Arnaud
Basle
,
James
Beilsten-Edmands
,
Rafael J.
Borges
,
David G.
Brown
,
J. Javier
Burgos-Marmol
,
John M.
Berrisford
,
Paul S.
Bond
,
Iracema
Caballero
,
Lucrezia
Catapano
,
Grzegorz
Chojnowski
,
Atlanta G.
Cook
,
Kevin D.
Cowtan
,
Tristan I.
Croll
,
Judit É.
Debreczeni
,
Nicholas E.
Devenish
,
Eleanor J.
Dodson
,
Tarik R.
Drevon
,
Paul
Emsley
,
Gwyndaf
Evans
,
Phil R.
Evans
,
Maria
Fando
,
James
Foadi
,
Luis
Fuentes-Montero
,
Elspeth F.
Garman
,
Markus
Gerstel
,
Richard J.
Gildea
,
Kaushik
Hatti
,
Maarten L.
Hekkelman
,
Philipp
Heuser
,
Soon Wen
Hoh
,
Michael A.
Hough
,
Huw T.
Jenkins
,
Elisabet
Jiménez
,
Robbie P.
Joosten
,
Ronan M.
Keegan
,
Nicholas
Keep
,
Eugene B.
Krissinel
,
Petr
Kolenko
,
Oleg
Kovalevskiy
,
Victor S.
Lamzin
,
David M.
Lawson
,
Andrey
Lebedev
,
Andrew G. W.
Leslie
,
Bernhard
Lohkamp
,
Fei
Long
,
Martin
Maly
,
Airlie
Mccoy
,
Stuart J.
Mcnicholas
,
Ana
Medina
,
Claudia
Millán
,
James W.
Murray
,
Garib N.
Murshudov
,
Robert A.
Nicholls
,
Martin E. M.
Noble
,
Robert
Oeffner
,
Navraj S.
Pannu
,
James M.
Parkhurst
,
Nicholas
Pearce
,
Joana
Pereira
,
Anastassis
Perrakis
,
Harold R.
Powell
,
Randy J.
Read
,
Daniel J.
Rigden
,
William
Rochira
,
Massimo
Sammito
,
Filomeno
Sanchez Rodriguez
,
George M.
Sheldrick
,
Kathryn L.
Shelley
,
Felix
Simkovic
,
Adam J.
Simpkin
,
Pavol
Skubak
,
Egor
Sobolev
,
Roberto A.
Steiner
,
Kyle
Stevenson
,
Ivo
Tews
,
Jens M. H.
Thomas
,
Andrea
Thorn
,
Josep Triviño
Valls
,
Ville
Uski
,
Isabel
Uson
,
Alexei
Vagin
,
Sameer
Velankar
,
Melanie
Vollmar
,
Helen
Walden
,
David
Waterman
,
Keith S.
Wilson
,
Martyn
Winn
,
Graeme
Winter
,
Marcin
Wojdyr
,
Keitaro
Yamashita
Open Access
Abstract: The Collaborative Computational Project No. 4 (CCP4) is a UK-led international collective with a mission to develop, test, distribute and promote software for macromolecular crystallography. The CCP4 suite is a multiplatform collection of programs brought together by familiar execution routines, a set of common libraries and graphical interfaces. The CCP4 suite has experienced several considerable changes since its last reference article, involving new infrastructure, original programs and graphical interfaces. This article, which is intended as a general literature citation for the use of the CCP4 software suite in structure determination, will guide the reader through such transformations, offering a general overview of the new features and outlining future developments. As such, it aims to highlight the individual programs that comprise the suite and to provide the latest references to them for perusal by crystallographers around the world.
|
Jun 2023
|
|
|
Open Access
Abstract: We report here an assessment of the model refinement category of the 14th round of Critical Assessment of Structure Prediction (CASP14). As before, predictors submitted up to five ranked refinements, along with associated residue-level error estimates, for targets that had a wide range of starting quality. The ability of groups to accurately rank their submissions and to predict coordinate error varied widely. Overall only four groups out-performed a “naïve predictor” corresponding to resubmission of the starting model. Among the top groups there are interesting differences of approach and in the spread of improvements seen: some methods are more conservative, others more adventurous. Some targets were “double-barrelled” for which predictors were offered a high-quality AlphaFold 2 (AF2)-derived prediction alongside another of lower quality. The AF2-derived models were largely unimprovable, many of their apparent errors being found to reside at domain and, especially, crystal lattice contacts. Refinement is shown to have a mixed impact overall on structure-based function annotation methods to predict nucleic acid binding, spot catalytic sites and dock protein structures.
|
Jul 2021
|
|
|
Open Access
Abstract: Covariance-based predictions of residue contacts and inter-residue distances are an increasingly popular data type in protein bioinformatics. Here we present ConPlot, a web-based application for convenient display and analysis of contact maps and distograms. Integration of predicted contact data with other predictions is often required to facilitate inference of structural features. ConPlot can therefore use the empty space near the contact map diagonal to display multiple coloured tracks representing other sequence-based predictions. Popular file formats are natively read and bespoke data can also be flexibly displayed. This novel visualisation will enable easier interpretation of predicted contact maps.
|
Jan 2021
|
|
|
Open Access
Abstract: The conventional approach in molecular replacement is the use of a related structure as a search model. However, this is not always possible as the availability of such structures can be scarce for poorly characterized families of proteins. In these cases, alternative approaches can be explored, such as the use of small ideal fragments that share high, albeit local, structural similarity with the unknown protein. Earlier versions of AMPLE enabled the trialling of a library of ideal helices, which worked well for largely helical proteins at suitable resolutions. Here, the performance of libraries of helical ensembles created by clustering helical segments is explored. The impacts of different B-factor treatments and different degrees of structural heterogeneity are explored. A 30% increase in the number of solutions obtained by AMPLE was observed when using this new set of ensembles compared with the performance with ideal helices. The boost in performance was notable across three different fold classes: transmembrane, globular and coiled-coil structures. Furthermore, the increased effectiveness of these ensembles was coupled to a reduction in the time required by AMPLE to reach a solution. AMPLE users can now take full advantage of this new library of search models by activating the `helical ensembles' mode.
|
Oct 2020
|
|
I04-Macromolecular Crystallography
|
Abstract: Molecular replacement (MR) is the most popular technique to solve the phase problem in macromolecular crystallography. The conventional approach to finding search models for MR is to use the sequence of the target structure to identify a suitable homologue. This approach is based on the assumption that sequence similarity is a useful guide to structural similarity. Whilst largely true, this strategy is not always effective. For example, when a contaminant protein has been crystallised or when the most similar matches sequentially are not the most similar structurally. This thesis describes the development of SIMBAD, a three-step pipeline to perform sequence-independent MR. The first step performs a lattice-parameter search against the entire Protein Data Bank (PDB), rapidly determining whether the protein or a close homologue has been solved in the same crystal form. The second step is designed to screen the data against a database of known contaminants; thus determining if a contaminant protein has been crystallised. The final step is a brute-force search of a non-redundant derivative of the PDB provided by the MoRDa software package. In Chapter 3 the initial implementation of SIMBAD using AMoRe’s fast rotation function is presented, with encouraging results. Testing on a set of structures that covered a wide range of resolution limits, copies in the asymmetric unit, space groups, monomer sizes and secondary-structure types, gave a 40% success-rate with the full MoRDa database search and increased to 52% when combined with the lattice-parameter search. Further validation has come in the form of nine structures deposited to the PDB which used SIMBAD for structure solution. Leading on from the work in Chapter 3, research was carried out on whether the maximum-likelihood enhanced rotation function in Phaser would improve the sensitivity of the full MoRDa database search. Results presented in Chapter 4 show that the use of Phaser yielded a 60% success-rate on the test cases, a marked improvement on the previous iteration of SIMBAD. Combining this method with ensemble search models improved this further to 68%. Lastly, Chapter 5 explores the use of anomalous Fourier maps (AFMs) to validate partial MR solutions obtained from SIMBAD. This was necessary as the absence of sequence information meant that automated model building could not be included in the pipeline as a means to test the correctness of a potential solution. The findings in Chapter 5 demonstrate that when anomalous signal was available, the maximum peak height obtained in AFMs could be combined with R-free to train a classifier with 99% precision and recall.
|
Apr 2020
|
|
I04-Macromolecular Crystallography
|
Adam J.
Simpkin
,
Felix
Simkovic
,
Jens M. H.
Thomas
,
Martin
Savko
,
Andrey
Lebedev
,
Ville
Uski
,
Charles
Ballard
,
Marcin
Wojdyr
,
Rui
Wu
,
Ruslan
Sanishvili
,
Yibin
Xu
,
María-Natalia
Lisa
,
Alejandro
Buschiazzo
,
William
Shepard
,
Daniel J.
Rigden
,
Ronan M.
Keegan
Diamond Proposal Number(s):
[15945]
Open Access
Abstract: The conventional approach to finding structurally similar search models for use in molecular replacement (MR) is to use the sequence of the target to search against those of a set of known structures. Sequence similarity often correlates with structure similarity. Given sufficient similarity, a known structure correctly positioned in the target cell by the MR process can provide an approximation to the unknown phases of the target. An alternative approach to identifying homologous structures suitable for MR is to exploit the measured data directly, comparing the lattice parameters or the experimentally derived structure-factor amplitudes with those of known structures. Here, SIMBAD, a new sequence-independent MR pipeline which implements these approaches, is presented. SIMBAD can identify cases of contaminant crystallization and other mishaps such as mistaken identity (swapped crystallization trays), as well as solving unsequenced targets and providing a brute-force approach where sequence-dependent search-model identification may be nontrivial, for example because of conformational diversity among identifiable homologues. The program implements a three-step pipeline to efficiently identify a suitable search model in a database of known structures. The first step performs a lattice-parameter search against the entire Protein Data Bank (PDB), rapidly determining whether or not a homologue exists in the same crystal form. The second step is designed to screen the target data for the presence of a crystallized contaminant, a not uncommon occurrence in macromolecular crystallography. Solving structures with MR in such cases can remain problematic for many years, since the search models, which are assumed to be similar to the structure of interest, are not necessarily related to the structures that have actually crystallized. To cater for this eventuality, SIMBAD rapidly screens the data against a database of known contaminant structures. Where the first two steps fail to yield a solution, a final step in SIMBAD can be invoked to perform a brute-force search of a nonredundant PDB database provided by the MoRDa MR software. Through early-access usage of SIMBAD, this approach has solved novel cases that have otherwise proved difficult to solve.
|
Jul 2018
|
|