|
|
Open Access
Abstract: The accuracy of the information in the Protein Data Bank (PDB) is of great importance for the myriad downstream applications that make use of protein structural information. Despite best efforts, the occasional introduction of errors is inevitable, especially where the experimental data are of limited resolution. A novel protein structure validation approach based on spotting inconsistencies between the residue contacts and distances observed in a structural model and those computationally predicted by methods such as AlphaFold2 has previously been established. It is particularly well suited to the detection of register errors. Importantly, this new approach is orthogonal to traditional methods based on stereochemistry or map–model agreement, and is resolution independent. Here, thousands of likely register errors are identified by scanning 3–5 Å resolution structures in the PDB. Unlike most methods, the application of this approach yields suggested corrections to the register of affected regions, which it is shown, even by limited implementation, lead to improved refinement statistics in the vast majority of cases. A few limitations and confounding factors such as fold-switching proteins are characterized, but this approach is expected to have broad application in spotting potential issues in current accessions and, through its implementation and distribution in CCP4, helping to ensure the accuracy of future depositions.
|
Nov 2024
|
|
|
|
Rhiju
Das
,
Rachael C.
Kretsch
,
Adam J.
Simpkin
,
Thomas
Mulvaney
,
Phillip
Pham
,
Ramya
Rangan
,
Fan
Bu
,
Ronan M.
Keegan
,
Maya
Topf
,
Daniel J.
Rigden
,
Zhichao
Miao
,
Eric
Westhof
Open Access
Abstract: The prediction of RNA three-dimensional structures remains an unsolved problem. Here, we report assessments of RNA structure predictions in CASP15, the first CASP exercise that involved RNA structure modeling. Forty-two predictor groups submitted models for at least one of twelve RNA-containing targets. These models were evaluated by the RNA-Puzzles organizers and, separately, by a CASP-recruited team using metrics (GDT, lDDT) and approaches (Z-score rankings) initially developed for assessment of proteins and generalized here for RNA assessment. The two assessments independently ranked the same predictor groups as first (AIchemy_RNA2), second (Chen), and third (RNAPolis and GeneSilico, tied); predictions from deep learning approaches were significantly worse than these top ranked groups, which did not use deep learning. Further analyses based on direct comparison of predicted models to cryogenic electron microscopy (cryo-EM) maps and x-ray diffraction data support these rankings. With the exception of two RNA-protein complexes, models submitted by CASP15 groups correctly predicted the global fold of the RNA targets. Comparisons of CASP15 submissions to designed RNA nanostructures as well as molecular replacement trials highlight the potential utility of current RNA modeling approaches for RNA nanotechnology and structural biology, respectively. Nevertheless, challenges remain in modeling fine details such as noncanonical pairs, in ranking among submitted models, and in prediction of multiple structures resolved by cryo-EM or crystallography.
|
Oct 2023
|
|
|
|
Open Access
Abstract: The results of tertiary structure assessment at CASP15 are reported. For the first time, recognizing the outstanding performance of AlphaFold 2 (AF2) at CASP14, all single-chain predictions were assessed together, irrespective of whether a template was available. At CASP15, there was no single stand-out group, with most of the best-scoring groups—led by PEZYFoldings, UM-TBM, and Yang Server—employing AF2 in one way or another. Many top groups paid special attention to generating deep Multiple Sequence Alignments (MSAs) and testing variant MSAs, thereby allowing them to successfully address some of the hardest targets. Such difficult targets, as well as lacking templates, were typically proteins with few homologues. Local divergence between prediction and target correlated with localization at crystal lattice or chain interfaces, and with regions exhibiting high B-factor factors in crystal structure targets, and should not necessarily be considered as representing error in the prediction. However, analysis of exposed and buried side chain accuracy showed room for improvement even in the latter. Nevertheless, a majority of groups produced high-quality predictions for most targets, which are valuable for experimental structure determination, functional analysis, and many other tasks across biology. These include those applying methods similar to those used to generate major resources such as the AlphaFold Protein Structure Database and the ESM Metagenomic atlas: the confidence estimates of the former were also notably accurate.
|
Sep 2023
|
|
|
|
Jon
Agirre
,
Mihaela
Atanasova
,
Haroldas
Bagdonas
,
Charles B.
Ballard
,
Arnaud
Basle
,
James
Beilsten-Edmands
,
Rafael J.
Borges
,
David G.
Brown
,
J. Javier
Burgos-Marmol
,
John M.
Berrisford
,
Paul S.
Bond
,
Iracema
Caballero
,
Lucrezia
Catapano
,
Grzegorz
Chojnowski
,
Atlanta G.
Cook
,
Kevin D.
Cowtan
,
Tristan I.
Croll
,
Judit É.
Debreczeni
,
Nicholas E.
Devenish
,
Eleanor J.
Dodson
,
Tarik R.
Drevon
,
Paul
Emsley
,
Gwyndaf
Evans
,
Phil R.
Evans
,
Maria
Fando
,
James
Foadi
,
Luis
Fuentes-Montero
,
Elspeth F.
Garman
,
Markus
Gerstel
,
Richard J.
Gildea
,
Kaushik
Hatti
,
Maarten L.
Hekkelman
,
Philipp
Heuser
,
Soon Wen
Hoh
,
Michael A.
Hough
,
Huw T.
Jenkins
,
Elisabet
Jiménez
,
Robbie P.
Joosten
,
Ronan M.
Keegan
,
Nicholas
Keep
,
Eugene B.
Krissinel
,
Petr
Kolenko
,
Oleg
Kovalevskiy
,
Victor S.
Lamzin
,
David M.
Lawson
,
Andrey
Lebedev
,
Andrew G. W.
Leslie
,
Bernhard
Lohkamp
,
Fei
Long
,
Martin
Maly
,
Airlie
Mccoy
,
Stuart J.
Mcnicholas
,
Ana
Medina
,
Claudia
Millán
,
James W.
Murray
,
Garib N.
Murshudov
,
Robert A.
Nicholls
,
Martin E. M.
Noble
,
Robert
Oeffner
,
Navraj S.
Pannu
,
James M.
Parkhurst
,
Nicholas
Pearce
,
Joana
Pereira
,
Anastassis
Perrakis
,
Harold R.
Powell
,
Randy J.
Read
,
Daniel J.
Rigden
,
William
Rochira
,
Massimo
Sammito
,
Filomeno
Sanchez Rodriguez
,
George M.
Sheldrick
,
Kathryn L.
Shelley
,
Felix
Simkovic
,
Adam J.
Simpkin
,
Pavol
Skubak
,
Egor
Sobolev
,
Roberto A.
Steiner
,
Kyle
Stevenson
,
Ivo
Tews
,
Jens M. H.
Thomas
,
Andrea
Thorn
,
Josep Triviño
Valls
,
Ville
Uski
,
Isabel
Uson
,
Alexei
Vagin
,
Sameer
Velankar
,
Melanie
Vollmar
,
Helen
Walden
,
David
Waterman
,
Keith S.
Wilson
,
Martyn
Winn
,
Graeme
Winter
,
Marcin
Wojdyr
,
Keitaro
Yamashita
Open Access
Abstract: The Collaborative Computational Project No. 4 (CCP4) is a UK-led international collective with a mission to develop, test, distribute and promote software for macromolecular crystallography. The CCP4 suite is a multiplatform collection of programs brought together by familiar execution routines, a set of common libraries and graphical interfaces. The CCP4 suite has experienced several considerable changes since its last reference article, involving new infrastructure, original programs and graphical interfaces. This article, which is intended as a general literature citation for the use of the CCP4 software suite in structure determination, will guide the reader through such transformations, offering a general overview of the new features and outlining future developments. As such, it aims to highlight the individual programs that comprise the suite and to provide the latest references to them for perusal by crystallographers around the world.
|
Jun 2023
|
|
|
|
Open Access
Abstract: Determination of protein structures typically entails building a model that satisfies the collected experimental observations and its deposition in the Protein Data Bank. Experimental limitations can lead to unavoidable uncertainties during the process of model building, which result in the introduction of errors into the deposited model. Many metrics are available for model validation, but most are limited to consideration of the physico-chemical aspects of the model or its match to the experimental data. The latest advances in the field of deep learning have enabled the increasingly accurate prediction of inter-residue distances, an advance which has played a pivotal role in the recent improvements observed in the field of protein ab initio modelling. Here, new validation methods are presented based on the use of these precise inter-residue distance predictions, which are compared with the distances observed in the protein model. Sequence-register errors are particularly clearly detected and the register shifts required for their correction can be reliably determined. The method is available in the ConKit package (https://www.conkit.org).
|
Dec 2022
|
|
I03-Macromolecular Crystallography
|
Diamond Proposal Number(s):
[12342]
Abstract: Insect juvenile hormones (JHs) are a family of sesquiterpenoid molecules that are secreted into the haemolymph. JHs have multiple roles in insect development, metamorphosis and sexual maturation. A number of pesticides work by chemically mimicking JHs, thus preventing insects from developing and reproducing normally. The haemolymph levels of JH are governed by the rates of its biosynthesis and degradation. One enzyme involved in JH catabolism is JH diol kinase (JHDK), which uses ATP (or GTP) to phosphorylate JH diol to JH diol phosphate, which can be excreted. The X-ray structure of JHDK from the silkworm Bombyx mori has been determined at a resolution of 2.0 Å with an R factor of 19.0% and an Rfree of 24.8%. The structure possesses three EF-hand motifs which are occupied by calcium ions. This is in contrast to the recently reported structure of the JHDK-like-2 protein from B. mori (PDB entry 6kth), which possessed only one calcium ion. Since JHDK is known to be inhibited by calcium ions, it is likely that our structure represents the calcium-inhibited form of the enzyme. The electrostatic surface of the protein suggests a binding site for the triphosphate of ATP close to the N-terminal end of the molecule in a cavity between the N- and C-terminal domains. Superposition with a number of calcium-activated photoproteins suggests that there may be parallels between the binding of JH diol to JHDK and the binding of luciferin to aequorin.
|
Dec 2021
|
|
|
|
Open Access
Abstract: Covariance-based predictions of residue contacts and inter-residue distances are an increasingly popular data type in protein bioinformatics. Here we present ConPlot, a web-based application for convenient display and analysis of contact maps and distograms. Integration of predicted contact data with other predictions is often required to facilitate inference of structural features. ConPlot can therefore use the empty space near the contact map diagonal to display multiple coloured tracks representing other sequence-based predictions. Popular file formats are natively read and bespoke data can also be flexibly displayed. This novel visualisation will enable easier interpretation of predicted contact maps.
|
Jan 2021
|
|
|
|
Open Access
Abstract: The conventional approach in molecular replacement is the use of a related structure as a search model. However, this is not always possible as the availability of such structures can be scarce for poorly characterized families of proteins. In these cases, alternative approaches can be explored, such as the use of small ideal fragments that share high, albeit local, structural similarity with the unknown protein. Earlier versions of AMPLE enabled the trialling of a library of ideal helices, which worked well for largely helical proteins at suitable resolutions. Here, the performance of libraries of helical ensembles created by clustering helical segments is explored. The impacts of different B-factor treatments and different degrees of structural heterogeneity are explored. A 30% increase in the number of solutions obtained by AMPLE was observed when using this new set of ensembles compared with the performance with ideal helices. The boost in performance was notable across three different fold classes: transmembrane, globular and coiled-coil structures. Furthermore, the increased effectiveness of these ensembles was coupled to a reduction in the time required by AMPLE to reach a solution. AMPLE users can now take full advantage of this new library of search models by activating the `helical ensembles' mode.
|
Oct 2020
|
|
I04-Macromolecular Crystallography
|
Adam J.
Simpkin
,
Felix
Simkovic
,
Jens M. H.
Thomas
,
Martin
Savko
,
Andrey
Lebedev
,
Ville
Uski
,
Charles
Ballard
,
Marcin
Wojdyr
,
Rui
Wu
,
Ruslan
Sanishvili
,
Yibin
Xu
,
María-Natalia
Lisa
,
Alejandro
Buschiazzo
,
William
Shepard
,
Daniel J.
Rigden
,
Ronan M.
Keegan
Diamond Proposal Number(s):
[15945]
Open Access
Abstract: The conventional approach to finding structurally similar search models for use in molecular replacement (MR) is to use the sequence of the target to search against those of a set of known structures. Sequence similarity often correlates with structure similarity. Given sufficient similarity, a known structure correctly positioned in the target cell by the MR process can provide an approximation to the unknown phases of the target. An alternative approach to identifying homologous structures suitable for MR is to exploit the measured data directly, comparing the lattice parameters or the experimentally derived structure-factor amplitudes with those of known structures. Here, SIMBAD, a new sequence-independent MR pipeline which implements these approaches, is presented. SIMBAD can identify cases of contaminant crystallization and other mishaps such as mistaken identity (swapped crystallization trays), as well as solving unsequenced targets and providing a brute-force approach where sequence-dependent search-model identification may be nontrivial, for example because of conformational diversity among identifiable homologues. The program implements a three-step pipeline to efficiently identify a suitable search model in a database of known structures. The first step performs a lattice-parameter search against the entire Protein Data Bank (PDB), rapidly determining whether or not a homologue exists in the same crystal form. The second step is designed to screen the target data for the presence of a crystallized contaminant, a not uncommon occurrence in macromolecular crystallography. Solving structures with MR in such cases can remain problematic for many years, since the search models, which are assumed to be similar to the structure of interest, are not necessarily related to the structures that have actually crystallized. To cater for this eventuality, SIMBAD rapidly screens the data against a database of known contaminant structures. Where the first two steps fail to yield a solution, a final step in SIMBAD can be invoked to perform a brute-force search of a nonredundant PDB database provided by the MoRDa MR software. Through early-access usage of SIMBAD, this approach has solved novel cases that have otherwise proved difficult to solve.
|
Jul 2018
|
|
I03-Macromolecular Crystallography
|
Diamond Proposal Number(s):
[12342]
Abstract: Pullulan-hydrolysing enzymes, more commonly known as debranching enzymes for starch and other polysaccharides, are of great interest and have been widely used in the starch-saccharification industry. Type III pullulan hydrolase from Thermococcus kodakarensis (TK-PUL) possesses both pullulanase and α-amylase activities. Until now, only two enzymes in this class, which are capable of hydrolysing both α-1,4- and α-1,6-glycosidic bonds in pullulan to produce a mixture of maltose, panose and maltotriose, have been described. TK-PUL shows highest activity in the temperature range 95–100°C and has a pH optimum in the range 3.5–4.2. Its unique ability to hydrolyse maltotriose into maltose and glucose has not been reported for other homologous enzymes. The crystal structure of TK-PUL has been determined at a resolution of 2.8 Å and represents the first analysis of a type III pullulan hydrolyse. The structure reveals that the last part of the N-terminal domain and the C-terminal domain are significantly different from homologous structures. In addition, the loop regions at the active-site end of the central catalytic domain are quite different. The enzyme has a well defined calcium-binding site and possesses a rare vicinal disulfide bridge. The thermostability of TK-PUL and its homologues may be attributable to several factors, including the increased content of salt bridges, helical segments, Pro, Arg and Tyr residues and the decreased content of serine.
|
Apr 2018
|
|