|
|
Beatriz
Costa-Gomes
,
Joel
Greer
,
Nikolai
Juraschko
,
James
Parkhurst
,
Jola
Mirecka
,
Marjan
Famili
,
Camila
Rangel-Smith
,
Oliver
Strickson
,
Alan
Lowe
,
Mark
Basham
,
Tom
Burnley
Open Access
Abstract: Ease of access to data, tools and models expedites scientific research. In structural biology there are now numerous open repositories of experimental and simulated data sets. Being able to easily access and utilize these is crucial to allow researchers to make optimal use of their research effort. The tools presented here are useful for collating existing public cryoEM data sets and/or creating new synthetic cryoEM data sets to aid the development of novel data processing and interpretation algorithms. In recent years, structural biology has seen the development of a multitude of machine-learning-based algorithms to aid numerous steps in the processing and reconstruction of experimental data sets and the use of these approaches has become widespread. Developing such techniques in structural biology requires access to large data sets, which can be cumbersome to curate and unwieldy to make use of. In this paper, we present a suite of Python software packages, which we collectively refer to as PERC (profet, EMPIARreader and CAKED). These are designed to reduce the burden which data curation places upon structural biology research. The protein structure fetcher (profet) package allows users to conveniently download and cleave sequences or structures from the Protein Data Bank or AlphaFold databases. EMPIARreader allows lazy loading of Electron Microscopy Public Image Archive data sets in a machine-learning-compatible structure. The Class Aggregator for Key Electron-microscopy Data (CAKED) package is designed to seamlessly facilitate the training of machine-learning models on electron microscopy data, including electron-cryo-microscopy-specific data augmentation and labeling. These packages may be utilized independently or as building blocks in workflows. All are available in open-source repositories and designed to be easily extensible to facilitate more advanced workflows if required.
|
Oct 2025
|
|
VMXi-Versatile Macromolecular Crystallography in situ
|
Open Access
Abstract: A group of three deep-learning tools, referred to collectively as CHiMP (Crystal Hits in My Plate), were created for analysis of micrographs of protein crystallization experiments at the Diamond Light Source (DLS) synchrotron, UK. The first tool, a classification network, assigns images into categories relating to experimental outcomes. The other two tools are networks that perform both object detection and instance segmentation, resulting in masks of individual crystals in the first case and masks of crystallization droplets in addition to crystals in the second case, allowing the positions and sizes of these entities to be recorded. The creation of these tools used transfer learning, where weights from a pre-trained deep-learning network were used as a starting point and repurposed by further training on a relatively small set of data. Two of the tools are now integrated at the VMXi macromolecular crystallography beamline at DLS, where they have the potential to absolve the need for any user input, both for monitoring crystallization experiments and for triggering in situ data collections. The third is being integrated into the XChem fragment-based drug-discovery screening platform, also at DLS, to allow the automatic targeting of acoustic compound dispensing into crystallization droplets.
|
Oct 2024
|
|
|
|
Open Access
Abstract: For cryo-electron tomography (cryo-ET) of beam-sensitive biological specimens, a planar sample geometry is typically used. As the sample is tilted, the effective thickness of the sample along the direction of the electron beam increases and the signal-to-noise ratio concomitantly decreases, limiting the transfer of information at high tilt angles. In addition, the tilt range where data can be collected is limited by a combination of various sample-environment constraints, including the limited space in the objective lens pole piece and the possible use of fixed conductive braids to cool the specimen. Consequently, most tilt series are limited to a maximum of ±70°, leading to the presence of a missing wedge in Fourier space. The acquisition of cryo-ET data without a missing wedge, for example using a cylindrical sample geometry, is hence attractive for volumetric analysis of low-symmetry structures such as organelles or vesicles, lysis events, pore formation or filaments for which the missing information cannot be compensated by averaging techniques. Irrespective of the geometry, electron-beam damage to the specimen is an issue and the first images acquired will transfer more high-resolution information than those acquired last. There is also an inherent trade-off between higher sampling in Fourier space and avoiding beam damage to the sample. Finally, the necessity of using a sufficient electron fluence to align the tilt images means that this fluence needs to be fractionated across a small number of images; therefore, the order of data acquisition is also a factor to consider. Here, an n-helix tilt scheme is described and simulated which uses overlapping and interleaved tilt series to maximize the use of a pillar geometry, allowing the entire pillar volume to be reconstructed as a single unit. Three related tilt schemes are also evaluated that extend the continuous and classic dose-symmetric tilt schemes for cryo-ET to pillar samples to enable the collection of isotropic information across all spatial frequencies. A fourfold dose-symmetric scheme is proposed which provides a practical compromise between uniform information transfer and complexity of data acquisition.
|
Jun 2024
|
|
I23-Long wavelength MX
|
Yishun
Lu
,
Ramona
Duman
,
James
Beilsten-Edmands
,
Graeme
Winter
,
Mark
Basham
,
Gwyndaf
Evans
,
Jos J. A. G.
Kamps
,
Allen M.
Orville
,
Hok-Sau
Kwong
,
Konstantinos
Beis
,
Wesley
Armour
,
Armin
Wagner
Open Access
Abstract: rocessing of single-crystal X-ray diffraction data from area detectors can be separated into two steps. First, raw intensities are obtained by integration of the diffraction images, and then data correction and reduction are performed to determine structure-factor amplitudes and their uncertainties. The second step considers the diffraction geometry, sample illumination, decay, absorption and other effects. While absorption is only a minor effect in standard macromolecular crystallography (MX), it can become the largest source of uncertainty for experiments performed at long wavelengths. Current software packages for MX typically employ empirical models to correct for the effects of absorption, with the corrections determined through the procedure of minimizing the differences in intensities between symmetry-equivalent reflections; these models are well suited to capturing smoothly varying experimental effects. However, for very long wavelengths, empirical methods become an unreliable approach to model strong absorption effects with high fidelity. This problem is particularly acute when data multiplicity is low. This paper presents an analytical absorption correction strategy (implemented in new software AnACor) based on a volumetric model of the sample derived from X-ray tomography. Individual path lengths through the different sample materials for all reflections are determined by a ray-tracing method. Several approaches for absorption corrections (spherical harmonics correction, analytical absorption correction and a combination of the two) are compared for two samples, the membrane protein OmpK36 GD, measured at a wavelength of λ = 3.54 Å, and chlorite dismutase, measured at λ = 4.13 Å. Data set statistics, the peak heights in the anomalous difference Fourier maps and the success of experimental phasing are used to compare the results from the different absorption correction approaches. The strategies using the new analytical absorption correction are shown to be superior to the standard spherical harmonics corrections. While the improvements are modest in the 3.54 Å data, the analytical absorption correction outperforms spherical harmonics in the longer-wavelength data (λ = 4.13 Å), which is also reflected in the reduced amount of data being required for successful experimental phasing.
|
Jun 2024
|
|
I11-High Resolution Powder Diffraction
|
Claire A.
Murray
,
Project M
Scientists
,
Laura
Holland
,
Rebecca
O'Brien
,
Alice
Richards
,
Annabelle
Baker
,
Mark
Basham
,
David
Bond
,
Leigh D.
Connor
,
Sarah J.
Day
,
Jacob
Filik
,
Stuart
Fisher
,
Peter
Holloway
,
Karl
Levik
,
Ronaldo
Mercado
,
Jonathan
Potter
,
Chiu C.
Tang
,
Stephen P.
Thompson
,
Julia E.
Parker
Diamond Proposal Number(s):
[15723]
Open Access
Abstract: Calcite and vaterite crystallisation is strongly influenced by the presence of additives during the reaction process, as demonstrated by organic molecules in biogenic calcium carbonate formation. The effect of additives on the lattice parameters of calcite and vaterite in syntheses are frequently reported, but only as discrete studies discussing a single polymorph. The intertwined nature of these polymorphs, due to their shared reaction pathway, is rarely discussed. In this work we report the results of a large scale citizen science project to explore the influence of amino acids and related additives on both polymorphs, highlighting their differences and commonalities in terms of the effect on the lattice parameters and polymorph selectivity.
|
Jan 2024
|
|
|
|
Open Access
Abstract: Many bioimaging research projects require objects of interest to be identified, located, and then traced to allow quantitative measurement. Depending on the complexity of the system and imaging, instance segmentation is often done manually, and automated approaches still require weeks to months of an individual’s time to acquire the necessary training data for AI models. As such, there is a strong need to develop approaches for instance segmentation that minimize the use of expert annotation while maintaining quality on challenging image analysis problems.
Herein, we present our work on a citizen science project we ran called Science Scribbler: Virus Factory on the Zooniverse platform, in which citizen scientists annotated a cryo-electron tomography volume by locating and categorising viruses using point-based annotations instead of manually drawing outlines. One crowdsourcing workflow produced a database of virus locations, and the other workflow produced a set of classifications of those locations. Together, this allowed mask annotation to be generated for training a deep learning–based segmentation model. From this model, segmentations were produced that allowed for measurements such as counts of the viruses by virus class.
The application of citizen science–driven crowdsourcing to the generation of instance segmentations of volumetric bioimages is a step towards developing annotation-efficient segmentation workflows for bioimaging data. This approach aligns with the growing interest in citizen science initiatives that combine the collective intelligence of volunteers with AI to tackle complex problems while involving the public with research that is being undertaken in these important areas of science.
|
Jan 2024
|
|
|
|
Open Access
Abstract: Simulations of cryo-electron microscopy (cryo-EM) images of biological samples can be used to produce test datasets to support the development of instrumentation, methods, and software, as well as to assess data acquisition and analysis strategies. To be useful, these simulations need to be based on physically realistic models which include large volumes of amorphous ice. The gold standard model for EM image simulation is a physical atom-based ice model produced using molecular dynamics simulations. Although practical for small sample volumes; for simulation of cryo-EM data from large sample volumes, this can be too computationally expensive. We have evaluated a Gaussian Random Field (GRF) ice model which is shown to be more computationally efficient for large sample volumes. The simulated EM images are compared with the gold standard atom-based ice model approach and shown to be directly comparable. Comparison with experimentally acquired data shows the Gaussian random field ice model produces realistic simulations. The software required has been implemented in the Parakeet software package and the underlying atomic models are available online for use by the wider community.
|
Nov 2023
|
|
|
|
Open Access
Abstract: Public participation in research, also known as citizen science, is being increasingly adopted for the analysis of biological volumetric data. Researchers working in this domain are applying online citizen science as a scalable distributed data analysis approach, with recent research demonstrating that non-experts can productively contribute to tasks such as the segmentation of organelles in volume electron microscopy data. This, alongside the growing challenge to rapidly process the large amounts of biological volumetric data now routinely produced, means there is increasing interest within the research community to apply online citizen science for the analysis of data in this context. Here, we synthesise core methodological principles and practices for applying citizen science for analysis of biological volumetric data. We collate and share the knowledge and experience of multiple research teams who have applied online citizen science for the analysis of volumetric biological data using the Zooniverse platform (www.zooniverse.org). We hope this provides inspiration and practical guidance regarding how contributor effort via online citizen science may be usefully applied in this domain.
|
Jun 2023
|
|
|
|
Open Access
Abstract: An emergent volume electron microscopy technique called cryogenic serial plasma focused ion beam milling scanning electron microscopy (pFIB/SEM) can decipher complex biological structures by building a three-dimensional picture of biological samples at mesoscale resolution. This is achieved by collecting consecutive SEM images after successive rounds of FIB milling that expose a new surface after each milling step. Due to instrumental limitations, some image processing is necessary before 3D visualization and analysis of the data is possible. SEM images are affected by noise, drift, and charging effects, that can make precise 3D reconstruction of biological features difficult. This article presents Okapi-EM, an open-source napari plugin developed to process and analyze cryogenic serial pFIB/SEM images. Okapi-EM enables automated image registration of slices, evaluation of image quality metrics specific to pFIB-SEM imaging, and mitigation of charging artifacts. Implementation of Okapi-EM within the napari framework ensures that the tools are both user- and developer-friendly, through provision of a graphical user interface and access to Python programming.
|
Mar 2023
|
|
Krios I-Titan Krios I at Diamond
|
Open Access
Abstract: Electron cryo-tomography is an imaging technique for probing 3D structures with at the nanometer scale. This technique has been used extensively in the biomedical field to study the complex structures of proteins and other macromolecules. With the advancement in technology, microscopes are currently capable of producing images amounting to terabytes of data per day, posing great challenges for scientists as the speed of processing of the images cannot keep up with the ever-higher throughput of the microscopes. Therefore, automation is an essential and natural pathway on which image processing—from individual micrographs to full tomograms—is developing. In this paper, we present Ot2Rec, an open-source pipelining tool which aims to enable scientists to build their own processing workflows in a flexible and automatic manner. The basic building blocks of Ot2Rec are plugins which follow a unified application programming interface structure, making it simple for scientists to contribute to Ot2Rec by adding features which are not already available. In this paper, we also present three case studies of image processing using Ot2Rec, through which we demonstrate the speedup of using a semi-automatic workflow over a manual one, the possibility of writing and using custom (prototype) plugins, and the flexibility of Ot2Rec which enables the mix-and-match of plugins. We also demonstrate, in the Supplementary Material, a built-in reporting feature in Ot2Rec which aggregates the metadata from all process being run, and output them in the Jupyter Notebook and/or HTML formats for quick review of image processing quality. Ot2Rec can be found at https://github.com/rosalindfranklininstitute/ot2rec.
|
Mar 2023
|
|