I22-Small angle scattering & Diffraction
|
Minghui
Sun
,
Zheng
Dong
,
Liyuan
Wu
,
Haodong
Yao
,
Wenchao
Niu
,
Deting
Xu
,
Ping
Chen
,
Himadri S.
Gupta
,
Yi
Zhang
,
Yuhui
Dong
,
Chunying
Chen
,
Lina
Zhao
Open Access
Abstract: Structural disclosure of biological materials can help our understanding of design disciplines in nature and inspire research for artificial materials. Synchrotron microfocus X-ray diffraction is one of the main techniques for characterizing hierarchically structured biological materials, especially the 3D orientation distribution of their interpenetrating nanofiber networks. However, extraction of 3D fiber orientation from X-ray patterns is still carried out by iterative parametric fitting, with disadvantages of time consumption and demand for expertise and initial parameter estimates. When faced with high-throughput experiments, existing analysis methods cannot meet the real time analysis challenges. In this work, using the assumption that the X-ray illuminated volume is dominated by two groups of nanofibers in a gradient biological composite, a machine-learning based method is proposed for fast and automatic fiber orientation metrics prediction from synchrotron X-ray micro-focused diffraction data. The simulated data were corrupted in the training procedure to guarantee the prediction ability of the trained machine-learning algorithm in real-world experimental data predictions. Label transformation was used to resolve the jump discontinuity problem when predicting angle parameters. The proposed method shows promise for application in the automatic data-processing pipeline for fast analysis of the vast data generated from multiscale diffraction-based tomography characterization of textured biomaterials.
|
May 2023
|
|
I22-Small angle scattering & Diffraction
|
Zhongzheng
Zhou
,
Chun
Li
,
Xiaoxue
Bi
,
Chenglong
Zhang
,
Yingke
Huang
,
Jian
Zhuang
,
Wenqiang
Hua
,
Zheng
Dong
,
Lina
Zhao
,
Yi
Zhang
,
Yuhui
Dong
Open Access
Abstract: With the advancements in instrumentations of next-generation synchrotron light sources, methodologies for small-angle X-ray scattering (SAXS)/wide-angle X-ray diffraction (WAXD) experiments have dramatically evolved. Such experiments have developed into dynamic and multiscale in situ characterizations, leaving prolonged exposure time as well as radiation-induced damage a serious concern. However, reduction on exposure time or dose may result in noisier images with a lower signal-to-noise ratio, requiring powerful denoising mechanisms for physical information retrieval. Here, we tackle the problem from an algorithmic perspective by proposing a small yet effective machine-learning model for experimental SAXS/WAXD image denoising, allowing more redundancy for exposure time or dose reduction. Compared with classic models developed for natural image scenarios, our model provides a bespoke denoising solution, demonstrating superior performance on highly textured SAXS/WAXD images. The model is versatile and can be applied to denoising in other synchrotron imaging experiments when data volume and image complexity is concerned.
|
Apr 2023
|
|
|
David M.
Rogers
,
Rupesh
Agarwal
,
Josh V.
Vermaas
,
Micholas Dean
Smith
,
Rajitha T.
Rajeshwar
,
Connor
Cooper
,
Ada
Sedova
,
Swen
Boehm
,
Matthew
Baker
,
Jens
Glaser
,
Jeremy C.
Smith
Open Access
Abstract: This dataset contains ligand conformations and docking scores for 1.4 billion molecules docked against 6 structural targets from SARS-CoV2, representing 5 unique proteins: MPro, NSP15, PLPro, RDRP, and the Spike protein. Docking was carried out using the AutoDock-GPU platform on the Summit supercomputer and Google Cloud. The docking procedure employed the Solis Wets search method to generate 20 independent ligand binding poses per compound. Each compound geometry was scored using the AutoDock free energy estimate, and rescored using RFScore v3 and DUD-E machine-learned rescoring models. Input protein structures are included, suitable for use by AutoDock-GPU and other docking programs. As the result of an exceptionally large docking campaign, this dataset represents a valuable resource for discovering trends across small molecule and protein binding sites, training AI models, and comparing to inhibitor compounds targeting SARS-CoV-2. The work also gives an example of how to organize and process data from ultra-large docking screens.
|
Mar 2023
|
|
|
William
Mccorkindale
,
Kadi L.
Saar
,
Daren
Fearon
,
Melissa
Boby
,
Haim
Barr
,
Amir
Ben-Shmuel
,
Nir
London
,
Frank
Von Delft
,
John D.
Chodera
,
Alpha. A.
Lee
,
The
Covid Moonshot Consortium
Open Access
Abstract: A common challenge in drug design pertains to finding chemical modifications to a ligand that increases its affinity to the target protein. An underutilized advance is the increase in structural biology throughput, which has progressed from an artisanal endeavor to a monthly throughput of hundreds of different ligands against a protein in modern synchrotrons. However, the missing piece is a framework that turns high-throughput crystallography data into predictive models for ligand design. Here, we designed a simple machine learning approach that predicts protein–ligand affinity from experimental structures of diverse ligands against a single protein paired with biochemical measurements. Our key insight is using physics-based energy descriptors to represent protein–ligand complexes and a learning-to-rank approach that infers the relevant differences between binding modes. We ran a high-throughput crystallography campaign against the SARS-CoV-2 main protease (MPro), obtaining parallel measurements of over 200 protein–ligand complexes and their binding activities. This allows us to design one-step library syntheses which improved the potency of two distinct micromolar hits by over 10-fold, arriving at a noncovalent and nonpeptidomimetic inhibitor with 120 nM antiviral efficacy. Crucially, our approach successfully extends ligands to unexplored regions of the binding pocket, executing large and fruitful moves in chemical space with simple chemistry.
|
Mar 2023
|
|
E02-JEM ARM 300CF
|
Diamond Proposal Number(s):
[19064, 28749]
Open Access
Abstract: Characterisation of structure across the nanometre scale is key to bridging the gap between the local atomic environment and macro-scale and can be achieved by means of scanning electron nanobeam diffraction (SEND). As a technique, SEND allows for a broad range of samples, due to being relatively tolerant of specimen thickness with low electron dosage. This, coupled with the capacity for automation of data collection over wide areas, allows for statistically representative probing of the microstructure. This paper outlines a versatile, data-driven approach for producing domain maps, and a statistical approach for assessing their applicability. The workflow utilises a Variational AutoEncoder to identify the sources of variance in the diffraction signal, and this, in combination with clustering techniques, is used to produce domain maps. This approach is agnostic to domain crystallinity, requires no prior knowledge of crystal structure, and does not require simulation of a library of expected diffraction patterns.
|
Jan 2023
|
|
|
Open Access
Abstract: We describe how to use several machine learning techniques organized in a learning pipeline to segment and identify cell membrane structures from cryo electron tomograms. These tomograms are difficult to analyze with traditional segmentation tools. The learning pipeline in our approach starts from supervised learning via a special convolutional neural network trained with simulated data. It continues with semi-supervised reinforcement learning and/or a region merging technique that tries to piece together disconnected components belonging to the same membrane structure. A parametric or non-parametric fitting procedure is then used to enhance the segmentation results and quantify uncertainties in the fitting. Domain knowledge is used in generating the training data for the neural network and in guiding the fitting procedure through the use of appropriately chosen priors and constraints. We demonstrate that the approach proposed here works well for extracting membrane surfaces in two real tomogram datasets.
|
Jan 2023
|
|
I13-2-Diamond Manchester Imaging
|
Diamond Proposal Number(s):
[16205]
Open Access
Abstract: Methane (CH4) hydrate dissociation and CH4 release are potential geohazards currently investigated using X-ray computed tomography (XCT). Image segmentation is an important data processing step for this type of research. However, it is often time consuming, computing resource-intensive, operator-dependent, and tailored for each XCT dataset due to differences in greyscale contrast. In this paper, an investigation is carried out using U-Nets, a class of Convolutional Neural Network, to segment synchrotron XCT images of CH4-bearing sand during hydrate formation, and extract porosity and CH4 gas saturation. Three U-Net deployments previously untried for this task are assessed: (1) a bespoke 3D hierarchical method, (2) a 2D multi-label, multi-axis method and (3) RootPainter, a 2D U-Net application with interactive corrections. U-Nets are trained using small, targeted hand-annotated datasets to reduce operator time. It was found that the segmentation accuracy of all three methods surpass mainstream watershed and thresholding techniques. Accuracy slightly reduces in low-contrast data, which affects volume fraction measurements, but errors are small compared with gravimetric methods. Moreover, U-Net models trained on low-contrast images can be used to segment higher-contrast datasets, without further training. This demonstrates model portability, which can expedite the segmentation of large datasets over short timespans.
|
Dec 2022
|
|
|
Open Access
Abstract: Compound availability is a critical property for design prioritization across the drug discovery pipeline. Historically, and despite their multiple limitations, compound-oriented synthetic accessibility scores have been used as proxies for this problem. However, the size of the catalogues of commercially available molecules has dramatically increased over the last decade, redefining the problem of compound accessibility as a matter of budget. In this paper we show that if compound prices are the desired proxy for compound availability, then synthetic accessibility scores are not effective strategies for us in selection. Our approach, CoPriNet, is a retrosynthesis-free deep learning model trained on 2D graph representations of compounds alongside their prices extracted from the Mcule catalogue. We show that CoPriNet provides price predictions that correlate far better with actual compound prices than any synthetic accessibility score. Moreover, unlike standard retrosynthesis methods, CoPriNet is rapid, with execution times comparable to popular synthetic accessibility metrics, and thus is suitable for high-throughput experiments including virtual screening and de novo compound generation. While the Mcule catalogue is a proprietary dataset, the CoPriNet source code and the model trained on the proprietary data as well as the fraction of the catalogue (100 K compound/prices) used as test dataset have been made publicly available at https://github.com/oxpig/CoPriNet.
|
Nov 2022
|
|
|
Open Access
Abstract: The interaction between light and matter provides a sensitive probe of the electronic structure
of materials, on length scales determined by the difference between the incident and outgoing
wavevector of the light. Reflectometry techniques involve scattering off the surface of a
material, placing a detector at a point such that any light reaching the detector must scatter
through a vector approximately parallel to the material’s surface normal. Typically, for various
experiment-specific reasons, the raw data recorded by a detector will not be proportional to
the quantity of interest: the modulus squared of the scattering matrix element ⟨𝑘⃗′
|𝑉 |̂𝑘⟩⃗. This
is particularly true when the length of the scattering vector |𝑄| = | ⃗ 𝑘 −⃗ 𝑘⃗′
| is small, as is the
case in reflectivity experiments. Then, in addition to corrections that must be applied in any
scattering experiment, the finite size of the sample will affect the intensity of the reflected
beam, and it is often necessary to also correct for manual changes to the beam’s attenuation.
For the above reasons, all reflectivity experiments need at least some form of data reduction,
with the exact requirements being experiment specific and often numerous. islatu provides
a simple, performant and rigorously tested library and command-line interface for carrying
out these correction steps, which aims to substantially simplify the process of converting
instrument data to a reflectivity curve. This curve can then be analysed using one of the many
widely available reflectivity fitting tools, such as (Björck & Andersson, 2007), (A. R. Nelson &
Prescott, 2019) and (A. Nelson, 2006).
|
Sep 2022
|
|
|
Open Access
Abstract: A general method to invert parameter distributions of a polydisperse system using data acquired from a small-angle scattering (SAS) experiment is presented. The forward problem, i.e. calculating the scattering intensity given the distributions of any causal parameters of a theoretical model, is generalized as a multi-linear map, characterized by a high-dimensional Green tensor that represents the complete scattering physics. The inverse problem, i.e. finding the maximum-likelihood estimation of the parameter distributions (in free form) given the scattering intensity (either a curve or an image) acquired from an experiment, is formulated as a constrained nonlinear programming (NLP) problem. This NLP problem is solved with high accuracy and efficiency via several theoretical and computational enhancements, such as an automatic data scaling for accuracy preservation and GPU acceleration for large-scale multi-parameter systems. Six numerical examples are presented, including both synthetic tests and solutions to real neutron and X-ray data sets, where the method is compared with several existing methods in terms of their generality, accuracy and computational cost. These examples show that SAS inversion is subject to a high degree of non-uniqueness of solution or structural ambiguity. With an ultra-high accuracy, the method can yield a series of near-optimal solutions that fit data to different acceptable levels.
|
Aug 2022
|
|