Article Metrics


Online attention

Characterising fitness landscapes of protein evolution by next-generation sequencing

DOI: 10.17863/CAM.58220 DOI Help

Authors: Maya Petek (University of Cambridge)
Co-authored by industrial partner: No

Type: Thesis

State: Published (Approved)
Published: October 2020

Open Access Open Access

Abstract: A protein’s amino acid sequence determines its structural, chemical and physical properties, yet how sequence variation influences protein function is still incompletely understood. Protein fitness landscapes powerfully describe the sequence-function relationship by dividing sequence space into functional hills and valleys. This representation is often invoked yet lacks experimental evidence; the immense vastness of possible sequence space makes comprehensive high-quality datasets difficult to obtain. Laboratory directed evolution has focused on optimal utilisation of substitution libraries, however examination of functional innovation in Nature shows that short insertions and deletions (InDels) also play a key role. Beyond rare targeted studies of specific InDels, high-throughput data on fitness landscape for mutations other than substitutions are lacking entirely. In my PhD, I worked towards experimentally describing the fitness landscapes of InDels and substitutions in three systems: GFP, phosphotriesterase (PTE) and the kinase MKK1 docking domain. Towards this goal, I established two experimental assays (GFP, PTE) for deep mutational scanning and a new software toolkit, InDelScanner, for interpreting resulting data that contain InDels. With GFP, I sorted the deletions and substitution libraries into three activity fractions using FACS, then deep sequenced them with Illumina MiSeq to obtain a pilot dataset. The comparison of deletion effects between different lengths of deletions (-3, -6 and -9 bp) indicates that deletions are partially tolerated in eGFP, with tolerance improved for short deletions and in the stabilised starting point GFP8. Further interpretation of data was complicated by limited resolution in the sequencing dataset stemming from poor FACS separation, so I optimised the conditions for better sorting resolution using the mKate2 fluorescent protein as an expression reporter. In the second iteration of the activity sorting I additionally included UMIs in the plasmid design to improve the utilisation of NGS capacity. In the case of PTE, I performed proof-of-concept experiments for microfluidic droplet sorting in an integrated device with an in-line incubation line and a fluorescent sorting design. In parallel, testing of solubility and activity of random InDel variants showed that functional InDels do not necessarily suffer from a stability handicap, making InDel mutagenesis a viable strategy for gene randomisation in directed evolution. One challenge of InDel library data analysis is that InDels are not compatible with existing, substitution-focused software. Using the GFP deletions dataset, I developed the InDelScanner scripts which accurately detect, aggregate and filter insertions, deletions and substitutions. Using the scripts for composition analysis of TRIAD libraries in PTE showed these libraries are well balanced and highly diverse. Finally, I used the InDelScanner scripts to interpret a deep mutational scanning dataset that recorded the sequence preferences in the MKK1 docking domain, acting to activate ERK2. This experiment showed that the fitness landscape in this kinase pair is shaped by the activating effect of hydrophobic residues in the docking groove, as well as widespread positive epistasis. Together, the projects in this thesis demonstrate that deep mutational scanning experiments are a powerful method for exploring the sequence-function relationship in proteins, which can extend into comparison of different types of mutations as well as probing their (epistatic) interactions.

Journal Keywords: protein evolution; next generation sequencing; InDelScanner; InDels; insertions and deletions; GFP; phosphotriesterase

Subject Areas: Biology and Bio-materials, Technique Development

Instruments: I04-1-Macromolecular Crystallography (fixed wavelength)