Abstract:
We are investigating ways to emulate evolution in the laboratory in order to create new proteins with desirable properties. This approach circumvents our profound ignorance of how the amino acid sequence encodes protein function and exploits the ability of biological systems to evolve and adapt. Here I will describe recent efforts to accelerate the discovery of novel proteins through a combination of evolutionary and computational design approaches. I will also discuss what we can learn from the resulting protein sequences and the functions they encode.
We have developed computational tools to identify elements of sequence and structure that can be swapped among related proteins while minimizing structural disruption (SCHEMA). Structure-guided SCHEMA recombination of homologous proteins generates diverse sequences which still have a high probability of retaining the parental fold and function. We have used this approach to construct synthetic families of beta-lactamases and cytochrome P450 heme domains which differ from the parents by many dozens of amino acid substitutions on average. Analysis of these laboratory-generated protein families provides new insights into what it takes to make stable, functional enzymes, free from many of the filtering effects of natural selection. Unlike datasets of natural protein sequences, those generated by high throughput sequencing and functional analysis of the laboratory-generated proteins include sequences with nonnatural functions (e.g. not-folded and not-functional sequences, particularly useful for testing folding/function predictions) that explore what is physically possible within a given protein family.