Season 5, Episode 1: Building the DNA Oracle with Eeshit Vaishnav
Episode Contributors: Ayush Noori, Ashton Trotman-Grant, Eeshit Vaishnav
Episode Summary: The expression of genes in our genome to produce proteins and non-coding RNAs, the building blocks of life, is critical to enable life and human biology. So, the ability to predict how much of a gene is expressed based on that gene’s regulatory DNA, or promoter sequence, would help us both understand gene expression, regulation, and evolution, and would also help us design new, synthetic genes for better cell therapies, gene therapies, and other genomic medicines in bioengineering.
However, the process by which gene transcription is regulated is incredibly complex; thus, prediction transcriptional regulation has been an open problem in the field for over half a century. In his work, Eeshit used neural networks to predict the levels of gene expression based on promoter sequences. Then, he reverse engineered the model to design specific sequences that can elicit desired expression levels. Eeshit’s work developing a sequence-to-expression oracle also provided a framework to model and test theories of gene evolution.
About the Guest
Eeshit earned his double major in CS & Engineering and Biological Sciences & Engineering from the Indian Institute of Technology in Kanpur.
During his PhD at MIT, working on Dr. Aviv Regev’s team, he published 4 papers in Nature-family journals, including 2 on the cover and 1 on the cover as first and corresponding author. Eeshit’s work is in Cell, Nature Biotechnology, Nature Medicine, Nature Communications, and beyond.
Key Takeaways
cis-regulatory elements like promoters interact with transcription factors in the cell to regulate gene expression.
Variation in cis-regulatory elements drives phenotypic variation and influences organismal fitness.
Modeling the relationship between promoter sequences and their function – in this case, the expression levels they induce – is important to better understand regulatory evolution and also enable the engineering of regulatory sequences with specific functions with applications across therapeutics and cell-based biomanufacturing.
By cloning 50 million sequences into a yellow fluorescent protein (YFP) expression vector in S. cerevisiae and measuring the YFP levels they induced, Eeshit generated a rich dataset to map yeast promoter sequence to expression levels.
Next, Eeshit trained neural network models, including convolutional neural networks and Transformers, to predict expression from sequence with high accuracy.
Eeshit then “reverse-engineered” these convolutional models to create genetic algorithms that designed sequences which could induce desired expression levels.
Finally, Eeshit’s sequence-to-expression oracle allowed for the computational evaluation of regulatory evolution across different evolutionary scenarios, including genetic drift, stabilizing selection, and directional selection.
Impact
Eeshit’s work developing a sequence-to-expression oracle provided a framework to model and test theories of gene evolution.
This framework can help us both understand gene expression, regulation, and evolution, and design new, synthetic genes for better cell therapies, gene therapies, and other genomic medicines in bioengineering.
Paper: The evolution, evolvability and engineering of gene regulatory DNA