## Stephen Yau

Tsinghua University and Beijing Institute of Mathematical Sciences and Applications

4:30 PM

127 Hayes-Healy

A reception will precede the event at 4:00 pm in 101A Crowley Hall

### The geometry of genome space and its applications

Imitating Hilbert who proposed twenty-three problems in mathematics in 1900, DAPRA proposed twenty-three problems in pure and applied mathematics in 2008. These problems will be proven to be very influential for the development of mathematics in 21st-century. In the number 15 of DAPRA problems, we are asked to understand "The Geometry of Genome Space". A genome space consists of all known genomes of living beings and provides insights into their relationships, reflecting the important nature of the genomic universe. Mathematically, the genome space can be considered as the moduli space in mathematics. In this talk, we shall show that genome sequences can be canonically embedded in a high-dimensional Euclidean space by means of their natural vectors which describe the nucleotides distribution information within the genome sequence. In this way, we construct genome space as a subspace in a high-dimensional Euclidean space. In this space, a genome sequence is uniquely represented as a point, and how sequences are distributed in the genome space is determined. The similarity of sequences can be measured by the natural metric which is different from the induced metric from the ambient Euclidean space. Like our physical world, the dark matter / dark energy plays a crucial role in the construction of the correct natural metric in genome space. Here, we report the construction of genome spaces of virus, bacteria, and plants with natural metrics. These metrics are quite different in each genome space because different dark matter / dark energy may bend the space-time as predicted by Einstein theory.

DAPRA problem # 23 asks: What are the Fundamental Laws of Biology? Our convex hull principle for molecular biology states that the convex hull formed from natural vectors of one biological group does not intersect with the convex hull formed from any other biological group. This can be viewed as one of the Fundamental Laws of Biology for which DAPRA has been looking for since 2008. As applications, we provide the first mathematical method to find undiscovered genome sequence. Our theory allows us to explore where SARS-CoV-2 originated from. It provides a novel geometric perspective to study molecular biology. It also gives accurate way for large-scale sequences comparison in real-time manner.

*Stephen Yau, professor at Tsinghua University and a distinguished professor emeritus at the University of Illinois at Chicago, earned his doctorate from State University of New York at Stony Brook, and was a member of Princeton University’s Institute for Advanced Study as well as an assistant professor at Harvard University. He founded the Journal of Algebraic Geometry in 1991. He received the ICCM Chern Prize in 2019. He is an American Mathematical Society Fellow and Institute of Electrical and Electronics Engineers Fellow and received Guggenheim and Sloan Research Fellowships.*

View Poster