Research

I am generally interested in understanding the relationship between biological sequence, structure and function. Specifically, I study how the primary sequence of an antibody determines its structure and function, how changes in genetic information affect the viral phenotype by developing deep learning models and conducting high-throughput experiments

Background

Viruses pose a significant risk to global health, as exemplified by H1N1 influenza pandemic and the COVID-19 pandemic. Vaccination remains the primary preventive measure against viral infections. However, the rapid evolution of viruses and their ability to evade immune responses often compromise vaccine effectiveness. For example, during the 2023-24 flu season, vaccine effectiveness (VE) against influenza A was only 27% to 46%, and when vaccines did not match the circulating strains, effectiveness dropped to as low as 10% to 20%. Human antibodies play a critical role in fighting viral infections, but they also drive the emergence of new viral variants, such as the SARS-CoV-2 KP.3 variant, which can evade pre-existing antibodies by reducing their binding affinity or neutralization potency. The appearance of the SARS-CoV-2 Omicron variant in 2021, for example, significantly reduced the efficacy of many antibody-based therapies. Nevertheless, the immune system can adapt through natural infections or vaccinations, restoring antibody binding affinity via additional rounds of somatic hypermutation and affinity maturation within germinal centers. This process creates a complex coevolutionary dynamic between viruses and the immune system. Deciphering these dynamics is crucial for developing innovative strategies to monitor, predict, and counteract viral evolution, ultimately enhancing vaccine design and therapeutic effectiveness.

Antibody Response and Design

The immune system, primarily through the action of B cells and the antibodies they produce, exerts selective pressure on viruses, driving their evolution. With the technological advancements in single-cell high-throughput screen and paired B cell receptor sequencing, it is now possible to sequence the B cell repertoire at an unprecedented scale. However, predicting antibody function and structure based solely on these sequences remains a formidable task. Functional characterization of specific B cells often requires downstream experimental techniques, such as ELISA, neutralization assays, X-ray crystallography and cryogenic electron microscopy (cryo-EM), which are labor-intensive, time-consuming, and inherently low throughput.

  • Assembled a dataset of ∼8,000 antibodies to SARS-CoV-2 and ~5,000 antibodies to Influenza Virus.
  • Antibodies have distinct convergent sequence and molecular features.
  • A novel language model specifically designed to analyze memory B cell receptor (BCR) repertoires, with a focus on predicting B cell specificity with high accuracy.
  • Identification of antibody binding motifs and accelerating the influenza antibody discovery.

The Evolution of Influenza Virus

Influenza A viruses continuously accumulate mutations in their major surface proteins, hemagglutinin (HA) and neuraminidase (NA). For instance, since influenza H3N2 virus entered the human population in 1968, its HA and NA proteins have accumulated over 83 and 73 amino acid mutations, respectively. These mutations often drive immune escape, requiring frequent updates to the influenza vaccine. A key challenge in studying influenza virus evolution is forecasting how these mutations affect the virus's functionality and antigenicity.

  • High-throughput combinatorial mutational scanning on antigenic region.
  • Development of a high-throughput mutational fitness screening pipeline to quantify the effects of single amino acid mutations on viral proteins.
  • Quantifying epistasis and biophysical constraint helps to build a model of influenza evolution.

Deep Learning for Antibody-Virus Coevolution

While our previous model has demonstrated its power, it still focused on individual components rather than capturing the full complexity of protein-protein interactions or evolutionary dynamics. Building on my expertise, my future research will focus on developing a multimodal language model that simultaneously incorporates viral and antibody sequences, protein structures, and protein-protein interaction data to comprehensively study the coevolution between viruses and the immune system. The successful execution of this research will advance our understanding of virus-antibody coevolution and provide essential tools for improving vaccine efficacy and therapeutic strategies against emerging viral threats. In the long term, this approach will contribute to unravel the intricate relationships between biological sequences, structures, and functions, from the "Omics" level down to the atomic scale.

  • Building a comprehensive database and develop a novel antibody-viral language model.
  • Developing a multimodal generative model for virus-antibody dynamics.
  • Establishing a continual learning platform for model prediction and experimental feedback.