Predicting the mutations of the future SARS-COV-2 variants of concern


Abstract: During the pandemic, the severe acute respiratory syndrome coronavirus 2 (SARS-COV-2) spreads and evolves, and many variants emerge due to the vaccine and natural infection-derived immune pressure. The spike proteins, especially the human ACE2 receptor-binding domain (RBD), are the primary mutation target sites due to their role in infection and antibody neutralization. Dominant variants of concern (VOC), like B.1.351 (Beta), B.1.167.2 (Delta), B.1.1529 (Omicron), B.A.2 (Omicron sublineage), have multiple mutations in the region. Here we develop a novel in silico approach combining the antibody/ACE2 structure modeling and large protein language transformer model on RBD sequences, to successfully model the fitness landscape of the spike RBD, by accurately predicting the effects of mutations on ACE2 binding and antibody escape. We validate the model’s effectiveness with the existing RBD sequences in the GISAID database. Under the immune selection pressure, we find a significant correlation between the model scores and known variants sampling time (Spearman r=0.55). The model can be combined with the genetic algorithm capable of forecasting novel mutations that may cause futural concern. The model forecasts the Alpha and Omicron variants’ mutations in the RBD region before they appeared and spread. We further apply our model to forecast variants that may occur and spread in the future and characterize how these variants combine with antibodies in vitro. This approach could be used for the predictive profiling of the rapidly evolving viral diseases and other potential outbreaks, like antibiotic resistance.

Bio: Wenkai received his Bachelor’s degrees in both Bioscience and Computer Science from 2014 to 2018 at the University of Science and Technology of China, Hefei, China. After that, he received his Master’s degree in Computer Science from King Abdullah University Of Science and Technology (KAUST), Saudi Arabia. Currently, he is a Ph.D. candidate in the Department of Computer, Electrical and Mathematical Sciences & Engineering, at KAUST. His current research includes protein informatics, transcriptomic data analysis, and microbiology.

Abstract: During the pandemic, the severe acute respiratory syndrome coronavirus 2 (SARS-COV-2) spreads and evolves, and many variants emerge due to the vaccine and natural infection-derived immune pressure. The spike proteins, especially the human ACE2 receptor-binding domain (RBD), are the primary mutation target sites due to their role in infection and antibody neutralization. Dominant variants of concern (VOC), like B.1.351 (Beta), B.1.167.2 (Delta), B.1.1529 (Omicron), B.A.2 (Omicron sublineage), have multiple mutations in the region. Here we develop a novel in silico approach combining the antibody/ACE2 structure modeling and large protein language transformer model on RBD sequences, to successfully model the fitness landscape of the spike RBD, by accurately predicting the effects of mutations on ACE2 binding and antibody escape. We validate the model’s effectiveness with the existing RBD sequences in the GISAID database. Under the immune selection pressure, we find a significant correlation between the model scores and known variants sampling time (Spearman r=0.55). The model can be combined with the genetic algorithm capable of forecasting novel mutations that may cause futural concern. The model forecasts the Alpha and Omicron variants’ mutations in the RBD region before they appeared and spread. We further apply our model to forecast variants that may occur and spread in the future and characterize how these variants combine with antibodies in vitro. This approach could be used for the predictive profiling of the rapidly evolving viral diseases and other potential outbreaks, like antibiotic resistance.

Bio: Wenkai received his Bachelor’s degrees in both Bioscience and Computer Science from 2014 to 2018 at the University of Science and Technology of China, Hefei, China.  After that, he received his Master’s degree in Computer Science from King Abdullah University Of Science and Technology (KAUST), Saudi Arabia. Currently, he is a Ph.D. candidate in the Department of Computer, Electrical and Mathematical Sciences & Engineering, at KAUST. His current research includes protein informatics, transcriptomic data analysis, and microbiology.

  • Share this: