Preliminary structural proteome of the monkeypox virus causing a multi-country outbreak in May 2022
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Preliminary structural proteome of the monkeypox virus causing a multi-country outbreak in May 2022 Lena Parigger Innophore GmbH Andreas Krassnigg Innophore GmbH Stefan Grabuschnig Innophore GmbH Verena Resch Innophore GmbH Karl Gruber Innophore GmbH Georg Steinkellner Innophore GmbH Christian C. Gruber ( christian.gruber@innophore.com ) Innophore GmbH Research Article Keywords: Monkeypox, epidemic, proteome, genome, homology modeling, structure prediction, structural genomics Posted Date: May 26th, 2022 DOI: https://doi.org/10.21203/rs.3.rs-1693803/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Page 1/9
Abstract The monkeypox virus (MPX) belongs to the orthopoxvirus genus of the Poxviridae family, is endemic in parts of Africa, and causes a disease in humans similar to smallpox. The most recent outbreak of MPX in 2022 is already affecting 19 countries on different continents and has consequently become a focus of interest. In particular, a molecular understanding of the virus is essential to study infection processes and pathogen-host interactions, predict tropism changes, or guide drug development and discovery as well as vaccine development or adaptation at a very early stage. Herein we present a study of the structural genome of the currently emerging MPX virus: our analysis revealed 10,043 characteristic candidate open reading frames (ORFs), and a subsequent BLAST search of the non-redundant protein database and PDB reduced the number of suspected ORFs to 925 and 123 protein sequences, respectively. Finally, we provide the 3D structures of these 123 protein sequences, which were predicted by homology modeling and are available for download. Introduction The monkeypox (MPX) virus was initially isolated from orangutans in an Indonesian zoo by Rijk Gispen1,2 in 1949. Its description as a member of the pox-family of viruses occurred almost ten years later in 1958 and was published by Magnus et al. in 19592,3. The approximately 197 kbp long genome of the MPX virus shows over 96% identity to the smallpox variola (VAR) virus. Although being strongly related, neither does the MPX virus seem to be the ancestor of the VAR virus, nor vice versa4. In a previous analysis of the genome from 2002, 190 open reading frames (ORFs) were identified. These largely correspond to the essential geneset of the Orthopoxvirus genus, but differ, e.g,. in the equipment with immunomodulatory and host range genes5,6. While being discovered in primates, the original reservoir of the MPX virus is considered to be rodents such as different species of squirrels or striped grass mice7–9. A transmission of the virus to human hosts was first described in 1970 in the Democratic Republic of the Congo4,9,10. In humans, it causes a zoonotic disease11 with clinical manifestations similar to smallpox disease9. Since the first transmission, repetitive outbreaks in human populations have mainly been reported from within the Congo basin. However, the disease was spread also outside the African continent, where a general rise of cases are thought to relate to the suspension of the smallpox vaccination programme12,13. Since natural and potentially altered variants14 of MPX are reportedly a subject of study in biological-warfare programmes15,16, countries such as the United States have more recently been stockpiling smallpox vaccines as a precaution against bioterrorism and biological warfare17,18, as well as as an antiviral treatment that is said to be effective against various strains of previously identified smallpox.19 In light of the ongoing COVID-19 pandemic and its consequences, the recent outbreak of MPX already involves 19 countries (by May 2022) on different continents and thus has become an additional focus of attention. By definition20, an outbreak is a sudden increase in the incidence of a disease, when the Page 2/9
number of cases exceeds the normal expectation for the location or season. Therefore, prompt response to emerging diseases is essential so that therapeutic and non-therapeutic countermeasures can be taken early, even before a PHEIC, epidemic, or pandemic status is declared. Here we present the first full structural genome of the currently-spreading MPX to support the scientific community’s and pharmaceutical industry's responses to this putative risk. Results And Discussion Investigation of the first genome draft of the emerging MPX21 virus via all possible ORFs with a minimum length of ten amino acids, covering the forward and reverse strand in all three reading frames, resulted in 10,043 distinctive ORFs. A subsequent BLAST search of the non-redundant protein database22 and the PDB decreased the number of putative ORFs to 925 as well as 123 protein sequences, respectively. Structures of those protein sequences that aligned to proteins in the PDB were predicted by homology modeling (Table 1, Fig. 1). A comprehensive dataset on the process parameters and properties of all potential ORFs within the genome draft of MPX21 as well as the protein models generated within this study are available at https://doi.org/10.6084/m9.figshare.19877842.v1. Table 1 | Modeling parameters of the putative structural proteome. Structures of 123 putative ORFs matching with proteins in the PDB were predicted by homology modeling using the Catalophore™ DrugSolver Platform employing Yasara23. Positional information about the respective ORFs within the genome sequence as well as its putative function based on a BLASTP search24,25 of the non-redundant database and modeling parameters such as sequence identity and similarity are summarized. QX: number of ‘X’ in the query sequence, resulting from not resolved sections in the genome sequence; S: strand, on which the putative ORF is located [forward (+) or reverse (-)]; RF: reading frame; HM: homology modeling. Color coding: 0 (red) to 100% (green). Figure 1 | Genomic map of the putative structural proteome. Potential ORFs resulting in matches to proteins in the non-redundant protein database are depicted along the genome sequence draft of MPX21. Putative protein sequences in the forward and the reverse strands in three reading frames each are depicted above and below the genome, respectively, and are labeled by their query ID. Protein structures were modeled from the orange colored ORFs. Yellow colored sections in the genome refer to low quality regions, indicated with ‘N’ in the genome sequence. The figure was created using the Python package Matplotlib26 and Blender 3.1.2, available at www.blender.org. The early-stage structural models presented in this work should promptly serve as an initial collection of putative proteins within the currently spreading MPX, a compound of information which could support timely drug discovery, mutational analyses, and vaccine development. Most probably, the list of models (contained in Table 1) does not represent the complete structural proteome, since a number of 190 ORFs has been described earlier to be present in the MPX genome5. Notably, the list of 925 distinctive ORFs Page 3/9
which showed sequence similarity to entries in the non-redundant database will certainly include additional protein sequences within the MPX proteome, which we expect to be of a size close to 190 ORFs. The remaining potential ORFs of this set may contain fractions of protein sequences involved in the evolutionary origin of this virus. Besides that, it may include as-yet-unidentified physiological proteins of the MPX. Eventually, a further (dynamic) refinement of the initial putative structural proteome presented here should be considered, especially for drug targets as F13L27,28, referring to query sequence 9984. Conclusion The molecular understanding of an emerging disease as MPX or Covid-19 is essential to study infection processes and pathogen-host interactions, predict tropism changes, or guide drug repurposing and discovery as well as vaccine development or adaptation at a very early stage. While traditional structural- biology methods such as NMR, cryo-EM, or X-ray crystallography have advanced significantly in recent decades, even in the rapid international response to the SARS-CoV-2 outbreak in Wuhan/China in January 2020 it took a while until the first experimentally determined (complex-) structures of the SARS-CoV-2 main protease (Mpro) were published by Yang et. al29 and Hilgenfeld et al.30. To overcome the limitations of time- and human-resources-consuming experimental methods, structural bioinformatics provide early insight into the genome from a 3D molecular perspective by predicting genome-wide protein structures for a complete pathogen if the sequence is available. For example, we published the first structural model of SARS-CoV-2 Mpro on 23 January 202031 - one week after the 2019-ncov draft genome was published. This model was in good accordance with the crystal structure (PDB 6LU7), released afterwards, featuring a root-mean-square deviation of 0.6 Ångström for 282 out of 306 superimposed C -atoms. Interestingly, the binding site identified in this model was finally the drug target site of the majority of today's approved or investigated SARS-CoV-2 DAAs such as Paxlovid and others32,33. On a larger scale, early predictive structural genomes enable large-scale virtual screening for drug repurposing or new-drug development34,35, a deeper understanding of viral evolution and its structural implications, and prediction of the significance and impact of emerging viral variants36,37. This allows vaccine developers to monitor and adapt their candidates to emerging variants if required38. Therefore, structural-bioinformatics pipelines39 in combination with transparently shared open scientific data have proven to be an essential early-response tool for outbreaks40. Methodology Identification of potential ORFs in the MPX virus genome sequence On the emergence of a novel virus variant, the identification of the genome sequence provides critical information towards further understanding of its molecular characteristics. As bioinformatic molecular Page 4/9
interaction studies rely on protein models, we developed a pipeline embedded in the Catalophore™ DrugSolver platform to quickly process genome sequences and provide a set of potential proteins, a putative structural proteome. The first step consists of translating both the reverse and forward strand in three reading frames, in order to account for every possible translation frame. Therefore, cases with overlapping genes, shifted translation starts and AUG-codon independent translation initiation41 are also included in the set of initially considered ORFs. Subsequently, the translated sequences are split at stop codons (TAA, TAG, TGA), where only ORFs consisting of a minimum of ten amino acids are accepted. The Biopython packages42 Bio.Entrez, Bio.SeqIO and Bio.Blast are employed for this purpose. In this study, a list of 10,043 distinctive potential ORFs was identified within the recently published genome-sequence draft of the MPX virus21, which caused a multi-country outbreak in May 2022. The number of potential ORFs was limited to protein sequences with an enhanced possibility to be part of the physiological MPX virus proteome by performing a command-line based BLASTP24,25 search with default options of the non- redundant protein database22, resulting in 925 matches. Modeling of the putative proteome In addition to the non-redundant protein database, a BLASTP search with a maximal E value of 0.04 of the PDB was executed, identifying 123 of the 10,043 ORFs as matches to proteins with a structure that has already been solved. These 123 ORFs were subjected to homology modeling using the Catalophore™ DrugSolver Platform employing Yasara23. With six PSI-BLAST iterations, a maximal E value of 0.5 and five templates to consider, 115 of the 123 putative proteins were successfully modeled. For the remaining eight (query sequences 990, 2104, 2322, 5796, 5904, 6017, 7260, and 9571) no suitable template was found within the homology modeling process, which is why the protein structure that matched with the respective query sequence in the BLASTP search was used as a template in these cases. Modeling was performed with a maximum of five alignment variations per template and 50 conformations tried per loop, excluding terminal loop residues. Data availability. Publicly available datasets were analyzed in this study. This data can be found here: https://virological.org/uploads/short-url/39ehomlUxjRilxaJDMispIdu3JO.zip21. The dataset containing the process parameters and properties of all potential ORFs within the genome draft of MPX as well as the final protein structure models generated within this study are available at https://doi.org/10.6084/m9.figshare.19877842.v1. Declarations Acknowledgments Financial support was provided by Innophore GmbH and by the Austrian Research Promotion Agency General Programme funding scheme project nr. 41404876 “VirtualCure - Rapid Development of an Automated & Expandable In-silico High-Throughput Drug Repurposing Screening Pipeline“. The Page 5/9
computational results presented have been achieved in part using the Vienna Scientific Cluster (VSC) and HPC resources provided by Innophore. Technical and infrastructure support was provided by the Amazon Web Services Diagnostic Development Initiative (DDI). Some computational results presented in this manuscript have been produced in cloud computing facilities provided by Amazon Web Services within DDI, project nr. “CC ADV 00502188 2021 TR” entitled “virus.watch/monkeypox”. Catalphore is a registered trademark (AT 295631) of Innophore GmbH. Calculations were carried out using in-house software as described (e.g. pathogen-seqscan) in the methods section embedded in the CatalophoreTM Drug Solver platform with a non-commercial open-science license granted by Innophore GmbH. Author contributions L.P. and S.G. performed the genome analysis, prepared structural models and drafted the manuscript with input from all authors. A.K. contributed to analysis of data, revised the manuscript, advised and contributed to the genome analysis. V.R. supported the visualization of the obtained structural models and created the structural-genome landscape representation. K.G. gave structural advice and structural- biology input for data analysis and gave modeling-pipeline advice. C.C.G. & G.S. contributed in evaluating, preparing and interpreting the data, designed, managed and supervised the project. All authors edited the manuscript to its final form. Declaration of interests a.s. L.P., S.G., A.K. and V.R. report working for Innophore. K.G., G.S., C.C.G. report being shareholders of Innophore, an enzyme and drug discovery company. Additionally, G.S. and C.C.G. report being managing directors of Innophore. The research described here is scientifically and financially independent of the efforts in any of the above mentioned companies and open-science. Competing financial interests The authors declare no competing interests. References 1. Gispen, R. Smallpox reinfections in Indonesia. Ned T Geneesk 93, 3686–3695 (1949). 2. Arita, I. & Henderson, D. A. Smallpox and monkeypox in non-human primates. Bull. World Health Organ. 39, 277–283 (1968). 3. Magnus, P. von, Andersen, E. K., Petersen, K. B. & Birch-Andersen, A. A pox-like disease in cynomolgus monkeys. Acta Pathol. Microbiol. Scand. 46, 156–176 (2009). 4. Shchelkunov, S. N. et al. Human monkeypox and smallpox viruses: genomic comparison. FEBS Lett. 509, 66–70 (2001). 5. Shchelkunov, S. N. et al. Analysis of the Monkeypox Virus Genome. Virology 297, 172–194 (2002). 6. Weaver, J. R. & Isaacs, S. N. Monkeypox virus and insights into its immunomodulatory proteins. Immunol. Rev. 225, 96–113 (2008). Page 6/9
7. Ježek, Z. & Fenner, F. Human monkeypox. (Karger, 1988). 8. Khodakevich, L., Jezek, Z. & Messinger, D. Monkeypox virus: ecology and public health significance. Bull. World Health Organ. 66, 747–752 (1988). 9. Damon, I. K. Status of human monkeypox: clinical disease, epidemiology and research. Vaccine 29 Suppl 4, D54-59 (2011). 10. Marennikova, S. S., Seluhina, E. M., Mal’ceva, N. N., Cimiskjan, K. L. & Macevic, G. R. Isolation and properties of the causal agent of a new variola-like disease (monkeypox) in man. Bull. World Health Organ. 46, 599–611 (1972). 11. Di Giulio, D. B. & Eckburg, P. B. Human monkeypox: an emerging zoonosis. Lancet Infect. Dis. 4, 15– 25 (2004). 12. Bunge, E. M. et al. The changing epidemiology of human monkeypox—A potential threat? A systematic review. PLoS Negl. Trop. Dis. 16, e0010141 (2022). 13. Rimoin, A. W. et al. Major increase in human monkeypox incidence 30 years after smallpox vaccination campaigns cease in the Democratic Republic of Congo. Proc. Natl. Acad. Sci. 107, 16262–16267 (2010). 14. Gilsdorf, J. R. & Zilinskas, R. A. New Considerations in Infectious Disease Outbreaks: The Threat of Genetically Modified Microbes. Clin. Infect. Dis. 40, 1160–1165 (2005). 15. Kuhn, J. H. & Leitenberg, M. The Soviet Biological Warfare Program. in Biological Threats in the 21st Century 79–102 (IMPERIAL COLLEGE PRESS, 2016). doi:10.1142/9781783269488_0005. 16. Tucker, J. B. Biological weapons in the former Soviet Union: An interview with Dr. Kenneth Alibek. Nonproliferation Rev. 6, 1–10 (1999). 17. Henderson, D. A. et al. Smallpox as a Biological Weapon: Medical and Public Health Management. JAMA 281, 2127 (1999). 18. Cieslak, T. J. et al. Beyond the Dirty Dozen: A Proposed Methodology for Assessing Future Bioweapon Threats. Mil. Med. 183, e59–e65 (2018). 19. Kozlov, M. Monkeypox goes global: why scientists are on alert. Nature d41586-022-01421–8 (2022) doi:10.1038/d41586-022-01421-8. 20. Green, M. S. et al. When is an epidemic an epidemic? Isr. Med. Assoc. J. IMAJ 4, 3–6 (2002). 21. First draft genome sequence of Monkeypox virus associated with the suspected multi-country outbreak, May 2022 (confirmed case in Portugal) - Monkeypox. Virological https://virological.org/t/first-draft-genome-sequence-of-monkeypox-virus-associated-with-the- suspected-multi-country-outbreak-may-2022-confirmed-case-in-portugal/799 (2022). 22. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733-745 (2016). 23. Krieger, E. & Vriend, G. New ways to boost molecular dynamics simulations. J. Comput. Chem. 36, 996–1007 (2015). 24. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009). Page 7/9
25. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990). 26. Hunter, J. D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 9, 90–95 (2007). 27. Grosenbach, D. W. et al. Oral Tecovirimat for the Treatment of Smallpox. N. Engl. J. Med. 379, 44–53 (2018). 28. Duraffour, S. et al. ST-246 is a key antiviral to inhibit the viral F13L phospholipase, one of the essential proteins for orthopoxvirus wrapping. J. Antimicrob. Chemother. 70, 1367–1380 (2015). 29. Jin, Z. et al. Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature 582, 289–293 (2020). 30. Zhang, L. et al. Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors. Science 368, 409–412 (2020). 31. Gruber, C. C. & Steinkellner, G. Wuhan coronavirus 2019-nCoV—what we can find out on a structural bioinformatics level. Innophore GmbH Austria 24044224 Bytes (2020) doi:10.6084/M9.FIGSHARE.11752749.V3. 32. Owen, D. R. et al. An oral SARS-CoV-2 Mpro inhibitor clinical candidate for the treatment of COVID-19. Science 374, 1586–1593 (2021). 33. Richardson, P. et al. Baricitinib as potential treatment for 2019-nCoV acute respiratory disease. Lancet Lond. Engl. 395, e30 (2020). 34. Gorgulla, C. et al. A multi-pronged approach targeting SARS-CoV-2 proteins using ultra-large virtual screening. iScience 24, 102021 (2021). 35. Kodchakorn, K., Poovorawan, Y., Suwannakarn, K. & Kongtawelert, P. Molecular modelling investigation for drugs and nutraceuticals against protease of SARS-CoV-2. J. Mol. Graph. Model. 101, 107717 (2020). 36. Singh, A., Steinkellner, G., Köchl, K., Gruber, K. & Gruber, C. C. Serine 477 plays a crucial role in the interaction of the SARS-CoV-2 spike protein with the human receptor ACE2. Sci. Rep. 11, 4320 (2021). 37. Durmaz, V. et al. Structural-bioinformatics analysis of SARS-CoV-2 variants reveals higher hACE2 receptor binding affinity for Omicron B.1.1.529 spike RBD compared to wild-type reference. (2021) doi:10.21203/rs.3.rs-1153124/v1. 38. Schrörs, B. et al. Large-scale analysis of SARS-CoV-2 spike-glycoprotein mutants demonstrates the need for continuous screening of virus isolates. PLOS ONE 16, e0249254 (2021). 39. Hodgson, J. The pandemic pipeline. Nat. Biotechnol. 38, 523–532 (2020). 40. Open for outbreaks. Nat. Biotechnol. 38, 377 (2020). 41. Ho, J. S. Y., Zhu, Z. & Marazzi, I. Unconventional viral gene expression mechanisms as therapeutic targets. Nature 593, 362–371 (2021). 42. Biopython Tutorial and Cookbook. http://biopython.org/DIST/docs/tutorial/Tutorial.html. Table 1 Page 8/9
Table 1 is available in the Supplementary Files section. Figures Figure 1 Genomic map of the putative structural proteome. Potential ORFs resulting in matches to proteins in the non-redundant protein database are depicted along the genome sequence draft of MPX21. Putative protein sequences in the forward and the reverse strands in three reading frames each are depicted above and below the genome, respectively, and are labeled by their query ID. Protein structures were modeled from the orange colored ORFs. Yellow colored sections in the genome refer to low quality regions, indicated with ‘N’ in the genome sequence. The figure was created using the Python package Matplotlib and Blender 3.1.2, available at www.blender.org. Supplementary Files This is a list of supplementary files associated with this preprint. Click to download. MonkeyPoxHomologymodelsPDBs.zip MonkeyPoxPT00012022putativeproteomescan.csv Table1.docx Page 9/9
You can also read