Preliminary structural proteome of the monkeypox virus causing a multi-country outbreak in May 2022

Page created by Jimmie Reese
 
CONTINUE READING
Preliminary structural proteome of the monkeypox
virus causing a multi-country outbreak in May 2022
Lena Parigger
 Innophore GmbH
Andreas Krassnigg
 Innophore GmbH
Stefan Grabuschnig
 Innophore GmbH
Verena Resch
 Innophore GmbH
Karl Gruber
 Innophore GmbH
Georg Steinkellner
 Innophore GmbH
Christian C. Gruber (  christian.gruber@innophore.com )
 Innophore GmbH

Research Article

Keywords: Monkeypox, epidemic, proteome, genome, homology modeling, structure prediction, structural
genomics

Posted Date: May 26th, 2022

DOI: https://doi.org/10.21203/rs.3.rs-1693803/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License

 Page 1/9
Abstract
The monkeypox virus (MPX) belongs to the orthopoxvirus genus of the Poxviridae family, is endemic in
parts of Africa, and causes a disease in humans similar to smallpox. The most recent outbreak of MPX in
2022 is already affecting 19 countries on different continents and has consequently become a focus of
interest. In particular, a molecular understanding of the virus is essential to study infection processes and
pathogen-host interactions, predict tropism changes, or guide drug development and discovery as well as
vaccine development or adaptation at a very early stage. Herein we present a study of the structural
genome of the currently emerging MPX virus: our analysis revealed 10,043 characteristic candidate open
reading frames (ORFs), and a subsequent BLAST search of the non-redundant protein database and PDB
reduced the number of suspected ORFs to 925 and 123 protein sequences, respectively. Finally, we
provide the 3D structures of these 123 protein sequences, which were predicted by homology modeling
and are available for download.

Introduction
The monkeypox (MPX) virus was initially isolated from orangutans in an Indonesian zoo by Rijk Gispen1,2
in 1949. Its description as a member of the pox-family of viruses occurred almost ten years later in 1958
and was published by Magnus et al. in 19592,3. The approximately 197 kbp long genome of the MPX
virus shows over 96% identity to the smallpox variola (VAR) virus. Although being strongly related, neither
does the MPX virus seem to be the ancestor of the VAR virus, nor vice versa4. In a previous analysis of the
genome from 2002, 190 open reading frames (ORFs) were identified. These largely correspond to the
essential geneset of the Orthopoxvirus genus, but differ, e.g,. in the equipment with immunomodulatory
and host range genes5,6.

While being discovered in primates, the original reservoir of the MPX virus is considered to be rodents
such as different species of squirrels or striped grass mice7–9. A transmission of the virus to human
hosts was first described in 1970 in the Democratic Republic of the Congo4,9,10. In humans, it causes a
zoonotic disease11 with clinical manifestations similar to smallpox disease9. Since the first transmission,
repetitive outbreaks in human populations have mainly been reported from within the Congo basin.
However, the disease was spread also outside the African continent, where a general rise of cases are
thought to relate to the suspension of the smallpox vaccination programme12,13. Since natural and
potentially altered variants14 of MPX are reportedly a subject of study in biological-warfare
programmes15,16, countries such as the United States have more recently been stockpiling smallpox
vaccines as a precaution against bioterrorism and biological warfare17,18, as well as as an antiviral
treatment that is said to be effective against various strains of previously identified smallpox.19

In light of the ongoing COVID-19 pandemic and its consequences, the recent outbreak of MPX already
involves 19 countries (by May 2022) on different continents and thus has become an additional focus of
attention. By definition20, an outbreak is a sudden increase in the incidence of a disease, when the

 Page 2/9
number of cases exceeds the normal expectation for the location or season. Therefore, prompt response
to emerging diseases is essential so that therapeutic and non-therapeutic countermeasures can be taken
early, even before a PHEIC, epidemic, or pandemic status is declared. Here we present the first full
structural genome of the currently-spreading MPX to support the scientific community’s and
pharmaceutical industry's responses to this putative risk.

Results And Discussion
Investigation of the first genome draft of the emerging MPX21 virus via all possible ORFs with a minimum
length of ten amino acids, covering the forward and reverse strand in all three reading frames, resulted in
10,043 distinctive ORFs. A subsequent BLAST search of the non-redundant protein database22 and the
PDB decreased the number of putative ORFs to 925 as well as 123 protein sequences, respectively.
Structures of those protein sequences that aligned to proteins in the PDB were predicted by homology
modeling (Table 1, Fig. 1). A comprehensive dataset on the process parameters and properties of all
potential ORFs within the genome draft of MPX21 as well as the protein models generated within this
study are available at https://doi.org/10.6084/m9.figshare.19877842.v1.

Table 1 | Modeling parameters of the putative structural proteome.

Structures of 123 putative ORFs matching with proteins in the PDB were predicted by homology modeling
using the Catalophore™ DrugSolver Platform employing Yasara23. Positional information about the
respective ORFs within the genome sequence as well as its putative function based on a BLASTP
search24,25 of the non-redundant database and modeling parameters such as sequence identity and
similarity are summarized. QX: number of ‘X’ in the query sequence, resulting from not resolved sections
in the genome sequence; S: strand, on which the putative ORF is located [forward (+) or reverse (-)]; RF:
reading frame; HM: homology modeling. Color coding: 0 (red) to 100% (green).

Figure 1 | Genomic map of the putative structural proteome.

Potential ORFs resulting in matches to proteins in the non-redundant protein database are depicted along
the genome sequence draft of MPX21. Putative protein sequences in the forward and the reverse strands
in three reading frames each are depicted above and below the genome, respectively, and are labeled by
their query ID. Protein structures were modeled from the orange colored ORFs. Yellow colored sections in
the genome refer to low quality regions, indicated with ‘N’ in the genome sequence. The figure was
created using the Python package Matplotlib26 and Blender 3.1.2, available at www.blender.org.

The early-stage structural models presented in this work should promptly serve as an initial collection of
putative proteins within the currently spreading MPX, a compound of information which could support
timely drug discovery, mutational analyses, and vaccine development. Most probably, the list of models
(contained in Table 1) does not represent the complete structural proteome, since a number of 190 ORFs
has been described earlier to be present in the MPX genome5. Notably, the list of 925 distinctive ORFs

 Page 3/9
which showed sequence similarity to entries in the non-redundant database will certainly include
additional protein sequences within the MPX proteome, which we expect to be of a size close to 190
ORFs. The remaining potential ORFs of this set may contain fractions of protein sequences involved in
the evolutionary origin of this virus. Besides that, it may include as-yet-unidentified physiological proteins
of the MPX. Eventually, a further (dynamic) refinement of the initial putative structural proteome
presented here should be considered, especially for drug targets as F13L27,28, referring to query sequence
9984.

Conclusion
The molecular understanding of an emerging disease as MPX or Covid-19 is essential to study infection
processes and pathogen-host interactions, predict tropism changes, or guide drug repurposing and
discovery as well as vaccine development or adaptation at a very early stage. While traditional structural-
biology methods such as NMR, cryo-EM, or X-ray crystallography have advanced significantly in recent
decades, even in the rapid international response to the SARS-CoV-2 outbreak in Wuhan/China in January
2020 it took a while until the first experimentally determined (complex-) structures of the SARS-CoV-2
main protease (Mpro) were published by Yang et. al29 and Hilgenfeld et al.30. To overcome the limitations
of time- and human-resources-consuming experimental methods, structural bioinformatics provide early
insight into the genome from a 3D molecular perspective by predicting genome-wide protein structures
for a complete pathogen if the sequence is available. For example, we published the first structural model
of SARS-CoV-2 Mpro on 23 January 202031 - one week after the 2019-ncov draft genome was published.
This model was in good accordance with the crystal structure (PDB 6LU7), released afterwards, featuring
a root-mean-square deviation of 0.6 Ångström for 282 out of 306 superimposed C -atoms. Interestingly,
the binding site identified in this model was finally the drug target site of the majority of today's approved
or investigated SARS-CoV-2 DAAs such as Paxlovid and others32,33. On a larger scale, early predictive
structural genomes enable large-scale virtual screening for drug repurposing or new-drug
development34,35, a deeper understanding of viral evolution and its structural implications, and prediction
of the significance and impact of emerging viral variants36,37. This allows vaccine developers to monitor
and adapt their candidates to emerging variants if required38. Therefore, structural-bioinformatics
pipelines39 in combination with transparently shared open scientific data have proven to be an essential
early-response tool for outbreaks40.

Methodology
Identification of potential ORFs in the MPX virus genome
sequence
On the emergence of a novel virus variant, the identification of the genome sequence provides critical
information towards further understanding of its molecular characteristics. As bioinformatic molecular

 Page 4/9
interaction studies rely on protein models, we developed a pipeline embedded in the Catalophore™
DrugSolver platform to quickly process genome sequences and provide a set of potential proteins, a
putative structural proteome. The first step consists of translating both the reverse and forward strand in
three reading frames, in order to account for every possible translation frame. Therefore, cases with
overlapping genes, shifted translation starts and AUG-codon independent translation initiation41 are also
included in the set of initially considered ORFs. Subsequently, the translated sequences are split at stop
codons (TAA, TAG, TGA), where only ORFs consisting of a minimum of ten amino acids are accepted. The
Biopython packages42 Bio.Entrez, Bio.SeqIO and Bio.Blast are employed for this purpose. In this study, a
list of 10,043 distinctive potential ORFs was identified within the recently published genome-sequence
draft of the MPX virus21, which caused a multi-country outbreak in May 2022. The number of potential
ORFs was limited to protein sequences with an enhanced possibility to be part of the physiological MPX
virus proteome by performing a command-line based BLASTP24,25 search with default options of the non-
redundant protein database22, resulting in 925 matches.

Modeling of the putative proteome
In addition to the non-redundant protein database, a BLASTP search with a maximal E value of 0.04 of
the PDB was executed, identifying 123 of the 10,043 ORFs as matches to proteins with a structure that
has already been solved. These 123 ORFs were subjected to homology modeling using the Catalophore™
DrugSolver Platform employing Yasara23. With six PSI-BLAST iterations, a maximal E value of 0.5 and
five templates to consider, 115 of the 123 putative proteins were successfully modeled. For the remaining
eight (query sequences 990, 2104, 2322, 5796, 5904, 6017, 7260, and 9571) no suitable template was
found within the homology modeling process, which is why the protein structure that matched with the
respective query sequence in the BLASTP search was used as a template in these cases. Modeling was
performed with a maximum of five alignment variations per template and 50 conformations tried per
loop, excluding terminal loop residues.
Data availability.

Publicly available datasets were analyzed in this study. This data can be found here:
https://virological.org/uploads/short-url/39ehomlUxjRilxaJDMispIdu3JO.zip21. The dataset containing
the process parameters and properties of all potential ORFs within the genome draft of MPX as well as
the final protein structure models generated within this study are available
at https://doi.org/10.6084/m9.figshare.19877842.v1.

Declarations
Acknowledgments

Financial support was provided by Innophore GmbH and by the Austrian Research Promotion Agency
General Programme funding scheme project nr. 41404876 “VirtualCure - Rapid Development of an
Automated & Expandable In-silico High-Throughput Drug Repurposing Screening Pipeline“. The

 Page 5/9
computational results presented have been achieved in part using the Vienna Scientific Cluster (VSC) and
HPC resources provided by Innophore. Technical and infrastructure support was provided by the Amazon
Web Services Diagnostic Development Initiative (DDI). Some computational results presented in this
manuscript have been produced in cloud computing facilities provided by Amazon Web Services within
DDI, project nr. “CC ADV 00502188 2021 TR” entitled “virus.watch/monkeypox”. Catalphore is a registered
trademark (AT 295631) of Innophore GmbH. Calculations were carried out using in-house software as
described (e.g. pathogen-seqscan) in the methods section embedded in the CatalophoreTM Drug Solver
platform with a non-commercial open-science license granted by Innophore GmbH.

Author contributions

L.P. and S.G. performed the genome analysis, prepared structural models and drafted the manuscript with
input from all authors. A.K. contributed to analysis of data, revised the manuscript, advised and
contributed to the genome analysis. V.R. supported the visualization of the obtained structural models
and created the structural-genome landscape representation. K.G. gave structural advice and structural-
biology input for data analysis and gave modeling-pipeline advice. C.C.G. & G.S. contributed in
evaluating, preparing and interpreting the data, designed, managed and supervised the project. All
authors edited the manuscript to its final form.

Declaration of interests a.s.

L.P., S.G., A.K. and V.R. report working for Innophore. K.G., G.S., C.C.G. report being shareholders of
Innophore, an enzyme and drug discovery company. Additionally, G.S. and C.C.G. report being managing
directors of Innophore. The research described here is scientifically and financially independent of the
efforts in any of the above mentioned companies and open-science.

Competing financial interests
The authors declare no competing interests.

References
 1. Gispen, R. Smallpox reinfections in Indonesia. Ned T Geneesk 93, 3686–3695 (1949).
 2. Arita, I. & Henderson, D. A. Smallpox and monkeypox in non-human primates. Bull. World Health
 Organ. 39, 277–283 (1968).
 3. Magnus, P. von, Andersen, E. K., Petersen, K. B. & Birch-Andersen, A. A pox-like disease in cynomolgus
 monkeys. Acta Pathol. Microbiol. Scand. 46, 156–176 (2009).
 4. Shchelkunov, S. N. et al. Human monkeypox and smallpox viruses: genomic comparison. FEBS Lett.
 509, 66–70 (2001).
 5. Shchelkunov, S. N. et al. Analysis of the Monkeypox Virus Genome. Virology 297, 172–194 (2002).
 6. Weaver, J. R. & Isaacs, S. N. Monkeypox virus and insights into its immunomodulatory proteins.
 Immunol. Rev. 225, 96–113 (2008).
 Page 6/9
7. Ježek, Z. & Fenner, F. Human monkeypox. (Karger, 1988).
 8. Khodakevich, L., Jezek, Z. & Messinger, D. Monkeypox virus: ecology and public health significance.
 Bull. World Health Organ. 66, 747–752 (1988).
 9. Damon, I. K. Status of human monkeypox: clinical disease, epidemiology and research. Vaccine 29
 Suppl 4, D54-59 (2011).
10. Marennikova, S. S., Seluhina, E. M., Mal’ceva, N. N., Cimiskjan, K. L. & Macevic, G. R. Isolation and
 properties of the causal agent of a new variola-like disease (monkeypox) in man. Bull. World Health
 Organ. 46, 599–611 (1972).
11. Di Giulio, D. B. & Eckburg, P. B. Human monkeypox: an emerging zoonosis. Lancet Infect. Dis. 4, 15–
 25 (2004).
12. Bunge, E. M. et al. The changing epidemiology of human monkeypox—A potential threat? A
 systematic review. PLoS Negl. Trop. Dis. 16, e0010141 (2022).
13. Rimoin, A. W. et al. Major increase in human monkeypox incidence 30 years after smallpox
 vaccination campaigns cease in the Democratic Republic of Congo. Proc. Natl. Acad. Sci. 107,
 16262–16267 (2010).
14. Gilsdorf, J. R. & Zilinskas, R. A. New Considerations in Infectious Disease Outbreaks: The Threat of
 Genetically Modified Microbes. Clin. Infect. Dis. 40, 1160–1165 (2005).
15. Kuhn, J. H. & Leitenberg, M. The Soviet Biological Warfare Program. in Biological Threats in the 21st
 Century 79–102 (IMPERIAL COLLEGE PRESS, 2016). doi:10.1142/9781783269488_0005.
16. Tucker, J. B. Biological weapons in the former Soviet Union: An interview with Dr. Kenneth Alibek.
 Nonproliferation Rev. 6, 1–10 (1999).
17. Henderson, D. A. et al. Smallpox as a Biological Weapon: Medical and Public Health Management.
 JAMA 281, 2127 (1999).
18. Cieslak, T. J. et al. Beyond the Dirty Dozen: A Proposed Methodology for Assessing Future
 Bioweapon Threats. Mil. Med. 183, e59–e65 (2018).
19. Kozlov, M. Monkeypox goes global: why scientists are on alert. Nature d41586-022-01421–8 (2022)
 doi:10.1038/d41586-022-01421-8.
20. Green, M. S. et al. When is an epidemic an epidemic? Isr. Med. Assoc. J. IMAJ 4, 3–6 (2002).
21. First draft genome sequence of Monkeypox virus associated with the suspected multi-country
 outbreak, May 2022 (confirmed case in Portugal) - Monkeypox. Virological
 https://virological.org/t/first-draft-genome-sequence-of-monkeypox-virus-associated-with-the-
 suspected-multi-country-outbreak-may-2022-confirmed-case-in-portugal/799 (2022).
22. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic
 expansion, and functional annotation. Nucleic Acids Res. 44, D733-745 (2016).
23. Krieger, E. & Vriend, G. New ways to boost molecular dynamics simulations. J. Comput. Chem. 36,
 996–1007 (2015).
24. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).

 Page 7/9
25. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J.
 Mol. Biol. 215, 403–410 (1990).
26. Hunter, J. D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 9, 90–95 (2007).
27. Grosenbach, D. W. et al. Oral Tecovirimat for the Treatment of Smallpox. N. Engl. J. Med. 379, 44–53
 (2018).
28. Duraffour, S. et al. ST-246 is a key antiviral to inhibit the viral F13L phospholipase, one of the
 essential proteins for orthopoxvirus wrapping. J. Antimicrob. Chemother. 70, 1367–1380 (2015).
29. Jin, Z. et al. Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature 582, 289–293
 (2020).
30. Zhang, L. et al. Crystal structure of SARS-CoV-2 main protease provides a basis for design of
 improved α-ketoamide inhibitors. Science 368, 409–412 (2020).
31. Gruber, C. C. & Steinkellner, G. Wuhan coronavirus 2019-nCoV—what we can find out on a structural
 bioinformatics level. Innophore GmbH Austria 24044224 Bytes (2020)
 doi:10.6084/M9.FIGSHARE.11752749.V3.
32. Owen, D. R. et al. An oral SARS-CoV-2 Mpro inhibitor clinical candidate for the treatment of COVID-19.
 Science 374, 1586–1593 (2021).
33. Richardson, P. et al. Baricitinib as potential treatment for 2019-nCoV acute respiratory disease.
 Lancet Lond. Engl. 395, e30 (2020).
34. Gorgulla, C. et al. A multi-pronged approach targeting SARS-CoV-2 proteins using ultra-large virtual
 screening. iScience 24, 102021 (2021).
35. Kodchakorn, K., Poovorawan, Y., Suwannakarn, K. & Kongtawelert, P. Molecular modelling
 investigation for drugs and nutraceuticals against protease of SARS-CoV-2. J. Mol. Graph. Model.
 101, 107717 (2020).
36. Singh, A., Steinkellner, G., Köchl, K., Gruber, K. & Gruber, C. C. Serine 477 plays a crucial role in the
 interaction of the SARS-CoV-2 spike protein with the human receptor ACE2. Sci. Rep. 11, 4320 (2021).
37. Durmaz, V. et al. Structural-bioinformatics analysis of SARS-CoV-2 variants reveals higher hACE2
 receptor binding affinity for Omicron B.1.1.529 spike RBD compared to wild-type reference. (2021)
 doi:10.21203/rs.3.rs-1153124/v1.
38. Schrörs, B. et al. Large-scale analysis of SARS-CoV-2 spike-glycoprotein mutants demonstrates the
 need for continuous screening of virus isolates. PLOS ONE 16, e0249254 (2021).
39. Hodgson, J. The pandemic pipeline. Nat. Biotechnol. 38, 523–532 (2020).
40. Open for outbreaks. Nat. Biotechnol. 38, 377 (2020).
41. Ho, J. S. Y., Zhu, Z. & Marazzi, I. Unconventional viral gene expression mechanisms as therapeutic
 targets. Nature 593, 362–371 (2021).
42. Biopython Tutorial and Cookbook. http://biopython.org/DIST/docs/tutorial/Tutorial.html.

Table 1
 Page 8/9
Table 1 is available in the Supplementary Files section.

Figures

Figure 1

Genomic map of the putative structural proteome.

Potential ORFs resulting in matches to proteins in the non-redundant protein database are depicted along
the genome sequence draft of MPX21. Putative protein sequences in the forward and the reverse strands
in three reading frames each are depicted above and below the genome, respectively, and are labeled by
their query ID. Protein structures were modeled from the orange colored ORFs. Yellow colored sections in
the genome refer to low quality regions, indicated with ‘N’ in the genome sequence. The figure was
created using the Python package Matplotlib and Blender 3.1.2, available at www.blender.org.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.

 MonkeyPoxHomologymodelsPDBs.zip
 MonkeyPoxPT00012022putativeproteomescan.csv
 Table1.docx

 Page 9/9
You can also read