Comparative Analysis of Genome of Ehrlichia sp. HF, a Model Bacterium to Study Fatal Human Ehrlichiosis
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Lin et al. BMC Genomics (2021) 22:11 https://doi.org/10.1186/s12864-020-07309-z RESEARCH ARTICLE Open Access Comparative Analysis of Genome of Ehrlichia sp. HF, a Model Bacterium to Study Fatal Human Ehrlichiosis Mingqun Lin1* , Qingming Xiong1, Matthew Chung2, Sean C. Daugherty2, Sushma Nagaraj2, Naomi Sengamalay2, Sandra Ott2, Al Godinez2, Luke J. Tallon2, Lisa Sadzewicz2, Claire Fraser2,3, Julie C. Dunning Hotopp2,4,5 and Yasuko Rikihisa1* Abstract Background: The genus Ehrlichia consists of tick-borne obligatory intracellular bacteria that can cause deadly diseases of medical and agricultural importance. Ehrlichia sp. HF, isolated from Ixodes ovatus ticks in Japan [also referred to as I. ovatus Ehrlichia (IOE) agent], causes acute fatal infection in laboratory mice that resembles acute fatal human monocytic ehrlichiosis caused by Ehrlichia chaffeensis. As there is no small laboratory animal model to study fatal human ehrlichiosis, Ehrlichia sp. HF provides a needed disease model. However, the inability to culture Ehrlichia sp. HF and the lack of genomic information have been a barrier to advance this animal model. In addition, Ehrlichia sp. HF has several designations in the literature as it lacks a taxonomically recognized name. Results: We stably cultured Ehrlichia sp. HF in canine histiocytic leukemia DH82 cells from the HF strain-infected mice, and determined its complete genome sequence. Ehrlichia sp. HF has a single double-stranded circular chromosome of 1,148,904 bp, which encodes 866 proteins with a similar metabolic potential as E. chaffeensis. Ehrlichia sp. HF encodes homologs of all virulence factors identified in E. chaffeensis, including 23 paralogs of P28/ OMP-1 family outer membrane proteins, type IV secretion system apparatus and effector proteins, two-component systems, ankyrin-repeat proteins, and tandem repeat proteins. Ehrlichia sp. HF is a novel species in the genus Ehrlichia, as demonstrated through whole genome comparisons with six representative Ehrlichia species, subspecies, and strains, using average nucleotide identity, digital DNA-DNA hybridization, and core genome alignment sequence identity. Conclusions: The genome of Ehrlichia sp. HF encodes all known virulence factors found in E. chaffeensis, substantiating it as a model Ehrlichia species to study fatal human ehrlichiosis. Comparisons between Ehrlichia sp. HF and E. chaffeensis will enable identification of in vivo virulence factors that are related to host specificity, disease severity, and host inflammatory responses. We propose to name Ehrlichia sp. HF as Ehrlichia japonica sp. nov. (type strain HF), to denote the geographic region where this bacterium was initially isolated. Keywords: Ehrlichia sp. HF, Monocytic Ehrlichiosis, Mouse model, Comparative genomic analysis, Core genome alignment, Virulence factors * Correspondence: lin.427@osu.edu; rikihisa.1@osu.edu 1 Department of Veterinary Biosciences, The Ohio State University, 1925 Coffey Road, Columbus, OH 43210, USA Full list of author information is available at the end of the article © The Author(s). 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Lin et al. BMC Genomics (2021) 22:11 Page 2 of 22 Background Accidental transmission and infection of domestic ani- The incidence of tick-borne diseases has risen dramatic- mals and humans can cause potentially severe to fatal ally in the past two decades, and continues to rise [1–3]. diseases, and four species (E. chaffeensis, E. ewingii, E. The 2011 Institute of Medicine report “Critical Needs canis, and E. muris) are known to infect humans and and Gaps in...Lyme and Other Tick-Borne Diseases” re- cause emerging tick-borne zoonoses [11, 19–21, 34, 48, vealed the urgent need for research into tick-borne dis- 49]. In the US, the most common human ehrlichiosis is eases [4]. Ehrlichia species are tick-borne obligate human monocytic ehrlichiosis (HME) caused by E. chaf- intracellular bacteria, which are maintained via the nat- feensis, which was discovered in 1986 [12], followed by ural transmission and infection cycle between particular human Ewingii ehrlichiosis discovered in 1998 [34]. The species of ticks and mammals (Table 1). The genus Ehr- most recently discovered human ehrlichiosis is caused lichia belongs to the family Anaplasmataceae in the by E. muris subsp. eauclairensis [originally referred to as order Rickettsiales. According to International Code of E. muris-like agent (EMLA)] [19, 20]. Human infection Nomenclature of Prokaryotes and International Journal with E. canis has been reported in South and Central of Systematic and Evolutionary Microbiology [46], and America [21, 22, 49]. Regardless of the Ehrlichia species, following the reorganization of genera in the family Ana- clinical signs of human ehrlichiosis include fever, head- plasmataceae based on molecular phylogenetic analysis ache, myalgia, thrombocytopenia, leukopenia, and ele- [47], the genus Ehrlichia currently consists of six taxo- vated serum liver enzyme levels [20, 21, 34, 48–50]. nomically classified species with validly published names, HME is a significant, emerging tick-borne disease with including E. chaffeensis, E. ewingii, E. canis, E. muris, E. serious health impacts with the highest incidence in ruminantium, and a recently culture-isolated E. mina- people over 60 years of age and immunocompromised sensis that is closely related to E. canis (Table 1) [19, 37]. individuals [48]. Life-threatening complications such as Table 1 Biological characteristics of representative Ehrlichia species Species1 (Type strain) Diseases Mammalian Host Tick Vector/Host Geographic References Distribution Ehrlichia. sp. HF (HF565) Acute fatal infection of mice Unknown Ixodes ovatus , I. ricinus, Japan, France, [5–10] (experimental) and I. apronophorus ticks Serbia, Romania E. chaffeensis (Arkansas)2 Human monocytic ehrlichiosis Deer, Human, Dog, Amblyomma americanum USA, Africa, South [11–16] (HME) Coyote, Fox3 (Lone star tick) America, Europe, Japan E. muris subsp. muris (AS145) Murine monocytic ehrlichiosis Mouse, Vole Ticks (Haemaphysalis Japan, Russia4 [17, 18] (chronic systemic infection flava or Ixodes of mice) persulcatus) E. muris subsp. eauclairensis Human or murine monocytic Human, Mouse Ixodes scapularis Wisconsin and [19, 20] (Wisconsin) ehrlichiosis (fatal infection (black-legged tick) Minnesota, USA of mice) E. canis (Oklahoma) Canine tropical pancytopenia, Dog, Human Rhipicephalus sanguineus Global [21–25] Venezuelan Human Ehrlichiosis 5 (brown dog tick) E. ruminantium (Welgevonden) Heartwater Ruminants (Cattle, Various Amblyomma Africa, Caribbean6 [26–33] Sheep, Goats, species of ticks Antelope) E. ewingii (Stillwater) Canine granulocytic ehrlichiosis, Deer, Dog, Human Amblyomma americanum USA, Japan [34–36] Human ewingii ehrlichiosis E. minasensis (UFMG-EV) Ehrlichiosis Cattle, Deer, Dog 7 Rhipicephalus Brazil, Global [37–45] microplus tick 1 Based on International Code of Nomenclature of Prokaryotes, and published in International Journal of Systematic and Evolutionary Microbiology, which lists officially approved list of bacterial classification and nomenclature, the genus Ehrlichia currently consists of six validly published species with correct names (https://lpsn.dsmz.de/genus/ehrlichia) 2 Ehrlichia sp. HF, or Ixodes ovatus Ehrlichia (IOE) agent, is a field tick isolate of Ehrlichia species in Fukushima Prefecture, Japan from 1993 to 1994. Ehrlichia sp. HF DNA was also detected in I. ricinus tick from Brittany, France and Serbia, and I. apronophorus tick in Romania 3 E. chaffeensis DNA was detected in 71% of free-ranging coyotes in Oklahoma and experimentally infected red foxes 4 E muris DNA was found in I. persulcatus ticks and small mammals in Russia 5 Human Infection with E. canis with clinical signs was reported in Venezuela, and E. canis was culture isolated from a VHE patient. In addition, E. canis DNA was detected in human blood bank donors in Costa Rica 6 Heartwater in Caribbean islands of Guadeloupe was caused E. ruminantium Gardel, which is transmitted by Amblyomma variegatum (Tropical bont tick) and exceptionally virulent in Dutch goats. More heartwater cases in wild and domestic ruminants have been reported in five Caribbean islands, posing an increasing threat to domestic and wild ruminants in the continental US 7 E. minasensis strain UFMG-EVT was isolated from the haemolymph of engorged Rhipicephalus microplus female ticks in Brazil, whereas strain Cuiaba was isolated from the whole blood of a naturally infected cattle. E. minasensis DNAs have also been reported in ticks, cervids, and dogs from France, Pakistan, Ethiopia, and Israel
Lin et al. BMC Genomics (2021) 22:11 Page 3 of 22 renal failure, adult respiratory distress syndrome, menin- The major barriers for advancing research on Ehrlichia goencephalitis, multi-system organ failure, and toxic sp. HF, however, have been the inability to stably culture shock occur in a substantial portion of the patients who it in a mammalian macrophage cell line and lack of gen- are hospitalized and resulting in a case fatality rate of 3% ome sequence and analysis data. Previously, it was cul- [48]. However, there is no vaccine available for HME tured in monkey endothelial RF/6A cells and Ixodes [51], and the only drug of choice is doxycycline, which is scapularis tick embryo ISE6 cells [69]. To facilitate studies only effective with early diagnosis and treatment, and is using Ehrlichia sp. HF, we stably cultured the HF strain in not suitable for all patient groups [48]. In addition, a canine histiocytic leukemia cell line DH82, and obtained pathogenesis and immunologic studies on human ehr- the complete whole genome sequence (GenBank acces- lichiosis have been hampered due to the lack of an ap- sion NZ_CP007474). Despite many studies being con- propriate small animal disease model, as E. chaffeensis ducted with Ehrlichia sp. HF, this bacterium has not been only transiently infects immunocompetent laboratory classified into any species, causing confusion in the litera- mice [52, 53]. E. chaffeensis naturally infects dogs and ture with several different names (IOE agent, Ehrlichia sp. deer with mild to no clinical signs [53–55]. However, HF, the HF strain). Comparative core genome alignment use of these animals is difficult and cost-prohibitive, and phylogenetic analysis reveal that Ehrlichia sp. HF is a while not being suitable for pathogenesis studies. new species that is most closely related to E. muris and E. In an attempt to determine the pathogens harbored by chaffeensis, justifying the formal nomenclature of this spe- Ixodes ovatus ticks prevalent in Japan, Fujita and Wata- cies. The genome sequencing and analysis, including com- nabe inoculated tick homogenates into the intraperitoneal parative virulence factor analysis of Ehrlichia sp. HF, cavity of laboratory mice, followed by serial passage provides important insights, resources, and validation for through naïve mice using homogenized spleens from in- advancing the research on emerging human ehrlichioses. fected mice [5]. From 1983 to 1994, twelve “HF strains” were isolated from I. ovatus ticks in this manner, with the Results and Discussion strain named after the scientist Hiromi Fujita who first Culture Isolation of Ehrlichia sp. HF and purification of discovered and isolated this bacterium [5]. Electron micro- Ehrlichia genomic DNA graphs of HF326 showed the typical ultrastructure of Ehr- To obtain sufficient amounts of bacterial DNA free from lichia in the mouse liver [5]. A few years later, analysis of host cell DNA, we stably cultured Ehrlichia sp. HF in the 16S rRNA gene of the HF strains showed that four iso- DH82 cells. Spleen and blood samples were collected lates (HF565, HF568-1, HF568-2, and HF639-2) from from Ehrlichia sp. HF-infected mice euthanized at an Fukushima, and two isolates (HF642 and HF652) from acute stage of illness (8 d post inoculation) (Fig. S1A). Aomori, northern Japan, were identical and closely related Diff-Quik staining showed that the bacteria were present to Ehrlichia spp. [6]. The phylogenetic comparison of 16S in blood monocytes (Fig. S1B). After 2 - 3 weeks co- rRNA and GroEL protein sequences of HF565 with those culturing with infected spleen homogenates, large vacu- of members of the family Anaplasmataceae, and electron oles (inclusions) containing numerous bacteria (known micrographs of HF565 verified that the HF strain belongs as morulae) were observed in the cytoplasm of DH82 to the genus Ehrlichia [6]. Recent studies indicated that (Fig. S1C) and RF/6A cells (Fig. S1D). Ehrlichia sp. HF DNA sequences of Ehrlichia sp. HF have been detected could also be successfully passaged from DH82 cells to not only in I. ovatus ticks throughout Japan, but also in ISE6 cells (Fig. S1E). Morulae of Ehrlichia sp. HF in cell Ixodes ricinus ticks in France [7] and Serbia [8], and Ixodes cultures were like those seen in the tissue sections of the apronophorus ticks in Romania [9]. thymus and the lungs of infected mice [6], and in the Unlike E. muris, HF565 does not induce splenomegaly endothelial cells of most organs of infected mice [10]. but is highly virulent in mice, as intraperitoneal inocula- Ehrlichia sp. HF cultured in DH82 cells infects and kills tion kills immunocompetent laboratory mice in 6-10 days mice at 7 – 10 days post intraperitoneal inoculation, [5, 6, 10, 56]. HF565 (the HF strain described here) was re- similar to those inoculated with the infected mouse quested by and distributed to several US laboratories, spleen homogenate, demonstrating that Ehrlichia sp. HF where the strain was dubbed as I. ovatus Ehrlichia (IOE) culture isolate maintains mouse virulence [56]. The agent. Using the HF strain-infected mouse spleen hom- mouse LD50 of Ehrlichia sp. HF cultured in DH82 cells ogenate as the source of HF bacterium, pathogenesis stud- is approximately 100 bacteria [56]. ies in inoculated mice revealed that these bacteria induce a toxic shock-like cytokine storm, involving cytotoxic T- General features of the Ehrlichia sp. HF genome cells, NKT cells, and neutrophils similar to those reported The complete genome of Ehrlichia sp. HF was se- in fatal HME [57–68]. Therefore, Ehrlichia sp. HF has quenced using both Illumina and PacBio platforms, and been increasingly serving as a needed immunocompetent the reads from both platforms were combined at mul- mouse model for studying fatal ehrlichiosis. tiple levels in order to obtain a reliable assembly. The
Lin et al. BMC Genomics (2021) 22:11 Page 4 of 22 genome was rotated to the replication origin of Ehrlichia in 2 locations with the 5S and 23S rRNA being adjacent sp. HF (Fig. 1), which was predicted to be the region be- (Fig. 1, red bars in the middle circle) as in other se- tween hemE (uroporphyrinogen decarboxylase, EHF_ quenced members in the family Anaplasmataceae [72, 0001) and tlyC (hemolysin or related HlyC/CorC family 74]. Thirty-six tRNA genes are identified with cognates transporter, EHF_0999) as described for other members in for all 20 amino acids (AA) (Table 2 and Fig. 1, black the family Anaplasmataceae [70]. Annotation of the final- bars in the middle circle), similar to other Ehrlichia spp. ized genome assembly was generated using the IGS pro- (36 – 37 genes, Table 2). karyotic annotation pipeline [71]. The completed genome Comparative genomic analysis of Ehrlichia sp. HF of Ehrlichia sp. HF is a single double-stranded circular with other Ehrlichia species chromosome of 1,148,904 bp with an overall G+C content Previous studies have shown that some Anaplasma spp. of ~30%, which is similar to those of E. chaffeensis Arkan- and Ehrlichia spp. have a single large-scale symmetrical sas [72], E. muris subsp. eauclairensis Wisconsin [19], and inversion (X-alignment) near the replication origin, which E. muris AS145T [73] (Table 2). may have resulted from recombination between dupli- The Ehrlichia sp. HF genome encodes one copy each cated, but not identical rho termination factors [72, 75, of the 5S, 16S, and 23S rRNA genes, which are separated 76]. All genomes of the sequenced Ehrlichia spp. encode Fig. 1 Circular representation of Ehrlichia sp. HF genome. From outside to inside, the first circle represents predicted protein coding sequences (ORFs) on the plus and minus strands, respectively. The second circle represent RNA genes, including tRNAs (black), rRNAs (red), tmRNAs (blue), and ncRNAs (orange). The third circle represents GC skew values [(G-C)/(G+C)] with a windows size of 500 bp and a step size of 250 bp. Colors indicate the functional role categories of ORFs - black: hypothetical proteins or proteins with unknown functions; gold: amino acid and protein biosynthesis; sky blue: purines, pyrimidines, nucleosides, and nucleotides; cyan: fatty acid and phospholipid metabolism; light blue: biosynthesis of cofactors, prosthetic groups, and carriers; aquamarine: central intermediary metabolism; royal blue: energy metabolism; pink: transport and binding proteins; dark orange: DNA metabolism and transcription; pale green: protein fate; tomato: regulatory functions and signal transduction; peach puff: cell envelope; pink: cellular processes; maroon: mobile and extrachromosomal element functions
Lin et al. BMC Genomics (2021) 22:11 Page 5 of 22 Table 2 Genome properties of representative Ehrlichia species Ehrlichia Species1 EHF ECH EMU EmCRT2 ECA ERW NCBI RefSeq NZ_CP007474 NC_007799 NC_023063 NZ_LANU01000001 NC_007354 NC_005295 Size (bp) 1,148,904 1,176,248 1,196,717 1,148,958 1,315,030 1,516,355 GC (%) 29.6 30.1 29.7 29.8 29.0 27.5 Protein 866 892 874 866 933 934 tRNA 36 37 37 36 36 36 rRNA 3 3 3 3 3 3 Other RNA 4 3 3 4 3 4 Pseudogene 11 17 24 15 10 18 Total Gene 920 952 941 924 985 995 1 Abbreviations: EHF Ehrlichia sp. HF (HF565), EMU E. muris subsp. muris AS145, EmCRT E. muris subsp. eauclairensis Wisconsin, ECH E. chaffeensis Arkansas, ECA E. canis Jake, ERW E. ruminantium Welgevonden 2 The genome of E. muris subsp. eauclairensis Wisconsin is incomplete, consisting of 3 contigs, NZ_LANU01000001, NZ_LANU01000002, and NZ_LANU01000003 duplicated rho genes. Whole genome alignments demon- S1). In addition, transposon mutagenesis studies have strate that the Ehrlichia sp. HF genome exhibits almost identified intragenic insertions of genes encoding complete synteny with other Ehrlichia spp., including E. DNA mismatch repair proteins MutS and MutL in muris, E. canis, and E. ruminantium, without any signifi- Ehrlichia sp. HF [56]. Biological relationship between cant genomic rearrangements or inversions despite these MutM and the human infectivity remains to be genomes being oriented in the opposite directions (Fig. 2). investigated. However, Ehrlichia sp. HF has a single large-scale sym- E. muris subsp. muris, E. muris subsp. eauclairensis, metrical inversion relative to E. chaffeensis at the dupli- and Ehrlichia sp. HF cause persistent or lethal infec- cated rho genes (Fig. 2b). Large scale inversion was also tion in mice, whereas immunocompetent mice clear reported in other bacteria such as Yersinia and Legionella E. chaffeensis infection within 10 – 16 days [79–81]. species when genomes of closely related species are com- A metallophosphoesterase (ECH_RS03950/ECH_0964), pared [77]. However, the biological meaning and evolu- which may function as a phosphodiesterase or serine/ tionary implications of such process, if any, are largely threonine phosphoprotein phosphatase, was found unknown. only in E. chaffeensis but not in the other three Ehrli- In order to compare the protein ortholog groups chia spp. (Table S2). among four closely-related Ehrlichia spp., including Ehr- Except for 28 E. chaffeensis-specific proteins, there are lichia sp. HF, E. muris subsp. eauclairensis, E. muris less than 10 species-specific proteins present in Ehrlichia AS145, and E. chaffeensis Arkansas, 4-way comparisons sp. HF, E. muris subsp. muris AS145, or E. muris subsp. were performed using reciprocal BLASTP algorithm with eauclairensis (Table S2), all of which are hypothetical E-value < 1e-10 (Fig. 3). The four-way comparison proteins without any known functions or domains. Po- showed that the core proteome, defined as the set of tentially, these proteins may be involved in differential proteins present in all four genomes, consists of 823 pathogenesis of these Ehrlichia species. proteins representing 94.9% of the total 867 protein- Two-way comparisons identified further proteins that coding ORFs in Ehrlichia sp. HF (Fig. 3 and Table 3). are unique to Ehrlichia sp. HF, but absent in other Ehrli- Among these conserved proteins, the majority are asso- chia spp. (Table S3). Several of these proteins are involved ciated with housekeeping functions and are likely essen- in DNA metabolism, mutation repairs, or regulatory func- tial for Ehrlichia survival (Table 3). tions that were only found in Ehrlichia sp. HF (Table S3). By 4-way comparison, a hypothetical protein (EHF_ For example, compared to Ehrlichia sp. HF proteomes, E. RS02845 or MR76_RS01735) is found only in Ehrli- chaffeensis lacks a patatin-like phospholipase family pro- chia sp. HF and E. muris subsp. muris, the two tein (ECH_RS03820, a pseudogene with internal frame- strains that do not infect humans, but not in E. chaf- shift at AA180), which has phospholipase A2 activity feensis and E. muris subsp. eauclairensis, which both catalyzing the nonspecific hydrolysis of phospholipids, gly- infect humans [11, 12, 78] (Table S1). On the other colipids, and other lipid acyl hydrolase activities [82–84]. hand, the human-infecting strains of E. chaffeensis E. muris subsp. muris lacks CckA protein, a histidine kin- and E. muris subsp. eauclairensis have genes encoding ase that can phosphorylate response regulator CtrA and a bifunctional DNA-formamidopyrimidine glycosylase/ regulate the DNA segregation and cell division of E. chaf- DNA-(apurinic or apyrimidinic site) lyase protein, feensis [85, 86]. However, the absence of these proteins MutM (ECH_RS02515 or EMUCRT_RS01070) (Table needs to be further validated since sequencing errors and
Lin et al. BMC Genomics (2021) 22:11 Page 6 of 22 Fig. 2 Whole genome alignment between Ehrlichia sp. HF and three Ehrlichia spp. Genome sequences were aligned between Ehrlichia sp. HF and E. muris subsp. muris AS145 a, E. chaffeensis Arkansas b, E. canis Jake c, or E. ruminantium Gardel d using MUGSY program with default parameters, and the graphs were generated using GMAJ. Ehrlichia sp. HF genome has a single large-scale symmetrical inversion with E. chaffeensis, but exhibits almost complete synteny with other Ehrlichia spp mis-annotations can frequently confound such analyses. Properties [87], Kyoto Encyclopedia of Genes and Ge- For example, although the homolog to E. chaffeensis nomes (KEGG) [88], and Biocyc [89]. In addition, by two TRP120 was not identified in E. muris subsp. eauclairen- and four-way comparisons between Ehrlichia sp. HF and sis, TBLASTN searches indicated that this ORF is split into E. chaffeensis (Fig. 3 and Table 3), results indicated that two pseudogenes (EMUCRT_0995 and EMUCRT_0731) Ehrlichia sp. HF possesses similar metabolic pathways as in two separate contigs of the draft genome sequences. In previously described for E. chaffeensis [72]. Ehrlichia sp. addition, RpoB/C were misannotated in E. muris subsp. HF genome encodes pathways for aerobic respiration to eauclairensis genome as a concatenated pseudogene produce ATP, including pyruvate metabolism, the tri- EMUCRT_RS04655, whereas several genes encoding carboxylic acid (TCA) cycle, and the electron transport GyrA, PolI, AtpG, and CckA of E. muris AS145 were an- chain, but lacks critical enzymes for glycolysis and gluco- notated as pseudogenes due to frameshifts in homopoly- neogenesis. Similar to E. chaffeensis, Ehrlichia sp. HF meric tracts (Table S3). can synthesize fatty acids, nucleotides, and cofactors, but has very limited capabilities for amino acid biosynthesis, Metabolic and Biosynthetic Potential and is predicted to make only glycine, glutamine, glu- The metabolic potential of Ehrlichia sp. HF (Table 3) tamate, aspartate, arginine, and lysine. Ehrlichia sp. HF was analyzed by functional role categories using Genome encodes very few enzymes related to central
Lin et al. BMC Genomics (2021) 22:11 Page 7 of 22 Fig. 3 Numbers of protein homologs conserved among representative Ehrlichia spp. A Venn diagram was constructed showing the comparison of conserved and unique genes between Ehrlichia spp. as determined by reciprocal BLASTP algorithm using an E-value of < 1e-10. Numbers within the intersections of different circles indicate protein homologs conserved within 2, 3, or 4 organisms. Species indicated in the diagram are abbreviated as follows: EHF a, Ehrlichia sp. HF; ECH b, E. chaffeensis Arkansas; EMU c, E. muris subsp. muris AS145; EmCRT d, E. muris subsp. eauclairensis Wisconsin. Table 3 Role category breakdown of protein coding genes in Ehrlichia species Role Category1 EHF ECH EMU EmCRT Unique in EHF2 Amino acid biosynthesis 22 23 23 22 Biosynthesis of cofactors, prosthetic groups, and carriers 64 60 65 61 Cell envelope 53 51 51 48 1 Cellular processes 42 41 42 41 Central intermediary metabolism 3 3 5 3 DNA metabolism 41 44 41 42 Energy metabolism 84 82 80 83 Fatty acid and phospholipid metabolism 20 19 21 21 Mobile elements 4 4 4 4 Protein fate 79 78 77 78 Protein synthesis 108 108 107 107 Nucleotide biosynthesis 35 35 35 35 Regulatory functions 14 15 13 14 Transcription 21 21 19 19 Transport and binding proteins 33 33 32 33 Hypothetical proteins or proteins with unknown functions 244 276 268 255 8 Total Assigned Functions: 623 617 615 611 Total Proteins 867 893 883 866 1 Abbreviations: EHF Ehrlichia sp. HF, ECH E. chaffeensis Arkansas, EMU E. muris subsp. muris AS145, EmCRT E. muris subsp. eauclairensis Wisconsin. 2 Proteins specific to Ehrlichia sp. HF are based on 4-way comparison analysis among four Ehrlichia spp. by Blastp (E < 1e-10)
Lin et al. BMC Genomics (2021) 22:11 Page 8 of 22 intermediary metabolism (Table 3) and partially lacks has 23 paralogous omp-1/p28 family genes, named omp- genes for glycerophospholipid biosynthesis, rendering 1.1 to omp-1.23 (Fig. 4), and similarly flanked by tr1 and this bacterium dependent on the host for its nutritional secA genes. Comparing with the E. chaffeensis Omp-1/ needs, like E. chaffeensis [90, 91]. P28 proteins by the best matches from BLASTP search, Ehrlichia species, including the HF strain and E. chaf- the HF genome lacks orthologs of E. chaffeensis Omp- feensis, are deficient in biosynthesis pathways of typical 1Z, C, D, F, and P28-2, but has duplicated Omp-1H and pathogen-associate molecular patterns (PAMPs), includ- 6 copies of Omp-1E (Fig. 4). Since P28 and OMP-1F of ing lipopolysaccharide, peptidoglycan, common pili, and E. chaffeensis showed different solute diffusion rates flagella. Nevertheless, both E. chaffeensis and Ehrlichia [99], the divergence of Ehrlichia sp. HF Omp-1 protein sp. HF induce acute and/or chronic inflammatory cyto- family could affect the effectiveness of nutrient acquisi- kines production in a MyD88-dependent, but Toll-like tion by these bacteria. receptors (TLR)-independent manner [92–94]. Similar Gram-negative bacteria encode a conserved outer to acute severe cases of HME, Ehrlichia sp. HF causes membrane protein Omp85 (or YaeT) for outer mem- an acute toxic shock-like syndrome in mice involving brane protein assembly [100, 101], and a molecular many inflammatory factors and kills mice in 10 days [56, chaperone OmpH that interacts with unfolded proteins 61, 66, 67], suggesting that Ehrlichia species have as they emerge in the periplasm from the Sec transloca- unique, yet to be identified inflammatory molecules. tion machinery [102, 103]. The outer membrane lipopro- tein OmpA of E. chaffeensis is highly expressed [104– Two-component regulatory systems 106], and OmpA family proteins in other gram-negative A two-component regulatory system (TCS) is a bacterial bacteria are well characterized for their roles in porin signal transduction system, generally composed of a sen- functions, bacterial pathogenesis, and immunity [107]. sor histidine kinase and a cognate response regulator, All three outer membrane proteins were identified in which allows bacteria to sense and respond rapidly to Ehrlichia sp. HF, and highly conserved in these Ehrlichia environmental changes [95]. Our previous studies spp. (Table 4), suggesting their essential roles in bacterial showed that E. chaffeensis encodes three pairs of TCSs, infection and survival. including CckA/CtrA, PleC/PleD, and NtrX/NtrY, and Our previous studies showed that E. chaffeensis uses that the histidine kinase activities were required for bac- its outer membrane invasin EtpE to bind host cell recep- terial infection [85, 86]. Analysis showed that all three tor DNase X, and regulates signaling pathways required histidine kinases were identified in four species of Ehrli- for entry and concomitant blockade of reactive oxygen chia including Ehrlichia sp. HF (Table 4). However, the species production for successful infection of host response regulator cckA gene of E. muris subsp. muris monocytes [108–111]. Analysis showed that the homo- AS145 was annotated as a pseudogene due to an internal logs of EtpE were present in Ehrlichia sp. HF as well as frameshift (Table 4). Since CckA regulates the critical bi- other Ehrlichia (Table 4), suggesting these bacteria phasic developmental cycle of Ehrlichia, which converts might use similar mechanisms for entry and infection of between infectious compact dense-cored cell (DC) and their host cells. replicative larger reticulate cell (RC) form [85], the mu- tation of cckA in E. muris AS145 needs to be further val- idated to rule out sequencing error in a homopolymeric Protein secretion systems tract. Ehrlichia sp. HF encodes all major components for the Sec-dependent protein export system to secrete proteins Ehrlichia Outer Membrane Proteins (Omps) across the membranes. In addition, intracellular bacteria Ehrlichia spp. encode 14 – 23 tandemly-arrayed paralo- often secrete effector molecules into host cells via Sec- gous Omp-1/P28 major outer membrane family proteins independent pathways, which regulate host cell physio- in a >26 kb genomic region [52, 93, 96–98]. This poly- logical processes, thus enhancing bacterial survival and/ morphic multigene family is located downstream of tr1, or causing diseases [112]. Analysis of the Ehrlichia sp. a putative transcription factor, and upstream of secA HF genome identifies the Sec-independent Type I secre- gene [97]. Compensating for incomplete metabolic path- tion system (T1SS), which can transport target proteins ways, the major outer membrane proteins P28 and with a C-terminal secretion signal across both inner and Omp-1F of E. chaffeensis possess porin activities for nu- outer membranes into the extracellular medium, and trient uptake from the host, which allow the passive dif- twin-arginine dependent translocation (TAT) pathway, fusion of L-glutamine, the monosaccharides arabinose which can transport folded proteins across the bacterial and glucose, the disaccharide sucrose, and even the tet- cytoplasmic membrane by recognizing N-terminal signal rasaccharide stachyose as determined by a proteolipo- peptides harboring a distinctive twin-arginine motif some swelling assay [99]. The Ehrlichia sp. HF genome (Table 4) [113].
Lin et al. BMC Genomics (2021) 22:11 Page 9 of 22 Table 4 Potential pathogenic genes in Ehrlichia sp. HF, E. chaffeensis, E. muris subsp. muris, and E. muris subsp. eauclairensis Organisms1 EHF ECH EMU EmCRT Outer Membrane Proteins: Omp-1/P28 family proteins 23 22 20 20 Omp85 + + + + OmpH + + + + OmpA family protein + + + + EtpE + + + + Type IV Secretion System: VirB1/B5 - - - - VirB2 + (5) + (4) + (4) + (5) VirB3 + + + + VirB4 + (2) + (2) + (2) + (2) VirB6 + (4) + (4) + (4) + (4) VirB7 + + + + VirB8 + (2) + (2) + (2) + (2) VirB9 + (2) + (2) + (2) + (2) VirB10/B11/D4 + + + + Putative T4SS Effectors: Etf-1 + + + + 2 Etf-2 ± + ± ± Etf-3 + + + + Type I Secretion System3 + + + + Twin-arginine Translocation (TAT) Pathway4 + + + + TRP Proteins TRP32 + (94 aa) + (198 aa) + (112 aa) + (105 aa) TRP47 + (255 aa) + (316 aa) + (228 aa) + (252 aa) TRP120 + (584 aa) + (548 aa) + (1288 aa) +5 Ankyrin-repeat domain proteins 5 5 5 5 Two-Component Regulatory Systems: PleC/PleD + + + + NtrY/NtrX + + + + 6 CckA/CtrA + + ± + 1 Abbreviations: EHF, Ehrlichia sp. HF; EMU, E. muris subsp. muris AS145; ECH, E. chaffeensis Arkansas; EmCRT, E. muris subsp. eauclairensis Wisconsin. Numbers inside parentheses indicate the copy number of the gene; or else, only a single copy exists. +, genes present; -, homolog of the gene not identified based on Blast searches. 2 In addition to Etf-2 (ECH_0261, 264 aa), E. chaffeensis encodes six paralogs of Etf-2 with protein sizes range from 190 ~ 350 AA (ECH_0243, 293 aa; ECH_0246, 285 aa; ECH_0247, 316 aa; ECH_0253, 189 aa; ECH_0255, 352 aa; and ECH_0257, 226 aa). However, only low homologies (26 ~ 32% AA sequence identity) to E. chaffeensis Etf-2 were identified in other Ehrlichia spp. (indicated by ±) 3 Type I Secretion System is consisting of an outer membrane channel protein TolC, a membrane fusion protein HlyD, and an ATPase HlyB. All are present in these Ehrlichia spp 4 Both twin-arginine translocase subunits TatA and TatC were identified in all Ehrlichia spp 5 Tblastn search indicates that that the homolog of E. chaffeensis TRP120 in E. muris subsp. eauclairensis Wisconsin is split into two pseudogenes (EMUCRT_0995 and EMUCRT_0731) present in two separate contigs (NZ_LANU01000002 and NZ_LANU01000003) of the incomplete genome sequences 6 Gene encoding CtrA protein was identified in E. muris subsp. muris AS145 genome. However, cckA gene is annotated as a pseudogene due to an internal deletion, causing frameshift at 1,123 bp The Type IV secretion system (T4SS) is a protein se- T4SS apparatus were identified in Ehrlichia sp. HF, simi- cretion system of Gram-negative bacteria that can trans- lar to those of E. chaffeensis (Table 4). The minor pilus locate bacterial effector molecules into host cells and subunit VirB5 is absent in all Rickettsiales [115]. VirB1, plays a key role in pathogen-host interactions [90, 114]. which is involved in murein degradation, is not present Except for VirB1 and VirB5, all key components of the in Ehrlichia spp., likely due to the lack of peptidoglycan.
Lin et al. BMC Genomics (2021) 22:11 Page 10 of 22 Fig. 4 Gene structures of Omp-1/P28 family outer membrane proteins. E. chaffeensis Arkansas encodes 22 copies of Omp-1/P28 major outer membrane proteins clustered in tandem. Ehrlichia sp. HF encodes 23 copies, which are named Omp-1.1 to Omp-1.23 consecutively. However, it lacks homologs to E. chaffeensis Omp-1Z, C, D, F, and P28-2, but has duplicated Omp-1H and 6 copies of Omp-1E (based on best Blastp matches to E. chaffeensis Omp-1/P28 proteins). Note: omp-1.1 of Ehrlichia sp. HF (EHF_0067, ortholog of E. chaffeensis omp-1m) was initially annotated as a pseudogene by NCBI automated annotation pipeline. New start site was determined based on homolog to E. chaffeensis omp-1m. Grey bars indicate non-omp-1 genes within Ehrlichia omp-1/p28 gene clusters These virB/D genes encoding T4SS apparatus are split Most virB2 genes are clustered in tandem except for into three major operons as well as single genes in three virB2-1, which is separated from the rest. VirB2 paralogs separate loci that encode VirB7 and duplicated VirB8/9 are quite divergent and only share 26% identities despite proteins (Table 4 and Fig. S2). Genes encoding VirB4 their similar sizes and domain architecture among Rick- are also duplicated, which are clustered with multiple ettsiales [115, 121]. Phylogenetic analysis of VirB2 para- paralogs of virB2 and virB6 genes (Table 4 and Fig. S2). logs in representative Ehrlichia species showed that Ehrlichia sp. HF encodes four tandem functionally VirB2-1 proteins are clustered in a separate branch; uncharacterized VirB6-like paralogs (800 – 1,942 AA), whereas the rest of VirB2 paralogs are more divergent which have increasing masses and are three- to six-fold (Fig. S3). A. tumefaciens VirB2 undergoes a novel head- larger than Agrobacterium tumefaciens VirB6 (~300 AA), to-tail cyclization reaction and polymerizes to form the with extensions found at both N- and C-terminus [116]. T-pilus [116], and mature VirB2 integrates into the cyto- In A. tumefaciens, VirB2 is the major T-pilus compo- plasmic membrane via two hydrophobic α-helices [122, nent that forms the main body of this extracellular 123]. Analysis of Ehrlichia sp. HF VirB2-4 showed that structure, which is believed to initiate cell-cell contact it possesses a signal peptide (cleavage site between resi- with plant cells prior to the initiation of T-complex dues 29 and 30) and two hydrophobic transmembrane transfer [117, 118]. A yeast two-hybrid screen identified α-helices (Fig. S4A). Alignment of these VirB2 paralogs interaction partners in Arabidopsis thaliana, suggesting showed that two hydrophobic α-helices are completely that Agrobacterium VirB2 directly contacts the host cell conserved, although they are more divergent on the N- during the substrate translocation process [114, 119, and C-terminus (Fig. S4B), suggesting that Ehrlichia 120]. Compared to E. chaffeensis and E. muris subsp. VirB2s could form the secretion channels for mature muris AS145 that encode four VirB2 paralogs, both Ehr- T4SS pili as in Agrobacterium [121]. Our previous study lichia sp. HF and E. muris subsp. eauclairensis encode confirmed that VirB2 is expressed on the surface of a five VirB-2 paralogs at ~120 AA (Table 4 and Fig. 5). closely related bacterium Neorickettsia risticii [124]. Fig. 5 Gene structures of Ehrlichia VirB2 paralogs. All Ehrlichia spp. encodes 3 - 5 copies of VirB2 paralogs. Ehrlichia sp. HF and E. muris subsp. eauclairensis encode five VirB2 paralogs at ~120 AA, whereas E. chaffeensis and E. muris subsp. muris subsp. muris AS145 encode four VirB2. E. canis only encodes only 3 copies of VirB2 paralogs, and E. ruminantium encodes 4 copies with larger gaps between each virB2 paralogs. Except for E. ruminantium, most virB2 genes are clustered in tandem with virB2-1 separated from the rest.
Lin et al. BMC Genomics (2021) 22:11 Page 11 of 22 Studies indicated that VirB2 paralogs of Anaplasma Ankyrin-repeat containing proteins phagocytophilum are differentially expressed in tick and Ankyrin-repeats (Ank) are structural repeating motifs mammalian cells [125], and an outer membrane vaccine that consist of 33-AA with two anti-parallel α-helices of Anaplasma marginale containing VirB2 can protect connected to the next repeat via a loop region [131]. against the disease and persistent infection [126, 127]. Ank proteins are more common in eukaryotes, which Therefore, the expression of VirB2 paralogs could be mediate protein–protein interactions involved in a specific to the host environment, and their highly diver- multitude of host processes including cytoskeletal motil- gent C-terminus may offer antigenic variations for pro- ity, tumor suppression, and transcriptional regulation tection from host adaptive immunity. [131]. AnkA of A. phagocytophilum is one of a few known T4SS effectors, which can be translocated into Putative T4SS Effectors the host cells, tyrosine-phosphorylated, and plays an im- In contrast to other intracellular pathogens with enor- portant role in facilitating intracellular infection by mous numbers of effectors (i.e. Legionella pneumophila), regulating host signaling pathways [132–134]. A. phago- E. chaffeensis encodes much fewer but versatile effectors cytophilum AnkA can also be translocated to the cell nu- [128]. Three E. chaffeensis T4SS effectors have been cleus and bind to transcriptional regulatory regions of experimentally characterized, namely Ehrlichia translo- the CYBB locus to suppress host-cell innate immune re- cated factor (Etf)-1, -2, and -3 [129]. These T4SS effec- sponse [135, 136]. The AnkA homolog in E. chaffeensis, tors are essential for infection of host cells, through Ank200, also contains tyrosine kinase phosphorylation inhibition of host apoptosis by Etf-1 [129], acquisition of sites and can be tyrosine-phosphorylated in the infected host nutrients by Etf-1-induced autophagosomal path- host cells [137, 138]. E. chaffeensis Ank200 interacts ways [90], or maintenance of the bacterial replication with Alu-Sx elements to regulate several genes associ- compartments by Etf-2-mediated inhibition of endosome ated with ehrlichial pathobiology [139]. A homolog of maturation [130]. Homologs of Etf-1 and Etf-3 were E. chaffeensis Ank200 was identified in Ehrlichia sp. HF identified in all Ehrlichia spp., and they are highly con- (EHF_0607), which also contains two putative tyrosine served with percent protein identities over 77% and 85%, kinase phosphorylation sites and SH3 domains in respectively. Etf-2 proteins are more divergent among addition to 14 Ank repeats [133] (Fig. 6). Our analysis Ehrlichia spp., and E. chaffeensis encodes five paralogs of identified four additional Ank-repeat containing pro- Etf-2 with protein lengths range from 190 ~ 350 AA; teins in these representative Ehrlichia spp. (Table 4). In however, only low homologies (26 ~ 32% protein iden- Ehrlichia sp. HF, these proteins range from ~150 to tity) to E. chaffeensis Etf-2 were identified in Ehrlichia over 3,000 AA in length and contain 2 - 14 copies of sp. HF and other Ehrlichia species (Table 4). Whether Ank repeats (Fig. 6). It remains to be elucidated if any these proteins contain a T4SS motif and can be secreted of the ankyrin repeat-containing proteins in Ehrlichia into the host cell cytoplasm remains to be studied. sp. HF can be secreted, and whether these proteins Fig. 6 Domain structures of Ankyrin-repeat containing proteins in Ehrlichia sp. HF. Ehrlichia sp. HF encodes 5 Ank-repeat containing proteins, including E. chaffeensis Ank200 homolog (EHF_0607). Ank-repeat domains were determined by NCBI Conserved Domains Database (CDD, https:// www.ncbi.nlm.nih.gov/Structure/cdd) [140, 141], and eukaryotic phosphorylation sites were determined by Scansite 4.0 (https://scansite4.mit.edu/) [142]. In addition to 14 Ank repeats, Ank200 (EHF_0607) contains two tyrosine kinase phosphorylation sites (red bars), two SH3 domains (blue), and one Ser/Thr kinase site (green). Domain abbreviations: Ank, Ankyrin repeat; DUF5401, family of unknown function initially found in Chromadorea like Caenorhabditis elegans; NtpI, Archaeal/vacuolar-type H+-ATPase subunit I/STV1.
Lin et al. BMC Genomics (2021) 22:11 Page 12 of 22 regulate host cell signaling to benefit intracellular ehrli- Ehrlichia sp. HF and 6 representative Ehrlichia species, chial infection. subspecies, and strains by using three different parame- ters: (1) average nucleotide identity (ANI) [153], (2) Tandem-repeat containing proteins (TRPs) digital DNA-DNA hybridization (dDDH) [154], and (3) Using a heterologous Escherichia coli T1SS apparatus, core genome alignment sequence identity (CGASI) studies have identified four potential E. chaffeensis T1SS [155]. ANI values are calculated by first splitting the effectors, including ankyrin-repeat containing protein genome of one organism into 1 kbp fragments, which Ank200, and three tandem-repeat containing proteins are then searched against the genome of the other or- (TRPs), TRP47, TRP120, and TRP32 [138]. TRP120 ganism. ANI is then calculated by taking the average se- protein also contains a motif that is rich in glycine and quence identity of all matches spanning >70% of their aspartate and relates to the repeats-in-toxins (RTX) fam- length with >60% sequence identity [153]. dDDH values ily of exoproteins [93, 138, 143]. Our current analysis are calculated by using the sequence similarity of con- identified homologs of E. chaffeensis TRP proteins in served regions between two genomes and taking the Ehrlichia sp. HF and other representative Ehrlichia spp. sum of all identities found in matches divided by the (Table 4). In E. chaffeensis, all three TRP proteins overall match length [154]. CGASI values between ge- contain various numbers of tandem repeats with repeat nomes are calculated by generating a core genome align- lengths ranging from 19 ~ 80 AA. However, bioinfor- ment, consisting of all positions present in all analyzed matic analysis of TRP homologs in Ehrlichia spp. indi- genomes, and calculating the sequence identities be- cated that these proteins are highly variable, and the tween them [155]. length and numbers of repeats are different among all Using the core genome alignment used to calculate Ehrlichia spp. (Fig. 7, Table S4). Unlike E. chaffeensis CGASI values, the maximum-likelihood phylogenetic tree TRP32 and TRP47, no repeats or variable-length PCR of the seven recognized species in the genus Ehrlichia target (VLPT) domains were detected in homologs of showed that Ehrlichia sp. HF is a sister taxon to E. muris those in Ehrlichia sp. HF and other Ehrlichia spp. (Fig. 8a), being most closely related to E. muris subsp. (Fig. 7a - b). Interestingly, TRP120 homolog of Ehrlichia muris AS145. However, between the two genomes, ANI, sp. HF has tandem repeats with longer length (100-AA), dDDH, and CGASI values are 91.8%, 43.2%, and 95.7%, whereas that of E. muris AS145 encodes a very large respectively, all below the species cutoffs (95%, 70%, and protein at 1,288 AA with over 12 repeats that are highly 96.8%, respectively) [155]. Additionally, the current species enriched in glutamic acid (Fig. 7c, Table S4). TRP120 designations for these 7 Ehrlichia genomes are supported homolog is also identified in E. muris subsp. eauclairen- by all three parameters, with the exception of two subspe- sis, which is split into two ORFs in two separate contigs cies in E. muris (Fig. 8b). of the incomplete genome sequences, and has a total of Similar results are observed in a phylogenetic analyses ~11 repeats (Fig. 7c, Table S4). Previous studies have based on the 16S rRNA sequences (Fig. S5A) or eight indicated that E. chaffeensis TRP proteins are highly im- concatenated protein sequences (3,188 AA total) consist- munogenic in infected patients and animals [144], and ing of five conserved housekeeping proteins (TyrB/Mdh/ could play important roles in host–pathogen interactions Adk/FumC/GroEL) and three more divergent surface [143, 145–152]. Our recent study using Himar1 trans- proteins like major outer membrane or T4SS apparatus poson mutagenesis of Ehrlichia sp. HF recovered a proteins (P28/VirB2-1/VirB6-1) (Fig. S5B). However, the mutant with insertion within TRP120 gene from DH82 nodes on the phylogenetic tree generated using the core cells, indicating that TRP120 is not essential for survival nucleotide alignment consistently have higher bootstrap and infection of Ehrlichia sp. HF in DH82 cells [56]. As support values than those of 16S rRNAs or concatenated targeted mutagenesis of Ehrlichia is still unavailable, proteins (Fig. 8a and S5). Based on these analyses, we future studies using the cloned TRP120 mutant will proposed the following new classification of Ehrlichia sp. benefit functional analysis of TRP120. In addition, it HF. remains to be studied if any of TRPs of Ehrlichia sp. HF can be secreted by the T1SS, and whether these proteins Description of Ehrlichia japonica sp. nov. (japonica, N.L. regulate host cell signaling to benefit intracellular ehrli- fem. adj. japonica from Japan) chial infection or pathogenicity. The distances observed between Ehrlichia sp. HF and other Ehrlichia species by whole genome sequence- Ehrlichia sp. HF is a new Ehrlichia species based on based phylogenetic analysis indicate that Ehrlichia sp. genome and proteome phylogenetic analysis HF represents a new species in the genus Ehrlichia. This To classify Ehrlichia sp. HF in the genus Ehrlichia, we species is therefore named as Ehrlichia japonica sp. nov. conducted phylogenetic analyses of Ehrlichia sp. HF by to denote the geographic region where this bacterium using nucleotide-based core genome alignment of was initially isolated. The type strain, HFT, was named
Lin et al. BMC Genomics (2021) 22:11 Page 13 of 22 Fig. 7 Analysis of Ehrlichia tandem repeat proteins TRP-32/47/120. TRP Homologs of E. chaffeensis Arkansas were first identified using BLASTP among Ehrlichia spp., and the internal repeats were determined by XSTREAM (https://amnewmanlab.stanford.edu/xstream/). Colored boxes indicated different repeat sequences and lengths, and were drawn to scale with the protein lengths. TRP proteins are highly variable, and the length and numbers of repeats are different among all Ehrlichia spp. a E. chaffeensis TRP32 (ECH_0170) protein (or variable length PCR target/ VLPT, 198 AA) contains 4 consecutive VLPT repeats (30-AA). However, no repeats or VLPT domains were detected in Ehrlichia sp. HF (EHF_0893/ EHF_RS04015, only 90 AA with 45% identity matched to the C-terminus of ECH0170), E. muris subsp. eauclairensis (EMUCRT_RS02860, 105 AA), and E. muris subsp. muris (EMUR_00520/MR76_RS00500, 112 AA). b E. chaffeensis TRP47 protein (ECH_0166, 316 AA) contains eight consecutive 19-AA repeats at its C-terminus. TRP47 homologs in Ehrlichia sp. HF (EHF_0897/EHF_RS04625, annotation revised based on TBLASTN against Ehrlichia sp. HF genome) encodes a smaller protein (255 AA) with 40% identity, mostly conserved in N-terminus. However, no repeat sequences were identified in TRP47 homologs in Ehrlichia sp. HF, E. muris subsp. eauclairensis (EMUCRT_0637/ EMUCRT_RS04575, 252 AA), and E. muris subsp. muris (EMUR_00500/MR76_RS04630, 228 AA). c E. chaffeensis TRP120 protein (ECH_0039, 548 AA) contains 41/3 consecutive 80-AA repeats. TRP120 homolog in Ehrlichia sp. HF (EHF_0897/EHF_RS04625, 584 AA) contains 4¼ consecutive 100-AA repeats. A much larger protein was identified in E. muris subsp. muris AS145 (EMUR_0035/MR76_RS00035, 1,288 AA) with 121/3 repeats (8 repeats with 67-AA length and 41/3 repeats of 56-AA length). Two ORFs (EMUCRT_0995 and EMUCRT_09731) in E. muris subsp. eauclairensis that match to E. chaffeensis TRP120 at the N- and C- terminus respectively, were identified in two contigs (NZ_LANU01000002 and NZ_LANU01000003) of the incomplete genome sequences. Nine repeats of 65-AA length were identified in both proteins, whereas two shorter repeats of 38-AA length were found in EMUCRT_0995 only. after the scientist Hiromi Fujita who first discovered and Conclusions isolated this bacterium [5]. By comparing with closely related Ehrlichia spp., this To date all E. japonica was found in various Ixodes study indicates that the genome of Ehrlichia sp. HF en- species of ticks in Japan, France, Serbia, and Romania. codes all homologs to virulence factors of E. chaffeensis re- This species is highly pathogenic to mice. E. japonica quired to infect host cells, including outer membrane can be distinguished by PCR of 16S RNA using Ehrlichia proteins, protein secretion systems and effectors, support- sp. HF-specific primer pair HF51f/HF954r (923 bp target ing that this species can serve as a model bacteria to study size, Table S5, Fig. S6) from other Ehrlichia species in vivo pathogenesis and immune responses for fatal ehr- [156]. E. japonica HFT can be stably cultured in DH82 lichiosis. Whole genome alignment and phylogenetic ana- cells, which is available from BEI Resources (Deposit lyses indicate that Ehrlichia sp. HF can be classified as a ID# NR-46450, Manassas, VA) and Collection de new species in the genus Ehrlichia, and we propose to Souches de l’Unité des Rickettsies (CSUR Q1926, name it as Ehrlichia japonica sp. nov. Availability of this Marseille, France). bacterial strain in macrophage cultures and complete
Lin et al. BMC Genomics (2021) 22:11 Page 14 of 22 Fig. 8 Phylogenetic analysis and determination of ANI, dDDH, and CGASI values of 7 representative Ehrlichia species. a A maximum-likelihood phylogenetic tree with 1,000 bootstraps was generating using the core genome alignment used to calculate CGASI values. Bootstrap values are indicated next to their respective nodes. b The values of ANI, dDDH, and CGASI are calculated between 7 Ehrlichia genomes and plotted as a heatmap. The respective values for each pairwise comparison are shown in each cell. Colored circles next to each strain name indicate whether each genome belongs to the same species, with circles of the same color indicating genomes are of the same species according to either ANI, dDDH, or CGASI below the species cutoffs of 95%, 70%, and 96.8%, respectively. Abbreviations and GenBank Accession numbers: EHF, Ehrlichia sp. HF (NZ_CP007474.1); EchA, E. chaffeensis Arkansas (NC_007799.1); EmuA, E. muris subsp. muris AS145 (NC_023063.1); EmuW, E. muris subsp. eauclairensis Wisconsin (NZ_LANU01000001, NZ_LANU01000002, and NZ_LANU01000003); EcaJ, E. canis Jake (NC_007354.1); EruW, E. ruminantium Welgevonden (NC_005295.2); EruG, E. ruminantium Gardel (NC_006831.1). whole genome sequence data will greatly advance ehrlichi- Blood samples were collected by cardiac puncture, and osis researches, including in vivo virulence factors, thera- buffy coat was separated by centrifugation at 1,000 × g. peutic interventions, and vaccine studies. The presence of Ehrlichia sp. HF in monocytes in the blood smear was confirmed by Diff-Quik staining Methods (Thermo Fisher Scientific, Waltham, MA). The spleen Culture isolation of Ehrlichia sp. HF was aseptically excised and a single-cell suspension was Two C57BL/6 mice (Envigo, Indianapolis, IN) were in- prepared in 0.7-ml of RPMI-1640 media after lysing red traperitoneally inoculated with mouse spleen homoge- blood cells with ammonium chloride. DH82 cells were nates containing Ehrlichia sp. HF in RPMI-1640 cultured in DMEM (Dulbecco minimal essential (Mediatech, Manassas, VA) freezing medium containing medium; Mediatech) supplemented with 5% FBS and 2 20% fetal bovine serum (FBS; Atlanta Biologicals, Law- mM L-glutamine (L-Gln; GIBCO, Waltham, MA) at 37°C renceville, GA) and 10% DMSO (Millipore Sigma, Bur- under 5% CO2 in a humidified atmosphere as described lington, MA), which are stored in liquid nitrogen at previously [157]. RF/6A cells (ATCC) were cultured in approximate 0.35 ml, equivalent to ½ of an infected advanced minimal essential medium (AMEM, Gibco) spleen. Clinical signs and body weight were monitored supplemented with 5% FBS and 2 mM L-glutamine. The daily. Moribund mice at 8 day post inoculation were eu- ISE6 cell line, derived from the Ixodes scapularis tick thanized by CO2 inhalation and cervical dislocation. embryo, was cultured in L15C300 medium at 34°C as
Lin et al. BMC Genomics (2021) 22:11 Page 15 of 22 described previously [158]. Half of buffy coat cells and AMPure XT beads (Beckman Coulter Genomics, Dan- spleen cell suspension from one mouse were overlaid on vers, MA). DH82 and RF/6A cells in respective culture media, and Paired-end genomic DNA libraries for sequencing cultured with the addition of 0.1 μg/mL cycloheximide using Illumina platform were constructed using the (Millipore Sigma). To assess the degree of Ehrlichia in- KAPA library preparation kit (Kapa Biosystems, Wo- fection in host cells, a drop of infected cells was burn, MA). DNA was fragmented with the Covaris E210 centrifuged onto a slide in a Shandon Cytospin 4 cyto- and the libraries were prepared using a modified version centrifuge (Thermo Fisher), and the presence of Ehrli- of manufacturer’s protocol. The DNA was purified be- chia-containing inclusions was examined in both cell tween enzymatic reactions and the size selection of the types by Diff-Quik staining every 3 – 4 days. Ehrlichia library was performed with AMPure XT beads (Beckman sp. HF was continuously passaged in DH82 cells with Coulter Genomics), using 33.3 μl beads for 50 μl purified the addition of 0.1 μg/mL cycloheximide. ligation product. For indexed samples, the PCR amplifi- cation step was performed with primers containing a Culture and purification of host cell-free Ehrlichia sp. HF six-nucleotide index sequence. and bacterial genomic DNA Concentration and fragment size of libraries were de- Twelve T175 flasks of Ehrlichia sp. HF-infected DH82 termined using the DNA High Sensitivity Assay on the cells (>80% infectivity) at 3 d post infection (pi) were ho- LabChip GX (Perkin Elmer, Waltham, MA) and qPCR mogenized in 30 ml of 1× SPK buffer (0.2 M sucrose using the KAPA Library Quantification Kit (Complete, and 0.05 M potassium phosphate, pH 7.4) for 30 times Universal) (Kapa Biosystems, Woburn, MA). The mate with type A tight-fitting pestle in a dounce homogenizer pair libraries were sequenced on an Illumina HiSeq 2500 (Wheaton, Millville, NJ). After centrifugation at 700 × g (Illumina), producing 23.8 M reads (4.8G bases), while (Sorvall 6000D, Thermo Fisher), the pellet was further the paired-end libraries were sequenced on an Illumina homogenized for additional 30 times. Homogenates were MiSeq (Illumina), producing 1.6 M reads (826.2M combined and step-wise centrifuged at 700, 1,000, and 1, bases). 500 × g for 10 min without using the break function of DNA samples for PacBio sequencing were sheared the centrifuge to avoid disturbing the loosely-packed to 8 kbp using the Covaris gTube (Woburn, MA). Se- pellets, then passed through 5.0- and 2.7-μm filters, and quencing libraries were constructed and prepared for centrifuged at 10,000 × g for 10 min (Sorvall RC 5C Plus sequencing using the SMRTbell Express Template using SS-34 rotor). The purity of bacteria was deter- Prep Kit 2.0 (3kbp - 10kbp) and the DNA/Polymerase mined by Diff-Quik staining (Fig. S6A). Genomic DNA Binding Kit 2.0 (Pacific Biosciences. Menlo Park, CA). samples were prepared using Qiagen genomic tips (Qia- Libraries were loaded onto v2 SMRT Cells, and gen, Germantown, MD) according to the manufacturer’s sequenced with the DNA Sequencing Kit 2.0 (Pacific instructions, and resuspended in TE buffer. The quantity Biosciences), producing 81,741 reads (388.8M bases). and quality of genomic DNA were determined by All sequence reads were deposited at NCBI Sequence Nanodrop (8.41 μg total DNA; Thermo Fisher) as Read Archive (SRA, BioProject accession number well as 0.9% agarose gel electrophoresis with BioLine PRJNA187357). markers (Fig. S6B). The purity of bacterial genomic Five assemblies were generated with various combina- DNA was confirmed by PCR and agarose gel electro- tions of the data and assembly algorithms: (1) Celera As- phoresis using specific primers targeting Ehrlichia sp. sembler v7.0 of only PacBio data, (2) Celera Assembler HF 16S rRNA gene (HF51f/HF954r) and canine v7.0 of PacBio data with correction using Illumina G3PDH DNA (Table S5 and Fig. S6) [156, 159]. The paired-end data, (3) HGAP assembly of only PacBio contamination of host DNA was estimated to be sat- data, (4) MaSuRCA 1.9.2 assembly of Illumina paired- isfactorily low for shotgun sequencing to obtain end data subsampled to 50× coverage, and (5) complete genome sequence (Fig. S6B). MaSuRCA 1.9.2 assembly of Illumina paired-end data subsampled to 80× coverage. The first assembly was the Sequencing and annotation optimal assembly, namely the one generated with Celera Indexed Illumina mate pair libraries were prepared fol- Assembler v7.0 with only the PacBio data. The data set lowing the mate pair library v2 sample preparation guide was subsampled to ~22× coverage of the longest reads (Illumina, San Diego, CA), with two modifications. First, using an 8 Kbp minimum read length cutoff, with the re- the shearing was performed with the Covaris E210 (Cov- mainder of the reads used for the error correction step. aris, Wobad, MA) using the following conditions: duty The resulting single-contig assembly totaled ~89.4 Kbp cycle, 10; time, 120 sec; intensity 4; and cycles per burst, with 41.68% GC-content. The genome was trimmed to 200. The DNA was purified between enzymatic reactions remove overlapping sequences, oriented, circularized, and the size selection of the library was performed with and rotated to the predicted origin of replication.
You can also read