GBE A Chromosome-Level Genome Assembly of the Reed Warbler (Acrocephalus scirpaceus)

Page created by Jared Carr
 
CONTINUE READING
GBE A Chromosome-Level Genome Assembly of the Reed Warbler (Acrocephalus scirpaceus)
GBE

A Chromosome-Level Genome Assembly of the Reed Warbler
(Acrocephalus scirpaceus)
Camilla Lo Cascio Sætre1,*, Fabrice Eroukhmanoff1, Katja Rönk€a2,3, Edward Kluen2,3, Rose Thorogood2,3,
James Torrance4, Alan Tracey4, William Chow4, Sarah Pelan4, Kerstin Howe4, Kjetill S. Jakobsen1, and
Ole K. Tørresen1

1
 Centre for Ecological and Evolutionary Synthesis, University of Oslo, Norway

                                                                                                                                                                                           Downloaded from https://academic.oup.com/gbe/article/13/9/evab212/6367782 by guest on 18 December 2021
2
 HiLIFE Helsinki Institute of Life Sciences, University of Helsinki, Finland
3
 Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, University of Helsinki, Finland
4
 Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom

*Corresponding author: E-mail: c.l.c.satre@ibv.uio.no.
Accepted: 6 September 2021

Abstract
The reed warbler (Acrocephalus scirpaceus) is a long-distance migrant passerine with a wide distribution across Eurasia. This species
has fascinated researchers for decades, especially its role as host of a brood parasite, and its capacity for rapid phenotypic change in
the face of climate change. Currently, it is expanding its range northwards in Europe, and is altering its migratory behavior in certain
areas. Thus, there is great potential to discover signs of recent evolution and its impact on the genomic composition of the reed
warbler. Here, we present a high-quality reference genome for the reed warbler, based on PacBio, 10, and Hi-C sequencing. The
genome has an assembly size of 1,075,083,815 bp with a scaffold N50 of 74,438,198 bp and a contig N50 of 12,742,779 bp.
BUSCO analysis using aves_odb10 as a model showed that 95.7% of BUSCO genes were complete. We found unequivocal evidence
of two separate macrochromosomal fusions in the reed warbler genome, in addition to the previously identified fusion between
chromosome Z and a part of chromosome 4A in the Sylvioidea superfamily. We annotated 14,645 protein-coding genes, and a
BUSCO analysis of the protein sequences indicated 97.5% completeness. This reference genome will serve as an important resource,
and will provide new insights into the genomic effects of evolutionary drivers such as coevolution, range expansion, and adaptations
to climate change, as well as chromosomal rearrangements in birds.
Key words: genome assembly, Hi-C sequencing, long reads, reference genome, Acrocephalus scirpaceus. .

    Significance
    The reed warbler (Acrocephalus scirpaceus) has been lacking a genomic resource, despite having been broadly
    researched in studies of coevolution, ecology, and adaptations to climate change. Here, we generated a chromo-
    some-length genome assembly of the reed warbler, and found evidence of macrochromosomal fusions in its genome,
    which are likely of recent origin. This genome will provide the opportunity for a deeper understanding of the evolution
    of genomes in birds, as well as the evolutionary path and possible future of the reed warbler.

Introduction                                                                                    parasitic common cuckoo (Cuculus canorus) (Davies and
The ecology and evolution of the reed warbler (Acrocephalus                                     Brooke 1989; Stokke et al. 2018). Decades of field experiments
scirpaceus) has been of interest for over 40 years (Thorogood et                                have demonstrated behavioral coevolution and spatial and
al. 2019) as it is one of the favorite host species of the brood-                               temporal variation in species interactions (e.g., Thorogood

ß The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse,
distribution, and reproduction in any medium, provided the original work is properly cited.

Genome Biol. Evol. 13(9) doi:10.1093/gbe/evab212 Advance Access publication 9 September 2021                                                                                         1
Sætre et al.                                                                                                           GBE

and Davies 2013). However, the reed warbler’s response to              277,617,608 paired-end reads (2150) with 10 Genomics,
climate change has begun to attract increasing attention.              and 185,974,525 paired-end reads (2150) with Hi-C, at
Reed warblers are experiencing far less severe declines in pop-        83 and 56 coverage, respectively. The final genome as-
ulation size than is typical for long-distance migrants (Both et al.   sembly was 1.08 Gb in length, and contains 1,081 contigs
2010; Vickery et al. 2014). In fact, they are expanding their          (contig N50 of 13 Mb) and 200 scaffolds (scaffold N50 of
breeding range northwards into Fennoscandia (J€arvinen and             74 Mb) (table 1). The completeness of the assembled genome
Ulfstrand 1980; Røed 1994; Stolt 1999; Brommer et al.                  is high: of the 8,338 universal avian single-copy orthologs, we
2012), and have generally increased their productivity follow-         identified 7,978 complete BUSCOs (95.7%), including 7,920
ing the rise in temperature (Schaefer et al. 2006; Eglington et        single-copy (95.0%), and 58 duplicated BUSCOs (0.7%).
al. 2015; Meller et al. 2018). They are also showing rapid             Fifty-nine BUSCOs (0.7%) were fragmented, and 301

                                                                                                                                         Downloaded from https://academic.oup.com/gbe/article/13/9/evab212/6367782 by guest on 18 December 2021
changes in phenology (Halupka et al. 2008), and migratory              BUSCOs (3.6%) were missing.
behavior; instead of crossing the Sahara, monitoring suggests
that some reed warblers now remain on the Iberian Peninsula
over winter (Chamorro et al. 2019). Morphological traits such          Genome Annotation
as body mass and wing shape have been shown to change                  The GC content of the reed warbler genome assembly was
rapidly in reed warbler populations, indicating possible local         41.9%. The total repeat content of the assembly was
adaptation (Kralj et al. 2010; Salewski et al. 2010; Sætre et          10.94%, with LTR elements as the most common type of
al. 2017). Genetic differentiation is generally low between reed       repeat (4.50%) followed by LINEs (4.11%) (table 1).
warbler populations, but moderate levels of differentiation               Using the Comparative Annotation Toolkit, based on a
have been connected to both migratory behavior (Prochazka             whole-genome multiple alignments from Cactus, we pre-
et al. 2011) and wing shape (Kralj et al. 2010). Reed warblers         dicted 14,645 protein coding genes, with an average
thus provide a promising system to study population, pheno-            Coding DNA Sequence (CDS) length of 1,782 bp, and an av-
typic, and genetic responses to climate change.                        erage intron length of 2,918 bp (table 1). The annotated genes
    Although there has been an increasing number of avian              had 97.5% completeness (based on predicted proteins).
genome assemblies in recent years (e.g., Feng et al. 2020),
many nonmodel species, including the reed warbler, are still
lacking a genome resource. To date, the closest relative to            Synteny Analysis
the reed warbler with a published reference genome is the              The reed warbler genome showed high synteny with the great
great tit (Parus major) (GCA_001522545.3, deposited in                 tit genome, though with some notable differences (fig. 1). The
NCBI; Laine et al. 2016), but the unpublished genome of                reed warbler chromosome 6 is a fusion of great tit chromo-
the garden warbler (Sylvia borin) is available in public data-         somes 7 and 8, and reed warbler chromosome 8 is a fusion of
bases (GCA_014839755.1, deposited in NCBI). There is also              great tit chromosomes 6 and 9. Interestingly, these chromo-
a genome in preprint from the Acrocephalus genus, the                  somes are not fused in the garden warbler genome (supple-
great reed warbler (A. arundinaceus) (Sigeman, Strandh, et             mentary fig. 1, Supplementary Material online), but
al. 2020), but the scaffolds are not chromosome length.                correspond to the great tit chromosomes. This suggests that
    Here, we present the first genome assembly of the reed             the fusions evolved relatively recently, perhaps at the base of
warbler, based on PacBio, 10, and Hi-C sequencing, with               the Acrocephalidae branch within Sylvioidea, but further re-
descriptions of the assembly, manual curation, and annotation.         search is needed to determine this. Hi-C contact maps confirm
This genome will be a valuable resource for a number of studies,       that the chromosomes assembled in the reed warbler genome
including studies of coevolution, population genomics, adaptive        are unbroken (supplementary fig. 2, Supplementary Material
evolution, and comparative genomics. For reduced-                      online). Interchromosomal rearrangements are rare in avian
representation sequencing (e.g., RAD-seq) studies, it will help        evolution (Ellegren 2010; Skinner and Griffin 2012), with
produce a more robust SNP set than with a de novo approach             some exceptions, such as in the orders Falconiformes
(Shafer et al. 2017). It will facilitate the detection of selective    (Damas et al. 2017) and Psittaciformes (Furo et al. 2018). In
sweeps, and provide the physical localization of variants (Manel       fact, in all or most species of Psittaciformes, chicken chromo-
et al. 2016), thus giving insight into the potential genes involved    somes 6 and 7, and 8 and 9 are fused (Furo et al. 2018;
in adaptation. Furthermore, the genome will be an important            Kretschmer et al. 2018)—the same chromosomes involved
resource in the study of chromosomal rearrangements in birds.          in the fusions discovered in the reed warbler genome. We
                                                                       can only speculate about the significance of this without
Results and Discussion                                                 more data. Passeriformes, the sister group of Psittaciformes,
                                                                       exhibit much lower rates of interchromosomal rearrange-
Genome Assembly and Genome Quality Evaluation
                                                                       ments, despite being a large, highly diverse order
We generated 3,810,665 reads with PacBio, with an average              (Kretschmer et al. 2021). There is still a large knowledge gap
read length of 16 kb at 61 coverage. We further obtained              in the cytogenetics of birds (Degrandi et al. 2020), and more

2   Genome Biol. Evol. 13(9) doi:10.1093/gbe/evab212 Advance Access publication 9 September 2021
Genome Assembly of the Reed Warbler                                                                                            GBE

Table 1
Summary Statistics of the Reed Warbler Genome Assembly and Annotation
Genome Assembly
                                          Estimated genome size                                     1.13 Gb
                                          Guanine and cytosine content                              41.91%
                                          N50 length (contig)                                       13 Mb
                                          Longest contig                                            48 Mb
                                          Total length of contigs                                   1.07 Gb
                                          N50 length (scaffold)                                     74.44 Mb
                                          Longest scaffold                                          153.80 Mb

                                                                                                                                                  Downloaded from https://academic.oup.com/gbe/article/13/9/evab212/6367782 by guest on 18 December 2021
                                          Total length of scaffolds                                 1.08 Gb
                                          Complete BUSCOs                                           95.7%

Transposable Elements                                                                               Percent (%)                    Total length

                                          DNA                                                       0.22                           2.35 Mb
                                          LINE                                                      4.11                           44.2 Mb
                                          SINE                                                      0.09                           0.98 Mb
                                          LTR                                                       4.50                           48.4 Mb
                                          Unknown                                                   0.55                           5.9 Mb
                                          Other (satellites, simple repeats,                        1.49                           16 Mb
                                            low complexity)
                                          Total                                                     10.94                          117.6 Mb

Protein-Coding Genes

                                          Predicted genes                                           14,645
                                          Average coding sequence length                            1,782
                                            (bp)
                                          Average exon length (bp)                                  284
                                          Average intron length (bp)                                2918
                                          Complete BUSCOs                                           97.5%

research is needed to determine the rarity of the fusions we                   Sylvioidea, we found unequivocal evidence of two novel
discovered in the reed warbler genome.                                         macrochromosomal fusions in the reed warbler genome.
   We furthermore confirm the previously identified neo-sex                    Further research is needed to determine the evolutionary
chromosome (Pala et al. 2012; Sigeman, Ponnikas, et al.                        age of these fusions, especially because they are not pre-
2020), a fusion between the ancestral chromosome Z and a                       sent in the garden warbler genome, suggesting they are
part of chromosome 4A (according to chromosome naming                          relatively new. This genome will serve as an important
from the zebra finch). This fusion is thought to have occurred                 resource to increase our knowledge of chromosomal rear-
at the base of the Sylvioidea branch (Pala et al. 2012) and is                 rangements in birds, both their prevalence and their sig-
shared with all species of Sylvioidea studied so far (Sigeman,                 nificance for avian evolution. Furthermore, the genome
Ponnikas, et al. 2020). Figure 1 clearly shows that reed war-                  will, through the identification of genetic variants and in-
bler chromosome Z corresponds to great tit chromosome Z,                       formation of the function of associated genes, provide a
plus a part of great tit chromosome 4A, whereas reed warbler                   deeper insight into the evolution of the reed warbler, a
chromosome Z corresponds to garden warbler chromosome Z                        bird which will continue to fascinate researchers for years
(supplementary fig. 1, Supplementary Material online).                         to come.

Conclusion
                                                                               Materials and Methods
In this study, we present the first assembled and anno-
tated genome for the reed warbler A. scirpaceus. We have                       Sampling and Isolation of Genomic DNA
accomplished this through utilizing long read PacBio se-                       Blood was collected from a brachial vein of a female
quencing, and scaffolding with paired-end 10 and Hi-C                         reed warbler (subspecies A. scirpaceus scirpaceus,
reads. In addition to the previously identified autosome-                      NCBI Taxonomy ID: 126889) in Storminnet, Porvoo
sex chromosome fusion shared by all members of                                 (60 190 24.900 N, 25 350 23.000 E), Finland, on May 22, 2019.

Genome Biol. Evol. 13(9) doi:10.1093/gbe/evab212 Advance Access publication 9 September 2021                                                  3
Sætre et al.                                                                                                                            GBE

                                                                                                                                                             Downloaded from https://academic.oup.com/gbe/article/13/9/evab212/6367782 by guest on 18 December 2021
     FIG. 1.—Circos plot showing the synteny between the reed warbler (on the right side, denoted with the prefix as [Acrocephalus scirpaceus]) and the
great tit (left side, prefix pm [Parus major]) genome assemblies. The reed warbler chromosome 6 is a fusion of great tit chromosomes 7 and 8, whereas reed
warbler chromosome 8 is a fusion of great tit chromosomes 6 and 9 (see Hi-C contact maps in supplementary fig. 2, Supplementary Material online). The
reed warbler chromosome Z corresponds to great tit chromosome Z, and a part of great tit chromosome 4A.

Catching and sampling procedures complied with the Finnish                      Library Preparation and Sequencing
law on animal experiments and permits were licensed by the                      DNA quality was checked using a combination of a fluoro-
National Animal Experiment Board (ESAVI/3920/2018) and                          metric (Qubit, Invitrogen), UV absorbance (Nanodrop,
Southwest Finland Regional Environment Centre (VARELY/                          Thermo Fisher) and DNA fragment length assays (HS-50 kb
758/2018). Reed warblers were trapped with a mist net,                          fragment kit from AATI, now part of Agilent Inc.). The PacBio
ringed and handled by E.K. under his ringing license.                           library was prepared using the Pacific Biosciences Express li-
   The blood (80 ml) was divided and stored separately in                      brary preparation protocol. DNA was fragmented to 35 kb.
500 ml ethanol, and in 500 ml SET buffer (0.15M NaCl,                           Size selection of the final library was performed using
0.05M Tris, 0.001M EDTA, pH 8.0). The samples were imme-                        BluePippin with a 15 kb cut-off. Six single-molecule real-
diately placed in liquid nitrogen, and kept at 80  C when                     time (SMRT) cells were sequenced using Sequel Polymerase
stored. We performed phenolchloroform DNA isolation on                         v3.0 and Sequencing chemistry v3.0 on a PacBio RS II instru-
the sample stored in SET buffer, following a modified protocol                  ment. The 10 Genomics Chromium linked-read protocol
from Sambrook et al. (1989).                                                    (10 Genomics Inc.) was used to prepare the 10 library,

4   Genome Biol. Evol. 13(9) doi:10.1093/gbe/evab212 Advance Access publication 9 September 2021
Genome Assembly of the Reed Warbler                                                                                 GBE

and due to the reed warbler’s smaller sized genome, only            resulting in 521 corrections (breaks, joins and removal of er-
0.7 ng/ml of high molecular weight DNA was used as input.           roneously duplicated sequence). HiGlass (Kerpedjiev et al.
A high-throughput chromosome conformation capture (Hi-C)            2018) and PretextView (https://github.com/wtsi-hpag/
library was constructed using 50 ml of blood, following step 10     PretextView, last accessed September 17, 2021) were used
and onwards in the Arima Hi-C (Arima Genomics) library pro-         to visualize and rearrange the genome using Hi-C data, and
tocol for whole blood. Adaptor ligation with Unique dual            PretextSnapshot                 (https://github.com/wtsi-hpag/
indexing (Illumina) were chosen to match the indexes from           PretextSnapshot, last accessed September 17, 2021) was
the 10 linked-read library for simultaneous paired-end se-         used to generate an image of the Hi-C contact map. The
quencing (150 bp) on the same lane on an Illumina HiSeq X           corrections made reduced the total length of scaffolds by
platform. Both libraries were quality controlled using a            0.5% and the scaffold count by 44.6%, and increased the

                                                                                                                                      Downloaded from https://academic.oup.com/gbe/article/13/9/evab212/6367782 by guest on 18 December 2021
Fragment analyzer NGS kit (AATI) and qPCR with the Kapa             scaffold N50 by 20.2%. Curation identified and confirmed 29
library quantification kit (Roche) prior to sequencing.             autosomes and the Z and W chromosomes, to which 98.6%
    The sequencing was provided by the Norwegian                    of the assembly sequences were assigned.
Sequencing Centre (https://www.sequencing.uio.no, last
accessed September 17, 2021), a national technology plat-
                                                                    Genome Quality Evaluation
form hosted by the University of Oslo and supported by the
“Functional Genomics” and “Infrastructure” programs of the          We assessed the quality of the assembly with the assembla-
Research Council of Norway and the South-Eastern Regional           thon_stats.pl script (Bradnam et al. 2013) and investigated the
Health Authorities.                                                 completeness of the genome with Benchmarking Universal
                                                                    Single-Copy Orthologs (BUSCO) v. 5.0.0 (Sim~ao et al. 2015),
Genome Size Estimation and Genome Assembly                          searching for 8,338 universal avian single-copy orthologs
                                                                    (aves_odb10).
The genome size of the reed warbler was estimated by a k-
mer analysis of 10 reads using Jellyfish v. 2.3.0 (Marçais and
                                                                    Genome Annotation
Kingsford 2011) and Genome Scope v. 1.0 (Vurture et al.
2017), with a k-mer size of 21. The estimated genome size           We used a repeat library provided by Alexander Suh called
was 1,130,626,830 bp.                                               bird_library_25Oct2020 and described in Peona et al. (2020)
    We assembled the long-read PacBio sequencing data with          to softmask repeats in the reed warbler genome assembly.
FALCON and FALCON-Unzip (falcon-kit 1.5.2 and falcon-               Softmasked genome assemblies for golden eagle (Aquila
unzip 1.3.5) (Chin et al. 2016). Falcon was run with the fol-       chrysaetos), chicken (Gallus gallus), great tit (Parus major),
lowing parameters: length_cutoff ¼ 1; length_cutoff_pr ¼           Anna’s hummingbird (Calypte anna), zebra finch
1000; pa_HPCdaligner_option ¼ –v –B128 –M24; pa_da-                 (Taeniopygia guttata), great reed warbler (Acrocephalus arun-
ligner_option ¼ –e0.8 –l2000 –k18 –h480 –w8 –s100;                  dinaceus), icterine warbler (Hippolais icterina), collared fly-
ovlp_HPCdaligner_option ¼ –v –B128 –M24; ovlp_daligner_op-          catcher (Ficedula albicollis), and New Caledonian crow
tion ¼ –k24 –e.94 –l3000 –h1024 –s100; pa_DBsplit_option ¼          (Corvus moneduloides) were downloaded from NCBI. The tri-
–x500 –s200; ovlp_DBsplit_option ¼ –x500 –s200; falcon_sen-         angle subcommand from Mash v. 2.3 (Ondov et al. 2016) was
se_option ¼ –output-multi –min-idt 0.70 –min-cov 3 –max-n-          used to estimate a lower-triangular distance matrix, and a
read 200; overlap_filtering_setting ¼ –max-diff 100 –max-cov        Python script (https://github.com/marbl/Mash/issues/9#issue-
100 –min-cov 2. Falcon-unzip was run with default settings. The     comment-509837201, last accessed September 17, 2021)
purge_haplotigs pipeline v. 1.1.0 (Roach et al. 2018) was used to   was used to convert the distance matrix into a full matrix.
curate the diploid assembly, with -l5, -m35, -h190 for the contig   The full matrix was used as input to RapidNJ v. 2.3.2
coverage, and -a60 for the purge pipeline. Next, we scaffolded      (Simonsen et al. 2008) to create a guide tree based on the
the curated assembly with the 10 reads using Scaff10X v. 4.1       neighbor-joining method. Cactus v. 1.3.0 (Armstrong et al.
(https://github.com/wtsi-hpag/Scaff10X,         last    accessed    2020) was run with the guide tree and the softmasked ge-
September 17, 2021), and the Hi-C reads using SALSA v. 2.2          nome assemblies as input.
(Ghurye et al. 2017). Finally, we polished the assembly (com-
                                                                        We also downloaded the annotation for chicken, and used
bined with the alternative assembly from Falcon-Unzip), first
                                                                    it as input to the Comparative Annotation Toolkit (CAT) v.
with PacBio reads using pbmm2 v. 1.2.1, which uses minimap2
                                                                    2.2.1-36-gfc1623d (Fiddes et al. 2018) together with the hi-
(Li 2018) internally (v. 2.17), and then with 10 reads for two
                                                                    erarchical alignment format file from Cactus. Chicken was
rounds with Long Ranger v. 2.2.2 (Marks et al. 2019) and
                                                                    used as reference genome, reed warbler as the target ge-
FreeBayes v. 1.3.1 (Garrison and Marth 2012).
                                                                    nome and the AUGUSTUS (Stanke et al. 2008) species param-
                                                                    eter was set to “chicken.” InterProScan v. 5.34-73 (Jones et
Curation                                                            al. 2014) was run on the predicted proteins to find functional
The assembly was decontaminated and manually curated us-            annotations, and DIAMOND v. 2.0.7 (Buchfink et al. 2021)
ing the gEVAL browser (Chow et al. 2016; Howe et al. 2021),         was used to compare the predicted proteins against

Genome Biol. Evol. 13(9) doi:10.1093/gbe/evab212 Advance Access publication 9 September 2021                                     5
Sætre et al.                                                                                                                 GBE

UniProtKB/Swiss-Prot release 2021_03 (The UniProt                  and can be accessed at https://doi.org/10.6084/m9.figshare.
Consortium 2021). AGAT v. 0.5.1 (Dainat 2021) was used             16622302.v1.
to generate statistics from the GFF3 file with annotations
and to add functional annotations from InterProScan and
                                                                   Literature Cited
gene names from UniProtKB/Swiss-Prot. BUSCO v. 5.0.0
                                                                   Armstrong J, et al. 2020. Progressive Cactus is a multiple-genome aligner
was used to assess the completeness of the annotation.
                                                                        for the thousand-genome era. Nature 587(7833):246–251.
                                                                   Both C, et al. 2010. Avian population consequences of climate change are
                                                                        most severe for long-distance migrants in seasonal habitats. Proc Biol
Synteny Analysis                                                        Sci. 277(1685):1259–1266.
                                                                   Bradnam KR, et al. 2013. Assemblathon 2: evaluating de novo methods of
We aligned the assembly against the great tit (Parus major)

                                                                                                                                                  Downloaded from https://academic.oup.com/gbe/article/13/9/evab212/6367782 by guest on 18 December 2021
                                                                        genome assembly in three vertebrate species. Gigascience 2(1):10.
and the garden warbler (Sylvia borin) genome assemblies with       Brommer JE, Lehikoinen A, Valkama J. 2012. The breeding ranges of
minimap2 v. 2.18-r1015 and extracted only alignments lon-               central European and arctic bird species move poleward. PLoS One
ger than 5,000 bp. The bundlelinks from circos-tools v. 0.23            7(9):e43648.
was used to merge neighboring links using default options          Buchfink B, Reuter K, Drost HG. 2021. Sensitive protein alignments
                                                                        at tree-of-life scale using DIAMOND. Nat Methods
and a plot was created using circos v. 0.69-8.
                                                                        18(4):366–368.
                                                                   Chamorro D, Nieto I, Real R, Mun   ~oz AR. 2019. Wintering areas on the
                                                                        move in the face of warmer winters. Ornis Fennica 96:41–54.
Supplementary Material                                             Chin CS, et al. 2016. Phased diploid genome assembly with single mole-
Supplementary data are available at Genome Biology and                  cule real-time sequencing. Nat Methods 13(12):1050–1054.
Evolution online.                                                  Chow W, et al. 2016. gEVAL—a web-based browser for evaluating ge-
                                                                        nome assemblies. Bioinformatics 32(16):2508–2510.
                                                                   Dainat J. 2021. AGAT: Another Gff Analysis Toolkit to Handle Annotations
                                                                        in Any GTF/GFF Format (Version v0.5.1). Zenodo. Available from:
Acknowledgments                                                         https://www.doi.org/10.5281/zenodo.3552717.                   Accessed
                                                                        September 17, 2021.
We would like to thank Marjo Saastamoinen, Suvi Sallinen,          Damas J, et al. 2017. Upgrading short-read animal genome assemblies to
Paolo Momigliano, and the Molecular Ecology and                         chromosome level using comparative genomics and a universal probe
Systematics laboratory in the University of Helsinki for facili-        set. Genome Res. 27(5):875–884.
tating DNA extraction. We would like to thank Ave Tooming-         Davies NB, Brooke ML. 1989. An experimental study of co-evolution be-
Klunderud and the Norwegian Sequencing Centre for per-                  tween the cuckoo, Cuculus canorus, and its hosts. I. Host egg discrim-
                                                                        ination. J Anim Ecol. 58(1):207–224.
forming the sequencing. We also thank Pasi Rastas from             Degrandi TM, et al. 2020. Introducing the bird chromosome database: an
the HiLIFE BioData Analytics Service unit, for his assistance           overview of cytogenetic studies in birds. Cytogenet Genome Res.
with preliminary analyses. The computations were performed              160(4):199–205.
on resources provided by UNINETT Sigma2 - the National             Eglington SM, et al. 2015. Latitudinal gradients in the productivity of
Infrastructure for High Performance Computing and Data                  European migrant warblers have not shifted northwards during a pe-
                                                                        riod of climate change. Global Ecol Biogeogr. 24(4):427–436.
Storage in Norway. This work was supported by Research             Ellegren H. 2010. Evolutionary stasis: the stable chromosomes of birds.
Council of Norway by grant numbers 251076 and 300032                    Trends Ecol Evol. 25(5):283–291.
to K.S.J. and a HiLIFE Start-up grant and a University of          Feng S, et al. 2020. Dense sampling of bird diversity increases power of
Helsinki Faculty of Biological and Environmental Sciences               comparative genomics. Nature 587(7833):252–257.
travel grant to R.T.                                               Fiddes IT, et al. 2018. Comparative Annotation Toolkit (CAT) – simulta-
                                                                        neous clade and personal genome annotation. Genome Res.
                                                                        28(7):1029–1038.
Author Contributions                                               Furo IDO, et al. 2018. Chromosome painting in Neotropical long- and
                                                                        short-tailed parrots (Aves, Psittaciformes): phylogeny and proposal
C.L.C.S., F.E., K.R., K.S.J., O.K.T., and R.T. designed the re-         for a putative ancestral karyotype for tribe Arini. Genes 9: 491.
search. E.K., K.R., and R.T. collected the sample. K.R.                 Available from: 10.3390/genes9100491. Accessed September 17,
extracted DNA. C.L.C.S. and O.K.T. performed the research               2021.
                                                                   Garrison E, Marth G. 2012. Haplotype-based variant detection from short-
and/or analyzed the data. A.T., J.T., K.H., S.P., and W.C. cu-          read sequencing. arXiv Preprint. Available from: https://arxiv.org/abs/
rated the assembly. C.L.C.S. drafted the manuscript. All                1207.3907. Accessed September 17, 2021.
authors read and approved the final manuscript.                    Ghurye J, Pop M, Koren S, Bickhart D, Chin CS. 2017. Scaffolding of long
                                                                        read assemblies using long range contact information. BMC Genomics
                                                                        18(1):527.
Data Availability                                                  Halupka L, Dyrcz A, Borowiec M. 2008. Climate change affects breeding
                                                                        of reed warblers Acrocephalus scirpaceus. J Avian Biol. 39(1):95–100.
The reference genome of Acrocephalus scirpaceus (bAcrSci1),
                                                                   Howe K, et al. 2021. Significantly improving the quality of genome as-
and the raw sequence data, have been deposited in the                   semblies through curation. GigaScience 10(1):giaa153.
European Nucleotide Archive under the BioProject number            J€
                                                                    arvinen O, Ulfstrand S. 1980. Species turnover of a continental bird fauna:
PRJEB45715. Genome annotations are available at Figshare                northern Europe, 1850-1970. Oecologia 46(2):186–195.

6   Genome Biol. Evol. 13(9) doi:10.1093/gbe/evab212 Advance Access publication 9 September 2021
Genome Assembly of the Reed Warbler                                                                                                        GBE

Jones P, et al. 2014. InterProScan 5: genome-scale protein function clas-      Sambrook J, Fritsch EF, Maniatis T. 1989. Molecular cloning: a laboratory
    sification. Bioinformatics 30(9):1236–1240.                                    manual, 2nd ed. New York: Cold Spring, Harbor Laboratory Press.
Kerpedjiev P, et al. 2018. HiGlass: web-based visual exploration and anal-     Schaefer T, Ledebur G, Beier J, Leisler B. 2006. Reproductive responses of
    ysis of genome interaction maps. Genome Biol. 19(1):125.                       two related coexisting songbird species to environmental changes:
Kralj J, Prochazka P, Fainov  a D, Patzenhauerov   a H, Tutis V. 2010.         global warming, competition, and population sizes. J Ornithol.
    Intraspecific variation in the wing shape and genetic differentiation          147(1):47–56.
    of reed warblers Acrocephalus scirpaceus in Croatia. Acta Ornithol.        Shafer ABA, et al. 2017. Bioinformatic processing of RAD-seq data dra-
    45(1):51–58.                                                                   matically impacts downstream population genetic inference. Methods
Kretschmer R, Ferguson-Smith MA, de Oliveira EHC. 2018. Karyotype                  Ecol Evol. 8(8):907–917.
    evolution in birds: from conventional staining to chromosome paint-        Sigeman H, Strandh, M, et al. 2020. Genomics of an avian neo-sex chro-
    ing. Genes 9:181.                                                              mosome reveals the evolutionary dynamics of recombination suppres-
Kretschmer R, et al. 2021. Karyotype evolution and genomic organization            sion and sex-linked genes. bioRxiv Preprint, Available from: 10.1101/

                                                                                                                                                                 Downloaded from https://academic.oup.com/gbe/article/13/9/evab212/6367782 by guest on 18 December 2021
    of repetitive DNAs in the saffron finch, Sicalis flaveola (Passeriformes,       2020.09.25.314088. Accessed September 17, 2021.
    Aves). Animals 11(5):1456. Available from: 10.3390/ani11051456.            Sigeman H, Ponnikas S, Hansson B. 2020. Whole-genome analysis across
    Accessed September 17, 2021.                                                   10 songbird families within Sylvioidea reveals a novel autosome-sex
Laine VN, Great Tit HapMap Consortium, et al. 2016. Evolutionary signals           chromosome fusion. Biol Lett. 16(4):20200082.
    of selection on cognition from the great tit genome and methylome.         Sim~ao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM.
    Nat Commun. 7:10474.                                                           2015. BUSCO: assessing genome assembly and annotation
Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences.                 completeness      with     single-copy     orthologs.      Bioinformatics.
    Bioinformatics 34(18):3094–3100.                                               31(19):3210–3212.
Manel S, et al. 2016. Genomic resources and their influence on the de-         Simonsen M, Mailund T, Pedersen CNS. 2008. Rapid neighbour-joining. In:
    tection of the signal of positive selection in genome scans. Mol Ecol.         Crandall KA, Lagergren J, editors. Algorithms in bioinformatics. WABI
    25(1):170–184.                                                                 2008. Lecture Notes in Computer Science. Vol 5251. Berlin,
Marçais G, Kingsford C. 2011. A fast, lock-free approach for efficient            Heidelberg (Germany): Springer. p. 113–122. Available from:
    parallel counting of occurrences of k-mers. Bioinformatics                     10.1007/978-3-540-87361-7_10. Accessed September 17, 2021.
    27(6):764–770.                                                             Skinner BM, Griffin DK. 2012. Intrachromosomal rearrangements in avian
Marks P, et al. 2019. Resolving the full spectrum of human genome var-             genome evolution: evidence for regions prone to breakpoints.
    iation using linked-reads. Genome Res. 29(4):635–645.                          Heredity (Edinb). 108(1):37–41.
Meller K, Piha M, V€ah€atalo AV, Lehikoinen A. 2018. A positive relationship   Stanke M, Diekhans M, Baertsch R, Haussler D. 2008. Using native and
    between spring temperature and productivity in 20 songbird species in          syntenically mapped cDNA alignments to improve de novo gene find-
    the boreal zone. Oecologia 186(3):883–893.                                     ing. Bioinformatics 24(5):637–644.
Ondov BD, et al. 2016. Mash: fast genome and metagenome distance               Stokke B, et al. 2018. Characteristics determining host suitability for a
    estimation using MinHash. Genome Biol. 17(1):132.                              generalist parasite. Sci Rep. 8(1):6285.
Pala I, et al. 2012. Evidence of a neo-sex chromosome in birds. Heredity       Stolt BO. 1999. The Swedish reed warbler Acrocephalus scirpaceus pop-
    (Edinb). 108(3):264–272.                                                       ulation estimated by a capture-recapture technique. Ornis Svecica
Peona V, et al. 2020. The avian W chromosome is a refugium for endog-              9:35–46.
    enous retroviruses with likely effects on female-biased mutational load    Sætre CL, et al. 2017. Rapid adaptive phenotypic change following colo-
    and genetic incompatibilities. Philos Trans R Soc B. 376:20200186.             nization of a newly restored habitat. Nat Commun. 8:14159.
Prochazka P, et al. 2011. Low genetic differentiation among reed warbler      The UniProt Consortium. 2021. UniProt: the universal protein knowledge-
    Acrocephalus scirpaceus populations across Europe. J Avian Biol.               base in 2021. Nucleic Acids Res. 49:D480–D489.
    42(2):103–113.                                                             Thorogood R, Davies NB. 2013. Reed warbler hosts fine-tune their defenses
Roach MJ, Schmidt SA, Borneman AR. 2018. Purge Haplotigs: allelic contig           to track three decades of cuckoo decline. Evolution 67(12):3545–3555.
    reassignment for third-gen diploid genome assemblies. BMC                  Thorogood R, Spottiswoode CN, Portugal SJ, Gloag R. 2019. The coevo-
    Bioinformatics 19(1):460.                                                      lutionary biology of brood parasitism: a call for integration. Philos Trans
Røed T. 1994. “Rørsanger.” In Norsk Fugleatlas, 382–383. Norsk                     R Soc B. 374:20180190.
    Ornitologisk Forening.                                                     Vickery JA, et al. 2014. The decline of Afro-Palaearctic migrants and an
Salewski V, Hochachka WM, Fiedler W. 2010. Global warming and                      assessment of potential causes. Ibis 156(1):1–22.
    Bergmann’s rule: do central European passerines adjust their body          Vurture GW, et al. 2017. GenomeScope: fast reference-free genome pro-
    size to rising temperatures? Oecologia 162(1):247–260.                         filing from short reads. Bioinformatics 33(14):2202–2204.

                                                                               Associate editor: Bonnie Fraser

Genome Biol. Evol. 13(9) doi:10.1093/gbe/evab212 Advance Access publication 9 September 2021                                                                7
You can also read