Origin of Spliceosomal Introns and Alternative Splicing - CSH ...

Page created by Roy James
 
CONTINUE READING
Origin of Spliceosomal Introns and Alternative Splicing - CSH ...
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                 Origin of Spliceosomal Introns and
                 Alternative Splicing

                 Manuel Irimia1 and Scott William Roy2
                 1
                  The Donnelly Centre, University of Toronto, Toronto, Ontario M5S3E1, Canada
                 2
                  Department of Biology, San Francisco State University, San Francisco, California 94132
                 Correspondence: mirimia@gmail.com

                 In this work we review the current knowledge on the prehistory, origins, and evolution of
                 spliceosomal introns. First, we briefly outline the major features of the different types of
                 introns, with particular emphasis on the nonspliceosomal self-splicing group II introns,
                 which are widely thought to be the ancestors of spliceosomal introns. Next, we discuss
                 the main scenarios proposed for the origin and proliferation of spliceosomal introns, an
                 event intimately linked to eukaryogenesis. We then summarize the evidence that suggests
                 that the last eukaryotic common ancestor (LECA) had remarkably high intron densities and
                 many associated characteristics resembling modern intron-rich genomes. From this intron-
                 rich LECA, the different eukaryotic lineages have taken very distinct evolutionary paths
                 leading to profoundly diverged modern genome structures. Finally, we discuss the origins
                 of alternative splicing and the qualitative differences in alternative splicing forms and func-
                 tions across lineages.

                 SURPRISES AND MYSTERIES OF INTRONS                                 had to be removed to generate intact protein-
                 AND INTRON EVOLUTION                                               coding messenger RNAs. The initial findings in
                                                                                    viruses (Berget et al. 1977; Chow et al. 1977)
                        ith mid-20th century breakthroughs, mo-                     were soon extended to many cellular genes.
                 W      lecular cell biology finally seemed to obey
                 a relatively simple logic. Genetic information
                                                                                    With the advent of large-scale sequencing proj-
                                                                                    ects, it became clear that one kind of intron
                 was encoded in DNA genes (Avery et al. 1944;                       (the spliceosomal introns), as well as the cellu-
                 Watson and Crick 1953), which were tran-                           lar machinery that removes them (the spliceo-
                 scribed into RNA and subsequently translated                       some), are ubiquitous in eukaryotic genomes.
                 into functional proteins (Crick 1958). However,                    For example, the average human transcript
                 a most unexpected finding “interrupted” this                       contains 9 introns, totaling several hundred
                 logic. The coding information of DNA genes                         thousand introns across the genome and com-
                 was sometimes broken into pieces separated                         prising a quarter of the DNA content of each
                 by sequences whose sole apparent purpose was                       cell (Lander et al. 2001; Venter et al. 2001).
                 to generate an extra RNA sequence that then                        Moreover, functions for some introns began to

                 Editors: Patrick J. Keeling and Eugene V. Koonin
                 Additional Perspectives on The Origin and Evolution of Eukaryotes available at www.cshperspectives.org
                 Copyright # 2014 Cold Spring Harbor Laboratory Press; all rights reserved; doi: 10.1101/cshperspect.a016071
                 Cite this article as Cold Spring Harb Perspect Biol 2014;6:a016071

                                                                                                                                   1
Origin of Spliceosomal Introns and Alternative Splicing - CSH ...
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                 M. Irimia and S.W. Roy

                 emerge. In particular, by regulating the remov-           of eukaryotes (and thus are quite ancient even if
                 al of introns and subsequent rejoining of exons           not primordial), and originated from preexist-
                 (so-called intron splicing), eukaryotic genes can         ing self-splicing introns, which were likely pres-
                 generate multiple transcripts, vastly expanding           ent at very early stages of life.
                 molecular diversity. First hypothesized by Gil-
                 bert (1978), this process, known as alternative
                                                                           TYPES OF LARIAT INTRONS: SELF-SPLICING
                 splicing (AS), appears to be widespread in eu-
                                                                           INTRONS AND THE PREHISTORY OF THE
                 karyotes, seemingly reaching its apex in mam-
                                                                           SPLICEOSOMAL SYSTEM
                 mals (Barbosa-Morais et al. 2012), in which 95%
                 of multiexon genes undergo AS (Pan et al. 2008;           Spliceosomal introns are just one of the four
                 Wang et al. 2008).                                        major classes of introns found in nature, to-
                     Nearly four decades after the discovery of            gether with group I and group II self-splicing
                 introns, many questions remain unanswered.                introns, and tRNA introns. Intron types are de-
                 The most fascinating questions are still the              fined based on various structural and mechanis-
                 most fundamental ones: Why do introns exist?              tic features, and they have distinct phylogenetic
                 When and how did they arise? These questions              distributions. In this section we discuss different
                 were first formulated as part of the exciting and         intron types and compare crucial aspects of the
                 contentious “introns early/late” debate. Introns          evolutionarily related spliceosomal and group II
                 early held that introns were very ancient struc-          introns, collectively known as lariat introns.
                 tures that predated cellular life, and that mod-
                 ern organisms with few or no introns had lost
                                                                           Spliceosomal Introns
                 them secondarily (in particular, all prokaryotic
                 genomes contain either no introns or only a few           Spliceosomal introns are found in all studied
                 nonspliceosomal introns) (see below) (Darnell             eukaryotic nuclear genomes (with two possible
                 1978; Gilbert 1987; Long et al. 1995; de Souza            exceptions) (Andersson et al. 2007; Lane et al.
                 et al. 1996). The related “introns first” hypoth-         2007) and only in eukaryotic nuclear genomes,
                 esis holds that exons emerged from noncoding              although the total number of introns in each
                 regions between RNA genes in the RNA world                species varies by orders of magnitude. They
                 (Poole et al. 1998; Penny et al. 2009). On the            are characterized by their mechanism of splic-
                 other hand, introns late counters that spliceo-           ing, which is catalyzed by the spliceosome, a
                 somal introns arose later, at some point dur-             complex ribonucleoprotein machinery formed
                 ing eukaryotic evolution (Cavalier-Smith 1985,            by five small RNAs (the U1, U2, U4, U5, and U6
                 1991; Dibb and Newman 1989; Stoltzfus 1994;               snRNAs) and more than 200 proteins (Wahl
                 Logsdon et al. 1995; Logsdon 1998). This debate           et al. 2009). Although there are also marked
                 spanned over 20 years despite (or perhaps ow-             lineage-specific differences, the sequence struc-
                 ing to) the scarcity of directly relevant data,           ture of most spliceosomal introns consists of a
                 finding resolution only with the availability of          short 50 splice site (50 ss) boundary, a minimal
                 many whole genome sequences. Although ad-                 AG dinucleotide 30 splice site (30 ss) boundary, a
                 herents to both perspectives remain, the introns          catalytic adenosine (the branch point [BP]),
                 early perspective has been weakened by the find-          and a polypyrimidine tract between the BP
                 ing of low or zero intron density in all prokary-         and the 30 ss (Fig. 1). These sequences are recog-
                 otic lineages, and the gradual weakening of (and          nized and bound by the core components of the
                 emergence of potential alternative explanations           spliceosome and are crucial for the splicing
                 for) statistical signals suggestive of early introns.     reaction (Ruskin and Green 1985; Chiara and
                 However, although, formally, the data tipped              Reed 1995; Umen and Guthrie 1995; Chiara
                 the scales in favor of introns late, the current          et al. 1997; Du and Rosbash 2002). The rest of
                 consensus may be seen as a mixture of the two             the intronic sequence, which in some instances
                 perspectives (Koonin 2006): spliceosomal in-              in vertebrates may reach up to a million nucle-
                 trons appeared abruptly at the time of the origin         otides, is generally evolutionarily unconstrained

                 2                                              Cite this article as Cold Spring Harb Perspect Biol 2014;6:a016071
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                                                                                      Origin of Introns and Alternative Splicing

                     A

                                    2

                             Bits
                                    1

                                    0
                                Major/U2 (H. sapiens)

                     B

                                    2
                             Bits

                                    1

                                    0

                                Major/U2 (S. cerevisiae)

                     C                  EXON INTRON                                                    EXON INTRON
                                    2
                             Bits

                                    1
                                                                                              2–5 nt

                                    0
                                Minor/U12 (H. sapiens)

                 Figure 1. Consensus sequences of spliceosomal intron core splicing signals for (A) human major/U2 introns,
                 (B) yeast major/U2 introns, and (C ) human minor/U12 introns. The branch point (BP) adenosine is indicated
                 by a red arrow.

                 and highly variable (Irimia and Roy 2008b).                the major or U2 subtype described above). The
                 The splicing process consists of two consecu-              mechanism of splicing is nearly identical, but
                 tive transesterification reactions, catalyzed by           the minor spliceosome is assembled around
                 the snRNAs of the spliceosome (Domdey et al.               four distinct snRNAs—U11, U12, U4atac, and
                 1984; Padgett et al. 1984; Ruskin et al. 1984; Lin         U6atac—and a few specific accessory proteins
                 et al. 1985). First, the BP adenosine performs a           (Hall and Padgett 1996; Tarn and Steitz 1996;
                 nucleophilic attack on the 50 end of the intron,           Will et al. 1999); the rest of the protein machin-
                 resulting in the cleavage of the 50 nucleotide of          ery and the U5 snRNA are shared with the U2
                 the intron (generally the “G” in the GT) from              spliceosome. Although the sequence structure
                 the upstream exon. Second, the intron is fully             of both subtypes is similar overall, U12 introns
                 excised from the mRNA by another nucleo-                   have a different and stricter consensus sequence
                 philic attack by the upstream exon, which is               at the 50 ss and near the BP, and the distance be-
                 ligated to the 50 of the downstream exon, gen-             tween the BP and the 30 ss is highly constrained
                 erating a precise exon –exon junction and liber-           (typically 10 – 15 nucleotides) (Fig. 1C) (Jack-
                 ating the intronic sequence in a characteristic            son 1991; Hall and Padgett 1994). Although
                 lariat structure.                                          the much more limited phylogenetic distribu-
                     Interestingly, there is another subtype of             tion of U12 introns initially suggested they
                 spliceosomal introns in eukaryotes, the so-                might have arisen more recently in evolution,
                 called minor or U12 introns (in contrast to                U12 introns and spliceosomal components have

                 Cite this article as Cold Spring Harb Perspect Biol 2014;6:a016071                                           3
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                 M. Irimia and S.W. Roy

                 since been discovered in diverse eukaryotic lin-        liferated from a single intron copy, providing
                 eages, implying the existence of the U12 sys-           a common splicing apparatus, and allowing
                 tem in the last eukaryotic common ancestor              some IEP copies to degenerate (Dai and Zim-
                 (LECA) (Russell et al. 2006; Dávila López             merly 2003; Meng et al. 2005). In addition to the
                 et al. 2008).                                           role in assisting intron splicing, IEPs are multi-
                                                                         functional proteins that also have reverse tran-
                                                                         scriptase activity, which allows group II introns
                 Group II Self-Splicing Introns
                                                                         to reverse transcribe into DNA, conferring them
                 Found in bacterial genomes and in chloroplast           their nature of mobile genetic elements (Cous-
                 and mitochondrial genomes of widely diverged            ineau et al. 2000; Lambowitz and Zimmerly
                 eukaryotes (Ferat and Michel 1993; Lambowitz            2011).
                 and Zimmerly 2011; Candales et al. 2012),                    There are at least three major subgroups
                 group II self-splicing introns are believed to pre-     of group II introns, with further subdivisions,
                 date the origin of eukaryotes, perhaps even pre-        distributed among eight phylogenetically sup-
                 ceding the origin of cellular life (Koonin 2006).       ported lineages (Zimmerly et al. 2001; Simon
                 They have been identified, always in low num-           et al. 2008). Each group is characterized by spe-
                 bers, in around a quarter of the sequenced              cific RNA secondary structures, and by pecu-
                 bacterial genomes (Lambowitz and Zimmerly               liarities of splicing and mobility (Lambowitz
                 2011). On the other hand, they are rare in ar-          and Zimmerly 2011). In particular, although
                 chaea (only described, so far, in two related spe-      bacterial introns are usually highly active and
                 cies) (Dai and Zimmerly 2003; Rest and Mindell          functionally complete, introns in eukaryotic
                 2003; Doose et al. 2013). The mechanism of              organelles often lack important secondary do-
                 splicing of group II introns is very similar to         mains and/or encode degenerate IEPs (Coper-
                 that of spliceosomal introns—and likely evolu-          tino and Hallick 1993; Michel and Ferat 1995;
                 tionarily related—involving two transesterifi-          Bonen 2008; Barkan 2009). Importantly, the
                 cation reactions and the release of an excised          splicing of these degenerate introns likely relies
                 intron lariat. However, in this case, these reac-       on the action of trans-acting RNAs from other
                 tions are catalyzed by the intronic RNA itself,         introns and/or host-encoded proteins (Jarrell
                 which has ribozymatic activity (Peebles et al.          et al. 1988; Goldschmidt-Clermont et al. 1991;
                 1986; Schmelzer and Schweyen 1986; van der              Suchy and Schmelzer 1991; Lambowitz and
                 Veen et al. 1986). Accordingly, group II intronic       Zimmerly 2011), potentially mirroring inter-
                 RNA sequences show very complex and con-                mediate steps hypothesized for the origin of
                 served secondary structures that span 400– 800          spliceosomal introns.
                 nucleotides (Lambowitz and Zimmerly 2011).
                 Group II intron sequences are organized in six
                                                                         Other Types of Introns
                 main domains (DI– DVI) consisting of various
                 functional “loops” and “bulges,” which are the          (i) Group I introns: present in representatives of
                 basis of the ribozymatic activity (Michel and           most eukaryotic supergroups, either in nuclear
                 Ferat 1995; Qin and Pyle 1998; Lambowitz                ribosomal RNA genes or in mitochondrial and/
                 and Zimmerly 2011). Although group II introns           or plastid genes, as well as in some bacteria and
                 are capable of self-splicing in vitro, efficient        viruses (Haugen et al. 2005). They also catalyze
                 splicing in vivo requires specific proteins, which      their own splicing reaction by a very different
                 are usually encoded by the intron, but can also         mechanism, utilizing an exogenous guanosine
                 be recruited from the host (Solem et al. 2009).         (exoG) as a cofactor that does not release a lariat
                 The protein encoded by the intron (the “intron-         (Cech et al. 1981; Bass and Cech 1984; van der
                 encoded protein,” IEP) usually acts in cis, assist-     Horst and Tabak 1985).
                 ing the splicing only of its own intron. However,           (ii) Introns in nuclear and archaeal transfer
                 in some cases, an IEP can evolve the ability to         RNA genes: present in tRNA genes of eukary-
                 aid splicing of multiple related introns that pro-      otic nuclear or archaeal genomes (Marck and

                 4                                            Cite this article as Cold Spring Harb Perspect Biol 2014;6:a016071
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                                                                                      Origin of Introns and Alternative Splicing

                 Grosjean 2003; Randau and Söll 2008), and                 ilar boundary sequences (GT-AY in group II in-
                 also in some coding genes in archaea (Yokobori             trons and usually GT-AG in spliceosomal in-
                 et al. 2009; Doose et al. 2013). They do not share         trons, although some U12 introns are AT-AC)
                 functional or structural similarities with the             (Lambowitz and Zimmerly 2011), and there
                 other types of introns, as they are typically              are striking structural similarities between key
                 very short and their splicing is fully catalyzed           regions of group II intron domains and spli-
                 by protein enzymes (rather than ribozymes)                 ceosomal snRNAs (Lambowitz and Zimmerly
                 (Randau and Söll 2008).                                   2011). These include at least (i) domain DV
                                                                            and U6 snRNA, with divalent metal-ion binding
                                                                            sites involved in catalysis and similar base-pair-
                 ORIGIN AND ESTABLISHMENT OF THE
                                                                            ing interactions (Jarrell et al. 1988; Peebles et al.
                 SPLICEOSOMAL SYSTEM DURING
                                                                            1995; Yu et al. 1995; Abramovitz et al. 1996;
                 EUKARYOGENESIS
                                                                            Konforti et al. 1998; Yean et al. 2000; Shukla
                 In this section, we discuss the major steps lead-          and Padgett 2002), further supported by crystal
                 ing to the origin and establishment of the                 structure (Toor et al. 2008; Keating et al. 2010);
                 spliceosomal system in eukaryotes. Despite the             (ii) ID3 subdomain and d –d0 motifs and U5
                 persistence of disagreement on some important              snRNA stem loop, involved in the recognition
                 aspects, there is a general consensus about how            of 50 and 30 exons (Hetzer et al. 1997); and (iii)
                 this process may have unfolded (Fig. 2). Accord-           DVI and the U2-intron pairing that include the
                 ing to this general model, spliceosomal introns            BP adenosine (Schmelzer and Schweyen 1986;
                 evolved from invading group II introns, perhaps            Parker et al. 1987; Li et al. 2011). In addition, it
                 derived from the early mitochondrion (thought              has also been shown that extracts of snRNAs can
                 to be descended from an engulfed member of                 catalyze both splicing reactions without proteins
                 the a-proteobacteria, whose modern members                 in vitro (Valadkhan et al. 2007, 2009), in a sim-
                 contain group II introns). For some reason(s),             ilar manner to complete group II introns.
                 these introns then proliferated to an unprece-                  Whereas it is theoretically possible that
                 dented level in the host genome. Over time, the            group II introns evolved from spliceosomal in-
                 self-splicing activities of these many intron cop-         trons (or that both share a distinct common
                 ies degenerated, which was associated with the             ancestor [Vesteg et al. 2012]), it seems much
                 increase of trans-encoded RNAs and proteins                more likely that group II introns gave rise to
                 that promoted efficient intron splicing, setting           spliceosomal introns (Cech 1986). The presence
                 the basis of the protospliceosomal machinery,              of spliceosomal introns in all extant eukaryotic
                 and further releasing selective pressure on cis-           supergroups indicates that this transformation
                 intronic splicing elements. As this protospliceo-          occurred before LECA. The most favored hy-
                 somal machinery recruited more proteins and                pothesis is that spliceosomal introns originated
                 became more efficient, introns became increas-             from a-proteobacterial group II introns trans-
                 ingly reliant on the emerging spliceosome for              ferred from the protomitochondria to the host
                 proper splicing.                                           genome soon after the endosymbiotic event
                                                                            (Cavalier-Smith 1991), along with other parts
                                                                            of the protomitochondrial genome (Thorsness
                 Transfer of Group II Introns to the Host
                                                                            and Weber 1996; Adams and Palmer 2003). Sup-
                 Genome
                                                                            porting this view, complete and mobile group
                 Structural and functional evidence suggests that           II introns are relatively common in modern
                 spliceosomal and group II self-splicing introns            a-proteobacteria, but (nearly) absent in the
                 are evolutionarily related. Both types of introns          archaeal domain (Lambowitz and Zimmerly
                 are spliced through a similar two-step catalytic           2011), to which the preendosymbiotic protoeu-
                 reaction that relies on an endogenous adenosine            karyote is thought to be more closely related
                 (the BP), and releases the excised intron as a             (see Guy et al. 2014). In addition, few intron
                 lariat structure. The two intron types have sim-           positions are shared between ancient paralogs

                 Cite this article as Cold Spring Harb Perspect Biol 2014;6:a016071                                            5
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                 M. Irimia and S.W. Roy

                                                                                     α-Proteobacteria with
                                  Endosymbiosis                                         group II introns
                                       and
                                  Intron transfer

                                Intron proliferation                Sexual host? Low Ne? Nucleus?

                                Establishment of                      Intronic sequence degeneration
                                the spliceosomal                      Splicing trans-complementation
                                     system
                                                                      Protein recruitment

                                                                             Intron-rich
                                                                             Weak splice site signals
                                     LECA                                    Complex spliceosomes
                                                                             IR-dominated AS

                                                                                Intense intron loss
                                Evolutionary history of
                                                                                Intense intron gain
                                spliceosomal introns

                 Figure 2. Steps leading to the origin and establishment of the complex spliceosomal system of LECA during
                 eukaryogenesis (see text).

                 (i.e., gene duplicates originated before eukaryo-       from the endosymbiont to the host genome,
                 genesis), suggesting that at least most introns         at least one event of massive intron prolifera-
                 arose during or after the advent of eukaryotes          tion took place (Fig. 2). From a number perhaps
                 (Sverdlov et al. 2007).                                 similar to modern a-proteobacteria—usual-
                                                                         ly ,30 introns per genome (Lambowitz and
                                                                         Zimmerly 2004)—the ancestors of LECA ex-
                 Unprecedented Group II Intron Proliferation
                                                                         perienced an expansion of introns that populat-
                 in the Host Genome
                                                                         ed their genomes with thousands of elements,
                 It is also widely accepted that, following the          estimated at 2 introns per kbp of coding
                 transfer of ( presumably few) group II introns          sequence, and hypothesized to account for

                 6                                            Cite this article as Cold Spring Harb Perspect Biol 2014;6:a016071
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                                                                                      Origin of Introns and Alternative Splicing

                 .70% of the ancestral protoeukaryotic genome               2006) invokes a very different causality, drawing
                 (Koonin 2009). This proliferation is orders of             on the classic arguments on transposable ele-
                 magnitude larger than any known expansion                  ments by Hickey (1982). Hickey showed that
                 in prokaryotes, suggesting exceptional circum-             frequent sexual reproduction is crucial to the
                 stances. Although most researchers agree on the            success of transposable elements: whereas with
                 evolutionary singularity of this event, the na-            no or infrequent sex, the fitness costs of aggres-
                 ture of these circumstances is strongly debated.           sively propagating elements will doom them
                 Three (nonmutually exclusive) major explana-               to extinction, in the presence of frequent sex,
                 tions have been proposed: (i) the archaeal host            even highly deleterious elements can spread
                 lacked unknown defense mechanism against re-               through the genome. This suggests the possibil-
                 troelement multiplication (Koonin 2006); (ii)              ity that the advent of frequent sexual reproduc-
                 an extremely low effective population size (Ne)            tion could have brought an unprecedented
                 after an evolutionary bottleneck allowed the               spread of type II introns. Consistent with this
                 protoeukaryote to fix thousands of intron cop-             notion, meiosis and sexual reproduction appear
                 ies by simple genetic drift, despite the selective         to have been well established by LECA (Ramesh
                 disadvantage (Koonin 2006); and (iii) the oc-              et al. 2005), although, crucially, the order of
                 currence of frequent meiosis or meiosislike sex-           appearance in the evolution of frequent sexual
                 ual reproduction in the eukaryotic ancestor al-            reproduction and high intron density remains
                 lowed large-scale spreading of introns at the              unknown.
                 expense of the host’s fitness (Poole 2006), as
                 previously proposed by Hickey more generally
                                                                            Need and Benefits of
                 for any class of mobile genetic element (Hickey
                                                                            Trans-Complementation:
                 1982).
                                                                            Emergence of the Spliceosome
                     The discovery of two archaea species with
                 a few (4 and 21) group II introns, likely later-           Whatever the cause, the effect of this intron
                 ally transferred from bacteria (Dai and Zim-               propagation was a genome newly riddled with
                 merly 2003; Rest and Mindell 2003), for which              thousands of elements interrupting most cod-
                 no dramatic proliferation has occurred, argues             ing genes that required removal from pre-
                 that the first explanation is insufficient (the lack       mRNAs. Although intact group II introns are
                 of an unknown defense system against group II              capable of self-splicing and thus may be (near-
                 intron expansion in this lineage). The second              ly) functionally neutral, the need to maintain
                 proposed explanation, which can be integrated              a high number of constrained functional se-
                 within a more general hypothesis of the origin             quences and tertiary structures in every intron
                 and evolution of eukaryotes and their genomes              implies strong mutational pressure. Although
                 (Lynch and Conery 2003; Lynch 2006), current-              the number of intronic and exonic sites re-
                 ly has considerable support. This hypothesis               quired for splicing of spliceosomal introns has
                 posits that many eukaryotic genomic features               been estimated to be around 20– 40 per intron
                 initially proliferated owing to selection being            (Lynch 2002, 2006), the number of potentially
                 too inefficient to purge them from the popula-             constrained sites in a group II intron seems to be
                 tion, which situation might arise if the effective         at least one order of magnitude higher (Lambo-
                 population size of these ancestral organisms was           witz and Zimmerly 2011). Therefore, a spliceo-
                 very small (a conjecture for which direct tests            somal system that made use of only a few trans-
                 are very difficult). The presence of these slight-         acting factors to replace hundreds of thousands
                 ly deleterious elements is nonetheless argued              of constrained sequences in cis may have signif-
                 to have driven the emergence of profound and               icantly relieved much of the mutational pres-
                 defining eukaryotic characteristics, such as               sure associated with an extremely high number
                 the nucleus, as a defense mechanism (Koonin                of introns.
                 2006; López-Garcı́a and Moreira 2006; Martin                   There is also compelling mechanistic evi-
                 and Koonin 2006). The third hypothesis (Poole              dence that the transition from a system of self-

                 Cite this article as Cold Spring Harb Perspect Biol 2014;6:a016071                                           7
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                 M. Irimia and S.W. Roy

                 spliced introns to one based on trans-comple-           distribution ( present in all eukaryotic groups,
                 mentation may have evolved gradually. The first         and thus in LECA) and the presence of shared
                 step would involve the partial degeneration of          intron positions among paralogs, can be traced
                 a number of introns, which would lose impor-            to the period between the initial intron prolif-
                 tant functional structural elements and/or their        eration and the completion of the fully modern
                 IEPs. As in many of modern organelle’s group II         spliceosome (Veretnik et al. 2009). By the time
                 introns (Jarrell et al. 1988; Goldschmidt-Cler-         of LECA, a very complex, modern-looking spli-
                 mont et al. 1991; Suchy and Schmelzer 1991;             ceosome, composed of at least 78 proteins and
                 Lambowitz and Zimmerly 2011), the splicing              all the snRNAs, had already been established
                 of these partially degenerated introns would            (Collins and Penny 2005).
                 have been assisted in trans by ( partial) RNA
                 sequences of other introns. Presumably, some
                                                                         The Origins of the Minor and Major
                 of these sequences were more efficient than oth-
                                                                         Spliceosomal Systems
                 ers at assisting exogenous splicing, and thus
                 conferred a benefit and became fixed in the pop-        In fact, LECA had not only one complex spli-
                 ulations, eventually giving rise to the snRNAs          ceosome, but two. Comparative genomics have
                 of the two types of modern spliceosomes (Sharp          revealed that both U2/major and U12/minor
                 1991). Various pieces of evidence support this          spliceosomal introns, as well as RNA and pro-
                 idea. In addition to their structural similarities,     tein components associated with their distinct
                 some domains of group II introns can directly           splicing machineries, were present in LECA.
                 replace snRNAs in spliceosomal splicing in vitro        These two subtypes are quite similar, and they
                 (e.g., group II domain DV RNA can act instead           share most of the protein machinery involved in
                 of U6atac snRNA [Shukla and Padgett 2002]).             their splicing; however, each subtype uses a spe-
                 Conversely, U5 snRNA can complement group               cific subset of snRNAs (with the exception of
                 II introns missing the ID3 region (Hetzer et al.        U5), which are likely the most ancestral part of
                 1997).                                                  the spliceosome. These subsets of snRNAs are
                      These ( probably quite inefficient) initial        nonetheless evolutionarily related to one anoth-
                 protospliceosomes could then have evolved in-           er and probably originated by gene duplication
                 creased efficiency by recruiting auxiliary pro-         some time before LECA (Russell et al. 2006).
                 teins. Some of these factors could have been de-        Thus, debate has emerged as to which subtype
                 rived from the multiple available IEP copies,           originated first in evolution and which one is
                 which can also act in trans in the splicing of          derived. Several nonconclusive arguments can
                 related introns (Dai and Zimmerly 2003; Meng            be put forward for each case (Roy and Irimia
                 et al. 2005). Others could have been coopted            2009; Rogozin et al. 2012). Supporting U12 in-
                 from various other processes of RNA metabo-             trons’ ancestrality is their higher structural sim-
                 lism (e.g., Lsm/Sm proteins, whose homologs             ilarity with group II introns. The BP in both
                 play diverse roles in prokaryotes) (Wilusz and          types typically consist of an A within a polypy-
                 Wilusz 2005; Mura et al. 2013), or recruited            rimidine tract located at a constant, short dis-
                 from sequences of retrotransposons or other             tance from the 30 ss (domain DIV in group II
                 mobile elements (e.g., Prp8, the largest protein        introns) (Bonen and Vogel 2001; Lambowitz
                 of the spliceosome and an integral part of its          and Zimmerly 2011); on the other hand, U2
                 catalytic center, is thought to have evolved            introns’ BPs are usually located further away
                 from a retroelement-encoded reverse transcrip-          from the 30 ss, and at a much wider range of
                 tase) (Dlakić and Mushegian 2011). Gene du-            distances. In fact, U2 introns ancestrally have a
                 plication and functional specialization are likely      polypyrimidine tract between the BP and the
                 to have also played a role. For example, the            30 ss (Irimia and Roy 2008a), which has been
                 aforementioned Sm/Lsm proteins experienced              hypothesized to derive from the ancestral BP
                 several waves of early gene duplication (Veretnik       sequences of U12 introns (Burge et al. 1998).
                 et al. 2009), which, based on their phylogenetic        Along these lines, U12 introns can be viewed

                 8                                            Cite this article as Cold Spring Harb Perspect Biol 2014;6:a016071
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                                                                                      Origin of Introns and Alternative Splicing

                 in some sense as a transitional form between               LECA. The time point corresponding to LECA
                 the highly constrained structures of group II              represents a quite clear divide in our confidence
                 introns (Bonen and Vogel 2001; Lambowitz                   about evolutionary inferences. Reconstruction
                 and Zimmerly 2011) and the diffuse splicing                of steps leading up to LECA (origins of spli-
                 signals of most U2 spliceosomal introns (Irimia            ceosomal-like introns, reasons for the unprece-
                 et al. 2007b; Irimia and Roy 2008a; Schwartz               dented spread of introns, and transfer of splic-
                 et al. 2008). Consistent with such a directional           ing from cis- to trans-encoded elements, etc.)
                 process, U12 introns are known to often convert            remains uncertain because of a lack of close
                 into U2 introns, whereas the reverse process has           protoeukaryotic-like relatives. In contrast, infer-
                 never been described (e.g., Burge et al. 1998;             ences of features of LECA are relatively much
                 Alioto 2007). Second, positions of U12 introns             more straightforward because of the possibil-
                 may be more conserved between Arabidopsis                  ity of comparing lineages that diverged from
                 and humans than those of U2 introns, and                   LECA. That is, whereas the former by necessity
                 they are also more often localized at the 50 of            relies on largely indirect evidence, the latter can
                 the gene—a signature of ancestral introns that             draw on direct (not to say incontrovertible) ev-
                 have resisted intron loss (Basu et al. 2008a).             idence.
                     On the other hand, phylogenetic reconstruc-                 Comparative studies of modern genomes
                 tion of the insertion sites of ancient ( pre-LECA)         have drawn a surprising picture of the intron –
                 introns revealed a U2-like insertion bias (remi-           exon structures of genes in the genome of LECA
                 niscent of the protosplice site model of intron            (Roy and Irimia 2009; Rogozin et al. 2012; Koo-
                 insertion) (Dibb and Newman 1989). This result             nin et al. 2013), all of them pointing to a com-
                 suggests that U2 introns dominated at least by             plex spliceosomal system. First, comparisons of
                 the time of LECA (Basu et al. 2008c). However,             intron positions across orthologous genes from
                 this leaves open the possibility that this prolifer-       dozens of eukaryotic genomes have shown that
                 ation postdates the secondary advent of U2 in-             LECA was remarkably intron rich. At its sim-
                 trons, because group II introns may not have               plest, the logic is that if modern introns were
                 an insertion bias similar to U2 introns. Finally,          created after the various species diverged, the
                 neither type may be truly ancestral: The two               positions of these introns in homologous genes
                 types may have emerged semi-independently                  might be expected to be quite different across
                 from a larger pool of primitive snRNA dupli-               species, whereas if ancestral genes already con-
                 cates with a shared protein machinery.                     tained many introns and those introns have
                                                                            been retained in modern species, intron posi-
                                                                            tions in homologous genes might largely coin-
                 HISTORY OF THE SPLICEOSOMAL SYSTEM
                                                                            cide across species (Fig. 3). This general strategy
                 In this section we will describe the evolutionary          can be modified to account for the possibil-
                 history of introns since LECA. First, we dis-              ity that general preferences in intron insertion
                 cuss attempts to reconstruct LECA’s intron –               (e.g., into protosplice sites) and/or intron
                 exon structures using comparisons of modern                phase biases (reviewed in Rogozin et al. 2012)
                 genomes. Then, we summarize the evolutionary               could lead to independent intron insertions oc-
                 trajectories taken by different eukaryotic lineag-         curring at homologous positions in different
                 es, ranging from near complete loss and trans-             lineages (Sverdlov et al. 2005; Yoshihama et al.
                 formation of their ancestral introns, to the in-           2006). Avariety of studies have shown that genes
                 tronic stasis observed in many modern lineages.            from intron-rich species from any major eu-
                                                                            karyotic supergroup share a significant number
                                                                            of intron positions with other groups (Fedorov
                 Reconstruction of Intron– Exon
                                                                            et al. 2002; Rogozin et al. 2003; Csurös et al.
                 Structures in LECA
                                                                            2008, 2011). Different investigators have used
                 The evolutionary steps described in the previ-             different methods of evolutionary inference,
                 ous section were largely complete by the time of           ranging from parsimony to maximum likeli-

                 Cite this article as Cold Spring Harb Perspect Biol 2014;6:a016071                                           9
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                 M. Irimia and S.W. Roy

                        A          Loss              Gene A               B          Gain              Gene A

                        Inferred ancestor

                 Figure 3. Scenarios for intron evolution: shared ancestral introns versus independent gain. Hypothetical com-
                 parisons of intron positions across orthologous genes of distantly related species are shown. (A) Many introns
                 are in homologous positions, suggesting that these introns represent ancestral introns that have been retained.
                 (B) Lack of correspondence between intron positions in different lineages suggest that most introns have been
                 independently gained within each lineage.

                 hood, to attempt to reconstruct ancestral intron          same 50 ss, GTATGT. Put another way, the prob-
                 densities (Rogozin et al. 2003; Qiu et al. 2004;          ability that two randomly chosen Saccharomyces
                 Csurös 2005, 2008; Nguyen et al. 2005; Roy and           cerevisiae introns have the same hexamer splice
                 Gilbert 2005; Carmel et al. 2007, 2009). With             site is 58%, whereas two human 50 ss’s will cor-
                 increasing taxonomic sampling usually yield-              respond only 6% of the time. Following the
                 ing increasingly higher estimates for LECA’s in-          intuitive sense that (i) genome complexity will
                 tron density, the latest reconstructions (Csurös         reflect organismal complexity; and (ii) more
                 et al. 2011) used a Markov chain Monte Carlo              “highly evolved” organisms have more trans-
                 approach to investigate 99 genomes from five              formed genomes, it was initially thought that
                 eukaryotic supergroups, and estimated that                ancestral unicellular eukaryotes would have
                 LECA had 53% – 74% of modern human intron                 had “simple” yeastlike intron sequences, with
                 density, or 4.5 – 6.3 introns per gene (assuming          regular and predictable splice signals, and that
                 similar average protein-coding lengths). That is,         “complex” diverse signals arose by sequence di-
                 available evidence suggests that the intron –exon         vergence in multicellular lineages ( perhaps re-
                 structures of ancestral eukaryotes were exceed-           lated to AS, which makes extensive use of splice
                 ingly complex, with intron densities compara-             site heterogeneity) (Ast 2004). However, com-
                 ble to the most intron-dense modern densities,            parisons across species revealed that in fact yeast
                 and more intron dense than, for instance, mod-            is the exception, with the majority of unicellular
                 ern plants and insects (Fig. 4).                          and multicellular species alike (and thus likely
                      The second unexpected characteristics of             LECA) utilizing heterogeneous splicing signals,
                 LECA’s intron – exon structures involve the in-           a finding that holds both for the 50 ss and the BP
                 tron sequences themselves. Nearly all studied             (Irimia et al. 2007a; Irimia and Roy 2008a;
                 eukaryotic species show similar consensus se-             Schwartz et al. 2008; Keren et al. 2010). More-
                 quences for core splicing sequence motifs—the             over, LECA’s introns also likely harbored a poly-
                 50 splice site (ss), BP, and 30 ss (Fig. 1)—reflecting    pyrimidine tract between the BP and the 30 ss—
                 largely similar sequence preferences across spe-          similar to modern animals and plants, but not
                 cies determined by a shared spliceosomal ma-              yeasts—as inferred by the general presence of
                 chinery. However, the strength of these prefer-           this signal in representatives of most eukaryotic
                 ences—the extent to which individual introns’             supergroups (Irimia and Roy 2008a). This se-
                 specific motifs adhere to this consensus—varies           quence is bound in the first steps of spliceosome
                 dramatically (Fig. 4). For instance, whereas hu-          assembly by the U2AF65 (Zamore et al. 1992),
                 man introns use a vast array of 50 ss, 75% of all         a factor already present in the ancestral spli-
                 introns in the model yeast S. cerevisiae have the         ceosome (Collins and Penny 2005). In addition

                 10                                             Cite this article as Cold Spring Harb Perspect Biol 2014;6:a016071
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                                                                                                                                             Origin of Introns and Alternative Splicing

                                                                                                                                                                      LECA
                        A

                                                                                                                                             op ter

                                                                                                                                             ns s
                                                                                                                           us

                                                                                                                                                     a
                                                                                                                                        rm as
                                                                                                                                P. s mbe um

                                                                                                                                T. t elan NM

                                                                                                                                         ga an

                                                                                                                                        nh us
                                                                                                                                                 hil
                                                                                    r vu e

                                                                                                                                P. t assa m
                                                                                                                          ca

                                                                                                                                C. neof llanii

                                                                                                                                                lis
                                                                                                                                         z a lia

                                                                                                                                C. astensis
                                                                                  ce NM

                                                                                                                hr2 rin

                                                                                                                                                t ii
                                                                                          a

                                                                                                                                    he og
                                                      me alis

                                                                                                             luc ca

                                                                                                                                    ele orm
                                                             ae

                                                                                                                                              ru

                                                                                                                                             na

                                                                                                                                               s

                                                                                                                                             ns
                                                                                                                                    po ut
                                                                                         m

                                                                                                                                            ico

                                                                                                                                             ad
                                                                               C. evisi
                                                                         li

                                                                                                                     y ti
                                                  C. agin lia

                                                                                                                                    ory ure
                                                                                                                                P. t ayd ri
                                                                                                                                             is

                                                                                                                                              e

                                                                                                                                C. iculo
                                                                                                                                    m s

                                                                                                                                             s
                                                                                                            n-c ma
                                                                      icu

                                                                                                                                N. lcipa
                                                                                                         O. olyti

                                                                                                                                S. corn
                                                                                                                                U. urbe
                                                         rol

                                                                                                                                    tha ii
                                                                                                                                D. atan

                                                                                                                                        tan

                                                                                                                                        pie
                                                                                                                                A. vecte
                                                                                                                                         lia
                                                                                      ta

                                                                                                                 tol
                                                  T. vlamb

                                                                                                                                P. f jae
                                                                                                                   )

                                                                                                                                          v
                                                                                                                                A. ond
                                                                                                                                R. etra
                                                                    un

                                                                                                                                     bre
                                                                                                                 i
                                                                                  the

                                                                                    r

                                                                                                             his
                                                                                  pa

                                                                                                                                    rei

                                                                                                                                    sa
                                                                                                                                    na
                                                                                                                                    sil
                                                                                                                                    cr
                                                                                                                                     o
                                                                                                            lip

                                                                                                                                    m
                                                                                                                                    ri

                                                                                                                                    a
                                                                                                                                    g

                                                                                                                                    n
                                                                     c

                                                                                                                                    c
                                                                                                                                T. g

                                                                                                                                M.
                                                  G.

                                                                               G.

                                                                                                               (no

                                                                                                                                N.

                                                                                                                                N.

                                                                                                                                H.
                                                                  E.

                                                                                                         E.

                                                                                                                                E.
                                                                               S.

                                                                                                                                B.

                                                                                                                                B.
                                                                                                         Y.
                       Introns/gene
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                 M. Irimia and S.W. Roy

                 For example, nonsense-mediated decay (NMD),             For some lineages, phylogenetic reconstruction
                 a mechanism to eliminate spurious or unwant-            of intron loss and gain suggest that modern high
                 ed transcripts, is found in diverse eukaryotes          intron density is almost completely owing to
                 (Amrani et al. 2004; Chen et al. 2008; Jaillon          retention of ancestral introns (e.g., in the case
                 et al. 2008; Kerényi et al. 2008; Roy and Irimia       of vertebrates, intron number appears to have
                 2009). Indeed NMD and intron spread are likely          decreased somewhat since the ancestor of ani-
                 closely linked, because frequent splicing errors        mals 1 billion years ago) (Sullivan et al. 2006;
                 would necessitate NMD, whereas the presence             Srivastava et al. 2008; Csurös et al. 2011). In
                 of NMD would decrease the costs of intron               other cases, many modern introns appear to
                 spread (Koonin 2006; Lynch et al. 2006). Other          have been gained through a variety of processes,
                 “policing” processes, such as the ubiquitin sig-        most dramatically creation of hundreds or thou-
                 naling at the proteomic level, could also have          sands of introns through propagation of intron-
                 evolved as a defense mechanism against massive          creating elements (Worden et al. 2009; Roy and
                 intron presence (Koonin 2006). The timing of            Irimia 2012; van der Burgt et al. 2012). Similar-
                 emergence of other secondary aspects of the             ly, organelle-derived or horizontally transferred
                 spliceosomal system, most notably the function-         genes sometimes acquire similar densities to
                 al associations between chromatin and splicing          those of more ancestral nuclear host genes (Basu
                 seen in animals (Braunschweig et al. 2013), still       et al. 2008b; Ahmadinejad et al. 2010; Clarke et
                 remain to be elucidated.                                al. 2013, although see Roy et al. 2006 and Flot
                                                                         et al. 2013). Interestingly, both nearly complete
                                                                         loss of introns and dramatic intron prolifera-
                 Diverging from LECA: Different Modes
                                                                         tion have occurred several times independently
                 of Spliceosomal Intron Evolution across
                                                                         across the eukaryotic tree. For instance, nearly
                 Eukaryotes
                                                                         complete loss has occurred in groups as di-
                 From these intron-rich ancestors with hetero-           verged as fungi (at least twice), green and red
                 geneous splicing signals and complex spliceo-           algae, apicomplexa, and excavates (three times)
                 somes, the full diversity of modern eukaryotic          (Irimia and Roy 2008a). The reasons for this are
                 intron – exon structures has emerged over the           still unclear, and the proposed explanations
                 past 1.5 billion years of evolution. The results        range from selection for genome streamlining
                 of this evolutionary process have been dramatic.        to runaway intron loss mutation (reviewed in
                 Most noticeably, intron densities vary by orders        Maeso et al. 2012b). In lineages that have expe-
                 of magnitude, from several per gene and thou-           rienced many new intron gains, new introns
                 sands or tens of thousands per genome in many           appear not to have been created at constant rates
                 unicellular and multicellular lineages (Mer-            over millions of years of evolution, but instead
                 chant et al. 2007; Roy and Irimia 2009; Curtis          to be concentrated in episodes of gain (Roy
                 et al. 2012), to a handful per genome in the            2006; Roy and Penny 2006; Csurös et al. 2011).
                 genomes of some other species (Fig. 4) (Matsu-          Over recent times, many studied modern line-
                 zaki et al. 2004; Vanacova et al. 2005; Morrison        ages appear to be under a sort of intronic evo-
                 et al. 2007). Notably, these densities do not re-       lutionary stasis, experiencing low (or very low)
                 spect our intuitive sense of biological simplicity      numbers of intron gain and loss (Roy et al. 2003,
                 and complexity. Many parasitic lineages have            2006; Nielsen et al. 2004; Roy and Hartl 2006;
                 intron densities comparable to those of most            Coulombe-Huntington and Majewski 2007a,b;
                 intron-rich multicellular lineages, and at least        Roy and Penny 2007a; Stajich et al. 2007; Loh
                 one multicellular lineage has intron densities          et al. 2008; Zhang et al. 2010). On the contrary,
                 comparable to some of the most intron-poor              only a few known exceptions scattered across the
                 known unicellulars (Collén et al. 2013). Al-           eukaryotic tree show significant ongoing intron
                 though the origins of nearly intronless lineag-         gain-dominated evolution (Carmel et al. 2007;
                 es—massive intron loss—are clear, the origins           Roy and Penny 2007b; Worden et al. 2009; De-
                 of the highly intron-dense lineages are less so.        noeud et al. 2010; van der Burgt et al. 2012).

                 12                                           Cite this article as Cold Spring Harb Perspect Biol 2014;6:a016071
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                                                                                      Origin of Introns and Alternative Splicing

                      In addition to intron densities, splicing sig-        though it is difficult to confidently estimate
                 nals also show marked differences across ge-               average and median intron length in LECA, they
                 nomes (Fig. 4). In particular, 50 ss consensus in          may have been similar to those in most eukary-
                 a given species may be either very strict, with            otic genomes studied so far, perhaps around
                 nearly all introns in the genome having an iden-           150 nucleotides on average and with a median
                 tical or similar sequence, or highly heteroge-             centered around 70 – 90 nucleotides. It should
                 neous, as in the case of the human genome (Iri-            be noted, however, that introns were probably
                 mia et al. 2007b). Heterogeneous 50 ss signals             significantly longer at least in earlier ancestors,
                 are observed in most lineages, whereas the few             as group II introns from which they derived are
                 lineages with strict signals are scattered across          at least 400– 800 nucleotides long (Lambowitz
                 the phylogenetic tree, representing exceptions             and Zimmerly 2011), and they may reach 2.5
                 to the general pattern in nearly all major eukary-         kbp when they encode their own IEP (Lambo-
                 otic supergroups (Irimia et al. 2007b). This dis-          witz and Zimmerly 2004; Koonin 2009). Sur-
                 tribution strongly suggests that these exceptions          veys of intron lengths across eukaryotes reveal
                 have evolved independently within each lineage,            striking deviations in intron length distributions
                 echoing the case for lineages that have under-             in both directions. First, tiny introns (18 – 36
                 gone nearly complete intron loss. Unexpectedly,            nucleotides) have independently evolved in a
                 these two processes were found to have a very              few species scattered across the eukaryotic tree,
                 clear correspondence: In nearly intronless spe-            in groups as diverged as animals (Ogino et al.
                 cies, the few remaining introns have highly sim-           2010; Tsai et al. 2013), green algae (Gilson et al.
                 ilar and constrained 50 ss signals (Fig. 4) (Irimia        2006), and ciliates (Russell et al. 1994). More-
                 et al. 2007b). A similar pattern of strict signals         over, in all of these species the distributions
                 has been observed for the BP (Irimia and Roy               of lengths are very sharp—suggesting strong
                 2008a; Schwartz et al. 2008), although with a few          pressure to maintain a seemingly optimal small
                 exceptions involving species with few introns              length—and their splicing signals are highly dif-
                 but heterogeneous BPs (Fig. 4). Finally, in a few          fuse (with the noted exception of microspori-
                 instances, a further constraint on splicing sig-           dians) (Irimia and Roy 2008a; Lee et al. 2010).
                 nals has evolved, in which the BP is “anchored”            On the opposite side, introns from certain lin-
                 to the 30 ss, always present at exactly or almost          eages are unusually long. These include some
                 the same number of nucleotides away (Irimia                plants (Jiang and Goertzen 2011), brown algae
                 and Roy 2008a). Thus, in these extreme cases,              (Cock et al. 2010), and, especially, vertebrates,
                 the remaining U2 introns show similarities to              with averages one order of magnitude higher
                 U12 introns, which are also found in low num-              than most species, and with some introns reach-
                 bers per genome: extended constrained 50 ss and            ing up to one megabase. These expansions seem
                 BP signals and BP position. Although several               likely to be associated with insertion of trans-
                 explanations have been proposed to explain the             posable elements into intronic sequences (Jiang
                 coevolution between splicing signals and intron            and Goertzen 2011), although, in some cases,
                 numbers (Irimia et al. 2007b, 2009), the cause             exceptionally long introns may be associated
                 remains obscure. It seems likely that an impor-            with the presence of large numbers of cis-regu-
                 tant component is changes in the spliceosome:              latory elements (Irimia et al. 2011).
                 Massive introns loss is often accompanied by
                 loss of some auxiliary spliceosomal factors
                                                                            FUNCTIONS OF INTRONS: ORIGINS AND
                 largely involved in recognition of auxiliary splic-
                                                                            EVOLUTION OF ALTERNATIVE SPLICING
                 ing sequence signals such as ESEs (Plass et al.
                 2008), presumably leading to a greater emphasis            Given the fascinating and complex evolutionary
                 on core splicing signals, and thus requiring in-           paths that intron – exon structures have taken
                 creased information content in those signals.              from LECA to each extant eukaryotic lineage,
                      Intron lengths have also experienced con-             perhaps the major paradox is that, for most cas-
                 siderable divergence across eukaryotes. Al-                es, introns are not known to play a general func-

                 Cite this article as Cold Spring Harb Perspect Biol 2014;6:a016071                                          13
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                 M. Irimia and S.W. Roy

                 tional role in the biology of the cell: They are        in intracellular transport, either at the mRNA
                 simply intervening sequences that must be re-           (Buckley et al. 2011) or protein level (Freitag
                 moved for proper gene expression to occur. Sev-         et al. 2012; Kabran et al. 2012). Nevertheless,
                 eral evolutionary functions have been proposed          despite a plethora of described examples of
                 for introns, including enhancing within-gene            each kind, a major unanswered question is still
                 homologous recombination (Comeron and                   to what extent AS is functional or simply splic-
                 Kreitman 2000), and facilitation of exon shuf-          ing “noise” (Sorek et al. 2004; Irimia et al. 2008;
                 fling and thus the evolution of new genes (Gil-         Roy and Irimia 2008).
                 bert 1987; Liu and Grigoriev 2004). In addition,            AS was initially believed a relatively recent
                 certain intron sequences are known to play reg-         innovation, largely associated with the emer-
                 ulatory roles in transcription, either by acting        gence of multicellularity (Ast 2004). However,
                 as a limiting factor in cotranscriptional splic-        the wealth of genomes and transcriptomes ac-
                 ing (Patel et al. 2002), or indirectly by harbor-       cumulated over the past decade has turned
                 ing DNA regulatory elements (e.g., Epstein et al.       around this view. First, significant amounts of
                 1999; Irimia et al. 2012; Maeso et al. 2012).           AS (usually intron retention) have now been
                 Similarly, other introns harbor nested bona             described in nearly all well-studied species.
                 fide protein-coding or noncoding RNA genes              These include representatives of all eukaryotic
                 (e.g., Assis et al. 2008; Kumar 2009; Chorev            supergroups (McGuire et al. 2008; Labadorf
                 and Carmel 2013). However, the most wide-               et al. 2010; Otto et al. 2010; Rhind et al. 2011;
                 spread function of introns is probably through          Shen et al. 2011; Curtis et al. 2012; Sebé-Pedrós
                 AS, which has been estimated to occur in nearly         et al. 2013), and even some intron-poor species
                 all human multiexonic genes (Pan et al. 2008;           (Pleiss et al. 2007; Kabran et al. 2012). Second,
                 Wang et al. 2008). By differentially processing         LECA’s intron – exon structures seem to have
                 introns (and exons), eukaryotic cells can gener-        met all classic requirements for AS to occur
                 ate multiple mRNA and protein isoforms from             (Roy and Irimia 2009; Koonin et al. 2013): It
                 a single gene. This differential processing may         was intron-rich (Irimia et al. 2007b) and had
                 consist of nonsplicing (or retention) of introns,       heterogeneous splice signals, which are associ-
                 in the simplest case, but also of exclusion (or         ated with splicing variation within and across
                 skipping) of an entire exon, alternative choice of      genomes (Stamm et al. 2000; Ast 2004; Baek
                 multiple 50 ss, and/or 30 ss within an intron           and Green 2005). Moreover, functional and evo-
                 (which would produce exon truncations/ex-               lutionary analyses of alternatively spliced genes
                 pansions), or more complex splicing patterns            showed that ancient eukaryotic genes (i.e., those
                 (Irimia and Blencowe 2012).                             present in LECA) show high levels of AS in dif-
                      AS may itself perform different functions.         ferent eukaryotic lineages, suggesting that they
                 First, AS can generate protein isoforms with            could have been amenable to AS also in LECA
                 highly distinct sequences that may have dramat-         (Irimia et al. 2007b). Thus, all these lines of
                 ically different activities and biological roles        evidence strongly suggest that LECA could
                 (e.g., Boise et al. 1993; Gabut et al. 2011); but       have had at least a simple program of AS, per-
                 also proteins with only slight variations that          haps comparable to those observed in most
                 contribute to fine-tuning of cellular processes         modern unicellular eukaryotes, dominated by
                 (Lopez 1998). Second, AS can produce non-               intron retention and perhaps acting mainly as
                 functional transcript variants, serving as an ex-       an extra layer of gene regulation. However, as for
                 tra layer of down-regulation of gene expression.        any modern eukaryote, it is not clear what frac-
                 This may be achieved by on/off protein switches         tion of that transcriptional diversity played any
                 (Bingham et al. 1988), or by coupling missense          relevant biological role.
                 AS to NMD (Lewis et al. 2003; Lareau et al.                 Despite the similar patterns of AS in most
                 2007; Yap et al. 2012), leading to transcript deg-      eukaryotes, AS has also undergone remarkable
                 radation before translation. Third, inclusion           diversification in some specific lineages. For in-
                 of alternative sequences may lead to differences        stance, and most unexpectedly (Ast 2004), the

                 14                                           Cite this article as Cold Spring Harb Perspect Biol 2014;6:a016071
Downloaded from http://cshperspectives.cshlp.org/ on October 9, 2021 - Published by Cold Spring Harbor Laboratory Press

                                                                                      Origin of Introns and Alternative Splicing

                 intron-poor yeast S. cerevisiae may have evolved           This species has intron densities comparable
                 regulated, functional AS for the majority, if not          to humans (eight to nine introns per gene),
                 all, of its intron-containing genes. S. cerevisiae         and displays extremely high levels of splicing
                 has 260 introns, with a strong bias toward                variation, including both intron retention and
                 ribosomal genes (Juneau et al. 2006). In-depth             exon skipping. RNAseq quantification of exon
                 transcriptomic analyses have shown that splic-             skipping shows levels comparable only to the
                 ing of specific sets of introns can be down-reg-           human cortex (Curtis et al. 2012), the highest
                 ulated in response to different growth condi-              AS levels described so far in any tissue and or-
                 tions (Pleiss et al. 2007; Parenteau et al. 2011).         ganism (Barbosa-Morais et al. 2012). Whereas
                 For example, only a few minutes after amino                much of B. natans AS may represent noisy splic-
                 acid starvation, splicing of most ribosomal in-            ing, a few potential cases in which AS drives
                 trons is down-regulated, thereby reducing pro-             alternative subcellular targeting were identified
                 duction of ribosomal proteins, thus presumably             (Curtis et al. 2012), suggesting that AS may play
                 reducing global translation (Pleiss et al. 2007).          important and diverse roles in the biology of
                      Nonetheless, the most famous example of               B. natans. Surveys of diverse eukaryotes to iden-
                 extreme AS diversification is found in animals,            tify additional cases of independent evolution
                 particularly in primates (Barbosa-Morais et al.            of widespread functional exon skipping (or oth-
                 2012). Not only is AS much more common in                  er forms of protein-diversifying AS) are crucial
                 metazoans, but the mode of AS is also qualita-             to a full understanding of the functional impact
                 tively different from that of other eukaryotic             of the spliceosomal system in eukaryotes.
                 lineages. Unlike most eukaryotes, in which in-
                 tron retention dominates, animals show fre-                CONCLUDING REMARKS
                 quent usage of exon skipping (McGuire et al.
                                                                            Evidence from biochemistry and comparative
                 2008). In exon skipping, AS leads to the inclu-
                                                                            genomics strongly suggests that spliceosomal
                 sion/exclusion of full exons that behave as “cas-
                                                                            introns originated and proliferated during the
                 settes” of sequence and that can be deployed in
                                                                            origin of eukaryotes, in a process intimately
                 a tissue-specific manner. Consistent with the
                                                                            linked to other evolutionary events of eukar-
                 cassette idea, most regulated alternative exons
                                                                            yogenesis. The resulting portrait of the LECA
                 are multiple of three nucleotides (Xing and Lee
                                                                            spliceosomal system was comparable to that
                 2005), thereby preserving the reading frame
                                                                            of intron-rich modern eukaryotic organisms,
                 when included/excluded. Thus, cassette exons
                                                                            with myriad introns, complex splicing mach-
                 can have a much broader impact on proteome
                                                                            inery, and coupling of splicing to other cellular
                 diversity, by introducing or disrupting func-
                                                                            processes. From that complex ancestor, intron –
                 tional protein domains, or by regulating disor-
                                                                            exon structures have been highly plastic
                 dered regions often involved in protein – protein
                                                                            throughout eukaryotic evolution, with cases of
                 interactions (Buljan et al. 2012; Ellis et al. 2012;
                                                                            extreme divergence from their ancestors that
                 Colak et al. 2013). Intriguingly, the reasons for
                                                                            shaped the genomic landscapes of eukaryotes
                 this transition from intron retention (mostly
                                                                            probably like no other. Particularly remarkable
                 impacting gene regulation) to exon skipping
                                                                            are the multiple instances of recurrent evolution
                 (increasing proteomic diversity) during the
                                                                            of multiple features, including massive intron
                 evolution of early animals are still unknown.
                                                                            loss, constraint of splicing signals, restriction
                 Given the potential centrality of AS for the or-
                                                                            of BP-30 ss distance, loss of polypyrimidine tract,
                 igin of animal complexity, understanding this
                                                                            and complete loss of U12 introns.
                 crucial transition is a central priority for future
                 studies.
                                                                            ACKNOWLEDGMENTS
                      Strikingly, another eukaryotic lineage has
                 recently been reported to have mammalian-                  We thank Eesha Sharma for critical reading of
                 like levels of exon skipping, the chlorarachnio-           the manuscript. M.I. holds a PDF from the Hu-
                 phyte Bigelowiella natans (Curtis et al. 2012).            man Frontiers Science Program Organization.

                 Cite this article as Cold Spring Harb Perspect Biol 2014;6:a016071                                          15
You can also read