Scientific Report 2019 - Italian Institute for Genomic Medicine
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The Italian Institute for Genomic Medicine (IIGM) (formerly known as Human Genetics Foundation - Torino) is an operating body of the Compagnia di San Paolo since 2007. The IIGM is a research center of excellence and training in human genomics, epigenomics, and immunology, and carries out its activities through a model of efficiency and transparency, aiming at maximizing resources for research activities, training, and high-level education. The Institute shares scientific partnerships and collaborations with University of Turin (UniTo) and Polytechnic of Turin (PoliTo). In December 2018 IIGM and the Piedmont Foundation for Oncology (FPO) – IRCCS created a joint research platform in the FPO-IRCCS building in Candiolo (To), where IIGM laboratories moved in July 2019. IIGM has organized the new laboratories in accordance with new operational procedures and structural methods: the spaces on have been organized according to the criterion of “functionality and sharing”, so to create more opportunities for interaction between research teams, favoring the sharing of scientific notions and the spring of new ideas and projects. The scientific collaboration between IIGM and FPO-IRCCS will exploit research projects of excellent scientific value, allowing the achievement of the highest international standards in the field of translational and bio-medicine and oncology. Research activities carried on by PIs at the IIGM address cancer genomics and bioinformatics, epigenetic modifications related to malignant diseases, genomics instability and tumor immunity, immune-regulation, genomic epidemiology, quantitative and computational biology. From June 2019, a new “Genomic Instability and Tumor Immunity” started its research activities in IIGM. Research projects are managed by PIs, and are partly financed by the Compagnia di San Paolo and partly by external funders. The several collaborations the IIGM is involved in, both with Academia and Health Services, will foster the development of intellectual properties that could contribute to the establishment of new biotechnological companies and investments in the Piedmont area. The IIGM and “La Città della Salute e della Scienza” in Torino cooperate from years in ongoing projects with the goal to identify new molecular markers for the early diagnosis of tumors
(collaboration with the Center for the Oncological Prevention of Piedmont). Since 2018 was set up and developed the project "Functional genomics applied to pediatric neoplasms: from mutations, to function, to therapy" (Sargen - Sargenita) funded by a grant from the Compagnia di San Paolo to the Italian Foundation for Pediatric Hematology and Oncology (FIEOP) and a grant by Fondazione Veronesi, which is carried on at the IIGM under the scientific supervision of Prof. Franca Fagioli (Ospedale Infantile Regina Margherita) with the scientific collaboration of Prof. Salvatore Oliviero and Dr. Matteo Cereda. The project aims to deepen the knowledge of the biological mechanisms underlying sarcomas, improve diagnosis, and design new potential therapeutic strategies through a functional genomics platform that integrates nucleic acids sequencing analyses, cell and animal models, and functional studies. IIGM has activated the following scientific collaborations aimed at fostering the basic, translational and clinical research activities, through the activation of joint research projects involving both clinical and laboratory activities. Within these collaborations, all the partners provided personnel, scientific skills and knowledge, as well as their own instruments and equipment. • The partnership with Politecnico di Torino sustains the following projects: "Statistical inference and computational biology" and "New algorithms for inference and optimization from large-scale biological data (INFERNET) European project funded by the Horizon 2020 Marie Curie RISE program in which IIGM participates as coordinator; • The partnership with Università Cattolica del Sacro Cuore and Fondazione Policlinico Universitario Agostino Gemelli IRCCS sustains the study of the following topics: 1. Development of therapeutic antibodies; 2. Development of new CAR T therapies; 3. Genomic study of tumors. Thanks to this collaboration, starting from 2020 the IIGM laboratories will host a new "Immunotherapy" research group that will focus on the production of monoclonal antibodies for therapies and/or CAR-T, whose Principal Investigator will be Dr. Tobias Longin Haas; • The partnership with Department of Biology of the University of Rome "Tor Vergata" led to the creation, in early 2020, of a new research group in IIGM, the "Genomic instability and tumor immunity" Unit, whose Principal Investigator will be Dr. Ilio Vitale. In 2019 a total of 93 articles with IIGM affiliation were published in peer reviewed journals, such as Nature Medicine, BMJ, Cancer Discovery, Nature Genetics, Circulation, Cell Metabolism, for a total IF of 763.489 (mean IF 8.299).
During the 2019, the Institute organized scientific workshops and seminars to foster the opportunities to exchange opinions, knowledge, and experimental protocols between researchers with different backgrounds, and to contribute to education and dissemination of science. Worthy of note is the meeting “New frontiers in cancer immunotherapy”, coordinated by a scientific committee formed by the IIGM’s Prof. De Maria, Prof. Basso, and Dr. Pace, held at Politecnico di Torino with the participation of Italian and international speakers and attendees. The IIGM also pursued its commitment to scientific communication and dissemination through the project “Vivere la scienza”, a hands-on laboratory class for high school students with interactive activities in the field of biology and life sciences. Two main "facilities" are already operating at the IIGM, the Flow Cytometry and Cell Sorting Facility (Flow Cytometry Facility), and the Genomics and Epigenomics Facility (Genomic Facility). The Flow Cytometry and Cell Sorting Facility boosted its well-run activities offering services to researchers from the IIGM, the Candiolo Cancer Institute FPO-IRCCS, the Academia, and other research institutes. In late 2019 the Genomic Facility acquired the Illumina NovaSeq6000 genomic sequencer, which allows a modular approach to sequencing with the availability sequencing support (flow cells) in different formats, thus allowing small to large scale sequencing projects at moderate costs. As well as in previous years, the Genomic Facility provided methylation analysis services (microarray methylation analysis) to local and international researchers. The IIGM pursues a policy of state-of the-art research providing its researchers with innovative and up-to-date instrumentation. The recently acquired Seahorse XF Analyzer, an in vivo metabolic analyzer which allows to study key cellular functions such as mitochondrial respiration and glycolysis, has aroused the interest of several researchers from the Candiolo Cancer Institute FPO- IRCCS and the University of Turin.
IIGM Principal Investigators Cancer Genomics and Bioinformatics Unit Dr. Matteo CEREDA, PhD. International experiences: King’s College London, London, UK IEO, Istituto Europeo di Oncologia, Milan, IT MRC Laboratory of Molecular Biology, Cambridge, UK Molecular Epidemiology and Exposome Unit Dr. Alessio NACCARATI, PhD. International experiences: Institute of Experimental Medicine (IEM), Czech Academy of Sciences, Prague, Czech Republic Immuno-Regulation Unit Dr. Luigia PACE, PhD. International experiences: Curie Institut, Paris, France HELMHOLTZ Centre for infection research, Hannover, Germany
Epigenomics Unit Prof. Salvatore OLIVIERO, PhD. Full Professor in Molecular Biology, University of Turin International experiences: Albert Einstein College of Medicine, The Bronx, NY, USA Harvard Medical School, Boston, USA EMBL, Heidelberg, Germany Statistical Inference and Computational Biology Unit Full Prof. Andrea PAGNANI, PhD Full Professor (L.240), Polytechnic University of Turin International experiences: University of Paris-Sud ORSAY, Paris, France. Genomic Instability and Tumor Immunity Unit Dr. Ilio VITALE, PhD International experiences: Gustave Roussy Cancer Campus (INSERM U848), Villejuif, France
INDEX Cancer Genomics and Bioinformatics Unit______________________ 1 Project 1: “Deciphering alternative splicing regulation in cancer to identify new 3 therapeutic targets” Subproject "Characterization of the defects in the angiogenesis process caused by 5 the splicing factor Nova2 and their effect in tumorigenesis" Project 2: "Development of algorithms for the characterization of molecular 6 mechanisms responsible for the onset and progression of cancer" Project 3: "Analysis of heterogeneity and clonal evolution of tumors to identify 10 new therapeutic intervention points in malignancies" Project 4: "Study of the micro immune environment in breast cancer in order to 12 evaluate its effects on therapeutic treatments and identify new vulnerabilities" Funds and Grants 12 Molecular Epidemiology and Exposome Unit____________________ 13 Project 1: “Research of biomarkers for primary and secondary prevention of 15 tumors” Subproject “Evaluation of DNA methylation in leukocyte sub-populations associated 15 with tobacco smoking exposure Subproject "Role of microRNA and gut microbiome as colorectal cancer 18 biomarkers” Subproject “Identification and comparison of miRNA expression profiles in plasma 20 and feces and composition of the intestinal microbiome in subjects participating at the colon cancer screening program: how diet and lifestyle can alter the expression of miRNAs and the composition of the intestinal microbiota” Subproject “Study of the expression and composition of the intestinal microbiota in 22 relation to different eating habits” Subproject “Profiles of miRNA expression in faeces and plasma of subjects affected 24 by celiac disease by Next-Generation-Sequencing” I
Project 2: “Social inequalities, biological pathways and health” 26 Subproject "Environmental Measurements and Molecular Footprints of 28 Environmental Exposures (EXPOsOMICs)" Funds and Grants 30 Immuno-Regulation Unit_____________________________________ 31 Project 1: "Analysis of nuclear dynamics of heterochromatin during T 33 lymphocyte differentiation" Project 2: "Study of ontogenesis and heterogeneity of T lymphocytes during the 36 immune responses to tumor antigens and pathogens" Funds and Grants 40 Epigenomics Unit___________________________________________ 41 Project 1: "Epigenetic modifications involved in cell transformation" 42 Project 2: "Analysis of gene alterations in pediatric sarcomas" 45 Project 3: "Functional analysis of genes involved in neuroectodermal and 46 endomesodermal differentiation" Services carried out by the Unit 47 Funds and Grants 47 Statistical Inference and Computational Biology Unit_____________ 49 Project 1: “Cancer and Metabolism” 51 Subproject “Growth Rates and Metabolism” 51 Subproject “Cell Polarity and Compartmentalization” 52 Subproject “Biophysical modeling of molecular sorting” 52 II
Project 2: “Optimization method of chemico-physical protein properties” 54 Project 3: “Quantitative biology and modelling” 56 Subproject “Quantitative approach to the physiology of cancer” 58 Funds and Grants 59 Genomic Instability and Tumor Immunity Unit__________________ 61 Project: “Exploiting karyotypic aberrations and chromosomal instability in 62 cancer cells and cancer stem cells for precision immunotherapy: Chromosome instability and immunogenicity in CSCs” Funds and Grants 66 FACILITIES_________________________________________________ 67 Flow Cytometry and Cell Sorting Facility 67 Metabolic Facility 70 Genomic Facility 71 PUBLICATIONS 2019________________________________________ 73 III
Cancer Genomics and Bioinformatics Unit Research Group Matteo Cereda, Ph.D, Head of Unit (IIGM) Marco Del Giudice, Ph.D, Post Doc fellow (IIGM) Federica Gaudino, Ph.D, Post Doc fellow (IIGM) Greta Romano, Ph.D, Post Doc fellow (IIGM) Serena Peirone, Ph.D. student (IIGM and INFN) Francesca Priante, Eng. Research fellow (IIGM) Sarah Perrone, Research fellow (IIGM) Mariachiara Griego, internship (IIGM and UniFi) 1
The "Cancer Genomics and Bioinformatics" Unit studies the genomics of cancer using next generation sequencing techniques to characterize the molecular mechanisms underlying the onset and progression of the disease. With the growth of national and international sequencing projects, modern biology is facing new challenges caused by the massive production of genomic data. The main one is to extract relevant information from these data taking into account their intrinsic heterogeneity. Our unit is positioned in this field and addresses a wide range of biological problems resulting from large-scale genomic, transcriptomic and epigenetic experiments. Through the use of the most recent bioinformatic, statistical and mathematical approaches, our research aims to disarticulate the complexity of genomic data in order to support a personalized medicine capable of being incorporated into clinical practice. The group's activity focuses on identifying the somatic alterations, defining the clonal evolution of tumors, and identifying new molecular mechanisms that are points of clinical intervention. In particular, our interest focuses on the identification of new oncogenic alterations in tumors characterized by the absence of mutations, structural variations of the genome and transcriptome. In this context there is the study of the transcriptional regulation of splicing proteins and their role in the onset and progression of cancer in order to discover new potential therapeutic targets. Alternative splicing drives proteome diversity. Alterations of the protein-RNA interaction can lead to a variety of diseases, including cancer. It has also been recently shown that in cancers the protein alterations produced as a result of defects in the splicing mechanism outnumber those produced by somatic mutations (neo-antigens), and for this reason their study can potentially increase the range of action of precision medicine. 2
Projects Project 1: “Deciphering alternative splicing regulation in cancer to identify new therapeutic targets” M. Cereda (PI), M. Del Giudice, S. Peirone, F. Gaudino Aims 1. To analyze genomic, transcriptomic and epigenetic data of different tumors in order to identify alterations in tumor transcripts 2. To Identify the proteins responsible for the alteration of the splicing mechanism, and their regulators, and the damaged isoforms created, in order to select new therapeutic targets Results The group's activity has focused on the study of splicing regulation in prostate cancer. Although it is known that the alteration of the activity of transcription factors such as FOXA1, ERG, AR and HOXB13 is the basis of this disease, their effect on the regulation of splicing and the creation of spurious proteins remains unknown. Through the analysis of hundreds of mass sequencing data and experimental validations, we have shown how the expression of FOXA1 is the main regulator of the activity of the proteins that control splicing both in primary carcinoma and in metastases (Figure 1). . 60 2 % of R 0 FOXA1 HOXB13 ERG MYC FOXA1 MYC HOXB13 ERG AR AR Figure 1. Relative contribution of the expression of each transcription factor in the alteration of the gene expression of splicing proteins in primary (black) and metastatic (orange) tumors. In particular, the high expression of FOXA1 in primary and metastatic prostate cancer controls the expression of a specific set of 16 splicing factors. The unit validated the preliminary results through the use of four cell models recapitulating prostate cancer (i.e. DU145, PC3, LNCaP and VCaP) and silencing and over-expression experiments of FOXA1 confirming its set regulation action of splicing proteins (Figure 2). 3
Figure 2. A. Splicing effects significantly altered by the high expression of FOXA1 in primary (black) and metastatic (orange) prostate cancers. B. FOXA expression levels and six candidate splicing factors in the four cell lines under basal conditions C. FOXA1 silencing D. FOXA1 overexpression. The activation of FOXA1 and the consequent alteration of splicing factor expression reduce protein variability towards specific isoforms, significantly impacting oncogenes and genes predisposed to the suppression of tumors (Figure 3). Some of these isoforms are associated with a 20 * * * * svASE (%) positive prognosis for patients while others are 10 associated with a negative one. At the moment, the activity focuses on determining which isoforms 0 all ES A3 A5 IR MEX are most aggressive for patients and analyzing Cancer genes Rest of genes their specific regulatory mechanisms to provide new possible therapeutic strategies. Alternative Figure 3. Percentage of alternative splicing events, due to the high expression of FOXA1, splicing has been shown to be a therapeutic in cancer genes compared to the rest of the genes. vulnerability for cancers caused by the MYC oncogene. Our results in prostate cancer propose for the first time the applicability of the same concepts to prostate tumors by providing new candidates of therapeutic interest. Collaborations Prof Jernej Ule (Francis Crick Institute, London, UK) Dr Prabhakar Rajan (Barts Cancer Institute, Queen’s Mary University, London, UK) Prof Roded Sharan (School of Computer Science, Tel-Aviv University, Israel) 4
Subproject “Characterization of the defects in the angiogenesis process caused by the splicing factor Nova2 and their effect in tumorigenesis” M. Cereda (PI), M. Del Giudice Aims 1. To characterize the regulatory mechanisms of the splicing protein Nova2 in the formation of the vascular lumen and development of blood vessels 2. To identify the functional targets of Nova2 and its contribution in tumor angiogenesis Results The formation of the vascular lumen is a fundamental step during angiogenesis; however, the molecular mechanisms underlying this process remain unknown. Recent studies have shown that neural and vascular systems share common anatomical, functional and molecular similarities. The organization of the endothelial lumen is controlled at the post-transcriptional level by the splicing regulator Nova2, whose depletion interrupts the formation of the vascular lumen. Our unit has identified the functional targets of Nova2, reconstructing the methods of regulation and cooperation with other known splicing factors. In particular, it characterized Nova2 target exons affecting the Par polarity complex and its regulators. The exhaustion of Nova2 in cultured endothelial cells in fact compromises polarity, a process required for the formation of the vascular lumen. The unit is currently engaged in transferring these results to the study of tumors by analyzing mass sequencing data. Collaborations Dr Claudia Ghigna (Institute of Molecular Genetics, CNR, Pavia) 5
Project 2: “Development of algorithms for the characterization of molecular mechanisms responsible for the onset and progression of cancer” M. Cereda (PI), M. Del Giudice, S. Peirone, F. Priante Aims Identify new methodologies to extract relevant information on the origin and progression of diseases from mass sequencing. Results Through the application of artificial intelligence techniques, we have developed a new algorithm capable of identifying the biological processes related to the onset and progression of tumors using sequencing data of the tumor transcriptome. The developed method outperforms reference algorithms for the scientific community. Applied to 5,941 samples of 14 types of cancer, our method correctly identified the alterations of the specific signaling pathways of each tumor. Furthermore, our results have highlighted the role of the transcription factor PTEN in the modulation of immune processes. In particular, we have clarified that prostate tumors that exhibit PTEN loss have an immunosuppressive microenvironment, due to STAT3 activation. With the growth of high-throughput sequencing projects, modern biology is facing new bottlenecks due to big data problems. One of the challenges is to extract relevant information from these high- volume data, taking into account their intrinsic heterogeneity. So far, genomic projections have profiled thousands of samples providing insights into the cell transcriptome. However, disentangling the heterogeneity of these transcriptomic Big Data to identify defective biological processes remains difficult. Through the application of artificial intelligence techniques, we have developed a new algorithm capable of identifying the biological processes related to the onset and progression of tumors using sequencing data of the tumor transcriptome. Our GSECA Set Enrichment Class Analysis (GSECA) algorithm exploits the bimodal behavior of RNA sequencing gene expression profiles to identify altered genetic groups in heterogeneous patient cohorts (Figure 4). We have shown that GSECA has overcome the "cutting edge" algorithms in the management of sets of genes characterized by changes in the expression of groups of genes that are activated or repressed more intensely in a heterogeneous way among the samples. It can detect functionally related altered cellular mechanisms in a condition of interest considering more heterogeneous cohorts than other available methods. By increasing the signal-to-noise ratio, GSECA is able to successfully manage the heterogeneity of thousands of samples and provides useful information on the clinical and biological patterns of a phenotype (Figure 5). 6
Figure 4. Schematic representation of the GSECA algorithm. The algorithm proceeds through three sequential phases: (i) modeling the specific finite mixture of the sample for the distribution of gene expression; (ii) the specific discretization of the sample of expression values in seven categorical expression classes; and (iii) the statistical identification of altered genetic groups (AGS). AGS are displayed as maps of the expression class. Figure 5. Evaluation of the performance of the gene set analysis algorithms. Scatter plot of mean absolute FC (Abs. FC) and dispersion (D) averaged over the sets of genes detected by each method. The point size represents the average standard deviation of D. The colored button indicates the percentage of sets of genes that contain both activated and repressed sets of genes, that is, coordinated variability. 7
By increasing the signal-to-noise ratio, GSECA is able to successfully manage the heterogeneity of thousands of samples and provides useful information on the clinical and biological patterns of a phenotype. With this work we introduced the "less is more" paradigm shift in the treatment of large heterogeneous RNA-seq data sets, demonstrating that it improves the detection of altered biological processes in the phenotype of interest. Simultaneously, we generated a comprehensive evaluation of the effect of PTEN transcription factor loss in different types of cancer. Our data showed that the impact of PTEN silencing on cell regulation is proportional to the modulation of the PI3K / AKT signaling cascade, with the strongest effect in gliomas, endometriums, breast carcinomas, melanomas and sarcomas. GSECA has correctly highlighted the role of PTEN in the control of immune-related processes in most types of cancer, in particular in those that show a significant alteration of the tumor immune- microenvironment (TIME). These data support the importance of PTEN in modulating the immune system and therapeutic resistance (Figure 6). Figure 6. Pan-cancer analysis of somatic loss of PTEN. A. Heatmap showing the altered classes of gene sets among types of cancer. Classes are defined in accordance with the KEGG category. Each cell reports the AGS number. The annotation heatmap indicates the KEGG superclass of biological processes. B. Heatmap shows the number of sets of immune-related genes that are altered following PTEN loss between cancer types according to GSECA and other GSA methods. On the right panel, the heat map similar to an EC map represents the statistically significant alteration of the population of immune cells among types of cancer. Triangle size is the relative change in the percentage of tumor immune infiltrates between loss of PTEN and wild-type samples. The bar chart shows the IS for each type of cancer. 8
Emerging evidence has suggested that PTEN loss is an immunosuppressive event in prostate cancers. However, the connection between PTEN and the immune system is complex and involves both pro and anti-tumorigenic immune responses depending on the cellular phenotype and tumor microenvironment. Our analyses have shown that prostate tumors with PTEN loss are "cold" tumors, characterized by a low T lymphocyte content. Furthermore, we have shown that the immunosuppressive tumor environment of prostate tumors with PTEN loss could be caused by STAT3 activation. The loss of PTEN reduces the recurrence times of the disease of these patients. This underlines the potential role of biomarker of PTEN expression levels (Figure 7). Figure 7. Impact of PTEN loss in prostate cancer. A. GSECA CE map showing immune expression signatures altered as a consequence of somatic loss of PTEN in prostate cancer. B. Kaplan-Mayer disease-free survival curves (DFS) for patients with PTEN and PTEN-weight loss. C. Kaplan-Mayer DSF curves measured the stratification of PRAD patients on the optimal PTEN expression level (i.e. TPM = 3.56, selected maximum rank statistics = 2.34) within two years of initial treatment. D. Box diagrams showing the expression distributions of PTEN normalized expression levels for PTEN and PTEN-wt loss samples of four genes related to the immune response. E. Boxplot distributions of STAT3 relative level of phosphorylation for PTEN and PTEN-wt PRAD loss samples. Collaborations Prof Michele Caselle (Department of Theoretical Physics, University of Turin) 9
Project 3: “Analysis of heterogeneity and clonal evolution of tumors to identify new therapeutic intervention points in malignancies” M. Cereda (PI), G Romano, F Priante, F Gaudino Aims 1. To create a catalogue of tumor evolution models that allow the characterization of cancer driver genes, the classification of clones and sub-clonal events, responsible for drug resistance and an understanding of the interaction between clonal and sub-clonal events in the drug resistance. 2. To experimentally evaluate the predictions obtained on samples and tumor models through mass sequencing techniques 3. To Identify and quantify the predicted drug resistance effect predicted within and among various types of cancer Results The study systematically evaluates clonal evolution within and between different types of cancer using DNA and RNA NGS data from over 20,000 patients from sequencing consortia (TCGA). The unit has catalogued the tumor heterogeneity of these samples in a limited number of evolutionary models in order to offer information on the different selective advantages acquired by the tumor during its growth. It was therefore possible to group the tumor based on their evolution and catalogue the cancer genes based on their selective advantage. Currently, the unit is validating the predictions obtained on sarcomas on samples of osteosarcomas sequenced within the SARGEN project. The results converge towards clonal and subclonal alterations of the chromosome segregation mechanism during the mitotic process common to most pediatric patients analyzed. Surprisingly, although pediatric sarcomas are characterized by a limited number of somatic mutations, these tumors are distinguished by a high number of splicing variants (Figure 8). Currently the unit is engaged in the identification of those pathogenic splicing variants that may be subject to new precision medicine. In particular, the new splicing junctions have a greater impact on the genes that code for membrane receptors and respective ligands compared to somatic mutations. Validations of pathogenic variants are in progress through the third-generation Oxford Nanopore sequencing technology. 10
Figure 8. A. Number of new splicing variants (NJ) and somatic mutations in pediatric patients with osteosarcoma. B. Distribution of new splicing variants (NJ) and somatic mutations in patients. C. Distribution of the degree of protein connections in the ligand-receptor interaction network. D. Ligands and receptors characterized by splicing alteration. E. Genes recurrently affected by splicing changes in patients with osteosarcoma. Collaborations Prof Franca Fagioli (Pediatric Onco-Hematology, Stem Cell Transplantation and Cellular Therapy Division, Regina Margherita Children’s Hospital, Torino) 11
Project 4: “Study of the micro immune environment in breast cancer in order to evaluate its effects on therapeutic treatments and identify new vulnerabilities” M. Cereda (PI), S. Peirone Aims 1. To analyze genomic and transcriptomic data of breast tumors in order to characterize their micro immune environment 2. To Identify gene expression signatures and related markers that explain the type of micro environment specific to each tumor 3. To experimentally validate the action of the identified markers and their contribution in immunotherapeutic treatments Results The unit analyzed the gene expression data produced in Prof. Dazzi's laboratory at King's College London and identified the PTGS2 gene as the main candidate for immune evasion in breast cancer. Currently, the unit is analyzing transcriptomic data through statistical and artificial intelligence techniques for 982 tumor samples and 89 healthy counterparts from international consortia in order to evaluate PTGS2's pro or anti-tumor activity. Collaborations Prof Francesca Ciccarelli (Francis Crick Institute, London, UK) Prof Francesco Dazzi (King’s College London, London, UK) Funds and Grants AIRC MGAF 20566 “Deciphering alternative splicing deregulation in cancer to identify novel therapeutic targets” (PI: M. Cereda) 12
Molecular Epidemiology and Exposome Unit Research Group Alessio Naccarati, PhD, Senior researcher (IIGM), Head of Unit, since Sept. 2019 Paolo Vineis, former Head of the Unit, Unit honorary member (IIGM) and Environmental Epidemiology Chair (Imperial College, London, UK) Chiara Catalano, Master degree student (UniTo) Francesca Cordero, researcher (UniTo) and visiting scientist (IIGM) Antonio Francavilla, PhD student (UniTo) and Fondazione Celiachia fellow Amedeo Gagliardi, fellow (IIGM) Valentina Panero, research technician (IIGM) Barbara Pardini, PhD, senior researcher (IIGM) Giulia Beatrice Piaggeschi, PhD student (UniTo and IIGM) Sonia Tarallo, PhD, junior researcher (IIGM) Szimonetta Turoczi, student (Erasmus trainship from Eötvös Loránd University, Budapest, and IIGM fellow) Alexandru Anton Sabo, visiting scientist (COST Action CA 17118) Flavia Genua, visiting scientist (COST Action CA 17118) 13
The Unit`s research program is focused on the integration of environmental and individual (exposures, lifestyles, diseases and intermediate phenotypes) genomic and epigenomic data collected in large prospective studies. The research program has three main objectives: (a) to examine and apply new laboratory techniques for the analysis of “fingerprints” left on RNA, DNA, and proteins by environmental exposure in human population; (b) to identify new markers in blood and other body fluids that allow early diagnosis of diseases and possibly a more effective therapy; (c) to use and apply cutting-edge technologies for the assessment of environmental exposures; d) to analyze the socio-economic status in relation to the healthy status and epigenetic profiles, as in the innovative projects financed by the European Commission "Exposomics" (FP7) and "Lifepath" (Horizon 2020). In 2019 a new European Horizon 2020 project "Oncobiome" has started (A. Naccarati is a WP coordinator) which pursues the following objectives: 1) to identify and validate specific microbial profiles linked to the onset of cancer, prognosis, response to therapy or other specific effects 2) to understand the functional relevance of gut commensal ecosystems associated with cancer 3) to integrate the results with other oncological aspects (clinical, genomic, immune, metabolomics) 4) to design complementary tests using integrated "molecular signatures" to foresee cancer onset and progression. The Unit is among the very few in Italy and Europe which combines population research and the development of molecular tests to be used for early diagnosis and primary prevention of diseases like cancer. The development of high-throughput laboratory technologies and the collaboration with a big network of researchers has made the Unit an important node of the European Research system. The main impact on the regional healthcare system derives from the collaboration with the Piedmont Centre for Cancer Prevention, wherewith we have several research lines in common. 14
Projects Project 1: “Research of biomarkers for primary and secondary prevention of tumors” Subproject “Evaluation of DNA methylation in leukocyte sub-populations associated with tobacco smoking exposure” G. Piaggeschi (PI), V. Panero, S. Tarallo, F. Cordero, C. Catalano in collaboration with S. Polidoro Aims This project is the natural continuation of the work carried out by our group on the relationship between smoking habits and DNA epigenetic alterations. So far, the analyses have been carried out on whole blood samples prospectively collected and stored in biological banks. This study aims to investigate more in-depth the epigenetic signals by analyzing the separate cell-types present in the blood. This will allow to understand if the signals detected are due to the contribution of a single cell- type or a leukocyte line and it will help to better understand the kinetics of smoke exposure markers to return to normal levels. The experimental design of the project is based on a preliminary phase of flow cytometry analyses which will highlight the differences in the proportions of the main cell subpopulations among smokers and non-smokers. In parallel, the expression levels of membrane receptor (GPR15+), known to be significantly elevated in smokers, will be evaluated in the different cell-types. In addition, the levels of cotinine (a metabolite of nicotine) will be measured in plasma samples as a direct indicator of smoking exposure. For this type of analyses, fresh blood samples of smokers, ex-smokers and non-smokers have been collected from healthy volunteers. The data obtained from the first phase will allow the setting up of the analyses for the second phase of the project in which the different cell-types will be separated for a specific methylation analyses performed by bisulfite sequencing. This technique permits to sequence the genomic DNA converted with the sodium bisulfite in order to underline the DNA methylation patterns at single CpG level. Results Recruitment and samples collection started in April 2017 in collaboration with the Italian Blood Volunteers Association (AVIS) ending in December 2018 with a group of 300 samples collected. For all samples, nine leukocyte subpopulations (T, T-helper, T-cytotoxic, NK, NKT, B, monocytes, neutrophils and eosinophils) were quantified by FACS with the relative measurement of GPR15+ in each cell subtype. In smokers, there was a decrease in GPR15+ in NK cells and an increase of the same marker in B, T, T-helper and T-cytoxic cells compared to non-smokers (P
For the second phase of the project, an additional 16 healthy subjects (8 smokers and 8 non- smokers) matched by gender and age were recruited. From their samples, the nine cell subpopulations were separated by FACS and the DNA was extracted. Sequencing is currently undergoing at the Genecore facility EMBL, where two sequencing techniques (whole genome and targeted) will be tested on the same samples. In order to develop a bioinformatics pipeline to analyze the methylation sequencing data, an experimental and computational study has been conducted on 22 whole blood samples from the EPIC-Italy cohort to evaluate whether there were differences between the Illumina Bead-Chip array and the targeted sequencing technique to analyze the methylation levels. The obtained results demonstrate that the methylation levels of 1054 CpG sites analyzed with the two approaches correlated with high correlation coefficients (range 92-99%) demonstrating that the two techniques are comparable (Figure 9 B). Furthermore, within this subproject, MethylFASTQ, a tool that simulates the production of NGS methylation data has been developed to create the analysis pipeline and compared with other tools available in the literature (Piaggeschi G, et al., 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) 2019). A B Figure 9. A. The box plots show the different percentage of the cell subtypes (y axis) in the smoke categories (x axis). B. Example of correlation between methylation levels obtained by the Illumina BeadChip and those by Targeted Sequencing of sample ID2463313. 16
Collaborations Dr Laura Conti (Department of Molecular Biotechnology-Immunology Laboratory, UniTo) for technical support and FACS flow cytometric analysis Dr Davide Brusa (Universite Catholique de Louvain (UCL), Institute of Experimental and Clinical Research (IREC), Brussels, Belgium) for the experimental design and initial support to set up the flow cytometric analysis technique Dr Mario Falchi and Dr. Alessia Visconti (Computational Biology group, King's College London, Dept. Twin Research & Genetic Epidemiology), where G. Piaggeschi spent a PhD period abroad (25/02/2019 to 25/07/2019, 5 months). The project concerned the analysis of the effect of lifestyles on the composition of immune cells through the integration of "omics" data. A publication is being prepared in this regard 17
Subproject “Role of microRNA and gut microbiome as colorectal cancer biomarkers” A. Naccarati (PI), S. Tarallo, B. Pardini, A. Francavilla, F. Cordero, S. Turoczi, P. Vineis Aims The aim of this project is to analyze miRNA and other small noncoding RNA (sncRNA) expression profiles using Next Generation Sequencing (NGS) technologies in different biological samples and to relate the obtained profiles with the risk of colorectal cancer or precancerous forms. The hypothesis is that specific miRNA profiles can be used to identify patients with cancer or precancerous lesions by differentiating them from healthy patients, and thus their usage in concomitance with the existing screening methods could improve early diagnosis. A further objective of the study is to examine the possible alterations of miRNA expression levels due to the influence of diet, lifestyles and other environmental factors, and to evaluate how much these potential alterations can affect colorectal cancer onset. As part of the collaboration with UniTo, the S. Rita Clinic of Vercelli and the University of Trento (Resp. Dr Nicola Segata), two projects have been funded by Lega Italiana per la Lotta contro i Tumori (LILT 2015, 2018). In addition to the analysis of miRNAs and sncRNAs, the project includes the study of the gut microbiome composition in stool samples from the same subjects. For this purpose, high resolution shotgun metagenomics methods and an integration between the data from the two types of NGS sequencing were performed. Both research areas will be developed in the frame of H2020 Oncobiome project funded by the European Community which started on January 1, 2019. Results In our laboratory, a protocol for extracting RNA from stool samples and plasma exosomes has been developed for the optimal preparation of libraries for NGS sequencing. The discovery phase on 230 samples of subjects with different diagnoses (cancer, precancerous lesions, inflammations or negative colonoscopy) together with the related statistical analyses of the data obtained is concluded. At the same time, a collection of samples of subjects from the Czech Republic was set up at the time of colonoscopy to be used as a population for the validation of the signals obtained on the Italian cohort. Over 160 samples from patients with CRC, precancerous lesions and healthy subjects were collected and processed. Statistical analyses are currently undergoing but some preliminary results are already available. The results of microbial DNA sequencing in the stool samples of the same subjects have been published in the journal Nature Medicine in two back-to-back articles that compared the data of over 900 fecal metagenomes obtained from our and other cohorts, or from available datasets with 18
bioinformatics analyses and different approaches (Thomas AM, et al., Nature Medicine 2019; Wirbel J, et al., Nature Medicine 2019). In the meantime, we have also analyzed sncRNAs expression profiles in relation to metagenomics data in the feces of 80 subjects (adenomas and CRC). The comparison between bacterial expression profiles identified by analyzing shotgun metagenomics data and transcriptomics revealed a high correlation between the profiles. Moreover, we observed different bacterial species present in the different categories of subjects and importantly, a set of biomarkers composed of human-derived sncRNAs and microbial organisms allowed to achieve 88% of disease predictivity for the analyzed subjects (Tarallo S, et al., mSystems 2019) (Figure 10). Figure 10. Simultaneous analysis of small RNA-sequencing data (human and non-human sncRNAs) and whole metagenome sequencing (microbiome composition). A. Relative abundance of bacterial phyla (above) and bsRNAs (below) obtained respectively from Whole Metagenome Sequencing (WMS) and small RNA sequencing (sRNA-Seq) data. B. Area Under the Curves (AUCs) obtained by a Random Forest algorithm applied on WMS data (bDNA), sRNA-seq (bsRNAs), their combination (bDNA + bsRNAs) or combined together with the miRNAs expression levels (hsa-miRNAs + bDNA + bsRNAs). C. AUCs obtained through a Random Forest algorithm for the classification of colorectal cancer and healthy subjects show a specific number of signatures in relation to the information provided as input. A panel of 32 signatures (based on miRNAs, small bacterial RNAs and microbial DNA) provides a high accuracy in distinguishing patients with colorectal cancer from healthy control subjects. D. Contribution of each of the 32 signatures that provide the best classification accuracy between tumor and healthy samples. Collaborations Dr Giulio Ferrero (UniTO) for the bioinformatics analyses Dr Giuseppe Clerico, Dr. Gaetano Gallo (S. Rita di Vercelli Clinic) for the samples and clinical / anamnestic data collection Department of Molecular Biology of Cancer (Institute of Exp. Medicine, Czech Academy of Sciences, Prague, Czech Rep.) for the validation cohort from the Czech Republic Dr Nicola Segata (Center for Integrative Biology, Cibio, University of Trento) for the microbiome study 19
Subproject "Identification and comparison of miRNA expression profiles in plasma and feces and composition of the intestinal microbiome in subjects participating at the colon cancer screening program: how diet and lifestyle can alter the expression of miRNAs and the composition of the intestinal microbiota” A. Naccarati (PI), S. Tarallo, A. Francavilla, S. Turoczi, B. Pardini Colorectal cancer (CRC) represents the third largest cancer site in men and the second in women in the European Union. Population screening programs aim to reduce its incidence and mortality. To date, the gold standard for screening is colonoscopy considering its reliability and sensitivity, but it is invasive and expensive. However, this procedure can reduce the incidence of CRC by 60-90%. Several European countries have introduced non-invasive screening programs such as fecal occult blood (FOBT), fecal immunochemical test (FIT) or sigmoidoscopy but these methods have limitations in specificity or sensitivity. Aims 1. To evaluate the potential role of altered miRNA expression levels and the composition of the intestinal microbiota, as screening biomarkers for CRC. These specific "signatures" will be evaluated in plasma and feces using high-throughput methods as biomarkers for the screening of adenoma and CRC or in both, between subjects resulted positive (FIT+) and negative (FIT-) on the screening test FIT. 2. To test the predictive ability of the risk score, combining the information deriving from biomarkers, lifestyle data and the results obtained from the screening test able to determine the different subgroups and the different risk levels of the CRC. Results FIT samples collection for the analysis of the microbiome and the stool and plasma samples collection (together with diet and lifestyle data) from subjects tested positive for the FIT test are still ongoing. In December 2019, 4406 samples were collected and an aliquot of each sample was sent to the American National Cancer Institute for microbiome analysis. A dataset of clinical and anamnestic data was also prepared. 340 samples were recruited from the 1480 FIT+ subjects collected, including plasma and feces for the analysis of miRNA expression and the gut 20
microbiome composition. Preliminary analyses on the percentages of subjects at colonoscopy with advanced adenomas and CRC, reflect the expected results (25% and 4%, respectively). The subjects who were FIT- were offered a telephone interview on eating habits and lifestyle. Among the 1817 contacted, we got the answers to the questionnaire for 968 of them. To evaluate the potential impact of the screening test in following subjects at risk of tumor development, we started the collection of subjects who had a FIT- results at the first recall and that, after two years, are called back to repeat the test. Collaborations Dr Carlo Senore (Centro di Riferimento per l'Epidemiologia e la Prevenzione Oncologica in Piemonte, AOU Città della Salute e della Scienza di Torino) for the CRC screening population Dr Rashmi Sinha (National Cancer Institute Division of Cancer Epidemiology & Genetics, Metabolic Epidemiology Branch) and Dr Marc Gunter (the International Agency for Research on Cancer (IARC) of Lyon) for the microbiome analysis The project is also included in the "Collaborative International Network of Microbiome Cohorts Nested within National / Regional Colorectal Cancer Screening Programs" consortium led by IARC 21
Subproject “Study of the expression and composition of the intestinal microbiota in relation to different eating habits” S. Tarallo (PI), A. Naccarati, A. Francavilla, F. Cordero, B. Pardini Aims Although the relationship between diet, lifestyle and health status is ascertained, it is still difficult to correlate specific dietary patterns with health status. The diet can modulate the expression of microRNAs (miRNAs), small RNA molecules that regulate gene expression. Recent studies have shown that fecal miRNAs directly regulate the expression of specific bacterial genes and also microbial growth, making them essential for maintaining the balance of the intestinal microbiome. The altered expression of miRNAs is involved in the development of several diseases including cancer. In a previous study, our group showed for the first time that some miRNAs, known in the literature for their altered levels in colon cancer, were differentially expressed in plasma and feces in relation to different dietary habits (Tarallo S, et al., Mutagenesis 2014). However, further studies are needed on a larger number of subjects to better understand possible variations in healthy populations. Furthermore, several factors can positively or negatively influence the composition of the gut microbiome. These changes play an important role in the development of various chronic diseases. The main purpose of this study is to evaluate, through NGS methods, miRNA expression profiles and the gut microbiome composition, in plasma and feces of a large group of healthy volunteer subjects with different dietary habits (vegans, vegetarians and omnivores). For future "personalized medicine’ approaches, miRNA expression profiles and the gut microbiome composition could be useful biomarkers to identify healthy nutritional status. This would improve the prevention and treatment of various diseases. PI Sonia Tarallo received a Post-Doctoral fellowship (Years 2017 and 2018) from the Veronesi Foundation for this project. Results Plasma, serum, and stool samples together with questionnaires on diet and lifestyle were collected from 120 healthy volunteers with different dietary habits (vegans (V), vegetarians (VG), and omnivores (O)). All the information related to the anthropometric measurements (weight, height and abdominal circumference), serological data and those deriving from the questionnaires were included in a specific dataset. From the analyses of anthropometric data it has been observed that the BMI of O is significantly higher compared to those of VG and V. Furthermore, from serological analyses, ferritin resulted significantly lower in V and VG than in O, whereas vitamin B12 resulted 22
significantly lower in VG than in the V that habitually integrated it. From a preliminary analysis of the sequencing data regarding miRNA expression levels in stool samples we observed the presence of 47 differentially expressed miRNAs (DEmiRNAs) in each group with a trend among the three categories (Figure 11 A, B). A B Figure 11. A. DEmiRNAs in the fecal samples showing a trend of expression between the different types of diet investigated. B. Example of two miRNAs (miR-12121-5p and miR-425-3p) respectively upregulated and downregulated by comparing subjects with an omnivorous, vegetarian and vegan diet. Gut microbiome composition, preliminarily investigated using the 16S method, highlighted that subjects with a vegetarian and vegan diet have a more abundant percentage of Prevotella than the omnivores in which the Bacteroides predominate. Subsequently, in collaboration with the University of Naples, some samples were analyzed using a shotgun method for the validation of a study on the microbiome and diet. It has been observed that in diets rich in fiber the P. copri strain predominates, with an ability to metabolize carbohydrates, while in omnivores an increase in strains involved in the aminoacids biosynthesis is observed; the latter are known risk factors for the development of glucose intolerances and type 2 diabetes. The diversity at the level of P. copri strains is therefore modulated by the diet and presents a very varied repertoire of functions based on the genetic profile. Its importance has so far been underestimated (De Filippis F, et al., Cell Host Microbe 2019). To date, all the samples of the study have been re-sequenced using a shotgun method allowing a greater depth of detection of the species present and the microbial functionality. The analyses are in progress. The further analysis of the nutrient indexes obtained from the data of the EPIC questionnaires on food habits have been completed and will be related to the data obtained from the expression of miRNAs and gut microbiome with respect to the different dietary habits. 23
Collaborations Dr. Nicola Segata (Centre for Integrative Biology, Cibio, University of Trento) Dr. Danilo Ercolini (Dept. of Agricultural Sciences, University of Naples Federico II) Prof Vittorio Krogh (Epidemiology and Prevention Unit, IRCCS Foundation National Cancer Institute of Milan) Subproject “Profiles of miRNA expression in faeces and plasma of subjects affected by celiac disease by Next-Generation-Sequencing” A. Naccarati (PI), A. Francavilla (PI), S. Tarallo, B. Pardini, F. Cordero, P. Vineis Aims Celiac disease occurs in about 1% of the world's population, although most people affected remain unaware of it throughout their lives. New biomarkers could be useful in the diagnosis and monitoring of celiac disease. For this purpose, molecular markers based on new species of non-coding RNAs (ncRNAs) detectable in surrogate tissues, as well as the composition of the gut microbiome can represent an interesting research field. MicroRNAs (miRNAs) are small ~ 22 nucleotide long non- coding RNA molecules with an altered expression in many gastrointestinal diseases and therefore interesting to study as potential biomarkers. Dysbiosis, a recurrent feature in gastrointestinal diseases, including celiac disease, is the pathogenic alteration of the composition of the intestinal microbiota which can disrupt intestinal homeostasis and promote inflammation. As a consequence, there is a growing interest in studying the interactions between microbiome and host. MiRNAs and other small RNAs seem to be involved in these interactions. For all these reasons, our study aims to analyze the miRNA and microbiome profiles in stool / plasma samples using Next Generation Sequencing (NGS) by comparing the effect of the gluten-free diet between celiac and healthy subjects and in patients with a new diagnosis of Celiac disease before and after one year from the beginning of the gluten-free diet. For this project, Antonio Francavilla received a three-year scholarship from the Italian Celiac Association (AIC). Results To date, 40 celiac patients already enrolled in the gluten-free diet have been recruited, 3 patients with new diagnosis and 28 healthy controls matched by age and gender. For 45 of them, RNA 24
extracted from stool samples was used for the first sequencing experiments. As preliminary results we observed a group of miRNAs with lower expression levels in celiac subjects on diet compared to controls. In parallel, it was observed a correlation between the expression levels of some miRNAs with the years of adherence to the gluten-free diet (Figure 12). Interestingly, enrichment analyses of the target genes of dysregulated miRNAs have shown their involvement in inflammatory processes associated with celiac disease. The research activity planned for the coming months includes the recruitment of further celiac and healthy subjects, the continuation of the sequencing analyses in faeces and plasma as well as the metagenomic analyses of stool samples. Finally, a combination between the miRNA / microbiome profiles observed in this study with those of other gastrointestinal diseases studied by our team will be also performed. A B Figure 12. A. Heatmap based on the expression levels of 8 miRNAs significantly downregulated in celiac patients on a gluten-free diet compared to controls with an unrestricted diet. B. Principal Component Analysis (PCA) shows the separation between celiac subjects and healthy controls by using the expression values of miRNAs differently expressed but also a separation between celiac subjects in relationship to years of gluten- free diet. Collaborations Dr. Nicola Segata (Centre for Integrative Biology, Cibio, University of Trento) Dr. Lucia Crocellà, Dr. Cristina Guiotto, Prof. Rodolfo Rocca (Mauritian Order Hospital, Turin) Dr. Davide Ribaldone, Dr. Gianpiero Caviglia (San Giovanni Antica Sede Hospital – SGAS) Prof. Mauro Bruno (University Hospital of the City of Health and Science, Turin) 25
Project 2: “Social inequalities, biological pathways and health” P. Vineis (PI), A. Gagliardi, V. Panero in collaboration with S. Polidoro and G. Fiorito Aims LIFEPATH was a four-year project funded within Horizon 2020 program, which ended last year. The project coordinator is Paolo Vineis and IIGM participates as partner (WP4, biomarkers). The main objective of the project is to investigate the determinants of divergent aging pathways between individuals belonging to different socio-economic groups. This objective was achieved through an original study design that integrates the social sciences approach with biology (including molecular epidemiology), using existing population cohorts and "omics" analyses (in particular Epigenomics and Metabolomics). To achieve these objectives, the project used data from three categories of studies: 1. European or national studies in combination with data from population registers; 2. cohorts with intensive phenotyping and repeated biological samples (total population> 33,000); 3. large cohorts with unrepeated biological samples (total population> 202,000). The cohorts provide information on aging and health conditions (multi-morbidity) in the various stages of life, based on the paradigm of the life-course epidemiology ("build-up and decline"). In 19 European cohorts, existing DNA methylation data, inflammation markers, and metabolomics (up to 35,000 subjects) were analyzed. IIGM deals with the harmonization and integration of these data with new DNA methylation analyses on approximately 3,000 subjects. Results Genome-wide methylation analyses on 3000 samples selected by the cohorts (TILDA - Ireland, Generation21 - Portugal, Airwave - UK) have been completed. Various studies using the biomarker data produced within the project are still ongoing. An article examining the association between allostatic load and acceleration of epigenetic aging (AA) in the Irish cohort TILDA has been published in Psychoneuroendocrinology. The allostatic load is a multidimensional index that uses cardiovascular, neuroendocrine, immunological and metabolomics markers to try to quantify the biological risk due to different stresses during the life of individuals. In this work the Allostatic Load – AL - was estimated using 16 biomarkers. The results show that the cumulative association of the allostatic load with the acceleration of the epigenetic age (Age Acceleration - AA) is close to zero, becoming significant if only men are considered (McCrory C, et al., J Gerontol A Biol Sci Med Sci. 2020). 26
You can also read