USING EMBL-EBI CHEMOGENOMICS RESOURCES TO SUPPORT DRUG DISCOVERY RESEARCH - ANNE HERSEY CHEMBL GROUP, EMBL-EBI

Page created by Eddie Padilla
 
CONTINUE READING
USING EMBL-EBI CHEMOGENOMICS RESOURCES TO SUPPORT DRUG DISCOVERY RESEARCH - ANNE HERSEY CHEMBL GROUP, EMBL-EBI
Using EMBL-EBI Chemogenomics
Resources to Support Drug Discovery
             Research

Anne Hersey
ChEMBL Group, EMBL-EBI
ahersey@ebi.ac.uk
USING EMBL-EBI CHEMOGENOMICS RESOURCES TO SUPPORT DRUG DISCOVERY RESEARCH - ANNE HERSEY CHEMBL GROUP, EMBL-EBI
Outline of Talk
    How can chemogenomics resources help answer:
    • Which compound?
    • Does it get to site of action?
    • Is it safe?

    Focus on:
    •   ChEMBL – Bioactivity database
    •   SureChEMBL – Patent Database
    •   UniChem – compound cross referencing
    •   Examples by ChEMBL Users

2
USING EMBL-EBI CHEMOGENOMICS RESOURCES TO SUPPORT DRUG DISCOVERY RESEARCH - ANNE HERSEY CHEMBL GROUP, EMBL-EBI
Drug Discovery Pipeline
                                 Hit            Lead
    Target identification                                   Pre-Clinical   Clinical
                            Identification   Optimisation

        • Target Identification (What to work on?)
           •   Target to Disease links - Open Targets
        • Hit identification/Lead Optimisation (Which compound?)
           •   Bioactivity Data - ChEMBL, SureChEMBL, UniChem
        • Pre-clinical data (Is it safe, does it get there?)
           •   in-vivo exposure and toxicity data - ChEMBL
        • Clinical Candidates & Approved Drugs (What works?)
           •   Efficacy target, disease annotation - ChEMBL

3
USING EMBL-EBI CHEMOGENOMICS RESOURCES TO SUPPORT DRUG DISCOVERY RESEARCH - ANNE HERSEY CHEMBL GROUP, EMBL-EBI
ChEMBL Database Content
          Scientific literature                      Public databases

•   Medicinal chemistry
•   ADME relevant                                          •   Confirmatory assays
•   Agrochemicals
•   Reviews etc
                                                                         Patents
                                                               •    Bioactivities
                                                                                    Underexplored
    Deposited data sets                                                             targets (IDG)

                                                                   Toxicity reference data sets
                                     1,800,000
                                     compds

                          Clinical research compound sets
                                                 •   Black box warning/withdrawals
                                                 •   Therapeutic mechanism/indications
                                                 •   Molecule features
USING EMBL-EBI CHEMOGENOMICS RESOURCES TO SUPPORT DRUG DISCOVERY RESEARCH - ANNE HERSEY CHEMBL GROUP, EMBL-EBI
ChEMBL Data and Curation Pipeline
                                                                                                                                     Parent
                                   Compounds                                                                                         structures
                                                                                                                          Salt                         Property
       Documents                                                        standardisation                                                                                        aLogP        0.74
                                                                                                                          removal                     calculation              HBA          2
                                                                                                                                                                               HBD          1
                                                                                                                                                                               MWt          155.2
                                                                    •   Functional groups                                                                                      Ro5          0
                                                                    •   Drawing conventions                                                                                    RTB          2
                                                                                                                                                                               HAtoms       11
                                                                                                                                                                               pKa          7.84
                                                                                                                                                                               LogD         -0.78
                                                                                                                                                                               QED          0.55
                                                                                                                                                                               …
                                                                                                                                       Linked by
                                                              Activities                                                               compound hierarchy

                                                        Assays
                                               Cmpd A

                                               Cmpd B
                                                                                IC50 = 0.01 µM                       Molfiles                      Compound registration
                                                         Assay 1
                                                                                                                     • Depictions
                                   Compounds

                                                                                                                                                  InChI=1S/C14H9ClF2N2O2/c15-8-4-6-9…
                                               Cmpd C
                                                                                                                                                  InChI=1S/C15H12N2O3/c18-17(19)14-9…

       Assays                                  Cmpd A                           EC50 = 1 µM
                                                                                                                     • Chemistry cartridge        InChI=1S/C8H13NO2/c1-9-5-3-4-7(6-9)8…
                                                                                                                                                  InChI=1S/C8H13NO2.BrH/c1-9-5-3-4-7(6…
                                                          Assay 2
                                               Cmpd C

                                               Cmpd C     Assay 3               ED50 = 10 mg/kg
                                                                                                             Activity curation
 1         2            3
                                                                                                             • Unit standardisation                                         Compound
                                                                                                             • Outlier flagging                                             information
                                                   Assay
               Target                              description
               assignment                          (free text)               Ontology annotation                                    Experimental
                                                                             • BAO                                                  data                                                   Molecule
                                                                                                                                                                  Activities               dictionary

                                                                                                                                                                   Assays                  hierarchy
                                                                 • UniProt                                       Ontology annotation
                                 Target                                 • EBI Complex Portal                     • GO terms                                                               properties
                                 registration                           • IUPhar                                                                                   Target
                                                                                                                                                                 dictionary
PDE5      HEK293      Rattus                                     • Cell Dictionary                Collated from various                                                                   Documents
           cells    norvegicus                                                                    existing ontologies                    Target
                                                                 • Tissue Dictionary                                                     information
                                                                 • NCBI taxonomy                                                                                       Source information
USING EMBL-EBI CHEMOGENOMICS RESOURCES TO SUPPORT DRUG DISCOVERY RESEARCH - ANNE HERSEY CHEMBL GROUP, EMBL-EBI
SureChEMBL Pipeline
Fully automated pipeline
                            Chemical entity recognition         Name to structure
                            and OCR correction                  (5 methods)

                                                 Image to structure
                                                 (1 method)                  Chemical
                                                                            registration

Patent data feed
  (IFI claims)
                                    Complex work units
WO,
EP (app. and granted),                         20 million unique compounds
US (app. and granted),
JP (English abstracts)                         80,000 new compounds per month
USING EMBL-EBI CHEMOGENOMICS RESOURCES TO SUPPORT DRUG DISCOVERY RESEARCH - ANNE HERSEY CHEMBL GROUP, EMBL-EBI
UniChem – Compound Mapping across Resources

                         UniChem
                   156 million compounds
                        35 sources

    Web Services
                                                     Web
                                                         Inter
                                                               face
                                            l oads
                                         own
                                       D

7
USING EMBL-EBI CHEMOGENOMICS RESOURCES TO SUPPORT DRUG DISCOVERY RESEARCH - ANNE HERSEY CHEMBL GROUP, EMBL-EBI
Which Target and Compound?
Marketed drugs       Human drug targets
~ 3,000              ~800
Research compounds   Human targets with bioactive molecules
~1,800,000           ~3,500
                                                              ~1.8 million
USING EMBL-EBI CHEMOGENOMICS RESOURCES TO SUPPORT DRUG DISCOVERY RESEARCH - ANNE HERSEY CHEMBL GROUP, EMBL-EBI
Which Target and Compound?
  Other species
  Non protein targets
  Human Proteins ~20,000

                                                       Patents
Targets

                 Bioactivity Data
                                                       UniChem (10^8)
                                                       GDB17 (10^11)*
               Approved Drugs                          All (10^33)**

                                        Compounds

 9 *JL Reymond JCIM 2013, **PG Polishchuk JCAMD 2013
USING EMBL-EBI CHEMOGENOMICS RESOURCES TO SUPPORT DRUG DISCOVERY RESEARCH - ANNE HERSEY CHEMBL GROUP, EMBL-EBI
Targets
                      Bioactive Molecule Space

                                             Compounds

          Data matrix is far from complete

10
Can we fill the gaps with Predictive Models?
• Complete the matrix with predicted values
            • Can be used as:
              • A surrogate when experimental data is not available
              • A biological fingerprint
                       Targets
Compounds

 Quantitative Structure-Activity Relationship (QSAR) models
 have been built on ~800 ChEMBL targets (~550 human)
 Aim to integrate these predictions with ChEMBL
               Key: (Measured Data)           (Predicted)
               active, inactive, not tested   active, inactive
QSAR Prediction Results
•    Single proteins or protein complexes
•    pCHEMBL values ('IC50', 'EC50', 'XC50', 'AC50', 'Ki', 'Kd', 'Potency’)
•    Data from >2 articles
•    >= 40 unique active and 30 inactive compounds
•    Descriptors - RDKit Morgan fingerprints and physicochemical

    Prediction by different methods           Prediction for different target classes

     Also comparing with Conformal Prediction
     Includes inconclusive classification
     (Norinder et al J. Chem. Inf. Model. 2014)
Looking for New Compounds and Chemotypes
                     Marketed Drugs
      UniChem
      (synthesized      ChEMBL (bioactive molecules)
      molecules)
                           SureChEMBL (patented molecules)
Expanding Chemical Space - Example:
Suvorexant, an orexin receptor antagonist
developed by Merck and approved by FDA
in 2014 for treatment of insomnia.
ChEMBL:                           SureChEMBL:
• First mention 2010              • First mention 2008
• IC50/Ki binding to OX1R and     • Mentioned in 77 patents
  OX2R receptors                  • Mentions of Orexin in the same
• PK data in rat and dog            patents
• Anti-insomnic data in animal    • 133 compounds >=80% similarity
  models
• 36 compounds >-80% similarity
Orexin 1 Example: 3 Ring Scaffolds

ChEMBL
90 scaffolds
bioactivity data
for Orexin1

                                                        SureChEMBL
                                                        465 scaffolds
                                                        in patents
                                                        mentioning Orexin 1

15
Can we go a Step Further?
• Suvorexant is in 19 of the UniChem indexed databases
  (including ChEMBL and SureChEMBL)
• Can we search for similar compounds in UniChem?
• Work in progress ….
                                                    Structure
                                     Standardized
 UniChem    InChIs    Structures                    searchable
                                      structures    database
                 Un
                   iCh
                      em
                         Loo
                            ku
                              p
                                                           Similarity
                                   Compounds
                                   matching                 SSS
                                   search

      Similar compounds (>=80%) to Suvorexant
      36 in ChEMBL,133 SureChEMBL,643 in UniChem
ChEMBL Users – Drug Discovery Use Cases
J Med Chem 2018                                      PLOS ONE 2014

              “we also mine the open access
              databases ChEMBL and PDB for                    “application of the prediction model
              fragments showing PDE inhibitory                on external test set composed of
              activity, as well as SureChEMBL for             more than 160 hH4R antagonists
              recent PDE related patents, to                  picked from the ChEMBL database
              provide a wider context for                     gave enrichment factor of 16.4”
              exploring fragment diversity.“

J Med Chem 2017
                                                    J Med Chem 2013

                                                         “we analyzed structures of the D2/5-
                                                         HT2A ligands available in the ChEMBL
         “To validate the ICM VLS models, a              database (Ki < 10 nM for both receptors)
         small set of 100 known KOR ligands              and identified the arylamine moieties that
         from the ChEMBL database were mixed             could bind in the cavity between TMHs
         with a decoy set of 900 compounds and           4−6, which had been identified as a
         docked into all three receptor models.”         tolerant region for chemical expansion”
Does it get to the Site of Action and is it Safe?
ChEMBL ADME and toxicity data sources
• Scientific Literature
• Prescribing Information
• Toxicity Studies

                                        Side effects

                                        Therapeutic effect

                                                 No effect
Does it get there?     ChEMBL PK and PPB Data

Metabolic routes

                     Other disposition relevant data
                     • Solubility
                     • Permeability
                     • Intrinsic clearance
How CHEMBL ADME data is being used
Reasons for Compound Failure
Data from:
“An analysis of the attrition of
drug candidates from four major
pharmaceutical companies”
M Waring et al, NRDD 2015

21
Is it Safe? - Challenges for Safety Prediction
• Lack of compound interaction data for many targets
     relevant to toxicity
• In-vivo toxicity
     • Wide datasets - A different type of “big data”
       • Multiple measured endpoints
       • Time and concentration effects – “everything is toxic”
     • Fewer compounds = lack of chemical diversity
     • Much data is on well annotated marketed drugs
     • Missing exposure and tox data on the same
       compounds
     • Combining data in an easy to use format

22        Pre-competitive initiatives are helping
Biological Fingerprints to Predict Toxicity
 Experimental Data and Model             Biological Fingerprints     Predict Toxicity
         Predictions                         as descriptors             Endpoint

 Toxicol. Res., 2016                 Combined chemical descriptors,
                                     protein target descriptors and
                                     cytotoxicity data to predict organ
                                     toxicity
Chem. Res. Toxicol. 2011
                                      Combined chemical descriptors
                                      and gene expression data to
                                      predict hepatotoxicity

 Need more data, more curation and use of ontologies to link complex data
Learning from Existing drugs - What Works?
ChEMBL Drug & Clinical Candidate Annotation
Which target is responsible for the therapeutic effect of the drug?
Which disease is the drug used to treat?

                       Brivaracetam

                                                       Reasons for Drug Withdrawal
For the Future …
• 5 of the 10 top selling drugs are
  currently mAbs
• Should we be gathering bioactivity data
  on biologics?

                                            Types of Drugs and
                                            Clinical Candidates
Summary
                                  Hit             Lead
     Target identification                                    Pre-Clinical   Clinical
                             Identification    Optimisation

         • Target Identification (What to work on?)
            •   Target to Disease links - Open Targets
         • Hit identification/Lead Optimisation (Which compound?)
            •   Bioactivity Data - ChEMBL, SureChEMBL, UniChem
         • Pre-clinical data (Is it safe, does it get there?)
            •   in-vivo exposure and toxicity data - ChEMBL
         • Clinical Candidates & Approved Drugs (What works?)
            •   Efficacy target, disease annotation - ChEMBL

26
Acknowledgements
        EMBL-EBI Chemogenomics Team
•   Andrew Leach         •   Eloy Felix
•   Anne Hersey          •   Juan Mosquera
•   Anna Gaulton         •   Francis Atkinson
•   Jon Chambers         •   Nicolas Bosc
•   Patricia Bento       •   Fiona Hunter
•   Prudence Mutowo      •   Chris Radoux
•   Paula Magarinos      •   Marleen de Veij
•   David Mendez         •   Aldo Segura
You can also read