USING EMBL-EBI CHEMOGENOMICS RESOURCES TO SUPPORT DRUG DISCOVERY RESEARCH - ANNE HERSEY CHEMBL GROUP, EMBL-EBI
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Using EMBL-EBI Chemogenomics Resources to Support Drug Discovery Research Anne Hersey ChEMBL Group, EMBL-EBI ahersey@ebi.ac.uk
Outline of Talk How can chemogenomics resources help answer: • Which compound? • Does it get to site of action? • Is it safe? Focus on: • ChEMBL – Bioactivity database • SureChEMBL – Patent Database • UniChem – compound cross referencing • Examples by ChEMBL Users 2
Drug Discovery Pipeline Hit Lead Target identification Pre-Clinical Clinical Identification Optimisation • Target Identification (What to work on?) • Target to Disease links - Open Targets • Hit identification/Lead Optimisation (Which compound?) • Bioactivity Data - ChEMBL, SureChEMBL, UniChem • Pre-clinical data (Is it safe, does it get there?) • in-vivo exposure and toxicity data - ChEMBL • Clinical Candidates & Approved Drugs (What works?) • Efficacy target, disease annotation - ChEMBL 3
ChEMBL Database Content Scientific literature Public databases • Medicinal chemistry • ADME relevant • Confirmatory assays • Agrochemicals • Reviews etc Patents • Bioactivities Underexplored Deposited data sets targets (IDG) Toxicity reference data sets 1,800,000 compds Clinical research compound sets • Black box warning/withdrawals • Therapeutic mechanism/indications • Molecule features
ChEMBL Data and Curation Pipeline Parent Compounds structures Salt Property Documents standardisation aLogP 0.74 removal calculation HBA 2 HBD 1 MWt 155.2 • Functional groups Ro5 0 • Drawing conventions RTB 2 HAtoms 11 pKa 7.84 LogD -0.78 QED 0.55 … Linked by Activities compound hierarchy Assays Cmpd A Cmpd B IC50 = 0.01 µM Molfiles Compound registration Assay 1 • Depictions Compounds InChI=1S/C14H9ClF2N2O2/c15-8-4-6-9… Cmpd C InChI=1S/C15H12N2O3/c18-17(19)14-9… Assays Cmpd A EC50 = 1 µM • Chemistry cartridge InChI=1S/C8H13NO2/c1-9-5-3-4-7(6-9)8… InChI=1S/C8H13NO2.BrH/c1-9-5-3-4-7(6… Assay 2 Cmpd C Cmpd C Assay 3 ED50 = 10 mg/kg Activity curation 1 2 3 • Unit standardisation Compound • Outlier flagging information Assay Target description assignment (free text) Ontology annotation Experimental • BAO data Molecule Activities dictionary Assays hierarchy • UniProt Ontology annotation Target • EBI Complex Portal • GO terms properties registration • IUPhar Target dictionary PDE5 HEK293 Rattus • Cell Dictionary Collated from various Documents cells norvegicus existing ontologies Target • Tissue Dictionary information • NCBI taxonomy Source information
SureChEMBL Pipeline Fully automated pipeline Chemical entity recognition Name to structure and OCR correction (5 methods) Image to structure (1 method) Chemical registration Patent data feed (IFI claims) Complex work units WO, EP (app. and granted), 20 million unique compounds US (app. and granted), JP (English abstracts) 80,000 new compounds per month
UniChem – Compound Mapping across Resources UniChem 156 million compounds 35 sources Web Services Web Inter face l oads own D 7
Which Target and Compound? Marketed drugs Human drug targets ~ 3,000 ~800 Research compounds Human targets with bioactive molecules ~1,800,000 ~3,500 ~1.8 million
Which Target and Compound? Other species Non protein targets Human Proteins ~20,000 Patents Targets Bioactivity Data UniChem (10^8) GDB17 (10^11)* Approved Drugs All (10^33)** Compounds 9 *JL Reymond JCIM 2013, **PG Polishchuk JCAMD 2013
Can we fill the gaps with Predictive Models? • Complete the matrix with predicted values • Can be used as: • A surrogate when experimental data is not available • A biological fingerprint Targets Compounds Quantitative Structure-Activity Relationship (QSAR) models have been built on ~800 ChEMBL targets (~550 human) Aim to integrate these predictions with ChEMBL Key: (Measured Data) (Predicted) active, inactive, not tested active, inactive
QSAR Prediction Results • Single proteins or protein complexes • pCHEMBL values ('IC50', 'EC50', 'XC50', 'AC50', 'Ki', 'Kd', 'Potency’) • Data from >2 articles • >= 40 unique active and 30 inactive compounds • Descriptors - RDKit Morgan fingerprints and physicochemical Prediction by different methods Prediction for different target classes Also comparing with Conformal Prediction Includes inconclusive classification (Norinder et al J. Chem. Inf. Model. 2014)
Looking for New Compounds and Chemotypes Marketed Drugs UniChem (synthesized ChEMBL (bioactive molecules) molecules) SureChEMBL (patented molecules)
Expanding Chemical Space - Example: Suvorexant, an orexin receptor antagonist developed by Merck and approved by FDA in 2014 for treatment of insomnia. ChEMBL: SureChEMBL: • First mention 2010 • First mention 2008 • IC50/Ki binding to OX1R and • Mentioned in 77 patents OX2R receptors • Mentions of Orexin in the same • PK data in rat and dog patents • Anti-insomnic data in animal • 133 compounds >=80% similarity models • 36 compounds >-80% similarity
Orexin 1 Example: 3 Ring Scaffolds ChEMBL 90 scaffolds bioactivity data for Orexin1 SureChEMBL 465 scaffolds in patents mentioning Orexin 1 15
Can we go a Step Further? • Suvorexant is in 19 of the UniChem indexed databases (including ChEMBL and SureChEMBL) • Can we search for similar compounds in UniChem? • Work in progress …. Structure Standardized UniChem InChIs Structures searchable structures database Un iCh em Loo ku p Similarity Compounds matching SSS search Similar compounds (>=80%) to Suvorexant 36 in ChEMBL,133 SureChEMBL,643 in UniChem
ChEMBL Users – Drug Discovery Use Cases J Med Chem 2018 PLOS ONE 2014 “we also mine the open access databases ChEMBL and PDB for “application of the prediction model fragments showing PDE inhibitory on external test set composed of activity, as well as SureChEMBL for more than 160 hH4R antagonists recent PDE related patents, to picked from the ChEMBL database provide a wider context for gave enrichment factor of 16.4” exploring fragment diversity.“ J Med Chem 2017 J Med Chem 2013 “we analyzed structures of the D2/5- HT2A ligands available in the ChEMBL “To validate the ICM VLS models, a database (Ki < 10 nM for both receptors) small set of 100 known KOR ligands and identified the arylamine moieties that from the ChEMBL database were mixed could bind in the cavity between TMHs with a decoy set of 900 compounds and 4−6, which had been identified as a docked into all three receptor models.” tolerant region for chemical expansion”
Does it get to the Site of Action and is it Safe? ChEMBL ADME and toxicity data sources • Scientific Literature • Prescribing Information • Toxicity Studies Side effects Therapeutic effect No effect
Does it get there? ChEMBL PK and PPB Data Metabolic routes Other disposition relevant data • Solubility • Permeability • Intrinsic clearance
How CHEMBL ADME data is being used
Reasons for Compound Failure Data from: “An analysis of the attrition of drug candidates from four major pharmaceutical companies” M Waring et al, NRDD 2015 21
Is it Safe? - Challenges for Safety Prediction • Lack of compound interaction data for many targets relevant to toxicity • In-vivo toxicity • Wide datasets - A different type of “big data” • Multiple measured endpoints • Time and concentration effects – “everything is toxic” • Fewer compounds = lack of chemical diversity • Much data is on well annotated marketed drugs • Missing exposure and tox data on the same compounds • Combining data in an easy to use format 22 Pre-competitive initiatives are helping
Biological Fingerprints to Predict Toxicity Experimental Data and Model Biological Fingerprints Predict Toxicity Predictions as descriptors Endpoint Toxicol. Res., 2016 Combined chemical descriptors, protein target descriptors and cytotoxicity data to predict organ toxicity Chem. Res. Toxicol. 2011 Combined chemical descriptors and gene expression data to predict hepatotoxicity Need more data, more curation and use of ontologies to link complex data
Learning from Existing drugs - What Works? ChEMBL Drug & Clinical Candidate Annotation Which target is responsible for the therapeutic effect of the drug? Which disease is the drug used to treat? Brivaracetam Reasons for Drug Withdrawal
For the Future … • 5 of the 10 top selling drugs are currently mAbs • Should we be gathering bioactivity data on biologics? Types of Drugs and Clinical Candidates
Summary Hit Lead Target identification Pre-Clinical Clinical Identification Optimisation • Target Identification (What to work on?) • Target to Disease links - Open Targets • Hit identification/Lead Optimisation (Which compound?) • Bioactivity Data - ChEMBL, SureChEMBL, UniChem • Pre-clinical data (Is it safe, does it get there?) • in-vivo exposure and toxicity data - ChEMBL • Clinical Candidates & Approved Drugs (What works?) • Efficacy target, disease annotation - ChEMBL 26
Acknowledgements EMBL-EBI Chemogenomics Team • Andrew Leach • Eloy Felix • Anne Hersey • Juan Mosquera • Anna Gaulton • Francis Atkinson • Jon Chambers • Nicolas Bosc • Patricia Bento • Fiona Hunter • Prudence Mutowo • Chris Radoux • Paula Magarinos • Marleen de Veij • David Mendez • Aldo Segura
You can also read