Virtual Screening of C. Sativa Constituents for the Identification of Selective Ligands for Cannabinoid Receptor 2 - MDPI
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
International Journal of Molecular Sciences Article Virtual Screening of C. Sativa Constituents for the Identification of Selective Ligands for Cannabinoid Receptor 2 Mikołaj Mizera 1 , Dorota Latek 2 and Judyta Cielecka-Piontek 1, * 1 Department of Pharmacognosy, Poznan University of Medical Sciences, 60-781 Poznań, Poland; mikolajmizera@gmail.com 2 Faculty of Chemistry, University of Warsaw, 02-093 Warsaw, Poland; dlatek@chem.uw.edu.pl * Correspondence: jpiontek@ump.edu.pl; Tel.: +48-854-67-10 Received: 28 April 2020; Accepted: 24 July 2020; Published: 26 July 2020 Abstract: The selective targeting of the cannabinoid receptor 2 (CB2) is crucial for the development of peripheral system-acting cannabinoid analgesics. This work aimed at computer-assisted identification of prospective CB2-selective compounds among the constituents of Cannabis Sativa. The molecular structures and corresponding binding affinities to CB1 and CB2 receptors were collected from ChEMBL. The molecular structures of Cannabis Sativa constituents were collected from a phytochemical database. The collected records were curated and applied for the development of quantitative structure-activity relationship (QSAR) models with a machine learning approach. The validated models predicted the affinities of Cannabis Sativa constituents. Four structures of CB2 were acquired from the Protein Data Bank (PDB) and the discriminatory ability of CB2-selective ligands and two sets of decoys were tested. We succeeded in developing the QSAR model by achieving Q2 5-CV > 0.62. The QSAR models helped to identify three prospective CB2-selective molecules that are dissimilar to already tested compounds. In a complementary structure-based virtual screening study that used available PDB structures of CB2, the agonist-bound, Cryogenic Electron Microscopy structure of CB2 showed the best statistical performance in discriminating between CB2-active and non-active ligands. The same structure also performed best in discriminating between CB2-selective ligands from non-selective ligands. Keywords: QSAR; endocannabinoid system; Cannabis Sativa 1. Introduction The medical effect of Cannabis sp. bioactive ingredients is the subject of extensive research [1–3]. It has been discovered that Cannabis sp. contains over 500 various compounds, with the cannabinoid group itself having about 110 molecules [4]. Phytocannabinoids present in Cannabis sp. have slight differences in their chemical structures (types: cannabidiol, cannabichromene, cannabitriol, cannabicycol, cannabinodiol, cannabinol, cannabielsoin, cannabigerol, ∆9 -tetrahydrocannabinol, ∆8 -tetrahydrocannabinate) and are not selective. The most well-known phytocannabinoids are THC (∆8 -tetrahydrocannabinol) and CBD (cannabidiol). Research to date reports that both THC and CBD have an affinity for various types of endocannabinoid system receptors [5–7]. For example, cannabidiol is a non-competitive CB1 antagonist, CB2 inverse agonist, GPR55 and GPR18 antagonist, Peroxisome proliferator-activated receptor (PPAR-γ) agonist, α1, α3 glycine agonist, TRPM8 antagonist, and an agonist and antagonist of receptors of various types of serotonin 1A receptors (5-HT1A) [8–11]. The pharmacological effects of the interaction of Active pharmaceutical ingredients (APIs) with CB1 and CB2 receptors are the most investigated. The differences in the density of CB1 and CB2 receptor distribution also determine the activities within the central nervous system [12]. The pharmacological Int. J. Mol. Sci. 2020, 21, 5308; doi:10.3390/ijms21155308 www.mdpi.com/journal/ijms
Int. J. Mol. Sci. 2020, 21, 5308 2 of 14 effect of cannabinoids results from modification of the signaling pathway of two G protein-coupled receptors (GPCRs): cannabinoid receptors 1 and 2 (CB1 and CB2). Activation of the CB1 receptor, which is expressed mainly in the central nervous system, is responsible for their psychotropic action, while CB2 receptors are found mainly in the immune system [13]. This distinct distribution of CB2 receptors in human tissues suggests that selective cannabinoids are promising APIs with anti-inflammatory, analgesic, and anti-neuroinflammatory actions [14]. The discovery of new selective cannabinoids can be streamlined with the application of in silico methods such as structure-based virtual screening (VS) [15–17] or ligand-based (quantitative structure-activity relationship (QSAR)) VS [18]. Machine learning-based QSAR models can be successfully applied in pharmaceutical research related to drug discovery [19], drug formulation [20], and pharmaceutical analysis [21,22]. The applicability of QSAR modeling to predict the selectivity of CB receptors has also been reported in the literature. The CB2-selective activity of 29 benzimidazole and benzothiophene derivatives was investigated with comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) 3D-QSAR models [23] and showed an external predictive performance of R2 > 0.9. In the recent 4D-QSAR study involving molecular dynamics (MD), the modeling was performed on 29 structurally similar CB2 receptor inverse agonists [24]. The 4D-QSAR approach based on partial least squares (PLS) and multiple linear regression (MLR) resulted in Q2 = 0.719 and Q2 = 0.761 for the PLS and MLR models, respectively. The selectivity of 29 arylpyrazole derivatives was investigated with the application of 3D-QSAR/CoMFA analyses [25]. The QSAR model helped to identify the causes of CB1 selectivity by producing counter maps of affinity for both CB receptor subtypes. Nevertheless, the narrow applicability of these models due to similar scaffolds used as training examples may be a significant limitation when they are used in the prediction of novel chemotypes. In contrast, Floresta et al. conducted a 3D-QSAR study on a diverse dataset containing 312 molecules with reported experimental CB1 affinities and 187 molecules with reported CB2 affinities [26]. These models showed a predictive performance of Q2 = 0.62 and Q2 = 0.72 for the CB1 and CB2 QSAR models, respectively. The diversity of molecular structures in the training set also allowed Floresta et al. to perform VS for novel chemical scaffolds. The idea of the QSAR application for the identification of bioactive compounds in plant extract was also used by Labib et al. [27]. This VS study aimed to predict the activities of CB1 and opioid receptors among compounds isolated from Pinus roxburghii bark extract. The model of Labib et al. explained the synergistic anti-inflammatory action of Pinus roxburghii and provided information on the activity of several bioactive molecules identified in the extract. Our study aimed to predict CB2-selectivity for molecules identified in Cannabis Sativa by using a validated QSAR model. This goal was achieved by the execution of several steps: the collection of diverse ligands’ structures with experimental data describing CB1 and CB2 affinity; data curation; conducting QSAR study for CB1 and CB2; and the prediction of the CB1 and CB2 affinity of phytochemicals in the Cannabis Sativa. So far, the lack of crystal structures in cannabinoid receptors has been a major obstacle in searching for new, selective cannabinoids. However, there have been attempts to predict binding modes of well-known actives of CB1 and CB2 using homology models of these receptors and molecular dynamics [17,28,29]. Recent advances in structural studies of cannabinoid receptors 6KPC [30], 6KPF [30], 6PT0 [31], and 5ZTY [32] provided new data that supplemented drug discovery studies [30]. In this study, we tested the applicability of available experimental structures of CB2, solved in both active and inactive conformations, in structure-based VS to search for novel or more CB2-selective ligands. 2. Results 2.1. Study Design Data on the experimentally evaluated affinity of diverse compounds to CB1 and CB2 receptors and data on structures identified in Cannabis Sativa with unknown CB1/CB2 affinity were collected
Int. J. Mol. Sci. 2020, 21, 5308 3 of 14 from publicly accessible online resources. Next, the data with known affinity values were curated and subsequently used for the development and validation of CB1 and CB2 QSAR models. The validated models were used for the prediction of the CB1 and CB2 affinity of Cannabis Sativa ingredients and the prediction of prospective CB2-selective Cannabis Sativa ingredients. The dataset collected for the QSAR study was used to evaluate the statistical characteristics of the discriminatory ability of CB1 and CB2Int.crystal J. Mol. Sci.structures inPEER 2020, 21, x FOR theREVIEW docking study. 3 of 14 2.2.validated models were used for the prediction of the CB1 and CB2 affinity of Cannabis Sativa Data Curation ingredients and the prediction of prospective CB2-selective Cannabis Sativa ingredients. The dataset Datasets collected of ligands for the withwas QSAR study experimental affinities used to evaluate for CB1 the statistical and CB2 were characteristics of thejoined with corresponding discriminatory ability of CB1 records andmetadata assay CB2 crystal from structures in the docking ChEMBL. study. The resulting dataset was composed of six columns containing: the structure of the molecule in simplified molecular-input line-entry system (SMILES) 2.2. Data Curation format, a standard type of reported value (i.e., Ki, IC50, etc.), reported value, the mathematical relation Datasets ofvalue of the reported ligands towith theexperimental experimentally affinities for CB1 and measured CB2 were value, the ID joined with corresponding number of the source document records of assay metadata from ChEMBL. The resulting dataset was composed of six columns with reported study, and the confidence score. The initial dataset described above was subjected containing: the structure of the molecule in simplified molecular-input line-entry system (SMILES) to the removal format, of potentially a standard unreliable type of reported value data (Figure (i.e., Ki, IC50, 1). etc.),The initialvalue, reported datasetthe included mathematical 14,126 records forrelation CB1 and 13,506 records for CB2 (Figure 1.1). Records without reported of the reported value to the experimentally measured value, the ID number of the source SMILEs, which included metal complexes document and polymers with reported study, andwere excludedscore. the confidence fromThe theinitial dataset (Figure dataset 1.2).above described Forwas the remaining subjected records, thetorespective the removalmolecular of potentially unreliablewere structures data (Figure 1). The initial standardized anddataset included 14,126 2D coordinates were generated. Werecords assumed for CB1 that and both13,506 records which the assays, for CB2were(Figure 1.1). Records carried out onwithout a targetreported proteinSMILEs, which or a homologous protein included metal complexes and polymers were excluded from the dataset (Figure 1.2). For the were reliable. Records meeting these criteria were annotated in ChEMBL with a confidence score remaining records, the respective molecular structures were standardized and 2D coordinates were of 9generated. and 8, respectively We assumed (Figurethat both1.3). Only activities the assays, which were measured carried out in on large and consistent a target protein or a assays were picked from the dataset (Figure 1.4).That is, an assay was considered homologous protein were reliable. Records meeting these criteria were annotated in ChEMBL with as suitable for aselection if it contained confidence a score consistent of 9 andgroup of 10 or 8, respectively more1.3). (Figure molecules, all with Only activities the affinity measured measured. in large and consistent These large assays assays wereidentified were picked from the dataset based on the (Figure document 1.4).That is, an assay ID reported was considered in ChEMBL. In theas next suitable step,forrecords with selection if it contained a consistent group of 10 or more molecules, all with the affinity measured. a reported affinity in any standard value type related to Ki were kept (Figure 1.5) and subsequently These large assays were identified based on the document ID reported in ChEMBL. In the next step, converted to pKi. Duplicates analysis (Figure 1.6) was conducted by comparing records InChIKeys. records with a reported affinity in any standard value type related to Ki were kept (Figure 1.5) and Duplicates subsequentlywere mergedtoifpKi. converted theDuplicates standardanalysis deviation of duplicate (Figure measurements 1.6) was conducted wasrecords by comparing lower than 10% of theInChIKeys. entire range of measurements. Duplicates were merged if Asthea standard result, only one record deviation was measurements of duplicate kept, with thewas pKi value averaged. lower than 10% Mordred of the entireand descriptors range of measurements. Morgan fingerprints As were a result, only one record computed for eachwascompound kept, with the in pKi the dataset and thenvalue averaged. Mordred concatenated to createdescriptors a featureandvector. MorganRecords fingerprintswithwere computedfeature duplicate for eachvectors compound were removed in the dataset and then concatenated to create a feature vector. Records with duplicate feature vectors (Figure 1.7). These included, e.g., compounds differing only by chiral hydrogen atoms and those that were removed (Figure 1.7). These included, e.g., compounds differing only by chiral hydrogen atoms could and not thosebe differentiated that by the used could not be differentiated by descriptors and fingerprints. the used descriptors and fingerprints. Figure 1. Data records that passed through the subsequent data curation steps: (1) data acquisition, Figure 1. Data records that passed through the subsequent data curation steps: (1) data acquisition, (2) removing records with no SMILES included, (3) removing records with Confidence Score < 8, (4) (2)keeping removing only records with large records from no SMILES included, assays, (5) (3) removing keeping only records records with with included Confidence Ki values, (6) Score
Int. J. Mol. Sci. 2020, 21, 5308 4 of 14 2.3. Int. Model J. Mol. Sci. Description 2020, 21, x FOR PEER REVIEW 4 of 14 The independent CB1 and CB2 QSAR models were created according to the architecture presented The independent CB1 and CB2 QSAR models were created according to the architecture in Figure 2. The input feature vectors for the machine learning algorithm were molecular descriptors presented in Figure 2. The input feature vectors for the machine learning algorithm were molecular and fingerprints of compounds from the training set (Figure 2.1). The feature vectors along with descriptors and fingerprints of compounds from the training set (Figure 2.1). The feature vectors experimental CB1 and CB2 pKi values were used as a training set for the CB1 and CB2 QSAR models along with experimental CB1 and CB2 pKi values were used as a training set for the CB1 and CB2 (Figure 2.2). QSAR models (Figure 2.2). Figure Figure2.2.Machine Machinelearning-based learning-basedquantitative quantitativestructure-activity structure-activityrelationship relationship(QSAR) (QSAR)model. model. The Thevalue valuepredicted predictedinineach eachleaf leafofofdecision decisiontrees treesininaagradient gradientboosting boosting(GB)(GB)ensemble ensemblewas wasused used for the embedding of a descriptor space (Figure 2.3). The embedded representation for the embedding of a descriptor space (Figure 2.3). The embedded representation of the descriptorof the descriptor space spacewas wasused usedtotocreate createaaseparate separatek-nearest k-nearestneighbor neighbor(kNN)(kNN)model modelfor foreach eachreceptor receptor(Figure (Figure2.4). 2.4). The ThekNN kNNalgorithm algorithmwaswasmodified modifiedtotopredict predictan anaverage averagepKi pKivalue valueofofaacompound compoundonly onlyififall allnearest nearest neighbors neighborsofofthe query the querymolecular molecular structure structurewere within were withina given distance. a given This This distance. maximal distance maximal was distance used as anasapplicability was used domain an applicability domainthreshold related threshold to the related confidence to the confidence of of prediction prediction(Figure (Figure2.5). 2.5). Different Different thresholds weretested thresholds were testedtotoassess assess their their influence influence on statistical on the the statistical characteristics characteristics of kNN ofmodels. kNN models. We selected We selected 20 thresholds 20 thresholds as percentiles as percentiles 5 to a100 5 to 100 with with step a step equal to 5%equal to distribution of the 5% of the distribution of maximal ofEuclidean maximal distances Euclideanwithin distances kNN within clusterskNN clustersfor obtained obtained for the the training set.training set. 2.4.Model 2.4. ModelValidation Validation InInFigure Figure 3, 3,we wepresent present thethe dependence of cross-validated dependence Q2 on Q of cross-validated the applicability 2 on domain threshold. the applicability domain The predictive threshold. The performance showed consistent predictive performance showedbehavior, declining consistent with an behavior, increasedwith declining threshold. The best an increased statistical The threshold. characteristics Q >characteristics 2 best statistical 0.8 for the CB1Qand 2 CB2for > 0.8 models the CB1 was observed and for thewas CB2 models lowest thresholds observed for (QSAR 0.6, as best practices suggested [33]. of QSAR best practices [33]. in studies An autocorrelation plot (Figure 4) shows good agreement between out-of-sample predictions and experimental values for the majority of compounds, and importantly, for the CB2 selectivity ranges of pKi (pKi < 6 for CB1 and pKi > 6.5 for CB2). The majority of data points presented in Figure 4 are distributed between 6 and 8. Because machine learning techniques used in this study are interpolating techniques, we expected our model to overestimate pKi predictions at lower values and underestimate predictions at large ones. Figure 3. Q2—applicability domain threshold dependence for CB1 and CB2 k-nearest neighbor (kNN) models. Each point on the curves represents a Q2 calculated for external predictions for a cumulative fraction of molecular structures in the training dataset within a step of 5%.
In Figure 3, we present the dependence of cross-validated Q2 on the applicability domain threshold. The predictive performance showed consistent behavior, declining with an increased threshold. The best statistical characteristics Q2 > 0.8 for the CB1 and CB2 models was observed for the lowest thresholds ( 0.6, as suggested in studies of QSAR best practices [33]. Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW 5 of 14 Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW 5 of 14 An autocorrelation plot (Figure 4) shows good agreement between out-of-sample predictions and An experimental valuesplot autocorrelation for (Figure the majority of compounds, 4) shows good agreement and importantly, for the CB2predictions between out-of-sample selectivity ranges and of pKi (pKi values experimental < 6 for CB1 for theandmajority pKi > 6.5of forcompounds, CB2). The majority of data points and importantly, for presented in Figure the CB2 selectivity 4 are distributed between ranges of pKi (pKi < 6 for CB1 6 and 8. Because machine learning techniques used in this pKi > 6.5 for CB2). The majority of data points presented in Figure study are 4interpolating areFigure Figure techniques, distributed2 between we6 expected Q2—applicability 3. Q 3. —applicability and domain domain our model 8. threshold Because threshold to overestimate machine dependence dependence learning forCB1 for CB1andpKi and predictions techniques CB2 CB2 usedat k-nearest k-nearest lower in values study and this (kNN) neighbor neighbor (kNN) are underestimate interpolating models. predictions techniques, Each point on weat the large expected curves ones. our representsmodel a Q2to overestimate calculated for pKi externalpredictions predictions at for lower a values cumulative and models. Each point on the curves represents a Q calculated for external predictions for a cumulative 2 underestimate fraction of fraction predictions of molecular molecular at largein structures structures inones. the the training training dataset dataset within within aa step step of of 5%. 5%. Figure 4. Autocorrelation of experimental and predicted pKi values for the CB1 model (blue) and the CB2 model Figure Figure (orange). Autocorrelation 4. Autocorrelation 4. of experimental of experimental and and predicted predicted pKi pKi values values for for the the CB1 CB1 model model (blue) (blue) and and the the CB2 model CB2 model (orange). (orange). 2.5. Virtual Screening of Cannabis Sativa Phytochemicals 2.5. Virtual 2.5. Virtual Screening Screening of of Cannabis Cannabis Sativa Sativa Phytochemicals Phytochemicals Sixty-eight Cannabis Sativa phytochemicals with a molecular weight between 250D and 500D Sixty-eight wereSixty-eight subjected to Cannabis VS using Cannabis Sativa phytochemicals our validated Sativa CB1 and phytochemicals with CB2aaQSAR with molecular weight models. molecular betweenthe On average, weight between 250D 250D and 500D 500D selectivity and of were were subjected compounds subjected to from VS using Cannabis to VS our validated Sativa using our was less validated CB1 and CB1than CB2 QSAR for the and CB2 QSAR models. ChEMBL-derivedOn average, training models. On average, the selectivity the set. of Still, we selectivity of compounds observed compounds fromCannabis from a relatively highSativa Cannabis waswaslessless autocorrelation Sativa than for the ofthan the forChEMBL-derived predicted CB1 and CB2 the ChEMBL-derivedtraining set. of activity Still, training we observed Cannabis set. Sativa Still, we a relatively observed high phytochemicals autocorrelation (see Figure a relatively of the predicted 5) in comparison high autocorrelation of toCB1 the the and CB2 ChEMBLCB1 predicted activity of training Cannabis set. The and CB2 Sativa average activity phytochemicals absoluteSativa of Cannabis dpKi (see Figure between CB25) and phytochemicals in (see comparison CB1 toin was 0.69 Figure 5) the and ChEMBL training 1.26 for molecules comparison set. The average in Cannabis to the ChEMBL absolute Sativa training and set. The dpKi molecules averagebetween in the CB2 absolute and training dpKi CB1respectively. set, wasCB2 between 0.69 and andCB1 1.26was for molecules 0.69 and 1.26in Cannabis Sativainand for molecules molecules Cannabis Sativainand the molecules training set,in respectively. the training set, respectively. Figure 5. The correlation correlationbetween betweenaffinity affinityagainst againstCB1 CB1and CB2 and forfor CB2 compounds in Cannabis compounds Sativa in Cannabis and sativa compounds Figure reported and compounds in ChEMBL. reported between 5. The correlation in ChEMBL. affinity against CB1 and CB2 for compounds in Cannabis sativa and compounds reported in ChEMBL. Despite relatively low average predicted selectivity of compounds in Cannabis Sativa, structures C1–C3 showing Despite dpKi >low relatively 1 and a Tanimoto average predictedcoefficient selectivity(TC) < 0.5 were in of compounds identified Cannabis(Table Sativa,1).structures Low TC values indicated C1–C3 a significant showing dpKi > 1 andstructural a Tanimotodifference in the coefficient identified (TC) molecular < 0.5 were structures identified in Cannabis (Table 1). Low TC Sativa compared values indicated atosignificant the respective most similar structural molecular difference structuresmolecular in the identified in the training set. in Cannabis structures
Int. J. Mol. Sci. 2020, 21, 5308 6 of 14 Despite relatively low average predicted selectivity of compounds in Cannabis Sativa, structures C1–C3 showing dpKi > 1 and a Tanimoto coefficient (TC) < 0.5 were identified (Table 1). Low TC Int. Int. values J. Mol. Sci. Sci. indicated 2020, x21, x FOR axxsignificant PEER REVIEW structural 6 of6614 difference in the identified molecular structures of 14 Cannabis 6in J.Int. Mol. 2020, 21, FOR PEER REVIEW Int. J.J.Int. Int. Mol. Mol.J.J.Mol. Mol. Sci. Sci. Sci.2020, Sci. 2020, 2020, 2020, 21, 21, 21, 21, FOR FOR xxFOR FOR PEER PEER PEER PEER REVIEW REVIEW REVIEW REVIEW 66 of of 14 14 of14 of 14 Sativa compared to the respective most similar molecular structures in the training set. Table Table 1. Compounds 1. Compounds fromfrom Cannabis Cannabis Sativa Sativa thatthat were were predicted predicted as CB2-selective as CB2-selective andandareare alsoalso dissimilar dissimilar Table Table Table Table 1.Compounds 1. Compounds 1. Compounds 1. Compounds from from fromfrom Cannabis Cannabis Cannabis Cannabis Sativa SativaSativa Sativa that that that that were were were were predicted predicted predicted predicted asCB2-selective as CB2-selective as CB2-selective as CB2-selective and and and are and are arealso are also also also dissimilar dissimilar dissimilar dissimilar to to the the ChEMBL ChEMBL training training set. set. set. to the the Table to to to1. the the ChEMBLChEMBL ChEMBL Compounds ChEMBL training training training training set. set. from set. Cannabis Sativa that were predicted as CB2-selective and are also dissimilar Predicted Predicted TaniTani to the ChEMBL training Structure Structure set. Predicted Predicted Predicted Predicted Similar Similar Molecule Molecule for for ChEMBL ChEMBL Tani TaniTani Tani Structure Structure Structure Structure Value Value Similar Similar Similar Molecule Molecule Molecule for for for ChEMBL Similar Molecule for ChEMBL ChEMBL ChEMBL moto moto Value Value Value Value moto moto moto moto Structure Predicted Value Similar Molecule for ChEMBL CoefCoef Tanimoto CB1CB1 CB2CB2 CB1CB1 CB2CB2 Coef Coef Coef CB1CB1 CB1 CB1CB2 CB2 CB2CB2 Structure Structure CB1 CB1 CB1 CB2 CB1 CB2 CB2Coef CB2 ficie ficie Coefficient pKipKi CB1 pKi CB2 pKi Structure Structure Structure Structure pKi CB1 pKi CB2 pKi pKi ficie ficie ficie ficie pKipKi pKi pKi pKi pKi pKi pKi Structure pKi pKi pKi pKi pKi pKi pKi nt nt pKi pKi pKi pKi pKi nt nt ntnt 5.465.46 6.876.87 5.466.87 6.87 4.644.64 4.64 N/A N/A 0.280.28 N/AN/A0.28 0.28 C1 C1 C1 5.465.46 5.46 5.46 6.87 6.87 6.87 4.64 4.64 4.64 4.64 N/A N/A N/A 0.28 0.28 0.28 C1C1C1 C1 (4-hydroxy-6-methoxyspiro[1,2- (4-hydroxy-6-methoxyspiro[1,2- (4-hydroxy-6-methoxyspiro[1,2- (4-hydroxy-6-methoxyspiro[1,2- (4-hydroxy-6-methoxyspiro[1,2- (4-hydroxy-6-methoxyspiro (4-hydroxy-6-methoxyspiro[1,2- CHEMBL1201151 CHEMBL1201151 dihydroindene-3,4’-cyclohexane]-1’-yl) dihydroindene-3,4’-cyclohexane]-1’-yl) CHEMBL1201151 CHEMBL1201151 CHEMBL1201151 CHEMBL1201151 CHEMBL1201151 dihydroindene-3,4’-cyclohexane]-1’-yl) [1,2-dihydroindene-3,4’-cyclohexane] dihydroindene-3,4’-cyclohexane]-1’-yl) dihydroindene-3,4’-cyclohexane]-1’-yl) dihydroindene-3,4’-cyclohexane]-1’-yl) acetate acetate -1’-yl) acetate acetate acetate acetate acetate C2 C2 5.575.57 5.57 6.596.59 6.59 5.576.59 6.59 6.956.95 8.03 6.95 8.03 0.180.18 8.03 6.95 8.03 8.030.18 0.18 0.18 C2 C2 C2 C2 5.57 5.57 5.57 6.59 6.59 6.95 6.95 6.95 8.03 8.03 0.18 0.18 C2 2-[(2R,4aR,8R,8aR)-8-hydroxy-4a,8- 2-[(2R,4aR,8R,8aR)-8-hydroxy-4a,8- 2-[(2R,4aR,8R,8aR)-8-hydroxy-4a, 2-[(2R,4aR,8R,8aR)-8-hydroxy-4a,8- 2-[(2R,4aR,8R,8aR)-8-hydroxy-4a,8- 2-[(2R,4aR,8R,8aR)-8-hydroxy-4a,8- 2-[(2R,4aR,8R,8aR)-8-hydroxy-4a,8- dimethyl-1,2,3,4,5,6,7,8a- dimethyl-1,2,3,4,5,6,7,8a- 8-dimethyl-1,2,3,4,5,6,7, CHEMBL256753 CHEMBL256753 CHEMBL256753 dimethyl-1,2,3,4,5,6,7,8a- dimethyl-1,2,3,4,5,6,7,8a- dimethyl-1,2,3,4,5,6,7,8a- dimethyl-1,2,3,4,5,6,7,8a- CHEMBL256753 CHEMBL256753 CHEMBL256753 CHEMBL256753 octahydronaphthalen-2-yl]prop-2-enoic octahydronaphthalen-2-yl]prop-2-enoic 8a-octahydronaphthalen-2-yl] octahydronaphthalen-2-yl]prop-2-enoic octahydronaphthalen-2-yl]prop-2-enoic octahydronaphthalen-2-yl]prop-2-enoic octahydronaphthalen-2-yl]prop-2-enoic acidacid prop-2-enoic acid acid acidacid acid 6.416.41 7.29 6.417.29 7.29 7.29 7.29 7.337.33 6.04 7.33 7.33 6.04 6.04 0.410.41 6.046.040.41 0.41 0.41 6.41 6.41 6.41 7.29 7.29 7.33 7.33 7.33 6.04 6.04 0.41 0.41 C3 C3 C3 C3 C3 C3 6-carboxy-2-methyl-2-(4-methylpent- 6-carboxy-2-methyl-2-(4-methylpent-3- 6-carboxy-2-methyl-2-(4-methylpent-3- 6-carboxy-2-methyl-2-(4-methylpent-3- 6-carboxy-2-methyl-2-(4-methylpent-3- 6-carboxy-2-methyl-2-(4-methylpent-3- 6-carboxy-2-methyl-2-(4-methylpent-3- 3-enyl)-7-pentylchromen-5-olate enyl)-7-pentylchromen-5-olate enyl)-7-pentylchromen-5-olate enyl)-7-pentylchromen-5-olate enyl)-7-pentylchromen-5-olate enyl)-7-pentylchromen-5-olate enyl)-7-pentylchromen-5-olate 2.6.2.6. 2.6. CB2 CB2 Structure-Based Structure-Based Virtual Virtual Screening Screening Results Results 2.6.CB2 2.6. 2.6. 2.6. CB2 CB2 Structure-Based CB2 CB2 Structure-Based Structure-Based Virtual Structure-Based Structure-Based Screening Virtual Virtual Virtual Virtual Screening Screening Screening Screening Results Results Results Results Results WeWe We performed VS using four PDB structures of the CB2 receptor (PDB id: 6KPF, 6KPC, 6PT0, Weperformed performed VS using VS usingfour PDB four PDBstructures structures of the ofofCB2CB2 the CB2receptor receptor (PDB (PDB id: id:6KPF, id: 6KPF, 6KPF, 6KPC, 6KPC, 6PT0, 6PT0, 6PT0, We Weperformed We performed performed performed VS VSVSusing VSusing usingusing four four four four PDB PDB PDB PDB structures structures structures structures of the of of the the CB2the CB2 CB2 receptor receptor receptor receptor (PDB (PDB (PDB id:(PDB id: 6KPF, 6KPF, id: 6KPC, 6KPC,6KPF, 6KPC, 6PT0, 6PT0, 6KPC, 6PT0, andand 5ZTY) and and 5ZTY) 5ZTY) 5ZTY) and and two and and two two two compounds’ compounds’ compounds’ compounds’ libraries. libraries. libraries. libraries. The The The library librarylibrary for for the for thefirst the first compound compound first compound was wasderived was derived derived from from fromthethe the from and5ZTY) and and 5ZTY) 5ZTY) and and and two two two compounds’ compounds’ compounds’ libraries. TheThe libraries. libraries. The library The librarylibrary for the library for for first the the first for first the compound compound compound first waswas compound was derived derived derived from was from from the the derived the curated, curated, curated, ChEMBL ChEMBLChEMBL training training training dataset dataset dataset prepared prepared prepared for for our for our QSAR our QSAR QSAR modelsmodels models and and andit itit included included included CB2-selective CB2-selective CB2-selective curated, curated, curated, the curated, ChEMBL ChEMBL ChEMBL ChEMBL trainingtraining training dataset dataset training dataset prepared prepared prepared dataset for prepared for for ourourfor our QSAR QSAR QSAR our models models QSAR models andand and models it it it included included and included it6KPF CB2-selective CB2-selective CB2-selective included CB2-selective ligands ligands ligands expanded expanded expanded with with with CB2-non-selective CB2-non-selective CB2-non-selective ligands. ligands. ligands. ForFor thisthis For compound compound this compound library, library, library, 6KPF 6KPF andandand5ZTY 5ZTY 5ZTY ligands ligands ligands expanded expanded expanded with with with CB2-non-selective CB2-non-selective CB2-non-selective ligands. ligands. ligands. For For For this this this compound compound compound library, library, library, 6KPF 6KPF 6KPF and and and5ZTY 5ZTY 5ZTY ligands structures expanded structures achieved achieved with the the CB2-non-selective highest highest area area underunder the ligands. the receiver receiver For this operating operating compound characteristic characteristic library, curve curve (ROC 6KPF (ROC AUC) AUC)and 5ZTY structures structures structures structures achieved achieved achieved achieved the the the the highest highesthighest highest area area areaarea under under under underthe the the the receiver receiver receiver receiver operating operating operating operating characteristic characteristic characteristic characteristic curve curve curve curve(ROC (ROC (ROC (ROC AUC) AUC) AUC) AUC) structures value value valuein the achieved in the discrimination in discrimination the the discrimination discrimination highest between between area between under CB2-selective CB2-selective CB2-selectivethe receiver ligandsligands ligands versusoperating versus versus characteristic CB2-non-selective CB2-non-selective CB2-non-selective ones onescurve(see (see(see ones Table(ROC Table AUC) Table value value value in the in in the the discrimination discrimination between between between CB2-selective CB2-selective CB2-selective ligands ligands ligands versus versus versus CB2-non-selective CB2-non-selective CB2-non-selective onesones ones (see(see (see Table Table Table value 2). 2). 2). The 2). The inThe the The antagonist-bound, discrimination antagonist-bound, antagonist-bound, 5ZTY5ZTY between 5ZTY structure structure structure slightly CB2-selective slightly slightly outperformed ligands versus outperformed outperformed 6KPF 6KPF 6KPF (agonist-bound), CB2-non-selective (agonist-bound), (agonist-bound), as ones observed as observed (see Table 2). as observed observed 2). The 2). The antagonist-bound, antagonist-bound, 5ZTY 5ZTY structure structure slightly slightly outperformed outperformed 6KPF 6KPF (agonist-bound), (agonist-bound), as as observed observed The inin aaantagonist-bound, detailed in antagonist-bound, a detailed detailed analysis analysis analysis of 5ZTY 5ZTY of ROC of ROC structure curves curves structure ROC curves slightly obtained obtained slightly obtained outperformed in in our outperformed in ourenrichment our 6KPF enrichment enrichment 6KPF (agonist-bound), study study study (Figure (Figure (agonist-bound), (Figure 6). as 6). The 6). The second second as The observed second in a in aaindetailed in a detailed detailed analysis analysis analysis of ROC of of ROC ROC curves curves curves obtained obtained obtained in our in in our our enrichment enrichment enrichment study study study (Figure (Figure (Figure 6). The 6). 6). The The second second second compound’s compound’s compound’s library library library used used usedin in our in our our structure-based structure-based structure-based VS VS included included VS included the the same the same same CB2-selective CB2-selective CB2-selective ligands ligandsligands derived derived derived detailed compound’s compound’s analysis compound’s library library of libraryROC usedused used curves in our in in our our obtained structure-based structure-based structure-based in our enrichment VS included VS included VS included the same the study the same same (Figure CB2-selective CB2-selective CB2-selective 6). The ligands ligands second ligands derived derived compound’s derived fromfrom thethe from the ChEMBL ChEMBLChEMBL training training training dataset, dataset, dataset, named named named here herewith here with withthethe term the term term “actives”, “actives”, “actives”, andand and general generalgeneral diverse diverse diverse decoys decoysdecoys from library from from the used the the ChEMBL ChEMBL ChEMBL in our training training training dataset, dataset, structure-based dataset, named named named VS here here included here withwith with the the the the term term term same “actives”, “actives”, CB2-selective “actives”, andand and general general general ligandsdiverse diverse diversedecoys derived decoys decoys from the thatthatwere that were were generated generated generated with with DUD-E with DUD-E DUD-E [34],[34], named [34], named named here hereas here as “non-actives”. “non-actives”. as “non-actives”. In In VSIn VS against against VS against this this second this second second library, library, library, thatthat ChEMBL that were were were generated generated training generated withwith dataset, with DUD-E DUD-E DUD-E named [34], [34], [34],here named named named with here here here the asterm as as “non-actives”. “non-actives”. “actives”, “non-actives”. Inand In VS VS Inagainst VS against against general this this thisdiverse second second second library, library, decoys library, that were 6KPF6KPF 6KPFandand 5ZTY and 5ZTY 5ZTY structures structures structures performed performed performed similarly, similarly, similarly, discriminating discriminating discriminating CB2-actives CB2-actives CB2-actives from from from non-actives non-actives non-actives at at theat the the 6KPF 6KPF 6KPFand and and 5ZTY 5ZTY 5ZTY structures structures structures performed performed performed similarly, similarly, similarly, discriminating discriminating discriminating CB2-actives CB2-actives CB2-actives from from from non-actives non-actives non-actives at at the theat the generated level ofwith 0.8 andandDUD-E 0.79, [34], named respectively here (see(see as Table 2, “non-actives”. 2, In 6KPF VS6KPF against this second library, 6KPF and level levellevel level of 0.8 level of 0.8 of 0.8ofand of 0.8 0.8 and 0.79, and and 0.79, respectively 0.79, 0.79,0.79, respectively respectively respectively respectively (see(see (see Table (see Table Table Table Table 2, ROC 2, ROC2,ROC ROC 2, ROC ROC AUC AUC AUC AUC AUC AUC values). values). values). values). values). values). 6KPF 6KPF 6KPF 6KPF slightly slightly slightly slightly slightly slightly outperformed outperformed outperformed outperformed outperformed outperformed 5ZTY 5ZTY 5ZTY 5ZTY 5ZTY 5ZTY 5ZTY structures according according according to to the performed the ROC toROC ROC the ROCROC AUC AUC AUC similarly, characteristics characteristics characteristics discriminating and and6KPC and 6KPC 6KPC performed CB2-actives performed performed the the worst, from worst, the worst, worst, however, non-actives however, however, the the valuesat values the values values the were level were were of 0.8 according according according to the to to the the ROC AUC AUC AUC characteristics characteristics characteristics andand and 6KPC 6KPC 6KPC performed performed performed the the worst, the worst, however, however, however, the the values the values were were were andveryvery 0.79,close very closewith close with respectively no with no explicit (see explicit noexplicit explicit difference Table difference 2, difference ROC between between AUC between activeactive values). activeand and inactive 6KPF inactive and inactive CB2 CB2 slightly structures. outperformed structures. CB2 structures. Our Our Our results results 5ZTY results showed showed according showed to veryvery very close close close withwith with no no no explicit explicit difference difference difference between betweenbetween active active active andand and inactive inactive inactive CB2CB2 CB2 structures. structures. structures. OurOur Our results resultsresults showed showed showed thethatthat ROC even even AUC slight slight differences differences characteristics between between andthe the the 6KPC crystal crystal structures structures performed of the of the the same worst, same receptor receptor however, have have the an an impactimpact on on the the that that that that eveneven even even slight slight slight slight differences differences differences differences between betweenbetween between the the crystal the crystal crystal crystal structures structures structures structures of of the the of of the same the same same same receptor receptorreceptor receptor have have anvalues have have an an impact an impact impact impactwere on on onvery on the the the close the VSVS with no VS VS results. results. explicit results. results. A A similar AA similar difference similar similar conclusion conclusionbetween conclusion conclusion waswaswas was derived derived activederived derived from and from from from aa recent a recent inactive a recent recent study CB2 study on on structures. study study on on glucagon glucagon glucagon glucagon Our receptors, receptors, results receptors, receptors, wherewhere showed where where an an that an even VS results. VS results. ensemble A similar A similar conclusion conclusion was derived was derived from a from a recent recent study study on glucagon onsimulations glucagon receptors, receptors, where an an an where crystal ensemble ensemble ensemble of ofof receptor receptor of receptor receptor conformations conformations conformations conformations generated generated generated generated in in short in in short short short MDMD MD MD simulations simulations simulations outperformed outperformed outperformed outperformed crystal crystal crystal slight ensemble ensembledifferences of between receptor of receptor the conformations crystal structures generated in of the short same conformations generated in short MD simulations outperformed crystalMD receptor simulations have an outperformed impact on the crystal VS results. structures structures structures in VSin in VS [35,36]. [35,36]. VS [35,36]. structures structures Astructures in similar conclusion VSin VS [35,36]. [35,36]. in VS [35,36].was derived from a recent study on glucagon receptors, where an ensemble of receptor conformations generated in short MD simulations outperformed crystal structures in VS [35,36].
Int. J. Mol. Sci. 2020, 21, 5308 7 of 14 Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW 7 of 14 Table 2. Results of the enrichment study for CB2-receptor structures. Table 2. Results of the enrichment study for CB2-receptor structures. CB2-Non-Selective Decoys General Decoys Metric Metric 6KPF CB2-Non-selective 6KPC 6PT0 Decoys 5ZTY 6KPF 6KPC General Decoys 5ZTY 6PT0 6KPF 6KPC 6PT0 5ZTY 6KPF 6KPC 6PT0 5ZTY EF 2% 0 0 0 1.7 3.3 0 0 3.3 EF 2% EF 5% 0.67 0 0 0 0 0 1.3 1.7 4.7 3.3 1.3 0 1.3 0 3.3 3.3 EF10% EF 5% 1.7 0.67 0 0 0.33 0 1 1.3 4.0 4.7 2 1.3 1.7 1.3 3 3.3 ROC EF 10% 1.7 0 0.33 1 4.0 0.73 2 0.77 1.7 0.79 3 0.6 0.53 0.53 0.6 0.8 AUC ROC AUC 0.6 0.53 0.53 0.6 0.8 0.73 0.77 0.79 A B Figure 6.6. The Thereceiver operating receiver characteristic operating (ROC) characteristic curves (ROC) obtained curves in theinenrichment obtained study study the enrichment using libraries including (A) CB2-selective and CB2-non-selective ligands and (B) CB2-actives and CB2-non- using libraries including (A) CB2-selective and CB2-non-selective ligands and (B) CB2-actives and active ligands. ligands. CB2-non-active 3. Discussion 3. Discussion The The main main goal of this goal of this study study was was to to predict predict CB2-selective CB2-selective compounds compounds among among thethe phytochemicals phytochemicals that Cannabis Sativa. that constitute Cannabis Sativa. We achieved this goal through the following steps: collecting aa large constitute We achieved this goal through the following steps: collecting large set set of experimental data from ChEMBL for the training set, data curation, model development and of experimental data from ChEMBL for the training set, data curation, model development and validation, validation, andandpredicting predictingthe selectivity the selectivityof phytochemicals of phytochemicals fromfromCannabis Sativa.Sativa. Cannabis In parallel, we tested In parallel, we the applicability tested of recently the applicability of released recently PDB structures released PDB of CB2 in structure-based structures VS by conducting of CB2 in structure-based VS an by enrichment study. The dataset collected by our group was curated, which resulted conducting an enrichment study. The dataset collected by our group was curated, which resulted in in the removal of themore removalthanof90% moreof than unreliable 90% ofand inconsistent unreliable data. Despite and inconsistent theDespite data. removalthe of removal this largeoffraction of this large data, theof fraction final dataset data, included the final dataset1958 records1958 included for CB1 andfor records 2616 records CB1 for CB2. and 2616 Notably, records such for CB2. a large Notably, dataset is sufficient for employing the machine learning approach. Much smaller such a large dataset is sufficient for employing the machine learning approach. Much smaller datasets datasets have been successfully used to develop QSAR models for CB1/CB2 studies [23,24]. Nevertheless, have been successfully used to develop QSAR models for CB1/CB2 studies [23,24]. Nevertheless, focusing on the dataset focusingdiversity on the in developing dataset our in diversity QSAR model also developing ourcreated QSAR amodelchallengealsofor predicting created the activity a challenge for of outlier compounds. We solved this problem by determining an applicability predicting the activity of outlier compounds. We solved this problem by determining an applicability domain for each compound domain forindividually. each compound Embedding the descriptor individually. Embedding space theofdescriptor base GB modelsspace made of baseit GB possible to assess models made the prediction confidence based on Euclidean distances of k-nearest neighbors. it possible to assess the prediction confidence based on Euclidean distances of k-nearest neighbors. The GB algorithm The GB algorithm was also successfully applied by Ancuceanu et al. [37] for cytotoxicity prediction.
Int. J. Mol. Sci. 2020, 21, 5308 8 of 14 was also successfully applied by Ancuceanu et al. [37] for cytotoxicity prediction. The performance of GB in the modeling of various QSARs was compared to other ensemble methods in a study by Kwon et al. [38]. Our approach was validated with the five-fold cross-validation protocol with resulting statistical characteristics of Q2 > 0.6 for the entire dataset and Q2 > 0.8 for the most confident predictions. Regarding small subsets with the highest confidence predictions, our model outperformed much more complex 3D and 4D QSAR models created for equally small sets of similar compounds [23,24]. We used our validated QSAR models for the prediction of the CB1 and CB2 affinities of phytochemicals from Cannabis Sativa. As a result, we identified three compounds C1–C3 (see Table 1) dissimilar to the training dataset with a TC < 0.3 compared to the respective most similar structure, and predicted pKi difference CB2 vs. CB1 (dpKi) ≥ 1. The common name of C1 is cannabispirol, and it is a compound isolated from Japanese domestic Cannabis Sativa [39]. C1 has been experimentally evaluated for antimicrobial activity [40] and was also tested in a study on targeting multidrug resistant mouse lymphoma cells [41]. Ilicic acid (C2) showed activity in G2/M cell cycle arrest of tumor cells [42]. Cannabichromente (C3) is a precursor for biosynthesis of various cannabinoids [43]. As for structure-based VS using PDB structures of CB2, we observed that these structures were able to discriminate not only CB2-actives from non-active ligands (DUD-E decoys) but also CB2-selective from non-selective ligands (the ChEMBL-derived dataset). The best results were obtained for 5ZTY and 6KPF structures, regardless of the activation state. In this study, we identified potential selective cannabinoids among the constituents of Cannabis sativa. Our QSAR model identified three compounds of potential interest with significant dissimilarity to compounds already evaluated experimentally and reported in ChEMBL. The statistical characteristics of the developed QSAR models suggest a high probability of successful experimental validation, which we hope will attract attention from experimental groups interested in searching for CB2-selective ligands of natural origin. 4. Materials and Methods 4.1. Data Collection Chemical structures from the ChEMBL database [44] with experimentally measured affinities for the CB1 and CB2 receptors were used as training data. The datasets were composed of 14,126 records for CB1 and 13,506 records for CB2. Compounds constituting the phytochemical profile of Cannabis Sativa were acquired from the Collective Molecular Activities of Useful Plants Database (CMAUP) [45]. The ChEMBL data was exported and downloaded in comma-separated values (CSV) format, while the CMAUP data was downloaded as an Structure Data Format (SDF) file. 4.2. Data Curation The method of data curation followed a modified approach of Fourches et al. [46,47]. We modified the curation method by the addition of extra steps that were relevant for curating data acquired from ChEMBL accompanied by additional metadata specific to this database. Namely, we grouped experimental values provided their source studies were the same, and then removed groups with less than 10 records. We also assessed data reliability according to the source study confidence score as provided by ChEMBL. We then removed records with a confidence score lower than 8. A final dataset consisted of records with InChIKeys as molecular structure identifiers with associated standardized 2D structures of compounds and pKi values. RDKit was used for standardization and InChIKeys calculation [48]. The curated CB2 and CB1 datasets are available on the website [49]. 4.3. Descriptor Calculation The Mordred python library [50] was used to calculate 1613 2D standard descriptors and 213 3D descriptors. 3D descriptors were used to include information about chirality in ligand structures.
Int. J. Mol. Sci. 2020, 21, 5308 9 of 14 Descriptors with variance less than 0.05 or containing invalid values were removed. The descriptors were concatenated with 1024-bit Morgan fingerprints with a radius equal to 3. 4.4. Machine Learning A gradient boosting (GB) algorithm implemented in the Light Gradient Boosting Machines library [51] was used to create base models. The GB algorithm and its parameters were selected in an inner 5-fold cross-validation protocol based on grid search. Other algorithms including linear regression, partial least squares, and support vector machines from the scikit-learn library [52] were also tested. GB models for CB1 and CB2 were decision tree ensembles. The output of each tree in a boosted ensemble was used to create embedding of a descriptor space. A kNN algorithm with k = 3 was used to determine the applicability domain of the model based on the embedding. The applicability domain was defined by a threshold, which is a maximum allowed distance between a query molecular structure and nearest neighbors. Molecular structures with the Euclidean distance to nearest neighbors greater than a given threshold were considered as out of the applicability domain. Python code for the execution of trained models is provided in the Supplementary Materials. 4.5. Model Validation Models were validated using a 5-fold cross-validation protocol. Out-of-sample predictions were used to calculate Q2 according to the following equation: P 2 n Yi − Ŷi Q2 = 1 − P 2 (1) n Yi − Yi Here, n is the number of samples in the dataset, Yi is an experimental pKi of the ith sample, Ŷi is a predicted value of pKi of the ith sample, and Yi is a mean pKi value. We evaluated Q2 for 20 different applicability domain thresholds and considered Q2 as a confidence measure for a model using a given threshold. 4.6. Prediction of CB2-Selectivity of Cannabis Sativa Ingredients A curated dataset of Cannabis Sativa constituents reported in the CMAUP database [45] was tested against our QSAR CB1 and CB2 models. In such a way, we searched for selective molecules with dpKi ≥ 1, where dpKi was defined as dpKi = pKiCB2 − pKiCB1 . For each predicted selective compound, a Tanimoto coefficient (TC) was calculated to assess the similarity of predicted compounds to ones already tested and used to exclude hits that were not novel with regard to training data. TC was computed using 1024-bit Morgan fingerprints with a radius equal to 3. 4.7. Structure-Based Virtual Screening CB2 receptor structures (PDB id: 6KPC [30], 6KPF [30], 6PT0 [31], and 5ZTY [32]) were derived from the Protein Data Bank (PDB) [53]. This set included agonist-bound (6KPC, 6KPF, 6PT0) and antagonist-bound (5ZTY) structures corresponding to different activation states of the CB2 receptor (active and inactive, respectively). The CB2 structures used in this study were very similar to each other (average pairwise RMSD = 6.41, with standard deviation = 3.40), yet differing in TMH6 conformation (5ZTY and 6KPC vs. others) bending while interacting with a G protein complex. In all these structures ligands were located in the orthosteric binding site of CB2, although allosteric modulation has also been observed for these receptors [54]. The four PDB structures have nearly identical binding sites except for a few residues: W258 (W6.48), S285, and to a lesser extent: F87 (F2.57), F91 (F2.61), H95 (H2.65), F183 (EC2) (see Figure 7). W258 and F117 (F3.36) residues form a well-known Trp-Phe toggle switch changing its position on the receptor activation [17,30]. Interestingly, the position of
Int. J. Mol. Sci. 2020, 21, 5308 10 of 14 W258 is slightly different not only in comparing the active vs. inactive conformations but also in three active conformations, e.g., moved away from the ligand (6PT0 vs. two others). Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW 10 of 14 Figure Figure 7. 7. Comparison Comparison of of CB2 CB2 structures structures available available in in the the Protein Protein Data Data Bank Bank (PDB). (PDB). Here, Here, an an inactive inactive CB2 CB2 conformation conformation (5ZTY) (5ZTY) is is shown shown in in green green and and is is the the best best performing performing in in virtual virtual screening screening (VS), (VS), active conformation (6KPF) is shown in yellow, and other CB2 active structures are shown active conformation (6KPF) is shown in yellow, and other CB2 active structures are shown in grey. in grey. Residues Residues are are labeled labeled according to the according to the 5ZTY 5ZTY numbering. numbering. Receptor Receptor structures structures were processed processed withwith the the Protein Protein Preparation Preparation tooltool 2017-4, 2017-4, Schrodinger Schrodinger LLC, LLC, New York, NY,NY, USA USA [55]. Ligands in PDB structures were used to determine the position of docking grids spanning over the whole CB2 active site. The The prepared prepared CB2 CB2 structures structures were used for screening against a CB2 actives actives library library that that included included CB2-selective CB2-selective and and CB2-non-selective CB2-non-selective compounds to assess the ability of these CB2 structures to discriminate between between selective selective and and CB2-non-selective CB2-non-selective ligands. ligands. This compound compound library librarywas wasextracted extractedfromfrom our ChEMBL training set (see Materials and our ChEMBL training set (see Materials and Methods) and Methods) and included included compounds compounds with pKiwith 7 for CB2-selective, CB2 CB2-selective, while otherswhile wereothers were considered considered CB2-non-selective. CB2-non-selective. The second The second round of VS wasround of VS was performed performed to test to what extentto test to what extent experimental CB2 experimental CB2 structures were able to discriminate between CB2-actives structures were able to discriminate between CB2-actives and non-actives. Here, we used the and non-actives. Here, we used thelibrary compounds compounds library that was that was generated usinggenerated DUD-E [34] using DUD-E with [34] with ChEMBL-derived ChEMBL-derived CB2- CB2-selective ligands selective ligands (see above) used (see hereabove) as CB2used hereHere, actives. as CB2 weactives. discardHere, we discard CB2-non-selective CB2-non-selective ligands and we ligands did not and we them include did not include in the them in compound the compound library. library. VSwith VS was performed wasGlide performed 2017-4,with Glide 2017-4, Schrodinger LLC, Schrodinger New York, NY, LLC,USA New[56] York, NY, that USA [56] followed thethat followed ligand the ligand preparation withpreparation Ligprep [57].withWeLigprep [57]. computed We computed enrichment factors (EF) and areas under the curve (AUC) for enrichment factors (EF) and areas under the curve (AUC) for receiver-operator curves (ROC) usingreceiver-operator curves (ROC) Maestro using Maestro 2017-4, 2017-4, LLC, Schrodinger Schrodinger New York,LLC, New NY, USA.York, NY, USA . Conclusions 5. Conclusions We would like to emphasize that our study aimed to perform an in silico study to identify prospective CB2-selective compounds from C. Sativa that could be confirmed confirmed by experimental experimental studies studies in the future. This This study study shows shows the the first first robust robust CB1/CB2 CB1/CB2 QSARQSAR models models that that are reproducible and applicable to a large variety of chemical scaffolds. Our Our developed developed model model was was trained trained on on a large and diverse set setofofcompounds compounds that surpassed that surpassedrecent studies recent [26]. We studies [26].based our method We based on machine our method learning, on machine learning, which utilized data from thousands of experimental studies on CB1/CB2 activities. been which utilized data from thousands of experimental studies on CB1/CB2 activities. Our study was Our conducted study was without making common been conducted withoutmistakes makingsuch as, e.g., common not conducting mistakes such as,data curation e.g., or its improper not conducting data curation or its improper execution, lack of a validation protocol, applying inappropriate statistical characteristics for estimation of predictive performance, and finally, lack of applicability domain determination [58–60]. We also showed that the applicability domain estimation based on a gradient boosted model latent space allows the accurate prediction of the model confidence. What is more, we validated the estimation of the model confidence interval, which allows users to make conclusions
You can also read