• Japanese Webinar Now Available to Watch



    For anyone who wasn't able to attend the Japanese language webinar hosted by Kaz Ikeda, we have provided a link to its recording. This webinar covers the basic use of the ChEMBL database with a particular focus on the interface and searching.

    The YouTube clip can be found here: Japanese Webinar

    Any questions, please feel free to contact chembl-help@ebi.ac.uk

  • International Chemical Biology Society - Free Membership


    On the occasion of the 3rd European Chemical Biology Society (ECBS) in Vienna, Rathnam Chaguturu, founding president of the International Chemical Biology Society (ICBS), announced the launch of this new society.

    You are cordially invited to become founding members by using the online registration at
    http://www.chemical-biology.org/join.html

    ICBS is offering free membership to all chemical biologists until the end of this year (whooooo!).

    Between October 4-5, ICBS is holding its first official conference in Cambridge MA:
    http://chemical-biology.org/images2/ICBS12/ICBS2012_Agenda_Cambridge_MA_USA_FINAL.pdf

  • CINF Session on Bioinf and Chemoinf Data at the ACS National Meeting in New Orleans - April 2013



    Abstract submission is now open for the CINF sessions of the ACS National Meeting in New Orleans, LA, next April. Ian Bruno of the CCDC and I are chairing a session of Linkage of Bioinformatic and Chemoinformatic Data.

    Check it out, and get those abstracts in!

  • Query Privacy in ChEMBL


    We have been asked several times for all the user-generated queries of ChEMBL - i.e. the structures sketched in to the interface that are then searched against the database. We will not (and in fact, physically can't) share these. Sorry. It is against both our institutional privacy policy, and standard Terms of Use, and also we've engineered the app to avoid us 'storing' any of this information where at all possible (e.g. in avoiding /tmp type fluff, minimizing residency time in caches, etc.).

    There are clearly some advantages in pooling or analysing website search data - it highlights interesting trends, something becoming more interesting to a user community can spot emerging events, etc. It can alert to flu outbreaks (there was a Science paper from google on this, don't have the reference handy though - you may be able to find it with google though.....). There is a huge interest in many sites that I use in tracking and analysing query terms and usage patterns, and in some contexts this is just the thing to do - like when ebay teases me (and surely of all the tortured obsessive souls on the planet, it is just me and me alone) with a rare phosphor or perforation machin variant I don't have.

    The types of query that people perform can clearly also be used to develop ways of improving a website, or specifically the performance of search queries - and for algorithm development this information can be like gold-dust. There are now many chemical fingerprint systems available, and adapting the features/structures of these to typical user queries is really valuable in their development.

    There are essentially two distinct aspects to user's expectations/rights of privacy when using a website like ChEMBL.

    • There is a personal privacy issue - 'why is John Overington interested in compounds for the treatment of obesity?'. This is an primarily an embarrassment sort of thing ('hey, is this guy a bit chubby?'), or maybe a commercially sensitive thing ('he's interested in obesity stuff; heh, let's raise the price for him', or 'let's show him some adverts for chips', or 'let's contact his rival and let them know he's interested in his weight'). These latter things are behind the feature where you first search for a flight and the price is great, then the next time you look, it's gone up - allegedly.
    • There's a more fundamental IP issue though -  The simple disclosure of a search term can be commercially damaging, and potentially stop the development of life-saving therapies. The simplest case is chemical structure and drug patents. The most important patent claim in drug discovery is to have composition of matter (and don't get all hissy over pharma misusing the patent system, since patents are absolutely essential for the development of new medicines, the treatment of disease, improvement of food supplies, for funding future R&D, for a source of employment, license revenues to Universities, and taxation revenues, etc). This composition of matter is a claim of a novel chemical structure, that no-one has disclosed before, and it is useful for something. If the structure is not novel, then the patent can be readily invalidated.
    Hopefully, you'll understand our reasons for maintaining both user and query privacy.

    For an extra clear clarification - we do not, and cannot examine queries of users ourselves within the development team here at the EBI. In case you read the above text as sharing stuff solely with third parties.

    Your use of ChEMBL is private, and always will be.


  • New Drug Approvals 2012 - Pt. XVIII - Teriflunomide (AubagioTM)






    ATC Code: L04AA13
    Wikipedia: Teriflunomide

    On September 12th the FDA approved Teriflunomide (tradename AUBAGIO, ChEMBL973), an orally administered drug for the treatment of relapsing forms of Multiple Sclerosis (MS). Teriflunomide is an inhibitor of of pyrimidine synthesis by dihydroorotate dehydrogenase (DHODH, Uniprot: Q02127) but is it not certain if this explains the effect of the drug on MS lesions. Teriflunomide inhibits rapidly dividing cells, which includes activated T lymphocytes thought to drive the MS disease process. The net effect of the inhibition of DHODH is that lymphocytes cannot accumulate sufficient pyrimidines for DNA synthesis. Additionally, Teriflunomide has been shown to inhibit the activation of nuclear factor kappaB and tyrosine kinases, but at doses higher than needed for the observed anti-inflammatory effects. Teriflunomide is the active metabolite of an already approved drug Leflunomide (tradename Arava, ChEMBL960) indicated for the treatment of rheumatoid and psoriatic arthritis.

    MS is an inflammatory disease characterised by damaging of the myelin sheaths surrounding the axons of the brain and spinal cord. This demyelation results in a broad number of symptoms scarring. The prevalence ranges between 2 – 150 per 100.000 and the disease onset usually occurs in young adults. MS cannot currently be cured and the prognosis is difficult to predict, depending on the subtype of the disease. The United States National Multiple Sclerosis Society characterised four clinical courses, two of which are classified as relapsing forms of MS namely 'relapsing remitting' and 'progressive relapsing'.

    Currently there are six other disease-modifying treatments for MS approved by regulatory agencies. These are: Fingolimod (trade name Gilenya, CHEMBL314854), interferon beta-1a (trade names Avonex, CinnoVex, ReciGen and Rebif, CHEMBL1201562) and interferon beta-1b (U.S. trade name Betaseron, in Europe and Japan Betaferon, CHEMBL1201563), glatiramer acetate (trade name Copaxone, CHEMBL1201507), mitoxantrone (trade name Novantrone, CHEMBL58) and natalizumab (trade name Tysabri). Of these drugs, only Fingolimod is orally administered, the others are injected intravenously or subcutaneously, hence Terfiflunomide is the second oral treatment option for MS.




    Terfiflunomide is a small molecule drug with a molecular mass of 270.20 g/ml, an AlogP of 2.09 , 3 rotatable bonds and does not violate the rule of 5.
     Canonical SMILES : C\C(=C(/C#N)\C(=O)Nc1ccc(cc1)C(F)(F)F)\O
     InChi: InChI=1S/C12H9F3N2O2/c1-7(18)10(6-16)11(19)17-9-4-2-8(3-5-9)12(13,14)15/h2-5,18H,1H3,(H,17,19)/b10-7-

    The structure of the drug can interconvert between Z and E stereoisomers with the Z enol being the most stable and the active form.

    DHODH (EC: 1.3.5.2, Uniprot: Q02127, PDB: 1D3G, CHEMBL: ChEMBL1966, IntAct: EBI-3928775 ), is a 395 amino acid monomer located at the mitochondrion inner membrane. The protein is a single-pass membrane protein with the catalytic site located in the mitochondrial inter-membrane space.

    >sp|Q02127|PYRD_HUMAN Dihydroorotate dehydrogenase (quinone)
    MAWRHLKKRAQDAVIILGGGGLLFASYLMATGDERFYAEHLMPTLQGLLDPESAHRLAVR FTSLGLLPRARFQDSDMLEVRVLGHKFRNPVGIAAGFDKHGEAVDGLYKMGFGFVEIGSV TPKPQEGNPRPRVFRLPEDQAVINRYGFNSHGLSVVEHRLRARQQKQAKLTEDGLPLGVN LGKNKTSVDAAEDYAEGVRVLGPLADYLVVNVSSPNTAGLRSLQGKAELRRLLTKVLQER DGLRRVHRPAVLVKIAPDLTSQDKEDIASVVKELGIDGLIVTNTTVSRPAGLQGALRSET GGLSGKPLRDLSTQTIREMYALTQGRVPIIGVGGVSSGQDALEKIRAGASLVQLYTALTF WGPPVVGKVKRELEALLKEQGFGGVTDAIGADHRR

    The recommended dose of AUBAGIO is 7 mg or 14 mg orally once daily. AUBAGIO can be taken with or without food.

    The median time to reach maximum plasma concentrations is between 1 and 4 hours post-dose following and oral administration. The half life is approximately 18-19 days after repeated doses of 7 mg and 14 mg respectively. It takes approximately 3 months respectively to reach steady-state concentrations.

    Teriflunomide is mainly eliminated through direct biliary excretion of unchanged drug and renal excretion of metabolites.

    The drug comes with a box warning to alert prescribers to the risk of liver problems, including death, and a risk of birth defects. Physicians are advised to do a blood test for liver function prior to prescribing Terfiflunomide and periodically during the course of treatment. Based on animal studies, the drug may cause fetal harm.

    The license holder is the Genzyme Corporation and the full prescribing information can be found here.


  • ChEMBL RESTful Web Service API Release 1.0.5


    We are pleased to announce that we have updated the ChEMBL RESTful Web Service API (application programming interface) with some more of the features that you - the ChEMBL users - have requested. 

    In particular, we have added support for the:
    • Retrieval of compounds by Canonical SMILES string using HTTP POST *.
    • Retrieval of compounds containing a particular substructure, as given by a Canonical SMILES string using HTTP POST *.
    • Retrieval of a list of compounds similar, at a given cutoff percentage Tanimoto similarity, to one represented by a given Canonical SMILES string using HTTP POST *.
    • Retrieval of larger compound images, as given by a compound ChEMBLID. The retrieved image can be easily re-sized using the 'dimensions' attribute of the endpoint. See the example URLs below.
    • Inclusion of a 'synonyms' property on ChEMBL compound resources. This property will be set for compounds for which there are synonyms available.

    Sample urls:


    In addition to the API changes we have also updated the ChEMBL Java client to take advantage of the new features provided by the API. These updates include:
    • Methods to invoke the additional HTTP POST API endpoints (searching for compounds based on SMILES matches, common substructures and similarity to a given percentage Tanimoto similarity). Examples of the new client methods in use are available on the Example.java class on the API documentation page.

    As always, you're feedback and suggestions for improving the API are most welcome. Please e-mail: chembl-help@ebi.ac.uk.


    Link: https://www.ebi.ac.uk/chembldb/index.php/ws

    *  These additions are in response to a bug in sending SMILES data via the URL - some SMILES instances, such as those containing triple bonds, make use of characters which are reserved characters in the specification for Uniform Resource Locators (URLs). For API requests involving SMILES, API user's can choose to either URL encode their SMILES input before submitting the request to the HTTP GET endpoint or use the new HTTP POST endpoint and send the SMILES data in the body of the HTTP request rather than in the URL.

  • New Drug Approvals 2012 - Pt. XVII - Linaclotide (LinzessTM)



    ATC Code: A03A (incomplete)
    Wikipedia: Linaclotide

    On Agust 30, the FDA approved Linaclotide (tradename: Linzess; Research Code: MD-1100, ASP-0456), a novel, first-in-class Guanylate Cyclase-C (GC-C) agonist indicated for the treatment in adults of irritable bowel syndrome with constipation (IBS-C), and chronic idiophatic constipation (CIC). CIC is a diagnosis given to people who experience persistent constipation and do not respond to standard treatment. IBS-C is a subtype characterized by chronic abnominal pain, discomfort, bloating and alteration of bowel habits. Linaclotide exherts its therapeutic action by binding to GC-C, resulting in an increase in both intracellular and extracellular concentrations of cyclic guanosine monophosphate (cGMP). Increase in intracellular cGMP stimulates secretion of chloride and bicarbonate into the intestinal lumen, mainly through activation of the cystic fibrosis transmembrane conductance regulator (CFTR) ion channel, resulting in increased intestinal fluid and accelerated transit. Linaclotide has been shown, in animal models, to not only accelerate gastrointestinal (GI) transit, but also to reduce intestinal pain, which is thought to be mediated by increased extracellular cGMP.

    Other treatments for IBS have been already in the market and these include treatments with antimuscarinic drugs, such as Dicyclomine (approved in 1950; tradename: Bentyl; ChEMBL: CHEMBL1123), Methantheline (approved in 1951, tradename: Banthine; ChEMBL: CHEMBL1201264), a serotonin agonist, such as Tegaserod (approved in 2002; tradename: Zelnorm; ChEMBL: CHEMBL1201332) and a serotonin antagonist, such as Alosetron (approved in 2000; tradename: Lotronex; Chembl: CHEMBL1110) and Lubiprostone (approved in 2006; tradename: Amitiza; ChEMBL: CHEMBL1201134), a chloride channel activator. While these drugs act by either inhibiting the muscarinic action of acethylcholine, or through the activation of the serotonin receptors of the nervous system in the GI tract, or by activating the chloride channels on the GI epithelial cells, Linaclotide represents the first GC-C agonist to ever reach the market.

    GC-C (ChEMBL: CHEMBL1795197; Uniprot: P25092) is a 1073 amino-acid long enzyme, which has an extracellular ligand binding domain (PFAM: ANF_receptor), a domain similar to that of protein tyrosine kinases (PFAM: Pkinase_Tyr) and a adenylate and guanylate cyclase catalytic domain (PFAM: Guanylate_cyc).

    >GUC2C_HUMAN Heat-stable enterotoxin receptor
    MKTLLLDLALWSLLFQPGWLSFSSQVSQNCHNGSYEISVLMMGNSAFAEPLKNLEDAVNE
    GLEIVRGRLQNAGLNVTVNATFMYSDGLIHNSGDCRSSTCEGLDLLRKISNAQRMGCVLI
    GPSCTYSTFQMYLDTELSYPMISAGSFGLSCDYKETLTRLMSPARKLMYFLVNFWKTNDL
    PFKTYSWSTSYVYKNGTETEDCFWYLNALEASVSYFSHELGFKVVLRQDKEFQDILMDHN
    RKSNVIIMCGGPEFLYKLKGDRAVAEDIVIILVDLFNDQYFEDNVTAPDYMKNVLVLTLS
    PGNSLLNSSFSRNLSPTKRDFALAYLNGILLFGHMLKIFLENGENITTPKFAHAFRNLTF
    EGYDGPVTLDDWGDVDSTMVLLYTSVDTKKYKVLLTYDTHVNKTYPVDMSPTFTWKNSKL
    PNDITGRGPQILMIAVFTLTGAVVLLLLVALLMLRKYRKDYELRQKKWSHIPPENIFPLE
    TNETNHVSLKIDDDKRRDTIQRLRQCKYDKKRVILKDLKHNDGNFTEKQKIELNKLLQID
    YYNLTKFYGTVKLDTMIFGVIEYCERGSLREVLNDTISYPDGTFMDWEFKISVLYDIAKG
    MSYLHSSKTEVHGRLKSTNCVVDSRMVVKITDFGCNSILPPKKDLWTAPEHLRQANISQK
    GDVYSYGIIAQEIILRKETFYTLSCRDRNEKIFRVENSNGMKPFRPDLFLETAEEKELEV
    YLLVKNCWEEDPEKRPDFKKIETTLAKIFGLFHDQKNESYMDTLIRRLQLYSRNLEHLVE
    ERTQLYKAERDRADRLNFMLLPRLVVKSLKEKGFVEPELYEEVTIYFSDIVGFTTICKYS
    TPMEVVDMLNDIYKSFDHIVDHHDVYKVETIGDAYMVASGLPKRNGNRHAIDIAKMALEI
    LSFMGTFELEHLPGLPIWIRIGVHSGPCAAGVVGIKMPRYCLFGDTVNTASRMESTGLPL
    RIHVSGSTIAILKRTECQFLYEVRGETYLKGRGNETTYWLTGMKDQKFNLPTPPTVENQQ
    RLQAEFSDMIANSLQKRQAAGIRSQKPRRVASYKKGTLEYLQLNTTDKESTYF


    Linaclotide is an oral peptide drug, comprised of 14 amino acids and with disulfide bonds between cysteines (1-6), (2-10) and (3-15). Linaclotide has a molecular weight of 1526.8 Da. (Name: L-cysteinyl-L-cysteinyl-L-glutamyl-L-tyrosyl-L-cysteinyl-L-cysteinyl-L­-asparaginyl-L-prolyl-L-alanyl-L-cysteinyl-L-threonyl-glycyl-L-cysteinyl-L-tyrosine, cyclic (1-6), (2-10), (5­-13)-tris (disulfide); CanonicalSmiles: C[C@@H](O)[C@@H]1NC(=O)[C@@H]2CSSC[C@@H]3NC(=O)[C@@H](N)CSSC[C@H](NC(=O)[C@H](CSSC[C@H](NC(=O)CNC1=O)C(=O)N[C@@H](Cc4ccc(O)cc4)C(=O)O)NC(=O)[C@H](Cc5ccc(O)cc5)NC(=O)[C@H](CCC(=O)O)NC3=O)C(=O)N[C@@H](CC(=O)N)C(=O)N6CCC[C@H]6C(=O)N[C@@H](C)C(=O)N2; InChI: InChI=1S/C59H79N15O21S6/c1-26-47(82)69-41-25-101-99-22-38-52(87)65-33(13-14-45(80)81)49(84)66-34(16-28-5-9-30(76)10-6-28)50(85)71-40(54(89)72-39(23-97-96-20-32(60)48(83)70-38)53(88)67-35(18-43(61)78)58(93)74-15-3-4-42(74)56(91)63-26)24-100-98-21-37(64-44(79)19-62-57(92)46(27(2)75)73-55(41)90)51(86)68-36(59(94)95)17-29-7-11-31(77)12-8-29/h5-12,26-27,32-42,46,75-77H,3-4,13-25,60H2,1-2H3,(H2,61,78)(H,62,92)(H,63,91)(H,64,79)(H,65,87)(H,66,84)(H,67,88)(H,68,86)(H,69,82)(H,70,83)(H,71,85)(H,72,89)(H,73,90)(H,80,81)(H,94,95)/t26-,27+,32-,33-,34-,35-,36-,37-,38-,39-,40-,41-,42-,46-/m0/s1)

    The recommended dosage of Linaclotide is 290 mcg orally once daily for the case of IBS-C, and 145 mcg orally once daily for the treatment of CIC, on empty stomach at least 30 minutes prior to first meal of the day.

    Linaclotide is minimally absorbed with low systemic availability following oral administration. Concentrations of Linaclotide and its active metabolite in plasma are below quantitation after oral doses of 145 mcg and 290 mcg were administrated. Therefore Linaclotide is expected to be minimally distributed to tissues. Linaclotide is metabolised within the GI tract to its active metabolite by loss of the terminal tyrosine moiety. Both Linaclotide and the metabolite are proteolitically degraded within the intestinal lumen to smaller peptides and naturally occuring amino acids. Following the daily administration of 290 mcg of Linaclotide for seven days, about 5% and 3% were recovered in the feces of fasted and fed subjects, respectively, and virtually all as the active metabolite.

    The license holder is Ironwood Pharmaceuticals, Inc. and the full prescribing information of Linaclotide can be found here.

  • Antibody Drugs To Have Reached Clinical Trials By Company



    Similar to the previous kinase post, this time for antibody containing therapeutics. If you'd like the data, let me know....