• ChEMBL will be at the ISCB ACBCB Meeting in South Africa



    We are attending the ISCB ASBCB Meeting in Cape Town, South Africa from 9th to 11th March 2011. If you are interested meeting up, or in hearing about ChEMBL, getting a demo, asking questions, etc. mail us.

  • ChEMBL Resources will be unavailable 7th thru 9th Jan 2011


    ChEMBL resources will be down due to essential scheduled maintenance from 16.30 GMT on Friday, 7th January 2011 to 21.00 GMT Sunday, 9th January 2011. This will affect all EBI services.

  • Meetings: Prioritisation of Drug Targets for Neglected Diseases, May 16-21, Siena, Italy


    We are speaking at a training school on the prioritisation of drug targets for neglected diseases running from May 16th to May 21st 2011, held at the University of Siena, Italy. It is organised under Working Group 1 of COST Action CM0801. Here is some text from the course brochure (which can be found here)

    This training school was conceived to teach and train researchers on the currently available tools that may lead to drug target identification and prioritization, before moving to their validation in vivo. The course will consist of lectures and practical classes. In guided hands-on computer sessions all participants will analyze sequence, structure, and function of unknown sequences. By that they will learn about the existence as well as the source and availability of bioinformatics software and how to apply these tools and interpret results with confidence. The school will provide computers to all trainees. All lectures will be in English.

    Spaces are limited, and the deadline for applications is March 4th 2011.

  • New Drug Approvals 2010 - Pt. XIX Dienogest (NataziaTM)




    ATC code: G03AB08 and G03FA15 (as combinations)


    On May 6th 2010, the FDA approved Dienogest, as a component of the oral contraceptive drug Natazia (tradename:Natazia, tradename:Qlaira). Dienogest (research code:STS-557) is a progestin is a synthetic analog of the natural product progesterone. Dienogest is one of two active components within Natazia, the other being the 17ß-estradiol prodrug - estrogen valerate; Natazia is therefore a Combination Oral Contraceptive (or COC). Natazia is the first four-phase combined oral contraceptive marketed in the United States - the four phases refering to the differential dosing of the estrogen/progestin throughout the menstrual cycle. Each monthly pack of Natazia contains pills to be taken in a specific order: 2 dark yellow tablets each containing 3 mg estradiol valerate; then 5 medium red tablets each containing 2 mg estradiol valerate and 2 mg dienogest; then 17 light yellow tablets each containing 2 mg estradiol valerate and 3 mg dienogest; then 2 dark red tablets each containing 1 mg estradiol valerate; and finally 2 white tablets (inert/placebo).

    The -gest- INN stem covers progestins, a large pharmacological class of steroid structures, including (alongside dienogest) - demegestone, desogestrel, dydrogesterone, etonogestrel, gestodene, gestonorone, gestrinone, hydroxyprogesterone, levonorgestrel, medrogestone, medroxyprogesterone,
    megestrol, nomegestrol, norelgestromin, norgestimate, norgestrel, norgestrienone, progesterone, promegestone, quingestanol, and trimegestone. These are usually in the G03 ATC class (Sex hormones and modulators of the genital system) and are often dosed as combination with other steroid hormones (estrogens and/or androgens). Some members of this set are in the L02 ATC class (Endocrine therapy, Antineoplastic and immunomodulatory agents)

    Dienogest is a synthetic derivative of the natural steroid progesterone, and acts via similar targets to the endogenous steroid - primarily through binding to the Progesterone Receptor (PR), specifically dienogest acts as a PR agonist, with an EC50 of 3.1 nM for human PR. Progesterone Receptor (Uniprot:P06401, ChEMBL:CHEMBL208, synonym:PGR, synonym:NR3C3) is a homodimeric nuclear hormone receptor (NHR) with the drug binding at a well defined site within the ligand-binding domain (Pfam:PF00104). The nuclear receptor family are a very significant family of drug targets, with the most pharmacologically relevant members of the family acting as ligand-regulated transcription factors. Dienogest is a 19-nortestosterone derivative, and displays some androgen-related pharmacology (in the case of dienogest, it acts as an anti-androgen, with an EC50 of 420 nM). Dienogest shows no significant glucococorticoid (GR) or mineralocorticoid (MR) activity.

    There is a wealth of structural data available for the molecular target of dienogest - PR, an example structure is PDBE:3d90 which is in complex with the related progestin levonorgestrol. The ligand-binding domain structure of many nuclear receptors is known, and the ligand typically fills a central hydrophobic cavity, altering the surface topology around this ligand-dependent surface site, that then can bind various co-activators/co-repressors. There are a large set of potential co-activators/co-repressors, often expressed in a tissue-specific manner. This aspect offers the potential to pharmacologically differentiate ligands directed against NHRs - leading to the identification of clinical stage PR antagonists (USAN stem -pristone, e.g. mifepristone), selective progesterone receptor modulators (SPRMs) (USAN stem -prisnil, e.g. asoprisnil), as well as agonists that are non-steroidal agonists of PR (USAN stem -proget, e.g. tanaproget).


    Dienogest (IUPAC:(17α)-17-Hydroxy-3-oxo-19-norpregna-4,9-diene-21-nitrile, INCHIKEY:AZFLJNIPTRTECV-FUMNGEBKSA-N, SMILES:CC12CCC3=C4CCC(=O)C=C4CCC3C1CCC2(CC#N)O , PubChem:CID68861, Chemspider:62093); has a molecular weight of 311.42 Da, has one hydrogen bond donor, three hydrogen bond acceptors, a calculated logP of 1.8, and a topological surface area of 61.1 Å2. It is therefore completely compliant with Lipinski's rule of five. There are no ionisable centers (over a biologically relevant pH range) and therefore dienogest is neutral under physiological coniditions. The four rings of the classic steroid core are obvious in the structure, and all chiral centers are present with a defined stereochemistry. Dienogest, like all steroids, is also a very rigid molecule, with only one free rotatable bond. A feature of note in the structure is the presence of the nitrile group, in which the carbon atom is reasonably electrophilic (and therefore accessible to attack by nucleophiles) -  some drugs, e.g. saxagliptin use a nitrile as a reactive group for covalent binding to the target. However, in dienogest, the nitrile does not have a mechanistic role in target modulation. Dienogest was first synthesized in the late 1970s.


    Dienogest is highly bioavailable (91%) when orally dosed, and has a half-life of 12.3 hr. Following i.v. dosing, dienogest has a volume of distribution (Vd) of 46 L, and a total Clearence of 5.1 L.hr-1. 10% of circulating dienogest is in a free form, while 90% is bound to serum albumin. In contrast to many steroid like drugs, no significant binding to the steroid carrier proteins - sex-hormone binding globulin (SHBG) and corticosteroid-binding globulin (CBG) is reported. Dienogest is extensively metabolised and cleared by CYP3A4 and other pathways responsible for endogenous steroid catabolism (leading to hydroxylation and conjugation), clearance is primarily in the form of metabolites via the renal route. This reliance on CYP3A4 for metabolism leads to multiple opportunities for drug-drug interactions with both CYP3A4 inhibitors and inducers. At the 3mg dose level used in Natazia, the daily molar dose is 9.6 µmol. The average steady-state plasma concentration (for a p.o. dose of 3 mg o.d.) of dienogest is 33.7 ng/ml (equivalent to 0.11 µM)

    Dienogest has a black box warning - and should not be used in smokers over 35 - in this population the risk of cardiovascular events is increased.

    The full US prescribing information (for Natazia) can be found here. The same product as Natazia, but branded as Qlaira has been available within parts of Europe since 2008 - the Summary of Product Characteristics (SPC) for Qlaira is here. An earlier product (Climodien) containing dienogest, but indicated for hormone replacement therapy (HRT), has been available within some parts of the EU since 2001. The SPC for Climodien is available here.

    The license holder is Bayer Healthcare and the product website is www.natazia.com.

  • Summary of U.S. New Drugs For 2010

    Here is an initial list of the 2010 US new approved drugs (specifically New Molecular Entities). The way we count things, there were 19 novel newly approved drug substantces in the US last year.

    #USANTradenameIcon
    1 Tocilizumab Actemra / RoActemra
    2 Dalfampridine Ampyra
    3 Liraglutide Victoza
    4 Velaglucerase alfa VPRIV
    5 Carglumic acid Carbaglu
    6 Polidocanol Asclera
    7 Denosumab Prolia
    8 Cabazitaxel Jevtana
    9 Sipuleucel-T Provenge
    10 Ulipristal Acetate Ella
    11 Alcafatadine Lastacaft
    12 Pegloticase Krystexxa
    13 Fingolimod Gilenya
    14 Dabigatran Etexilate Pradaxa
    15 Lurasidone Latuda
    16 Ceftaroline Fosamil Teflaro
    17 Eribulin Mesylate Halaven
    18 Tesamorelin Egrifta
    19 Dienogest Natazia


    12 are small molecule drugs, and 7 are biologicals. Of the small molecule drugs, 6 (32%) are small molecule synthetic drugs, 6 (32%) are small molecule natural product-derived drugs, 6 (32%) are biologicals (including peptides, enzymes and mAbs) and one (5%) is a cell-based therapy. Also interesting is the fact that the majority are parenterally dosed (11 of 19) (58%).


    For details on the icon set used in the table, see this link.

    Following some checking, I've added Dienogest to the list (it is part of the combination product Natazia), and updated the analysis below... Some sources are stating that there are 21 'New Drugs' for 2010; however, a 'new drug' is not necessarily the same as an NME, and also there are some inconsistencies on the FDA approval tables for 2010 at the current time (for NMEs that everolimus (Zortress) was first approved in the US in 2010, it was actually first approved in 2009 as Affinitor), that make counting the NMEs for the year problematic. the raw approval data from the FDA is in a series of monthly charts, accessible here (unfortunately, there is no easy, web-friendly way to provide a set of useful links, you'll just have to type in the months). In these tables you should look for the 1s, as being the new NMEs, as you will see, quite a few are unassigned, and as mentioned above there are some errors (e.g. everolimus was first approved (as a new NME) last year, however, under a different Tradename, for a different indication).


    UPDATE: One of the potentially new NMEs of last year is incobotulinumtoxinA (trademark:Xeomin), this is a type A botulinum toxin, in the same class as abobotulinumtoxinA (trademark:Dysport, Reloxin, Azzalure), and onabotulinumtoxinA (trademark:BOTOX). These are essentially identical from an active component perspective (the USAN statements are abobotulinumtoxinA, incobotulinumtoxinA, and onabotulinumtoxinA) and the sequences are essentially identical. It is the convention, that due to the very high potency, and subsequent differences in potency from different production/processing routes for botulinum toxin products, that different USANs are assigned to highlight the non-bioequivalence of different products. This is part of a broader issue of assigning bioequivalence of biological drugs, which has exercised drug producers, regulators, and consumers over recent years. Since we are mostly interested in drugs differentiated by differing molecular structures, we do not consider these are distinct NMEs, and so incobotulinumtoxinA is not counted in our analysis as a new NME. A similar issue occurred last year.

    Another interesting case for a new 2010 biological drug is Collagenase Clostridium histolyticum (approved in the US in 2010 as Xiaflex), which is a defined composition mixture of two bacterial collagenase gene products. Xiaflex is dosed parenterally. In 2004 Santyl was approved as a topical drug for wound debridement; the active ingredient in Santyl is ‘Collagenase clostridium histolyticum’, produced by an entirely different process. It would appear from cursory literature analysis that Santyl has non-articulated composition (this is not the same as having a variable or non-specific composition, just that the components are not in a defined composition in the easily accessible public regulatory documents). There are clear developmental and safety differences between a topically dosed ‘local’ agent (Santyl), and an agent that has full exposure to the circulatory and immune system (Xiaflex), and they serve different patient populations, have different indications, etc. They are clearly non-substitutable in a clinical setting.

    So, how does one treat this case? Should Xiaflex be considered as actually two new NMEs (the independent and related products of the related ColG and ColH gene products, which is actually what the USAN references) towards drug approval innovation numbers, or should it be subsumed under the previous approval of Collagenase Clostridium histolyticum for Santyl. We have taken the view, from the perspective of the approval of ‘new NMEs’, that Xiaflex contains a previously approved active ingredient. Others will take different views.

    More broadly, it is of interest to examine the USAN definition for Xiaflex - it contains two distinct chemical components (the two sequence related collagenase proteins) in a simple mixture - there is nothing special about the mixture - for example, they are not a defined composition obligate heterodimer, and they will be separable from drug substance via straightforward routes under native physiological-like conditions. Some small molecule USANs contain multiple molecules, but these are invariably salts, and in cases where there are two (or more) active ingredients in a small molecule drug, they are typically assigned separate USANs. Furthermore, the convention now is to assign a USAN for the parent small molecule, as well as for each distinct salt, even if the salt is the only component in an approved product. This is in-line with the INN model (where salts are not usually assigned distinct INNs) Logically, to us, from an informatics perspective, it would make sense to assign USANs for Xiaflex at the level of the distinct proteins), and then for Xiaflex to be a ‘product’ containing two USANs as a defined mixture, in the same way the many small molecule mixture drugs are defined. Anyway, the informatics representation of biological drugs, and the concepts of bioequivalence, differences in post-translational processing (proteolytic maturation, N- and O-linked glycosylation, etc) may seem to be a semantic discussion, but it does have important commercial and healthcare implications. This issue will no doubt keep many drug discoverers, regulators, and intellectual property staff employed for some time, and hopefully will eventually bring improved, cheaper and continually innovative healthcare to all.

    Stepping back even further… Given that current drug naming processes and ‘business rules’ were developed at a time when the complexities of biological drugs were not imagined, and also before a time of electronic databases, and the benefits of the application of controlled vocabularies, dictionaries and ontologies were really appreciated - it is interesting to reflect on how it would be done nowadays if starting from scratch. More of this in a future post (maybe).

    In final summary, the number of molecularly novel drugs that were approved in the US last year is between 19 and 22, with the difference being in the way that biological drugs are treated!

  • A local meeting for local people


    Andreas Bender from the Unilever Center in Cambridge has set up a series of meetings for networking of molecular modellers, chemoinformaticians and allied trades  for the Cambridge, UK, area. The meetings alternate between the University in Cambridge and the EBI, these have been great fun so far, and there is now a web-site for the Cambridge Cheminformatic Network, with a Doodle Poll for dates for the 2011 meetings.

  • UK Research Council Science Funding 2011-2015

    Today, RCUK and The UK research councils announced the budgets and priorities for UK science spending for 2011-2015. A link to the BBSRC delivery plan is here.

  • Can we predict late stage attrition?


    Having seen todays surprise regulatory delay on ticagrelor - here is an interesting question. 'Is it possible to predict late stage attrition?' By this, I specifically mean is it possible to predict what potential drugs will be approved from a set of recently disclosed USANs. In 2010, there were 101 new USANs assigned (after removing redundancy through parents/salts, different salts, etc). Some of the 2010 USANs will themselves be salts of previous USANs (e.g. ibuprofen sodium), but this will be a small number.

    So given that, on average, that around 20% of these will be eventually approved within the US, is it possible to predict which 20% will become real products?

    If there is interest in a collaborative project (or maybe even a competition ;) ) on this, I will set up a small project area, INCHIs of the structures, etc. Based on historical averages, it will take about four years to get a reasonable readout, but it might be an interesting set of compounds to follow through (a little bit like the Up Series of documentaries, but for drugs, not kids).