• Developer position in ChEMBL available



    We now have an immediately available position within the ChEMBL group for an exciting project looking at data integration of ADMET data alongside structural and comparative genomics data. This role requires good general bioinformatics knowledge, programming in perl (or equivalent), knowledge of SQL, and database querying and data integration. Previous experience with ADMET data would be beneficial.

    More details are available on request.

  • ChEMBL Web Services are now in BioCatalogue



    BioCatalogue is a registry of Web Services in the biosciences. All the ChEMBL web services (including the REST and SOAP PSICQUIC) services are now listed in BioCatalogue.

  • New Drug Approvals 2011 - Pt. VIII Gadobutrol (GadavistTM)








    ATC code : V08CA09

    On March 14th 2011, the FDA approved Gadobutrol (USAN: Gadobutrol USANdate: 2010 tradename: Gadavist NDA 201277), a gadolinium-based contrast agent, for intravenous (i.v.) use in diagnostic MRI to detect and visualize areas with disrupted blood brain barrier and/or abnormal blood supply of the central nervous system. In MRI, when an electromagnetic field is applied, the hydrogen nuclei present in the body, flip their spin and align with the direction of the field. Once the field is turned off, the hydrogen nuclei decay to the original spin-down state and release a photon, corresponding to the energy difference between the two states. Since, hydrogen nuclei in different tissues return to their equilibrium state at different rates, a distinctive image can be obtained. Gadobutrol enhances the contrast in MIR images, by decreasing the spin-lattice relaxation time (T1) and the spin-spin relaxation time (T2).

    The gado- USAN/INN stem covers gadolinium derivatives used for diagonistic/imaging use; other approved drugs from the V08CA ATC class include gadoteridol, gadoversetamide, gadodiamide, gadobenic acid, gadopentetic acid, gadoxetic acid and gadofosveset acid.






    Gadobutrol (IUPAC: 2-[4,10-bis(2-oxido-2-oxoethyl)-7-(1,3,4-trihydroxybutan-2-yl)-1,4,7,10-tetrazacyclododec-1-yl]acetate; gadolinium(3+) PubChem: CID 15814656) is a gadolinium(III)(Gd3+) chelate with two chiral centers. It has a molecular weight of 604.7 Da, and contains three hydrogen bond donors and thirteen hydrogen bond acceptors.

    In comparison with the other gadolinium chelates, gadobutrol has a macrocyclic framework and is overall neutral in charge.

    Gadobutrol is dosed intravenously, with a recommended dose of 0.1 mL/kg body weight (0.1 mmol/kg). It is more concentrated than other gadolinium-based contrast agents and should be administered at half of the volume. Gadoburtol is cleared from the plasma, after an intravenous injection (iv) with a mean terminal half-life (t1/2)of 1.81 hr. It is excreted in an unchanged form via the kidneys.

    Gadavist has a black box warning and should not be used in patients with impaired elimination of drugs. It may increase the risk of nephrogenic systemic fibrosis (NSF).

    The license holder for Gadavist is Bayer HealthCare Pharmaceuticals, the product website is here, and the full prescribing information can be found here (Gadobutrol was approved in the EU in 2000 under the tradename Gadovist, the European SPC can be found here).

  • Webinar - ChEMBL Web Services and Schema Walkthrough


    We plan to run some webinars detailing recent progress and changes with some of our services.

    The first will be a webinar on the ChEMBL REST web services and programmatic access at 3 pm BST on 31st March 2011. Please mail to register for this.

    The second will be a webinar on the current ChEMBL schema at 3pm BST on 30th March 2011. Please mail to register for this.

    Please note that at this time of year, not all countries are synchronised on the transition to Daylight Savings time (if indeed they do change). Please check mapping of your local time to BST. Please, please try and resist the temptation to edit the subject lines of the mail links above.

  • Visitor Talk - Jens Loesel - MedChem Attractiveness and Redundancy - Looking for value in Compounds and Chemical Space

    We have a visitor to the EBI on 6th April 2011 - Jens Loesel from Pfizer, Sandwich. Jens will give a talk at 1pm on some large-scale cheminformatics analysis of the Pfizer screening file that he has done. If any off campus people want to come to the talk, they are very welcome; but I will need a name and affiliation to get them past security - mail me. An abstract for the talk is below....

    MedChem Attractiveness and Redundancy - 
    Looking for value in Compounds and Chemical Space
    Jen Loesel, Pfizer


    A more diverse screening file is a better screening file. A bigger screening file is a better screening file. Are these statements really true? We will critically scrutinize both these questions in the talk.
    In part 1 we will investigate the quality of chemical structures. A good screening file needs to balance quality versus diversity.

    We generated an algorithm that is purely based on structure to achieve this. The algorithm is able to compete with medicinal chemists in ranking the attractiveness of compounds as defined by the consensus opinion of multiple chemists. We called the score MedChem Attractiveness (MCA ). The score is an important step towards quantifying the quality of chemical structures. The score complements existing algorithms for novelty and diversity as well as filters like the Ro5.

    In part 2 of the talk we look at the size and economy of the screening file. The value of the whole screening file isn’t simply the sum of all its individual compounds. There is a limit at which a screening file becomes too big and costly for the aim it tries to solve – finding new leads for novel MedChem projects in an efficient manner?

    Primary screens at Pfizer often yield large numbers of very similar hit compounds. These large clusters of active compounds represent limited value for Hit Identification beyond the first few active members. To streamline our screening operation we analysed the probability of finding actives in recent HTS screens based on fingerprint similarity. We combined the results from the HTS analysis with Belief Theory. This allowed us to define the ideal density of neigbours in chemical space for lead identification. Based on that density we defined a new property of the chemical space we call Redundancy. Redundancy represents the fraction of compounds populating chemical space beyond the ideal density for efficient Hit Identification screening.

    This work was no academic exercise. The model resulted in the permanent deletion of >1 million compounds from the screening file. The result is a higher quality and more efficient Pfizer screening file for the future. Both algorithms are very generic and can be applied or adapted to a variety of other uses.

  • New Drug Approvals 2011 - Pt. VII Belimumab (BenlystaTM)




    ATC code : L04AA26

    On March 9th, 2011, the FDA approved Belimumab (trade name: Benlysta, ATC code L04AA26), an immunosuppressant human monoclonal antibody, for treatment of patients with systemic lupus erythematosus (SLE, OMIM:152700, ICD-10:M32.), a systemic autoimmune disease. The prevalence of SLE varies among differing ethnic groups, and countries, e.g. 40 per 100,000 in Northern Europe, 53 in 100,000 in the US, and 159 in 100,000 among people of Afro-Caribbean descent; this translates to about 159,000 cases in the US, among 1.5 million cases of different forms of lupus in general. In SLE, periods of illness alternate with remissions, and symptoms are diverse, comprising fever, malaise, joint pains, myalgias, and fatigue, but also dermatological symptoms (e.g. malar rash), anemia, cardiac, pulmonary and renal impairments as well as neurological and neuropsychiatric syndromes such as headache and depression, rendering diagnosis challenging. SLE is currently incurable, and its symptomes are traditionally treated with powerful agents such as cyclophosphamide, corticosteroids and immunosuppressants.

    Belimumab acts by binding to the soluble form of B-lymphocyte stimulator (BLyS, a.k.a. BAFF, TNFSF13B, CD257, Uniprot:Q9Y275 Pfam:PF00229), a member of the TNF superfamily of proteins. BLyS promotes the survival and development of B-lymphocytes into mature plasma B cells; these key immune system cells produce antibodies, mediating the humoral immune response. BLyS was discovered for its immune stimulant properties in 1999 by Human Genome Sciences (HGS), who jointly with GSK then developed Belimumab as an effective BLyS inhibitor, and ultimately the first new lupus drug since 1955.

    As the name Belimumab implies, it is a human (-u-) immunomodulatory (-lim-) antibody.

    The structure of the soluble form of BLyS is known, a typical pdb entry is 1kxg.


    After reconstitution of lyophilized powder, Benlysta is diluted to the recommended dosage of 10 mg/kg and injected intravenous at 2-week intervals for the first 3 doses, and at 4-week intervals thereafter. The distribution half-life (t1/2) of Belimumab is 1.75 days, the terminal half-life (t1/2) is 19.4 days, with a steady state Volume of distribution (VSS) of 5.29 L, and a Clearance (Cl) of 215 mL.day-1.

    The full prescribing information can be found here, and the product website is here.

    Benlysta is manufactured by HGS and marketed by HGS and GSK.

  • A Taxonomy for Drugs: 2 - Stereochemistry


    The next area for consideration in our descriptive taxonomy for drugs is stereochemistry. There are many differing types of stereoisomers encountered in general chemistry, and the area is complex, but the majority of these are not relevant for drug discovery; for example atropisomers, although an important effect is not important for among drug substances. The most significant category of stereochemistry for drug like molecules involves chiral centers at sp3 hybridised carbon atoms connected to four chemically distinct atoms (often giving rise to enantiomerism). Another relevant case of stereoisomerism for drugs are diastereoisomers, these are stereoisomers that are not enantiomers, and include cis-/trans- (E-/Z-) configuration of alkenes.

    So for Drug_Stereochemistry_Class, a drug can be:
    • Chiral - containing a single defined stereoisomer of the drug substance, and which lacks an internal plane of symmetry.
    • Racemic - containing a mixture of stereoisomers of the drug substance.
    • Achiral - composed of a drug substance that does not display chirality.
    • Other - displaying a physiologically relevant stereochemical property not covered by the classes above.
    The vast majority of biological monomers (e.g. amino acids, nucleotides, sugars)  are chiral, and polymers of these are also chiral (so biological drugs and drug targets within the body). So biological drugs are 'chiral', but since it is so ubiquitous for molecules of this class, the convention is to ignore the issue of chirality for biologicals. For small molecule drugs, the importance is more significant, both scientifically and commercially, and several drugs which were initially synthesized and marketed as racemic mixtures, have subsequently been developed in a chirally pure form. An example of this is Omeprazole, which was subsequently replaced by the 'active' S-enantiomer Esomeprazole. By convention, USANs and INNs for chirally distinct forms of a molecule have either ar- as a prefix for R-configuration and es- as a prefix for S-configuration forms. There is no correlation between the +/- labelling and R/S labelling of chirally active molecules. Previously, the USANs/INNs of chirally pure drugs were often denoted with levo- and dextro- prefices.

    It is important to note that since drugs tend to interact with chiral receptors, enantiomers will have different binding affinities against a target (or set of targets), metabolic routes, side-effects, half-lives, etc., and so in general there is usually more interest in developing a chirally pure or achiral drug. Chiral centers often add significantly to the synthetic complexity and cost of manufacture of a drug, and so again there are pressures to develop achiral drugs where possible. So as a general rule, achiral drugs are 'preferred' over chiral drugs which in turn are preferred over racemic drugs. There is a fuller discussion of isomerism and drug development here.

    Although most chirality in drugs occurs at sp3 carbon atoms, an important and often neglected case is for sulfoxides, where the geometry around the sulphur atom is tetrahedral, and optically active isomers are possible.

    For example:

    Sildenafil is an achiral drug.
    Levodopa is a chiral drug.
    Armodafanil is a chiral drug (containing a chiral sulfoxide).
    Citalopram is a racemic drug.
    Abciximab is a chiral drug.

  • A Taxonomy for Drugs: 1 - Drug Class



    It is intuitive to describe what a drug is in natural language - a small molecule, etc, but one problem is that these descriptive terms are context dependent, loosely defined and are used very variably across the literature; and so when someone asks 'How many small molecule drugs there are?' - first of all it depends on what is a 'drug', and secondly what is a 'small molecule'. As far as I can tell there is not a descriptive taxonomy for drugs (I use the term taxonomy here as a bridge term between a controlled vocabulary/dictionary and an ontology). For our own purposes within ChEMBL we need such a taxonomy, but post our initial thoughts here for comment, and no doubt (and hopefully), significant correction and improvement (use the comment section of the blog, then everyone can see any discussions).

    Drugs are regulated products that are 'intended use in the diagnosis, cure, mitigation, treatment or prevention of disease' - let's not visit what a disease is, but move swiftly on to trying to sub divide this into obvious and useful categories/classes when thinking about molecular drug structures.

    So for a Drug_Class, categorisation into the following seems intuitive and useful. Most of the action in drug discovery will be connected to the "Therapeutic" class.
    • Therapeutic - A substance with a curative action on a disease.
    • Supplement - A substance, used to address a deficiency of that substance (or of that substances normal function).
    • Imaging Agent - A substance used to image a molecule or structure within the body.
    • Diagnostic Agent - A substance used in the diagnosis of a disease, not involving imaging.
    • Other - A substance not covered by the categories above.
    Drugs are then typically divided into Drug_Types - small molecules and biologicals.
    • Small Molecule - A substance with a molecular weight less than 1500 Da that is otherwise not a Biological.
    • Biological - A substance primarily composed from monomers of naturally occuring substances (e.g. amino-acids, sugars, nucleotides, etc.).
    • Other - A substance not covered by the categories above.
    I hate these sort of self referential/recursive definitions, but please mail improvements! Within each of these two main classes there are some relevant, pragmatic and useful subdivisions - Drug_Type_Subclass.

    For Small Molecules:
    • Inorganic - A non-organic substance.
    • Natural Product-derived - A substance that is derived from a naturally occurring primary or secondary metabolite.
    • Synthetic - An organic substance that is not derived from a naturally occurring priamry or secondary metabolite.
    • Other - A substance that is a Small Molecule which is not covered by the categories above.
    For Biologicals:
    • Monoclonal antibody (mAb) - A substance similar in sequence to an antibody sequence.
    • Vaccine - A substance that acts through eliciting an acquired immune response in the patient.
    • Enzyme - A substance acting as a catalyst for a chemical reaction.
    • Virus - A substance with the biological characteristics of a competant virus.
    • Cell - A substance with the biological characteristics of a competant cell.
    • Peptide - A substance which is a polymer built primarily from amino acids, containing between two and twenty amino acids.
    • Protein - A substance which is a polymer built primarily from amino acids, containing in excess of twenty amino acids, and that is not a monoclonal antibody.
    • Oligosaccharide - A substance which is a polymer built primarily from sugar-like monomers.
    • Oligonucleotide - A substance which is a polymer built primarily from nucleotide-like monomers.
    • Other - A substance which is not covered by the categories above.
    So, a consistent, semantically useful description of a specific drug is constructed from a combination of Drug_Type_Subclass, Drug_Type, and Drug_Class.

    For example:

    Sildenafil is a "synthetic small molecule therapeutic"
    Abciximab is a "monoclonal antibody biological therapeutic".
    Vitamin D3 is a "natural product-derived small molecule supplement".
    Ioflupane 123I is a "synthetic small molecule imaging agent"

     Any feedback, pointers to any existing classifications/taxonomies, etc. would be very welcome.

    Parts 2, 3 and 4 for this Drug taxonomy will be posted shortly.