• canSAR v1.0 launched

    canSAR logo Yesterday, Cancer Research UK press released the launch of the first full version of canSAR - the Institute of Cancer Research's integrated cancer research and drug discovery resource. canSAR integrates large volumes of disparate data covering most aspects of cancer biology and chemistry, and is an example of how to complement the chEMBL database with therapeutic area specific knowledge. canSAR integrates biological annotation, gene expression, RNA interference studies, structural biology and protein interaction network data - as well as chemical and pharmacological data. It contains annotation on the entire human proteome, and contains >8 million experimental data points including RNAi and chemical screening data. For full release notes please see canSAR news. canSAR is updated monthly. As well as the wonderful chEMBL, the data in canSAR comes from a large number of sources, including ArrayExpress, PDBe, ROCK, STRING, Genomics of Drug Sensitivity in Cancer, COSMIC, BindingDB, SCOP, PFAM- we (at the ICR) are grateful to our friends at all these places for their help. In the new year, we will be holding a series of webinars and walkthroughs, and details of these will be posted on the ChEMBL-og.

  • Any old unwanted SGI dial boxes?


    I am doing a surprising amount of molecular graphics stuff at the moment, and have finally realised I don't have the skills or coordination to use a mouse/keyboard to do simple rotations/scaling/clipping etc. So, it turns out it's possible to connect up a dial box to a aMacBook Pro, with a little bit of messing around with drivers. So does anyone have an old, unwanted dial box for an SGI machine? Part numbers 9980991, 9980992, and 9780804 are all apparently OK. If you have one of these, let me know.

  • 2010 New Drug Approvals - Pt. XVIII - Tesamorelin (Egrifta)


    ATC code (partial): H01AC

    Also this month, on November 10th, FDA has approved Tesamorelin under the trade name Egrifta. Tesamorelin (research code:TH-9507) is an analog of the human growth hormone-releasing factor (GRF) (UniProt:P01286, synonym:Somatoliberin, synonym:GRF, synonym:GHRH) indicated for the reduction of excess abdominal fat in HIV-infected patients with lipodystrophy. Lipodystrophy is a condition in which excess fat develops in atypical areas of the body, most notably around the liver, stomach, and other abdominal organs. This condition is observed as a side effect with many antiretroviral drugs used to treat HIV. Tesamorelin is the first-FDA approved treatment specifically approved for lipodystropy.

    The -relin INN stem covers prehormones or hormone releasing peptides, a very broad range of targets and pharmacology. The -morelin stem sub-group covers growth hormone-release stimulating peptides including capromorelin, dumorelin, examorelin, ipamorelin, pralmorelin, rismorelin, sermorelin, somatorelin, and tabimorelin.


    Tesamorelin is an N-terminally modified variant of the natural 44 residue long peptide, Somatoliberin, which is a hypothalamic peptide, acting on the pituitary somatotroph cells to stimulate the synthesis and pulsatile release of endogenous growth hormone (GH), which is both anabolic and lipolytic. Somatoliberin is a member of the glucagon family (Pfam:PF00123) of endogenous peptide ligands. Tesamorelin exerts its therapeutic effects by binding to, and being an agonist of GHRHr - a type-2 (or class B, or secretin-like) GPCR (Uniprot: Q02643, ChEMBL:CHEMBL158, Pfam:PF00002), on pituitary somatotrophs; the triggered release growth hormone (GH) in turn acts on a variety of target cells, including chondrocytes, osteoblasts, myocytes, hepatocytes and adipocytes, resulting in a host of pharmacodynamic effects, which are primarily mediated by insulin-like growth factor 1 (IGF-1) produced in the liver and in peripheral tissues.

    Tesamorelin has Molecular Weight of 5135.9 Da, absolute bioavailability, following s.c. dosing is less than 4%, with a volume of distribution of 10.5 L/kg (in HIV-infected patients) and an elimination half-life (t1/2) of 38 minutes (again in HIV-infected patients). The recommended dosage is 2 mg injected subcutaneously daily (typically in the abdomen) - a typical daily dose is therefore 0.39 umol).



    trans-3-hexenoyl-YADAIFTNSYRKVLGQLSARKLLQDIMSRQQGESNQERGARARL-NH2

    Tesamorelin is produced synthetically and is otherwise identical in amino acid sequence to that of human Somatoliberin/GRF. Tesamorelin is then modified by attachment a 3-hexenoyl moiety via an amide linkage to the N-terminal tyrosine residue. This chemical modification blocks proteolytic degradation by endogenous proteins such as DPP-IV, thus prolonging the half-life of the peptide (the inhibition of DPP-IV is itself the basis of a number of therapies for the treatment of type-II diabetes - the gliptins). A further chemical modification is the C-terminal amidation - this p.t.m. is found in the naturally produced peptide. Tesamorelin is closely chemically related to a number of other clinical agents, such as Sermorelin (which is a shorter, but still active version of Somatoliberin/GRF)

    The full prescribing information can be found here.

    The license holder is EMD Serono, Inc. and the product website is www.egrifta.com.

  • Domain-level annotation of binding-sites for ligands within ChEMBL

    One big problem of simple sequence searching with tools like blast with ChEMBL is the problem of the introduction of contextually incorrect target relationships due to matching of irrelevant domains. For example, imagine a protein, X, that contains two domains of types A and B, and a second protein, Y, which contains also two domain types, B and C. If the ligand is known to bind to domain type A, there is no ligand-binding relationship between X and Y; however, if the ligand binds at domain type B in X, then there is a relevant relationship between X and Y. This may sound like an rare example, but it is surprisingly common (and extremely annoying), since the majority of eukaryotic proteins are multi-domain, and the presence of certain domains, such as an EGF-like domain (Pfam:pf00009) can greatly complicate the analysis of sequence searches. What is really needed is a reliable mapping (or more generally a probabilistic score) of the ligand-binding domains within a particular protein.

    Enough of all these Xs and Ys! Here is a real example, for three interesting proteins, Axl, Lck and SOCS3. As you can see, protein kinase domain inhibitors are only 'transferable' between Axl and Lck, while SH2 binders are only 'transferable' between Lck and SOCS3.




    Here is a graph (as a pie chart) of the Pfam domains for the ligand binding regions of all the protein targets in the current (Chembl_08) target dictionary. The annotation, was performed by a simple classifier heuristic, and we are validating the accuracy of this approach at the moment, but it appears to be largely correct. Once we're happy with the results, we'll add the ligand-binding-domain data to the target dictionary.

  • Do you want to know about the Chembl Database User Group meeting?

    The ChEMBL Database User Group on Linkedin is perilously close to 100 members - in fact we need just one more to make it to the magic century! We have found a well known industry figure to help organise our first User Group meeting, and we'll start posting details shortly on the LinkedIn group site.

  • What Are The Key Clinical Candidate Disclosure Meetings?


    Here is a call for assistance, all input will end up published on The ChEMBL-og, and accessible to one-and-all. What we're looking for is a pretty comprehensive list of key clinical candidate disclosure meetings, ideally those with disclosure of chemical structure, functional assay, pharmacokinetic and toxicology data - you know the sort of meeting, where the key data on a hot compound is disclosed for the first time.

    I've put together a preliminary list here, from memory and a little bit of googling - this is far, far, far from perfect and is woefully incomplete, and am now looking for addition of extra meetings for the areas not covered, and some highlighting of additional ones. As you will see I've used the ATC classification for the structure of the list - although this is not perfect for things like anti-microbials, etc., it is actually a pretty good framework to hang this off.

    If you have any suggestions, please mail them in....

    Finally, if you are interested in hearing our plans, and maybe collaborating on some informatics aspect of this, feel free to contact us.

  • 2010 New Drug Approvals - Pt. XVII- Eribulin Mesylate (Halaven)





    ATC code (partial): L01C

    On November 15th, 2010, the FDA approved Eribulin Mesylate (ResearchCode:E-7389) under the trade name Halaven (TradeMark:Halaven). It is indicated for for the treatment of patients with late stage, metastatic breast cancer who have previously received at least two chemotherapeutic regimens for the treatment of metastatic disease. Phase III trials showed that patients survived a median of 2.5 months longer than patients treated with other current alternatives. Eribuln is a synthethic analogue of halichondrin B, a cytotoxic polyether macrolide marine natural product.

    The mechanism of action of Eribulin is anti-mitotic and is mediated via tubulin binding, where it leads to G2/M block in the the cell-cycle; after prolonged stalling in this state, cells enter apoptosis and are then cleared.


    Eribulin is a large (Mwt 729.9 for Eribulin and 826.0 for the mesylate salt) synthetic compound (an analogue of halichondrin B) an IUPAC name of the structure is 11,15:18,21:24,28­ Triepoxy-7,9-ethano-12,15-methano-9H,15H-furo[3,2-i]furo[2',3':5,6]pyrano[4,3­ b][1,4]dioxacyclopentacosin-5(4H)-one, 2-[(2S)-3-amino-2-hydroxypropyl]hexacosahydro-3­ methoxy-26-methyl-20,27-bis(methylene)-, (2R,3R,3aS,7R,8aS,9S,10aR,11S,12R,13aR,13bS,15S,18S,21S,24S,26R,28R,29aS)-, methanesulfonate (salt). The most striking part of the structure is the highly fused, rigid ring system, as you would expect, the synthesis is complicated. The structure contains many of the classical features of natural products - a high number and fraction of defined chiral centers, a high ratio of oxygens to nitrogens, and a high ring count.

    The recommended dosing is 1.4mg/m2 as two intravenously delivered doses, separated by seven days, repeated after a further two weeks. An average adult human has a skin surface area of ca. 1.8 m2, so this would equate to a single dose of ~3 umol)  The mean half-life of Eribulin is ~40 hr, with a mean volume of distribution of ~80 L/m2, and a mean clearance of ~1.8 L/hr/m2. Plasma protein binding is around 58%. Eribulin is metabolically stable and is largely unmetabolised, with the majority of the dosed drug being excreted as the dosed form in the feces.

    Eribulin binds at (or near) the vinca domain of tubulin, a region that is located at the interface of two tubulin heterodimers when arranged end to end and overlaps the exchangeable GTP site on β-tubulin (Bai et al). β-tubulin is small family of related human proteins (PFAM:PF03953, HOMSTRAD:tubulin, and UniProt:P07437 for a specific member) that are key components of microtubules. There are multiple isoforms of β-tubulin e.g. "tubulin-beta1" , ChEMBLDB ID: CHEMBL1915, canSAR:link; and "tubulin-beta5", ChEMBLDB_ID:CHEMBL5444, canSAR link. Multiple 3-D structures are available for alpha-/beta-tubulins including PDBe:1tub. Tubulins are the target of several other classes of anticancer drugs, such as Paclitaxel (aka taxol) and Vinblastine (both similarly cytotoxic natural products)

     
    NAME="Eribulin Mesylate"
    TRADEMARK_NAME="Halaven"
    ATC_code= L01C
    SMILES="CO[C@@H]([C@@H](C[C@H](O)CN)O1)[C@@H](CC(C[C@@H]2O[C@@]([C@H]3C4[C@@]([C@@H]5[C@@H](C6)O4)([H])O7)([H])[C@]7([H])CC2)=O)[C@@H]1C[C@@H](O[C@@H](CC[C@H]8C(C[C@H](CC[C@]6(O5)O3)O8)=C)C[C@H]9C)C9=C.CS(O)(=O)=O"
    InChI="/C40H59NO11.CH4O3S/c1-19-11-24-5-7-28-20(2)12-26(45-28)9-10-40-17-33-36(51-40)37-38(50-33)39(52-40)35-29(49-37)8-6-25(47-35)13-22(42)14-27-31(16-30(46-24)21(19)3)48-32(34(27)44-4)15-23(43)18-41;1-5(2,3)4/h19,23-39,43H,2-3,5-18,41H2,1,4H3;1H3,(H,2,3,4)/t19-,23+,24+,25-,26+,27+,28+,29+,30-,31+,32-,33-,34-,35+,36+,37+,38?,39+,40+;/m1./s1/i1-12,2-12,3-12,4-12,5-12,6-12,7-12,8-12,9-12,10-12,11-12,12-12,13-12,14-12,15-12,16-12,17-12,18-12,19-12,20-12,21-12,22-12,23-12,24-12,25-12,26-12,27-12,28-12,29-12,30-12,31-12,32-12,33-12,34-12,35-12,36-12,37-12,38-12,39-12,40-12,41-14,42-16,43-16,44-16,45-16,46-16,47-16,48-16,49-16,50-16,51-16,52-16;1-12,2-16,3-16,4-16,5-32"
    ChemDraw=eribulin.cdx
    

    Full prescribing information here The license holder for Halaven™ is Eisai Inc.

  • GPCR Structures - D3 structure published in Science


    Yet another GPCR structure (when I was younger, these four simple words side by side would have been unbelievable), this time of the human D3 receptor (Uniprot:P34562) at 2.9 Angstrom resolution and complexed with the antagonist Eticlopride (Chembl:CHEMBL8946), there are two molecules of D3 in the a.u. and the structure is available from public protein structure databases (PDBe:3pbl).


    This makes a total of 7 distinct rhodopsin-like GPCR structures now known.

    %J Science
    %D 2010 
    %V 330
    %P 1091-1095 
    %T Structure of the Human Dopamine D3 Receptor in Complex with a D2/D3 Selective Antagonist
    %A E.Y.T. Chien
    %A W. Liu
    %A Q. Zhao
    %A V. Katritch
    %A G. Won Han
    %A M.A. Hanson
    %A L. Shi
    %A A. Hauck Newman
    %A J.A. Javitch
    %A V. Cherezov
    %A R.C. Stevens