• TACBAC 2012


    A quick reminder of the TACBAC 2012 conference. Previous conferences in the series have been excellent, and so check out the website for some initial conference details.

  • Internship at one of our industry partners

    One of our Industry Programme members, Novartis, has an opportunity for an internship (at Masters / postgraduate level) at their site in Basel in 2012 on a federation project with focus on chemical and patent related data, which would include amongst other things federation of some of the EMBL-EBI resources.

    If you are interested, please mail a cv and covering mail to this address, before 30th November 2011.

  • Molecular Architecture of the Human ADMET System

    Here is an interesting graph, it the the frequency distribution of the functional PFAM domains for the human ADMET system - more specifically, it is the distribution of domain frequencies for the single domain containing proteins (the multidomain set is being curated now). The source data comes from the PharmADME site (the graph includes the "extended set").


    So just 10 distinct functional domains cover almost 75% of the domains (there are a total of 46 domains in this set). By far the most frequent domain is, unsurprisingly, the cytochrome p450 domain (PF00067).

  • Annotating ChEMBL With Disease/Indication Data


    A surprisingly difficult thing to do is to perform a search of ChEMBL for potential anti-cancer compounds - an obvious place to start is by building  a list of genes involved in cancer, and then pulling back all data against these genes, applying activity filters, etc. Alternatively you could think of searching for keywords like 'xenograft' in the assay descriptions, since these are likely to be linked quite strongly for anticancer indications. However, it is a big pain to do, far harder than it should be, but the good news is that we starting to think about fixing this, a little. A surprising number of gotcha's get in the way of doing this - for example a target clearly linked to cancer could be an anti-target in a cardiovascular project, etc.

    The data in ChEMBL is largely centred around 'depositions from projects' in this case the project can be the assembled data in a publication, which was either part of, or an entire 'project' in the authors lab(s), or are depositions that aren't from the literature, for example the GSK malaria HTS dataset. Each of these projects had an intent - they were making compounds to cure gout, or malaria, etc. Capturing this 'intent' data is the key thing to try and do. It is often pretty simple to do just from the title of the paper, for example.

    New Serotonin 5-HT1A Receptor Agonists with Neuroprotective Effect against Ischemic Cell Damage
    
    
    
    as a title gives a pretty clear clue that the intent was to discover compounds useful for the treatment of stroke. That particular paper is here.

    So how successful is the simple approach of looking for the disease area of a project from the titles of papers - it turns out it is pretty good, largely due to the frequent use of canonically constructed titles for the literature - "X for the treatment of Y" where X is a compound-related term and Y is a disease related term.

    So it sounds like a simple problem, take the titles, tag them up with disease terms and away you go. However, this is where it gets complicated - there is not a good taxonomy/ontology for diseases, at least one that maps back into discovery space well. You can seem to get quite a way with a list of synonyms for various diseases, but the world needs a common public dictionary/vocabulary for disease terms. To balance this lack of a fantastic existing standard, there is the ATC classification for drugs, it is a little inconsistent in the way it mixes chemotypes, pathways and targets, but it is stable, accepted and robust - and all new approved drugs will be placed into this taxonomy, so let's see what ATC style tagging can achieve.

    What we are specifically trying is to take neglected diseases (defined here as tuberculosis, helminth infections, schistosomiasis, trypanosomiasis, HIV/AIDS and malaria), and manually tag up the assays in ChEMBL with the corresponding ATC codes, at the 'depth' supported by the title - for example if malaria compounds are artemisinin-based, then it can be placed at a deeper level (P01BE) than just a malaria targeted approach (P01B). Of course, it's then possible to do cool things that group data across the span of the ATC classification.

    We'll let you know how we get on!

  • Deadline Approaching for Current Recruitment in ChEMBL


    The deadline is approaching for two posts in the ChEMBL group - one a web developer, and the other a data integration position. The posts are three year fixed term contracts. Closing dates for applications is the 27th November 2011.

    Further details should be available here
    If you have any questions, please feel free to contact us.

  • Recommendations for a MySQL Chemical Data Cartridge?


    What options are there for a MySQL Chemical Structure Cartridge ? - the constraints are that the license needs to be Open (to commercial and non-commercial users). Post away in the comments, then everyone can see the answers.

    Update: for a little background on our specific interests - we wish to build a deployable and distributable version (a package or vm) with a preconfigured and loaded current ChEMBL database, capable of performing full chemical search capability. Deployment could be as a Linux style package, or as an Amazon EC2 instance. Our internal systems here at the EBI are based on Oracle, and the MDL (or whatever the current name is :) ) Direct cartridge - this configuration is sometimes beyond the reach of many budgets, and so we are interested in exploring a 'free' but useful version of ChEMBL.

    Update 2: So Postgres opens up quite a few more options....

  • Movember Donations from Outside the United Kingdom




    In responses to a question from one of the ChEMBL-og readers - I've just checked, and it is possible for non-UK residents to donate to the EBI Movember team - The Bioinformoustachians. Link for donations is above.

  • USAN Watch - November 2011

    The USANs for November 2011 have just been published.

    USAN Research Code StructureDrug ClassTherapeutic classTarget
    bupivacaineSKY-0402synthetic small moleculetherapeuticsodium channels
    condoliaseSI-6603enzymetherapeutic
    ixazomibMLN-2238synthetic small moleculetherapeuticproteasome
    ixazomib citrateMLN-9708synthetic small molecule prodrugtherapeuticproteasome
    panobinostatLBH-589synthetic small moleculetherapeuticHDACs
    tivantinibARQ-197synthetic small moleculetherapeuticMET
    trelagliptin succinateSYR-472, SYR-111472synthetic small moleculetherapeuticDPP-IV


    Bupivicaine is a really old compound, with Bupivacaine Hydrochloride assigned a USAN in 1967, I guess it is on the list now as the non-salt form.