• EIPOD postdocs 2012


    Just a reminder of the opening of applications for the EMBL postdoc scheme - EIPOD for 2012. Get in touch if you're interested in our draft project.

  • Public Popular Chemistry Databases and Licensing


    Licensing of data, and copyright is a complex thing, and always gets people hot under the collar! Some time ago, following consultation with our funders, we settled on a CC-BY-SA license for ChEMBL - this does a couple of things, but primarily it places an explicit license on the data so it is clear what you can do with it. There is a lot of hot air and active discussion over how 'public' and 'open' particular licenses are, but the CC-BY-SA 3.0 license made sense to us (for reference, this license is also used by the world's premier open resource wikipedia - here is their license)

    This license we apply to each release of the database, it makes the data freely available and usable. It requires attribution, so that users of derivative works know where the data comes from, can identify the funders and producers of the work - which we think is fair and appropriate, and finally it applies 'share-alike', so that if you distribute the ChEMBL data further you shouldn't restrict the rights of your users to further give away, use, remix, etc. the data. To be clear though, it does not preclude commercial use of the data.

    The Share Alike clause is the thing that people usually get excited by, but this is meant to ensure that if you distribute the data to others you make sure that you also give those you distribute it to the same right to redistribute. If you have a significant problem with this, then don't redistribute any data you get under a SA license; if you find it difficult to keep track of the provenance of data entering your systems, should you really be building stuff to distribute anyway? ;)

    Chemistry is a relatively odd world compared to bioinformatics, in that users are generally worried about inadvertent disclosure of their ideas and queries - the basis of the concern is that running a search over the Internet, over an open network, can amount to disclosure and 'publication', and could in theory void a patent through prior art disclosure. As you are probably aware, it is easy to monitor and record traffic over public networks, recovering passwords, etc. Secondly, there is a general suspicion over what happens to recording usage on the servers - do the providers of the web service mine the queries? sell them on? and other sort of paranoid stuff. Well it is not as paranoid as you may think, maybe, since several large internet sites explicitly state in the Terms and Conditions that your query becomes their intellectual property. Sloppy programming, in particular with advertiser sites, can disclose a whole bunch of query data to advertisers.

    One of the ways of dealing with the former issue, is to provide access over the https: protocol, this encrypts the traffic between your client browser and the server, and also prevents impersonation of the server by another machine (to most reasonable intents and purposes this is still true). The same secure http: protocol can be applied to make programmatic web services secure too.

    There are a number of large 'free'/Open chemical databases at the moment, and we drew up a little table the other day comparing the license and access. It's not complete, probably contains some errors, so if anything is wrong, please let me know. If there are other Open resources to add, put something in the comments, and I'll add it to the table.

    Update - thanks to Richard of the RSC, I've corrected some ambiguities in the original table.

    ResourceUrl (http: protocol form)LicenseDownloadhttps:
    ChEMBLhttp://www.ebi.ac.uk/chemblCC-BY-SAyesyes

    BindingDBhttp://www.bindingdb.org/CC-BY-SAyesno

    PubChemhttp://pubchem.ncbi.nlm.nih.gov/No clear license statementyesyes

    ZINChttp://zinc.docking.org/No redistribution of significant subsets without permissionyesyes, but certificate expired

    ChemSpiderhttp://www.chemspider.com/No download, and limited to a total 5,000 data entries stored locally. API access free to academic users, others by agreement.noyes, but broken

  • Postdoc project in in silico/biochemical target prediction



    We have an interdisciplinary postdoc project available as part of EMBL's EIPOD program (details here). The project with is Matthias Willmanns based at EMBL Hamburg, and the appointee will spend time in both labs in a combined computational and experimental project aimed at discovering the mode of action of high-throughput screening hits from an anti-tuberculosis assay.

    Further details of the project are available here. This is deliberately brief, and candidates are meant to flesh out the project design as part of the application process.

    The deadline for applications is 5pm CEST 13th September 2012.

  • New Drug Approvals 2012 - Pt. XIV - Mirabegron (MyrbetriqTM)


    ATC Code: G04BD (incomplete)
    Wikipedia: Mirabegron


    On June 28 2012, the FDA approved Mirabegron (tradename: Myrbetriq; Research Code: YM-178), a novel, first-in-class selective β3-adrenergic receptor agonist indicated for the treatment of overactive bladder (OAB) with symptoms of urge urinary incontinence, urgency, and urinary frequency. OAB syndrome is a urological condiction defined as urinary urgency, usually accompanied by frequency and nocturia, with or without urge urinary incontinence, in the absence of urinary tract infection or other obvious pathology. Mirabegron acts by relaxing the detrusor smooth muscle during the storage phase of the urinary bladder fill-void cycle by activation of β3-receptor which in turn increases bladder capacity.

    Other treatments for OAB are already in the market and these include treatments with antimuscarinic drugs, such as Flavoxate (approved in 1970; tradename: Urispas; ChEMBL: CHEMBL1493), Oxybutynin (approved in 1975, tradenames: Ditropan, Ditropan XL, Oxytrol, Gelnique, Anturol; ChEMBL: CHEMBL1231), Tolterodine (approved in 1998; tradenames: Detrol, Detrol LA; ChEMBL: CHEMBL1382), Trospium (approved in 2004; tradenames: Santura, Santura XR; ChEMBL: CHEMBL1201344), Solifenacin (approved in 2004; tradenames: Vesicare; ChEMBL: CHEMBL1200803), Darifenacin (approved in 2004; tradenames: Enablex; ChEMBL: CHEMBL1346) and Fesoterodine (approved in 2008; tradenames: Toviaz; ChEMBL: CHEMBL1201764). While these drugs act by inhibiting the muscarinic action of acethylcholine, Mirabegron represents the first β3-receptor agonist to ever reach the market.

    β3-receptor (ChEMBL: CHEMBL246; Uniprot: P13945) is a 408 amino-acid long G protein-coupled receptor (GPCR), belonging to Rhodopsin family (PFAM: PF00001; subfamily A17). Crystal structures of the closely related β1- and β2-receptors are known and act as good frameworks for understanding the mode of action of Mirabegron.

    >ADRB3_HUMAN Beta-3 adrenergic receptor
    MAPWPHENSSLAPWPDLPTLAPNTANTSGLPGVPWEAALAGALLALAVLATVGGNLLVIV
    AIAWTPRLQTMTNVFVTSLAAADLVMGLLVVPPAATLALTGHWPLGATGCELWTSVDVLC
    VTASIETLCALAVDRYLAVTNPLRYGALVTKRCARTAVVLVWVVSAAVSFAPIMSQWWRV
    GADAEAQRCHSNPRCCAFASNMPYVLLSSSVSFYLPLLVMLFVYARVFVVATRQLRLLRG
    ELGRFPPEESPPAPSRSLAPAPVGTCAPPEGVPACGRRPARLLPLREHRALCTLGLIMGT
    FTLCWLPFFLANVLRALGGPSLVPGPAFLALNWLGYANSAFNPLIYCRSPDFRSAFRRLL
    CRCGRRLPPEPCAAARPALFPSGVPAARSSPAQPRLCQRLDGASWGVS


    Mirabegron is a synthetic chiral small-molecule, with a molecular weight of 396.51 Da, a AlogP of 2.26, 4 hydrogen bond donors and 5 hydrogen bond acceptors, and thus fully rule-of-five compliant. (IUPAC: 2-(2-amino-1,3-thiazol-4-yl)-N-[4-[2-[[(2R)-2-hydroxy-2-phenylethyl]amino]ethyl]phenyl]acetamide; Canonical Smiles: C1=CC=C(C=C1)[C@H](CNCCC2=CC=C(C=C2)NC(=O)CC3=CSC(=N3)N)O; InChI: InChI=1S/C21H24N4O2S/c22-21-25-18(14-28-21)12-20(27)24-17-8-6-15(7-9-17)10-11-23-13-19(26)16-4-2-1-3-5-16/h1-9,14,19,23,26H,10-13H2,(H2,22,25)(H,24,27)/t19-/m0/s1)

    The recommended starting dosage of Mirabegron is 25 mg once daily, with or without food, and is effective for 8 weeks. Depending individual patient efficacy and tolerability, the dose may be increased to 50 mg once daily.

    Mirabegron has a bioavalibity of 29% at a dose of 25 mg, which increases to 35% at a dose of 50 mg, a volume of distribution (Vd) of approximately 1670 L and a moderate plasma protein binding of ca. 71%. Mirabegron is metabolized via multiple pathways involving dealkylation, oxidation, glucuronidation and amide hydrolyis. Studies have suggested that although CYP3A4 and CYP2D6 isoenzymes play a role in the oxidative metabolism of Mirabegron, this is a limited role in the overall elimination. In addition to these isoenzymes, the metabolism of Mirabegron may also involve butylcholinesterase, uridine diphospho-glucuronosyltransferases and alcohol dehydrogenase. Two major inactive metabolites were observed in human plasma and these represent 16% and 11% of the total exposure. Mirabegron total clearance (CLtot) from plasma is ca. 57 L/h, with a terminal half-life of approximately of 50 hours. Renal clearance (CLR) is approximately 13 L/h, which corresponds to nearly 25% of CLtot. The urinary elimination of unchanged Mirabegron is dose-dependent and ranges from ca. 6% after a daily dose of 25 mg to 12.2% after a daily dose of 100 mg.

    The license holder is Astellas Pharma Inc. and the full prescribing information of Mirabegron can be found here.

  • ChEMBL Is Alive! Part 1 - posted by Louisa


    'ChEMBL Is Alive' is to show that ChEMBLdb is a living database that is constantly being worked on by a number of people. As the Chemical Curator for ChEMBL, I (Louisa Bellis) thought it would be interesting for our Blog readers to find out what goes on behind the scenes at 'ChEMBL Towers' and to get regular updates on what we are doing to the data between releases and in response to user emails sent to chembl-help@ebi.ac.uk.

    As well as being the chemical curator, I also deal with most of the help-desk traffic, where users can email in and let us know of any errors that they may have found, or even to suggest an improvement or enhancement for the interface.

    As an example of the work that is done to ChEMBL on an ongoing basis, I thought it would be good to give a brief summary of some of the chemical curation that occurred during the month of June 2012:

    An external user pointed out to me that they had come across a 'few' compounds that had the same canonical SMILES string, but had different standard InChI strings. I created a spreadsheet of these duplicate SMILES, which came to a whopping 967 lines. Of these, just over 100 lines were due to E/Z isomerism, some needed to be merged for being incorrect and the rest were checked individually to see why the SMILES were the same. It turned out that there was an issue with the molfiles so each of these compounds was redrawn from scratch. This came to 1,112 compound redraws in all which will be loaded into ChEMBL as soon as possible and will be visible to external users in the ChEMBL_15 release (expected end of November 2012).

    I also started working on a list of duplicate names in the ChEMBL database. This was to support my own work flow and not suggested by our users - it created a list of 9,952 duplicate names. However, not all duplicate names are actual duplicates that need to be merged together, they can simply have the same simple chemical name that is not reflecting that they are enantiomers of each other. This work is still ongoing, but I have been able to redraw and merge about 100 compounds as a direct result of this list. I am only about 10% of the way through this spreadsheet, so I can say that it will keep me busy for a little while yet.

    In June, we also received two emails from users to let us know that they had found what they believed were errors. In one case, the units had been incorrectly extracted from the paper as nM, when they were in fact uM. Upon checking the paper, I could see where the confusion had arisen. I could see that it had one table where they displayed uM and all the rest of the tables were nM, so the extractor had not seen this difference. These have now been fixed and will be visible in ChEMBL_14 (due for release end of July 2012).

    The other email I'll mention here was to do with target assignment, where we had assigned a target to some data, and the user had read the paper and believed that the data was incorrectly assigned. This is still being checked by our biological curator, but if found to be incorrect, will be changed immediately in the database.

    These are both great examples of users helping us to improve the quality of data in ChEMBL.

    I hope to add more curation information in the future, but if there is anything specific that you would like to see me blog about (relating to curation or error checking) then please let me know.

  • Access to ChEMBL web services via workflow tools



    As some of you may know, besides the ChEMBL web interface and the SQL dumps, there is a another way to access and retrieve data from your favourite public database, namely the RESTful web services. We have already provided API examples there using Java, Python and Perl but, as of today, we also provide examples for popular pipelining / workflow tools, such as Pipeline Pilot and KNIME.


    The user base of such tools is certainly growing, as they offer modularity, flexibility, transparency and higher integration and sharing capabilities compared to programming or standard software packages. In fact, demand for tighter integration between ChEMBL and these tools was one of the outcomes of our recent Workflows workshop. Indeed, using the web services via a workflow tool is a seamless way to search, retrieve, integrate and analyse data without having to install, maintain and update local databases or write, dreaded for some, SQL queries.


    We have posted some useful examples on the Pipeline Pilot and KNIME community fora, which are available to download.

    If there is interest expressed in the comments below, we can also provide a webinar, say on Tuesday 31 July 2012 at 3pm.

  • Seminar: Discovery of Viagra/Revatio (aka sildenafil aka UK-92480)






    A reminder of an on campus, open, seminar from Andy Bell (now at Imperial College), detailing the discovery and development of UK-92,480 (also known as sildenafil and even better known as V1agra and R3vat10). Andy was one of the medicinal chemists and inventors on the PDE-5 inhibitor programme at Pfizer, and the story covers many aspects of drug discovery including, of course, the discovery of the side effect, and also one where the pharmacology led to many new molecular insights into NO signalling and PDE biology.

    There are many myths about the discovery of V1agra, so this is a rare opportunity to hear the exciting story first-hand.


    Here's Andy's abstract....


    "Viagra™ (sildenafil) is a unique example of a chemical tool being used to
    discover the linkage between a biological mechanism and a disease through
    clinical trials. The presentation will describe the discovery of sildenafil
    and its use in defining the role of cGMP phosphodiesterases (particularly
    PDE5) in human diseases such as Male Erectile Dysfunction (MED) and Pulmonary
    Arterial Hypertension (PAH). These clinical studies, combined with the
    discovery of additional PDE isoforms, were used to define a desirable profile
    for subsequent 2nd generation PDE5 inhibitors. The impact of structural
    biology and high throughput screening on the discovery of further clinical
    candidates will also be discussed."



    The seminar  is on Tuesday July 17th 2012 at 2pm in room M203 (room change alert for those who put it in their diary earlier) - you will need to mail me in order to get registered with campus security if you don't work on campus. If you do, you can just turn up.


    Andy will also be giving a more detailed technical seminar on screening file analysis, diversity, chemical space, etc - again, let me know if you are interested in attending this too......

  • The Winner of the 900th Follower of @Chembl Competition!



    As Frankie Howerd used to say as one of his catchphrases - A funny thing happened on the way to the forum....

    We ran a twitter competition for the 900th follower of the @chembl twitter feed. Nothing too crazy, just a few sticks of the fabled ChEMBL rock as a prize (I need to get rid of these, since the 'eat by' date is pretty close. If the attendees of our Computational Approached to Drug Discovery Course being held next week, return home with any teeth, it will be a marvel).

    I am pretty disciplined in pruning spambots from our twitter followers, and if I had not done this, we probably would have been well over a thousand by now - so getting to 900 'real' people was a pretty big event for us.

    Anyway, I listed the competition online, and the twitterverse was then on tenterhooks - in reality, this sort of thing should slow down followers following, (if the followers are aware, or care), since if you see there are, say, 894 followers, and you really want the stick of rock; then wait - and we did see a slow down in accumulation of followers. Here is the announcement of the competition.....


    I really did see Noel Gallagher at Heathrow, this was pretty cool - he saw a fat guy in front of him in security, struggling with his belt and shoes - I don't think he tweeted that fact though. I'm drifting, so back on topic, in the twitter stream above is the announcement of our competition.

    I regularly monitored the followers as we got closer to 900, and one morning, while I was in Andover NH, I saw someone had won the prize!


    It was only Grant Shapps, a proper Minister of State and full Cabinet Member in the current UK Government. Wow! I imagine, Grant must have been watching, waiting for some fool to become follower 899, then pouncing to claim the honour. All I can say is, "congratulations" to Grant, and the prize is in the post - but when you feel the envelope it does feel a little odd - a little bit like a st*ck of d*nam*te; I'm sure they x-ray everything to the Houses of Parliament, so I hope it doesn't go missing in the post room! Also make sure Willetts or Cable see you in The House with our sweetie!


    We clearly will have something very special planned for the competition of the 1,000th @chembl follower - Lady Gaga and Barrack Obama, are you ready?