ChEMBL blog

Amgen Scholars Program
06 Dec 2011

Due to a lot of wasted time in the past, my lab does not host or support applications to the Amgen Scholars Program, so please do not apply to us!
Papers: MIRIAM and Identifiers.org
06 Dec 2011

The NAR Database Issue is currently in full flow, and there are many excellent articles; one important one for ChEMBL is this paper from the group of our good friend Nicolas Le Novere, at the EBI. It addresses a really important problem in biological and chemical data integration through the generation of unique and stable identifiers for records in a data collection – these are MIRIAM identifiers (MIRIAM is an acronym for Minimum Information Required in the Annotation of Models Registry (http://www.ebi.ac.uk/miriam). Identifiers.org is a new service (http://identifiers.org) that is built upon the information stored in the MIRIAM Registry and which provides directly resolvable identifiers, in the form of Uniform Resource Locators (URLs). Resources such as this are essential components for ad hoc, distributed queries across disparate data sources, and a core component for semantic web development.

A link to the free pdf of the paper is here.

ChEMBL is in identifiers.org, give it a go!
```
%J Nucleic Acids Research
%D 2011
%O doi:10.1093/nar/gkr1097
%T Identifiers.org and MIRIAM Registry: community resources to provide persistent identification
%A N. Juty
%A N. Le Novere
%A C. Laibe
```
MIPTEC 2012 and EMBO Chemical Biology 2012
05 Dec 2011

A quick post to say that two of the best conferences in Europe next year are coordinating speakers and schedules for the Drug Discovery sessions to allow a really great line-up of talks and speakers, and to allow the best use of your (always too small!) travel budgets for conference attendance in these economically tough times. More later, but pencil both conferences into your diary now!

MIPTEC 2012 - September 24th to 27th 2012, Basel, Switzerland
EMBO Chemical Biology - September 26 to 29th 2012, Heidelberg, Germany.
Cape Town
03 Dec 2011

I've just spent a great week in Cape Town, at UCT, visiting the lab of Kelly Chibale; where there's lots of activity in academic drug discovery, and also at the Institute of Infectious Disease and Molecular Medicine. My first time in Africa, and it won't be my last!
3,788 Thank yous!
02 Dec 2011

So a big and heart-felt three thousand, seven hundred and eighty eight 'Thank Yous' to all the benevolent donors to the EBI Movember Team - The Bioinformoustachians. It was a lot of fun, the place is a lot hairier now, and there are more smiles on faces than at the start of the month.

Mission accomplished by our International (hair)peace keeping force - The Victoria and George Crosses (respectively) are awarded to Francesco Iorio (sgt. 8th Italian light foot) and Remco Loos (cpl. 4th Dutch commando) for their epic struggle and hand-to-hand combat for first place in the fund-raising league table, and a 'mention in despatches' is presented to Nicolas Le Novére (lt. 1st French prancers) for being the 'Top Gun' amongst the faculty. The platoon is now returning home, and will soon be changing back into 'civvies' (much to the relief of their partners and family).

If any other large science data centres want to make a fight of it for Movember 2012 - bring it on, we are waiting!

PS, it's still not too late to donate (up to December 8th 2011) at http://mobro.co/ebi
ChEMBL 12 Released
01 Dec 2011
We are pleased to announce the release of ChEMBL_12. This latest version of the ChEMBL database contains:
- 1,222,969 compound records
- 1,077,189 distinct compounds
- 596,122 assays
- 5,654,847 bioactivities
- 8,703 targets
- 43,418 documents
- 7 data sources
This release includes updates to the manually extracted Medicinal Chemistry literature, a number of published ADMET datasets (for metabolic enzymes and various transporters), updates to OrangeBook drug approvals and a partial update from PubChem BioAssay. ChEMBL_12 also contains a new deposited dataset of Malaria liver-stage screening data from Novartis-GNF (for more details on this dataset, please see ChEMBL-NTD website and the recent publication in Science).

Please refer to the ChEMBL_12 release notes for a more detailed description of all changes included in this release.

You can download the data from the ChEMBL ftpsite: ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/
Paper: Drug Repurposing from an Academic Perspective
28 Nov 2011

There is a great paper in Drug Discovery Today: Therapeutic Strategies reviewing experience in a large number of drug repurposing experiments performed at one institute - the University of New Mexico. There is a lot of interest in the potential of drug repurposing, many papers discussing various theoretical approaches, discussion of IP, etc, but relatively few present real-world, worked examples. The well-known examples tend to be the successful ones, and have now become almost cliches - Sildenafil and Thalidomide for example. Discussion of the challenges encountered is far less common, and this paper covers a portfolio of projects, identified by a mix of empirical broad screening, virtual screening, or hypothesis-led research - a summary of each of them w.r.t. their status and progress is then outlined - finally some conclusions are presented - the most valuable and sobering ones being the limiting factors.

A further feature of the paper is the sheer complexity (and cost) of this sort of work if tackled seriously with a genuine translational science approach, as opposed to a list of hits from an in vitro/virtual screen, that are proposed for testing.

For those of you dying to know the 'punchlines' on the current challenges (as one might expect, data availability is a major component behind all of these), these are:
- Dosing and safety
- Lack of integration with pharmaceutical sciences and toxicology
- Appropriate intellectual property coverage
```
%T Drug repurposing from an academic perspective
%J Drug Discovery Today: Therapeutic Strategies
%A T.I. Oprea
%A J.E. Bauman
%A C.G. Bologa
%A T. Buranda
%A A. Chigaev
%A B.S. Edwards 
%A J.W. Jarvik
%A H.D. Gresham
%A M.K. Haynes 
%A B. Hjelle 
%A R. Hromas
%A L. Hudson
%A D.A. Mackenzie
%A C.Y. Muller
%A J.C. Reed
%A P.C. Simons
%A Y. Smagley
%A J. Strouse
%A Z. Surviladze
%A T. Thompson
%A O. Ursu 
%A A. Waller
%A A. Wandinger-Ness
%A S.S. Winter
%A Y. Wu
%A S.M. Young
%A R.S. Larson
%A C. Willman
%A L.A. Sklar
%O doi:10.1016/j.ddstr.2011.10.002
```
Google Citation Wordcloud
27 Nov 2011

I am really schizophrenic over the importance and relevance of citations statistics - of course there are always great reasons why your citations are lower than they should really be, and why your peers citations are higher than they should be. However, it is the case that they are a necessary evil as an academic, and human psychology draws us in, like a single-minded light-obsessed moth to a candle flame, to comparing ourselves against each other in various ways. I used to use Thomson Reuter's Web of Science for this, but the tools to manage a personal list are odd and clunky, and the system seems to have recently been ported to an old 80286 with 4MB of RAM (the spec of my first personal Unix machine, with a built-like-a-tank American Megatrends Inc motherboard. Ahhh good-times with "Uncle Mark" and "Spock The Vulcan"); To counter this frustration, Google Scholar Citations has recently gone public, and it is great, easy tools to validate and merge publications, the ability to have private or public views on the data - the only thing that seems snafu'd is the adding of a link to a home page (it just replies that I need to add a valid link). I won't post a link to my citation page (it is public though), since they are, between you and me, shameful, since I wasn't able to publish the best stuff while I was in industry (damn, there I go again!).....

Anyway, Google Scholar Citations can be programmed against, and Google Plus alerted me to an R script that can draw a word cloud for co-authors and key-words for a given author.

The R source is here, but you may need to install some other packages to get it to run. The image above is the word cloud for all of my publishing career - all 23 years of it. It's interesting to see the protein structure and comparative modelling stuff as the strongest concepts, but this has also been the basis of a lot of the more recent work, including the database stuff. At the end of the day, for me, 3-D structure is the most satisfying and stimulating way to thing about ligand data and design.

I must admit I didn't know that R did word clouds, so expect a few more gibberish filled blog posts while I play with this!

PS If this post sounds like a eulogy and advert for Google, it most definately isn't, since they recently decided to shut down their University Research Programme to Google Search :( boo. hiss.