ChEMBL blog

Integration of a filtered set of PubChem Bioassay data into ChEMBL.
07 Sep 2011

A sub-set of the PubChem Bioassay data has been integrated into ChEMBL.

How is this sub-set defined ?

In PubChem, depositors may assign multiple result types to an assay. However, if an assay is deposited as a ‘confirmatory’ assay (defined as an assay where a range of SID concentrations have been tested, with a view to determining a measurement of potency), then one of the result types must be marked up as an ‘Active Concentration’ (AC) result type. Panel assays may contain many ‘AC’ result types, one per panel member. The AC result type is the calculated potency measurement from the data, and is typically an IC50, EC50, AC50, GI50 or Ki. In addition, the PubChem deposition process requires that each SID in an assay must be assigned a single ‘Activity Summary’, from a controlled vocabulary which includes ‘inactive’, ‘active’ and ‘inconclusive’.

Only assays containing ‘AC’ result types have been integrated into ChEMBL, and from these assays, only activity data and SIDs associated with ‘AC’ result types have been integrated. The ‘Activity Summary’ field in PubChem associated with each integrated activity is also captured and shown in the ‘Activity Comment’ field in ChEMBL. Panel assays are divided into separate assays in ChEMBL, one ChEMBL assay for each panel member.

How are structures normalized ?

An automatic ‘standardization’ of SID structures downloaded from PubChem is carried out prior to integration (using in house protocols). Standard inchis are generated from the standardized mol files, and used to normalize with existing ChEMBL structures. SIDs matching exactly on standard inchi to existing ChEMBL structures are assigned to the existing CHEMBLID (and the mol file already associated with the existing ChEMBL structure is used to represent the searchable structure for this CHEMBLID). Where no match to a standard inchi is achieved, the incoming SID is assigned to a new CHEMBLID, and the standardized mol file for the SID is used to represent the searchable structure. A very small number of SIDs (<0.1%) with standardized mol files that fail to produce valid standard inchis, or to load into a oracle symyx cartridge without errors, are each assigned a new CHEMBLID, and associated with a ‘null’ structure (ie: no mol file is associated with this new CHEMBLID).

How frequently is the integrated data updated ?

Updates are carried out every ChEMBL release cycle.

How are targets mapped ?

Mappings to ChEMBL targets for each integrated PubChem assay has been automated for the initial load. However, manual review of these mappings by expert curators may result in ongoing changes.

How do I filter my query results to exclude or include various data sources ?

Users who prefer to exclude the integrated PubChem data (or any other integrated external data set) from their ChEMBL web-interface searches can do so by clicking ‘Activity Source Filter’ next to the main ChEMBL search bar, and deselecting the sources not required in future searches. Note, however, that these deselections persist between browser sessions. Users querying ChEMBL database dumps directly using SQL, and wishing to achieve this same filtering, should inspect the ‘source’ table, and the foreign keys to this table in the ‘assays’ and ‘compound_records’ tables.
Visiting Speaker - Lipinski 4th October 2011, Wellcome Trust Genome Campus
06 Sep 2011

Chris Lipinski, of Rule of Five renown, is visiting the Genome Campus on Tuesday 4th October 2011 - he will give a talk entitled 'What is the Chemistry in Chemical Biology?'. The talk will be at 4pm. The presentation is open to all, and if you're from off campus, I'll need to arrange access for you (mail me to arrange this).

Please note - it's not possible to broadcast this talk over the web.
GPCR Structures: A2a Receptor with three inverse agonists
04 Sep 2011

The latest GPCR structure, published in Structure, details three ligand complexes, ZM-241385, a xanthine (XAC) and caffeine. In addition to the interpretation of the complex structures, the paper additionally the use of thermostabilisation as a strategy to generate crystallisable forms of GPCRs. Of more populist interest is, for the first time we can see the molecular basis of one of the most widely used stimulant drugs in action - caffeine.

Coordinates are accessible at PDBe:3pwh (ZM-241385 CHEMBL113142), 3rey (xanthine amine conjugate, XAC CHEMBL273094) and 3rfm (caffeine CHEMBL113).
```
%T Structure of the Adenosine A2A Receptor in Complex with ZM241385 and the Xanthines XAC and Caffeine
%A A.S. Dore
%A N. Robertson
%A J.C. Errey
%A I. Ng
%A K. Hollenstein
%A B. Tehan
%A E. Hurrell
%A K. Bennett
%A M. Congreve
%A F. Magnani
%A C.G. Tate
%A M. Weir
%A F.H. Marshall
%J Structure 
%D 2011 
%O doi:10.1016/j.str.2011.06.014
```
Recruitment - Postdoc at the ICR in Computational Biology
03 Sep 2011

From the lab of one of our collaborators comes this job opportunity.

They are seeking a skilled, independent computational biologist with experience of large, multidisciplinary data analysis and programming to develop and apply novel computational techniques to support our cancer drug discovery efforts. They will take the lead in the development and dissemination of the integrative cancer research platform, canSAR.

The position will involve programming, in-silico research in identifying novel cancer therapeutic targets as well as the opportunity to mentor PhD students. The successful applicant will likely have a PhD in computational biology or related discipline, be adept in programming and large scale data analysis. Knowledge in the areas of database architecture, statistics or chemogenomics is advantageous.

Further details, including salary and contract term are available here.
Todays Found Natural Product: 3: Naringin/Bergamottin
01 Sep 2011

Your body is just full of carefully evolved and tried and tested systems to stop you taking bad or potentially damaging things into your body - smell receptors for the amines of rotting meat, visual receptors and a fantastic pattern-matching brain to allow you to see maggots, fungus, (blue color even in food is a usually a warning sign) etc. and taste receptors to warn of potentially toxic molecules. Drugs are highly regulated, quite rightly, and careful testing needs to be done to assess the safety of drugs.

Why, oh, why then, do we sell grapefruits without regulation and a license?

Terfenadine was a drug that was withdrawn following safety concerns, these side effects were sometimes induced by grapefruit consumption. Many, many, prescribing information sheets for drugs advise that there may be side effects and drug interactions with grapefruit - due to the prevalence of CYP3A4 mediated metabolism of the majority of current small molecule drugs. There is also an active internet (and scientific) literature in the use of grapefruit (or extracts) to exploit this CYP450 inhibition activity as a way to boost or extend plasma levels, or allow the use of lower doses of potentially life-saving drugs, and even drugs in a recreational setting. There is even a biotech established to develop drugs for this effect (but despite the wonders of Google I was unable to find it today (so maybe it is not in business any more)).

It is interesting to see grapefruit pitched as variously a 'wonder food' (health food stores, alternative therapists, agricultural organisations) or a potentially dangerous foodstuff to be consumed with care (many physicians and scientists - an example). Certainly, I always tell my mother, who loves the 'devils fruit', to tell her doctor that she frequently eats grapefruit whenever she gets a new prescription. A search of google (today, shows 3,030,000 hits for 'dangerous grapefruit' and 8,110,000 hits for 'healthy grapefruit', so a lot of opinions out there).

For the UK readers out there, the Daily Mail newspaper is renown for publishing conflicting health stories - here's what they say about grapefruit.

There are a number of compounds in grapefruit that are potent CYP inhibitors, including Naringin

and Bergamottin.

Well, the Florida Department of Citrus have provided some assurance. However, I'm still a little worried, I think I'll hold off until the 'Industrial Federation of Bergamottin and Naringin Producers' tells me it's fine ;)

The cartoon at the top was harvested from www.xkcd.com which is probably a site that is not safe for work (nsfw) - unless your work involves browsing cartoons with occasional adult humor.
RSS feed of Drug Approval Monographs
31 Aug 2011
An RSS feed has been set up for the ChEMBL-og NME Drug Approval Monographs.
Trade names and USANS to company and compound mapping
31 Aug 2011

The role of non-proprietary names (INNs, USANs etc.) and the maintenance of a synonym list for drugs under development is central in the accurate retrieval of information from the web. The USAN and INN process is well managed and documented, but it is possible to look a little bit further into the future with some simple web searches, as these examples show.

Firstly USANs under consideration - these are posted here. The therapeutic application of these can often be determined from the grammar of the name. So what compounds are they? Well a simple way to potentially look for these is with a simple whois lookup on the internet domain for INN_under_consideration.com. So using this simple approach one can speculate (reasonably) that golvatanib is Eisai's VEGFR2 and cMet inhibitor E-7050, currently in phase 2 trials. Quite a few non-proprietary domains are snaffled up by various domain registry companies, or held anonymously for their true owners - but the approach works for quite a few.

Secondly, Tradenames - well, the reference trademark site for the US is the United States Patents and Trademarks Office (USPTO). There are some excellent search tools, and one way of using these in the context of drug names is to search for the developer name, and then generating a list of potential candidates, again the field of use allows some association to be often made between a new drug. For example, searching with Pfizer as the "Owner" gives a list of trademarks including Xalkori (the tradename for the newly launched anti-cancer drug Crizotinib), and a number of alternative homonyms (these could be defensively filed, or for different territories). Of course, making a definitive assignment is difficult, and things can change, but these names could become quite useful when chained and mosaiced with other synonyms.

So, there's some simple things, it gets more interesting when you start to look at dates of filing, and so forth; but that is for another day, and a lot more analysis.

Disclaimer - the links above are fragile, they will change over time and may well have session ids, that won't work for you!

MIABE - Minimum Information about a Bioactive Entity

31 Aug 2011

Minimum Information Standards are an important feature in many aspects of science, and there is a rich history of the success of these in encouraging data interoperability across scientific resources and data analysis. An opinion paper has just been published in NRDD, that discusses bioactivity data. The paper itself seems to be open access (from my hotel room at least) - the link is here.

%T Minimum information about a bioactive entity (MIABE)

%J Nature Reviews Drug Discovery 
%V 10
%P 661-669
%D 2011
%A S. Orchard
%A B. Al-Lazikani
%A S. Bryant
%A D. Clark
%A E. Calder
%A I. Dix
%A O. Engkvist
%A M. Forster
%A A. Gaulton
%A M. Gilson
%A R. Glen
%A M. Grigorov
%A K. Hammond-Kosack
%A L. Harland
%A A. Hopkins
%A C. Larminie
%A N. Lynch
%A R. K. Mann
%A P. Murray-Rust
%A E. Lo Piparo
%A C. Southan
%A C. Steinbeck
%A D. Wishart
%A H. Hermjakob
%A J. Overington
%A J. Thornton
%O doi:10.1038/nrd3503