ChEMBL blog

Some forthcoming schema changes in ChEMBL 15
02 Jan 2013

Happy New Year!

We are making some necessary changes to the schema in the forthcoming ChEMBL 15. These will almost certainly break scripts, or lead to changes in search results if you have developed applications based on the previous ChEMBL schema.

If you are likely to be affected by these changes please consider the following actions.
- Review the Release Notes for previous releases of ChEMBL, where these changes were outlined.
- Watch the ChEMBL-og for further announcements. We will run a webinar on the key elements of the schema change and the way it is likely to affect data integration.
- Sign up to the chembl-announce mailing list.
- Let us know that you do develop applications using ChEMBL, and we will try and consider any specific dependencies you may have.
- Consider the likely impact of the changes prior to scrubbing your previous install and cussing us when it doesn't work out-of-the-box ;)
jpo
Paper: Objective assessment of cancer genes for drug discovery
02 Jan 2013

There's a great paper just published in NRDD on the analysis of potential drug targets using an objective, evidence-based approach, it makes extensive use of ChEMBL data, alongside specific cancer-related datasets, pathway and interaction data and so forth. There is also some good discussion of drug-repurposing informatics approaches.

A link to the pdf of the paper is here.
```
%T Objective assessment of cancer genes for drug discovery
%A M.N. Patel
%A M.D. Halling-Brown
%A J.E. Tym
%A P. Workman
%A B. Al-Lazikani
%J Nature Reviews Drug Discovery
%V 12
%P 35-50
%D 2013
%O doi:10.1038/nrd3913
```
Disclosure - one of the authors of the above paper (Al-Lazikani) is my wife.

jpo
Structural Annotation of Ligand Environments
30 Dec 2012

Some time ago, I developed a program called joy - it's purpose was to map onto a protein sequence the local physical environment of a residue, but the main aim was to build some environment specific substitution tables for residues that were useful in structure validation, distant fold detection, and so forth. Finally it led to the generation of a database of curated (at least initially) protein structure/sequence alignments (HOMSTRAD - the paper for this has been cited 415 times now, so at least a few people use this). Some of these alignments were used as early seed profiles for pfam, so for me it's been pretty satisfying to see these things taken up up the community. Several students and postdocs in the Blundell group extended and maintained this initial approach. Once you get used to looking at joy annotated structures and alignments it really is a great way of identifying structurally important residues - I use it for the regular GPCR posts on the ChEMBL-og for example.

Anyway, enough of living in the past - for it is almost a Brand New Year, and we want new stuff!

So, I was thinking this morning over coffee about what would be the ligand equivalent of this environmental annotation from a 'Ligand's Eye View' - and it's pretty simple I think, but likely to lead to some maybe interesting things - I also think the idea has enough legs to be a suitable project for a PhD project - and since I'm taking on a student next year (my last one at EMBL - how time flies!) I think this will be one of the ideas floated amongst the applicants.....

Joy stored just a couple of very simple robust descriptors of residue environment - 1) Solvent Accessibility 2) hydrogen bonding (donor and acceptors) and 3) Secondary structure. Of these the first two have straightforward analogies for a small molecule, at an atom level, and also explained most of the variance in residue conservation. The other big advantage is that this sort of approach can be used on a single example - you don't need an active ensembl (or set of evolutionally related sequences) to build some predictive models.

So the basic idea is to....
1. Build an explicit hydrogen form of all small molecule ligand complexes in PDBe. This is not trivial - there's tautomers, ionisation states (and pH of crystallisation), etc. as complicating factors required to do a good job. The right way to do this will be InChIs (as opposed to the normal, for us, Standard InChIs) - this in itself will form the basis of a good set of training data for evidence-backed tautomer identification/prediction.
2. Write the code to annotate the atomic/fragment environments, and adopt a standard to allow interchange of this data, probably in a molfile type format (or maybe as a custom stylee layer in InChI itself). There's some Fortran for this already, but it's probably time to embrace something more modern, like Ada, or some other newfangled language ;)
3. Incorporate this atomic/fragmental environment into fingerprint approaches (either a simple mask filter, or weights, depending on the class/number of interactions an atom, bond-path makes - since these are likely to be the structural features (in the image above the bits of the orange blob in contact with the blue blob).
4. Train the method on the extensive sets of ligands in ChEMBL - and as a first use case develop an approach to take a protein-ligand 3-D complex - perceive the SAR 'sensitive' and 'relaxed' positions around a ligand and do some virtual screening.
5. Build some fingerprint profiles for targets, based on this (hopefully) enhanced view of the sort of features required for a specific target site (as sublimed into the interaction fingerprints of some clusters of complementary ligands in the first instance).
6. Put some fancy stuff into something like LigPlot - or a program like this, depending on licensing....
7. There's some other stuff that's pretty obvious to those who work in the field once you've got this data.....
Success for me would be to have a method to address the magic methyl problem. If we start on this project, we'll keep you posted.

jpo
Visualisation of a Dynamic Hierarchy
28 Dec 2012

Just doing some year end things, before starting on the backlog over the weekend that has just magically accumulated somehow. I can feel so many New Year's resolutions coming on!

Anyway, I came across a cool visualisation on Teh Internets of the growth/changes in an organisation, but the visualisation can probably be applied to a lot of areas where there is some hierarchical organisation - imagine a timeline of scaffold representation in the med. chem. literature compounds, or targets for drugs over time (including first and second generation agents, etc.), or patents, or .....

Here's a link to the YouTube video.

The original source is here, from the great, and new to me, Flowing Data site.

jpo
Pipeline Pilot Cambridgeshire UGM
17 Dec 2012

We will be organising the 2nd Cambridgeshire Pipeline Pilot Users Group meeting on Thursday 17th January 2013, at 3pm here at the ChEMBL HQs. This is provided that the Mayans were actually wrong.

This is a preliminary agenda for the meeting:

1. Welcome and Host talk: George Papadatos + Gerard van Westen:

      Cool things with Pipeline Pilot and ChEMBL

2. Peter Woollard (GSK):

      Using Pipeline Pilot for computational biology capabilities, where it has helps the most and where it is less used

3. Richard Carter (ONT):

       Pipeline Pilot on a memory stick

4. Mike Cherry (Accelrys):

        Repetitive Data Flow

5. Question and Answer session, including:

   - how people have found Next Generation Sequencing components and the Text Analytics components

   - using Pipeline Pilot for running command line software on remote linux servers and retrieving results

6. Adrian Stevens (Accelrys)

      Upcoming chemistry components in PP9.0

If you fancy attending, drop me a line.

George

Paper: Mapping small molecule binding data to structural domains

14 Dec 2012

Our interacting domains paper is out in pdf form. Here's the link.

%T Mapping small molecule binding data to structural domains
%A F.A. Kruger
%A R. Rostom
%A J.P. Overington
%J BMC Bioinformatics 
%D 2012
%V 13(Suppl 17)
%P S11 
%O doi:10.1186/1471-2105-13-S17-S11

jpo

USAN Watch: December 2012

13 Dec 2012

The USANs for December 2012 have recently been published.

USAN

Research Code

Structure

Drug Class

Therapeutic class

Target

delantercept

ACE-041

immunoadhesin

therapeutic

TGFbeta

imilecleucel-t

cellular immunotherapy

therapeutic

n/a

palifosfamide tromethamine

ZIO-201T

synthetic small molecule

therapeutic

n/a

regorafenib

BAY-73-4506

synthetic small molecule

therapeutic

VEGFR2 TIE2

jpo

Paper: Automated design of ligands to polypharmacological profiles
12 Dec 2012

Another great paper in Nature this week, making extensive use of ChEMBL. It's by our long-term collaborators up at Dundee - Jeremy, Richard and Andrew - well done, great stuff! Basically it combines a knowledge-base of SAR data (ChEMBL), some predictive models for affinity/properties, and extracts a set of reasonable transforms (chemical conversions) from the same knowledge-base. I'll ask Jeremy/Andrew to do a guest post on the ChEMBL-og on the paper - they're probably pretty busy with press-releases, etc. ;)

Here's a link to the paper.

Have a read, it will keep you busy for a few hours.
```
%A J. Besnard
%A G.F. Ruda
%A V. Setola
%A K. Abecassis
%A R.M. Rodriguez
%A X.-P. Huang
%A S. Norval
%A M.F. Sassano
%A A.I. Shin
%A L.A. Webster
%A F.R.C. Simeons
%A L. Stojanovski
%A A. Prat
%A N.G. Seidah
%A D.B. Constam
%A G.R. Bickerton
%A K.D. Read
%A W.C Wetsel
%A I.H. Gilbert
%A B.L. Roth
%A A.L. Hopkins
%T Automated design of ligands to polypharmacological profiles
%J Nature
%D 2012
%V 492
%P 215-220
%O http://dx.doi.org/10.1038/nature11691
```
```
What will you do today with ChEMBL?
```