Structural Annotation of Ligand Environments


Some time ago, I developed a program called joy - it's purpose was to map onto a protein sequence the local physical environment of a residue, but the main aim was to build some environment specific substitution tables for residues that were useful in structure validation, distant fold detection, and so forth. Finally it led to the generation of a database of curated (at least initially) protein structure/sequence alignments (HOMSTRAD - the paper for this has been cited 415 times now, so at least a few people use this). Some of these alignments were used as early seed profiles for pfam, so for me it's been pretty satisfying to see these things taken up up the community. Several students and postdocs in the Blundell group extended and maintained this initial approach. Once you get used to looking at joy annotated structures and alignments it really is a great way of identifying structurally important residues - I use it for the regular GPCR posts on the ChEMBL-og for example.

Anyway, enough of living in the past - for it is almost a Brand New Year, and we want new stuff!

So, I was thinking this morning over coffee about what would be the ligand equivalent of this environmental annotation from a 'Ligand's Eye View' - and it's pretty simple I think, but likely to lead to some maybe interesting things - I also think the idea has enough legs to be a suitable project for a PhD project - and since I'm taking on a student next year (my last one at EMBL - how time flies!) I think this will be one of the ideas floated amongst the applicants.....

Joy stored just a couple of very simple robust descriptors of residue environment - 1) Solvent Accessibility 2) hydrogen bonding (donor and acceptors) and 3) Secondary structure. Of these the first two have straightforward analogies for a small molecule, at an atom level, and also explained most of the variance in residue conservation. The other big advantage is that this sort of approach can be used on a single example - you don't need an active ensembl (or set of evolutionally related sequences) to build some predictive models.

So the basic idea is to....


  1. Build an explicit hydrogen form of all small molecule ligand complexes in PDBe. This is not trivial - there's tautomers, ionisation states (and pH of crystallisation), etc. as complicating factors required to do a good job. The right way to do this will be InChIs (as opposed to the normal, for us, Standard InChIs) - this in itself will form the basis of a good set of training data for evidence-backed tautomer identification/prediction.
  2. Write the code to annotate the atomic/fragment environments, and adopt a standard to allow interchange of this data, probably in a molfile type format (or maybe as a custom stylee layer in InChI itself). There's some Fortran for this already, but it's probably time to embrace something more modern, like Ada, or some other newfangled language ;)
  3. Incorporate this atomic/fragmental environment into fingerprint approaches (either a simple mask filter, or weights, depending on the class/number of interactions an atom, bond-path makes - since these are likely to be the structural features (in the image above the bits of the orange blob in contact with the blue blob).
  4. Train the method on the extensive sets of ligands in ChEMBL - and as a first use case develop an approach to take a protein-ligand 3-D complex - perceive the SAR 'sensitive' and 'relaxed' positions around a ligand and do some virtual screening.
  5. Build some fingerprint profiles for targets, based on this (hopefully) enhanced view of the sort of features required for a specific target site (as sublimed into the interaction fingerprints of some clusters of complementary ligands in the first instance).
  6. Put some fancy stuff into something like LigPlot - or a program like this, depending on licensing....
  7. There's some other stuff that's pretty obvious to those who work in the field once you've got this data.....


Success for me would be to have a method to address the magic methyl problem. If we start on this project, we'll keep you posted.

jpo