PPI Library - Part 3

03 May 2012

It turns out that scientists and the rest of the world interpret 'PPIs' as very different acronyms - as the amount of spam comment filtering for Payment Protection Insurance I’ve had to delete shows. Anyway, life got in the way of science for a few weeks for me ( :( ), but some more of the PPI work is described here.

A very simple algorithm was applied to build a library of experimental peptide conformers. Firstly every tetra-peptide from a protein structure was extracted; one of these peptides was then taken as a seed for a conformational cluster, and subsequent tetra-peptides were fitted to this fragment. If the RMSD for the main-chain atoms was lower that a cutoff parameter, the original 'seed' fragment was taken as representative of that cluster. If the RMSD was greater than the cutoff, then a new cluster was established, and any subsequent tetra-peptides were fitted to both cluster representatives, and so forth. As more unique peptide conformers are seen (defined according to the RSMD cutoff) the number of clusters increases. Of course, the population of each column is stored - some conformers are really common (alpha-helix and beta-strand fragments) and others are rare/experimental errors.

At a large cutoff parameter, all tetra-peptides would cluster in the same set as the initial seed, and at a sufficiently small cutoff, then every tetra-peptide would be unique.

When applied to 2ptn (bovine trypsin, for deeply routed reasons my favorite PDB entry ever, and contain most features of globular proteins, secondary and super-secondary structure, turns, etc.) the following number of representative clusters were found, shown as a function of the RMSD cutoff. One way of thinking of this approach, is that the library can be though of containing every possible peptide conformation, at a given error/variation/resolution. So, it’s a sort of variable ‘resolution’ library. For 2ptn, you can see that the library complexity takes off below about 0.7 Angstrom RMSD. There is the asymptote at around 220, since this is about the number of residues in 2ptn.

There are a few tricks that need handling in the code, primarily in the treatment of peptides that span chain breaks in the protein structure - for this analysis, the four residues needed to be covalently contiguous (i.e. No internal chain breaks).

So, we now have a way of building a representative library of peptide conformers that we can think about suing as scaffolds for mimicking in our PPI library (as well as the mainchain donor/acceptor positions, we also have the C-alpha to C-beta vectors).

The next step is to extend this approach to a larger, more representative library of protein structures, let's use a validated (but ancient) paper for this.

%A U. Hobohm
%A C. Sander
%T Enlarged representative set of protein structures
%J Protein Science
%V 3
%P 522-524
%D 1994

Trivia: The photo above is of one of my sons, on mayoral voting day 2012, in a very wet London. You are never too young to learn about politics!

Update: Sorry the figures got barfed by the blogger software with a bad url, and got lost, so I've replaced them.