Comparative Screening File Analysis

Here is another oldie of ours. The basic idea is to use the scaffolds in StARlite (which are all known bioactives) to select a physical compound set for screening. Of course, there are important questions over diversity, the number of compounds, but all of that detail is for another day.

Indexing of a set of 2D structures with their scaffolds is a straightforward thing to do, although there are many methods to actually do this decomposition. We did this for StARlite, and you can then look at the frequency composition of scaffolds (a power law again, quel surprise) and then we came across a similar list of ranked scaffolds from Novartis (analysis by Ertl, Koch and Roggio) I haven't seen this in the primary literature, but it was in a lovely book I picked up in the foyer of Novartis on one visit). Why not compare these lists of scaffolds? This would show the enrichment or depletion of the most common scaffolds, and one could imagine using this to select compounds for future purchase to achieve file 'balancing', or to identify areas of biology/pharmacology/targets that the compound file is particularly suitable for. For the most frequent scaffolds, they tend to be simple rings, but further down the frequency list are more complicated systems. Many, many more applications can be thought of that are related to this basic underlying comparison.

The ranked scaffold list looks like this (so benzene is the most common, then pyridine, then piperidine, down to isoquinoline in rank position 35).

The comparison of StARlite and the Novartis screening file looks like this....

So, the Novartis file has more pyrimidines, morpholines and pyrazoles compared to StARlite, and is depleted in pyrrolidines, tetrahydrofurans, purines and tetrazoles. StARlite is probably a pretty good representation of the composition of the scaffold distribution for the entire industry.