Word Clouds - ChEMBL vs PubMed and some musing


Here's two word clouds, the upper one generated from the titles and abstracts from articles abstracted in ChEMBL, the lower one from the whole of PubMed. It's quite interesting to see how 'molecular' the words are in the ChEMBL cloud (compounds, inhibitors, binding, etc.) and are generally quantitative - but also how many words there are to do with variation (series, analogue, derivatives, selective, etc.). The PubMed cloud stresses clinical activities (patients, treatment, study, etc.) and arguably more trend/qualitative data.



These pictures made me reflect a little on what is needed to make the data we store in ChEMBL more 'translational' - we've made some steps towards this recently, with better coverage and accessibility of clinical stage and drug compounds, (ChEMBL 15 release notes for details) - but we need to go deeper in describing the effects of compounds in molecular systems in cellular and organism settings, in particular human patients - preferably individuals. This is related to some work we are doing in trying to map the assays in ChEMBL to various pathway resources - the trouble is that pharmacology seems to map poorly to the ways we represent pathways as a series of molecular interactions when you are actually interested in the emergent functional phenomena. I also now more strongly feel that we need to capture more early human PK, genetic and potentially biomarker data. We can't do this with our current funding, and we certainly can't do this alone, but we will see what we can do towards this.....