A brief post of a few thoughts about testing for chemical novelty, especially in the context of patent filing. It's a little bit odd, but interesting.
The concept of 'chemical novelty' is core to the filing of patent protection on pharmaceuticals, and as part of most patent filings, checks are done but the inventor to ensure that the invention is actually novel. People will search patent databases (historically these were largely commercial, but public resources such as PubChem & SureChEMBL also now contain significant amounts of patent derived data). They will then also need to search non-patent databases, since lots of chemicals are published without patents being filed. There is a good and accessible overview of the field and some databases here.
These datasources, and even more so the workflows, are heterogeneous, fragmented, and the broader the search the more expensive it becomes. As a general rule, the resources that are built from the patent literature are well designed around the date of disclosure/publication of a chemical structure, public resources a lot less so - if at all. There is actually also another two or three times when this novelty checking is important - firstly during patent examination - where a patent examiner has a short amount of available time to perform novelty checks, and these checks are of course relative to the filing date of the invention; thirdly by lawyers/other scientists who may try and wriggle around the constraints of the patent, trying to invalidate it by showing that the compound wasn't novel after all. Because often there are very large sums of money involved, people can become very determined and creative in looking for such 'prior art'! Publication can be anywhere in the world, so this adds to cost and complexity yet further.
Imagine now a free public, novelty checker, with 'strong' time-stamping of first 'publication' for all structures, and also great tautomer searching, correct treatment of parents/salts (in the field of patents, salt forms are often central to actual product properties and so are sometimes critically important). Go on, close your eyes and imagine just this, for a moment - feels good doesn't it?
There are a number of basic problems with implementing such a system, despite the huge cost savings and efficiencies in innovation it would no doubt bring. It would need to be done by a 'trusted third party' (so no one could pay to retrospectively add a compound with a retrospective publication time), and validated in some way (so the timestamps are 'provable' in some way - there are now cool informatics approaches to this). It would need to contain all previously 'published' compound structures, and have great internal provenance tracking (so where and when was the original source of this structure). It would also need to be big, probably of the order of a billion or so structures - this alone would make such a system out of reach for the vast majority of organisations. Of course, I am not considering Markush structures in this discussion - good luck with reliably enumerating these from patents, for the moment at least, but eventually you'll need to consider these as well!
Lurking here also are the GDB databases, the elephant in the room, which in a few years could make the discussion of chemical novelty moot.
Remember that nice fuzzy feeling you had a few paragraphs back, it's gone, hasn't it? Welcome to the real, painful world! UniChem has some elements of such a novelty checking system - at least it's possible to establish a snapshot of it's local chemical world at some arbitrary time point in the past, according to it's own reference frame. Since you are only interested in identical structures, it can already do the required searches - no need for Tanimoto, substructures etc for this particular use case. Maybe it needs some work on scalability, maybe it needs from work on Proof of Knowledge, etc.; maybe. But it's an interesting place to start thinking. At some point in the future, there will be the ability for you to run your own local instance of UniChem, with regular feeds of SureChEMBL structures, merged PubChem structures, etc. It's interesting to pose the question, just how much of the exemplified chemical universe be catered with reasonable investment on this problem?
For me, there's a lot of really interesting and deep technical challenges here, but also the potential to radically change the cost structure of chemical (and specifically drug) invention, freeing more investment for the discovery process itself (yeah, i am an idealist).
The picture above is of a novel, written by one of my oldest friends (literally!). He writes under a pseudonym but in the interests of attribution, his orchid is 0000-0001-5528-0087. It would make an excellent holiday read.
The results of the Teach-Discover-Treat (TDT) 2014 challenge were out earlier this week. TDT is an initiative to provide high quality computational chemistry tutorials that impact education and drug discovery for neglected diseases with a special focus on freely available software tools and reproducibility.
We are very happy to announce that Rodrigo Ochoa, former summer intern in the group two years ago, won the second place (KNIME award) at the TDT challenge.
Rodrigo’s entry was based on myChEMBL (Open Innovation track) and contains several KNIME and IPython Notebook tutorials within an NTD computational research setting.
ChEMBL Rita just knocked this up using Circos and ChEMBL - I am a great boss, and regularly think of things to provide her with therapeutic breaks (pun intended) from her thesis writing.
The graph shows the ATC classes - so A = alimentary, B = blood, C = cardiovascular etc, and specifically the forth level of the ATC classification. The colouring of each segment, is set by the colour of the most recently approved drug within that class. Of course collapsing over the various ATC levels, linkage to target, etc. is also pretty straightforward using the ChEMBL data. A couple of things stand out - the maturity of the cardiovascular drug area C, and by contrast the recent innovation in the oncology and immunology areas L.
More on this, with some refined views on the way soon....
Update - well here is a better view - a lot clearer. Rita - get back to the thesis writing, right now!!
Another update - thanks for the comments on this - here's another one at the top of the ATC hierarchy, this time the new approvals in oncology/immunology as a class stand out more, as does the recent lack of progress and small number of drugs available in the anti-parasitic class.
We got involved in the analysis of a really interesting GWAS/metabolomics study, with a publication just appearing in Nature Genetics. A link to the paper is here.
Genome-wide association scans with high-throughput metabolic profiling provide unprecedented insights into how genetic variation influences metabolism and complex disease. Here we report the most comprehensive exploration of genetic loci influencing human metabolism thus far, comprising 7,824 adult individuals from 2 European population studies. We report genome-wide significant associations at 145 metabolic loci and their biochemical connectivity with more than 400 metabolites in human blood. We extensively characterize the resulting in vivo blueprint of metabolism in human blood by integrating it with information on gene expression, heritability and overlap with known loci for complex disorders, inborn errors of metabolism and pharmacological targets. We further developed a database and web-based resources for data mining and results visualization. Our findings provide new insights into the role of inherited variation in blood metabolic diversity and identify potential new opportunities for drug development and for understanding disease.
%A Shin, So-Youn
%A Fauman, Eric B
%A Petersen, Ann-Kristin
%A Krumsiek, Jan
%A Santos, Rita
%A Huang, Jie
%A Arnold, Matthias
%A Erte, Idil
%A Forgetta, Vincenzo
%A Yang, Tsun-Po
%A Walter, Klaudia
%A Menni, Cristina
%A Chen, Lu
%A Vasquez, Louella
%A Valdes, Ana M
%A Hyde, Craig L
%A Wang, Vicky
%A Ziemek, Daniel
%A Roberts, Phoebe
%A Xi, Li
%A Grundberg, Elin
%A The Multiple Tissue Human Expression Resource (MuTHER) Consortium
%A Waldenberger, Melanie
%A Richards, J Brent
%A Mohney, Robert P
%A Milburn, Michael V
%A John, Sally L
%A Trimmer, Jeff
%A Theis, Fabian J
%A Overington, John P
%A Suhre, Karsten
%A Brosnan, M Julia
%A Gieger, Christian
%A Kastenmuller, Gabi
%A Spector, Tim D
%A Soranzo, Nicole
%T An atlas of genetic influences on human blood metabolites
%J Nat Genet
%O http://dx.doi.org/10.1038/ng.2982
There's an Open Access opinion paper out in J. Chem. Biol. on the potential application of chemo/bio joint QSAR techniques that have historically been developed and applied to primarily human pharmaceutical applications, to the agrochemical area.
Later in the summer of 2014, there is some very exciting news on agrochemical data for ChEMBL!
%0 Journal Article
%D 2014
%@ 1864-6158
%J Journal of Chemical Biology
%R 10.1007/s12154-014-0112-2
%T Towards predictive resistance models for agrochemicals by combining chemical and protein similarity via proteochemometric modelling
%U http://dx.doi.org/10.1007/s12154-014-0112-2
%I Springer Berlin Heidelberg
%8 2014-05-15
%K Polypharmacology
%K Cheminformatics
%K Machine learning
%K Resistance
%A Westen, Gerard J.P.
%A Bender, Andreas
%A Overington, John P.
%P 1-5
%G English
On April 23rd 2014 the FDA approved Siltuximab (Sylvant™) for the treatment of patients with multicentric Castleman’s disease (MCD) who are human immunodeficiency virus (HIV-)-negative and human herpes virus-8 (HHV-8)-negative.
Castleman disease (Also known as giant or angiofollicular lymph node hyperplasia, lymphoid hamartoma, or angiofollicular lymph node hyperplasia) is an abnormal non-cancerous growth of the lymph node that can resemble lymphomas. It is contributed to by hyperproliferation of cytokine-producing lymphocytes. Castleman disease can be unicentric (involving a single lymph node) or multicentric (systemic). Siltuximab is approved for the multi centric disease. Overproduction of IL-6 has been linked to systemic manifestations in patients with MCD.
Siltuximab is a chimeric (human and mouse) anti-IL6 antibody. It binds human IL-6 thus preventing the interaction of IL-6 to both soluble and membrane- bound IL-6 receptors.
The target, Interleukin-6 (IL6; Uniprot = P05231; ChEMBL = CHEMBL1795129 ; canSAR = P05231)is an pro-inflammatory cytokine produced by T-lymphocytes and macrophages in response to infection or trauma.
Siltuximab is produced in Chinese hamster ovary (CHO) cells and dministered as an 11 mg/kg dose given over 1 hour by intravenous infusion every 3 weeks. The maximum serum concentration (Cmax) was observed close to the end of infusion. At steady state, the serum mean Cmax value is 332 mcg/mL (42% CV), and the serum mean predose trough value is 84 mcg/mL (78% CV). The mean terminal half-life (t1/2) in patients after the first intravenous infusion of 11 mg/kg is 20.6 days, and clearance is 0.23 L/day (51% CV.
On April 21, 2014 the FDA approved Ramucirumab (Cyramza™) for the treatment of patients with advanced or metastatic, gastric or gastroesophageal junction (GEJ) adenocarcinoma with disease progression on or after prior treatment with fluoropyrimidine- or platinum-containing chemotherapy.
Gastric cancer has a very poor prognosis, with adenocarcinomas constituting ~95% of all gastric cancers (CRUK).
In a randomized, double-blind, multicenter study of ramucirumab plus best supportive care (BSC) compared with placebo plus BSC of 355 patients with locally advanced or metastatic gastric cancer (including adenocarcinoma of the gastro-esophageal junction [GEJ]), ramucirumab improved the overall survival to a median of 5.2 months, compared to 3.8 with the placebo arm. Progression-free survival (PFS) was improved from a median of 1.3 months in the placebo arm to 2.1 months in the ramucirumab arm.
Cyramza has been issued a boxed warning because of increases in the risk of hemorrhage, including severe and sometimes fatal hemorrhagic events.
The structure of extracellular domains 2 and 3 of KDR (VEGFR2) in blue in complex with its ligand VEGFC in green. PDB=2x1x
The target of ramucirumab is the extracellular ligand-binding domain of the receptor tyrosine kinase, Vascular endothelial growth factor receptor 2 (KDR, also known as VEGFR2; Uniprot = P35968 ; ChEMBL = CHEMBL279 ;
canSAR = P35968). Ramucirumab specifically binds to KDR (VEGFR2) thus preventing the binding of its ligands VEGF-A, VEGF-C, and VEGF-D. This blockade inhibits ligand-stimulated activation of VEGF Receptor 2 and consequently, inhibits ligand-induced proliferation and migration of human endothelial cells. Elevated expression of the ligands has been shown to be clinically correlated with survival and with metastasis in gastric cancers.
Ramucirumab is administered intravenously 8 mg/kg every 2 weeks. The mean minimum concentrations (Cmin) were 50 μg/mL (6-228 μg/mL) after the third dose and 74 μg/mL (14-234 μg/mL) after the sixth dose.
Last year Louisa Bellis toured the UK to present on SMSdrug.net and ChEMBL. The tour was well received and a large number of people that had not heard of ChEMBL were introduced to ChEMBL.
Following the success of the previous tour, Gerard van Westen (EMBL –EBI) is going to be doing a 2014 ChEMBL tour. This year’s tour will be going to the BeNeLux. For dates and locations see below. Please feel free to attend and meet up / chat on ChEMBL, contactable via email on gerardvw [at] ebi.ac.uk
Current dates are as follows:
19th of May – Maastricht University
Host: Egon Willighagen
Time: 12.00-17.00 ChEMBL + Allosteric modulators
20th of May – Maastricht University
Host: Jos Kleinjans
Time: 09.00 - 11.30 ChEMBL + Advanced ChEMBL
20th of May – KU Leuven
Host: Pieter Annaert
Time: 14.00 -17.00 ChEMBL 21th of May – KU Leuven
Host: Piet herdewyn
Time: 09.00 -10.00 ChEMBL
21th of May – University of Luxembourg
Host: Reinhard Schneider
Time: 13.00 - 17.00 ChEMBL + Advanced ChEMBL
22nd of May – Universiteit Antwerpen
Host: Koen Augusteyns
Time: 15.00 - 17.00 ChEMBL
26th of May – VU University Amsterdam & Universiteit van Amsterdam
Hosts: Chris de Graaf & Willem Stiekema
Time: 10.00 - 15.00 ChEMBL + Allosteric modulators
28th of May – University of Groningen
Host: Alexander Domling
Time: 14.00 - 17.00 ChEMBL + Advanced ChEMBL
6th of June – Utrecht University
Host: Roland Pieters
Time: 09.00 - 12.00 ChEMBL
6th of June – Universiteit Leiden
Host: Ad IJzerman
Time: 15.30 - 17.00 ChEMBL
17th of June – Erasmus Medical Centre, Rotterdam
Host: Roland Kanaar
Time: 14.00 - 17.00 ChEMBL
18th of June – Radboud University Nijmegen
Host: Tina Ritschel
Time: 13.30 - 15.00 ChEMBL + Allosteric modulators