Is Drug Discovery Getting Harder?

Alongside all our scientific interests, we also like to think about some of the financial/business aspects of drug discovery, differing business models, discovery strategies, etc., I guess these fall under the general tag Operational Research. The OHE is also a great place to browse for a very broad range of health economics issues and ideas. Here is a little toy analysis, that may be of some interest.

As some background, probably everyone has seen those time-series graphs comparing discovery costs (from PhRMA, ABPI, etc) and drug launches. A couple of things spring from this view, firstly that it is really, really expensive to discover drugs, even with constant currency corrections, and secondly that the per drug launched cost is inexorably rising.

Here is something a little bit different….

We took a simple list of all INNs, and for each of these, there was an associated date - the year in which the INN/USAN was approved. For background an INN/USAN (the ‘generic name’) is granted for a compound in clinical development when the applicant thinks there is a reasonable chance that the compound will be commercialised, i.e. it is a mark that the applicant is serious about the drug. Typically, but certainly not always, an INN/USAN is granted during phase II. This analysis is pretty easy to do; in fact, you can probably come up with this graph yourself with a soupçon of internet tomfoolery, so I won’t bother presenting that here - ‘an exercise for the reader’ as they say.

What we did next was to map the internal research code to those INNs/USANs, this number is something like LP-12345 (where LP is an alphabetic code, by convention assigned to a given company. The number typically indicates the order of registration of the compound in the company's internal compound collection. So, between LP-12,678 and LP-32,129, there would have been 19,451 compounds made and registered. Some companies also use the convention of having a 'dash' then a final integer, and this often is associated with a particular salt (so LP-12,678-1 could be the hydrochloride salt). As an explicit example Viagra (Tradename) is Sildenafil (USAN/INN) is UK-92480 (Research code). So this would have been the 92,480th compound registered in the compound collection at the UK labs of Pfizer (for more details on research codes and a table of company assignments see the indispensable Merck Index). Remember, we do not have access to the dates the compounds were made, just the link between the USAN date and the research code. The companies that make the compounds of course know the exact day the compound was synthesized and then registered. A further proxy/correlate of the discovery date would be the first time the compound appears in a patent - but this is not useful for a number of reasons that are not relevant for the following discussion.

So what we can plot pretty easily is the USAN date, and the order of the date of synthesis/registration. This allows us to come up with some pretty solid measures of the number of compounds required to be synthesised/purchased as a function of two key parameters, getting an assigned USAN, and getting and Approved Drug. What would the graph look like? Do we need to make more compounds per drug output now that we used to (Oh, once more for the halcyon days of drug discovery!) and if there is a larger number of compounds per drug output, are we making a more rational selection from a larger pool, and so reduce downstream attrition. Ideally, one would aspire to making and testing a smaller number of compounds per drug (or more practically, significantly reduced cost per compounds made per drug).

As an aside, it would be great to have data on the number of research staff per year per company, but what with mergers, out-sourcing, in-licensing, and so on, I guess even the companies themselves would not have this data for more than the previous five or so years. If anyone does have reliable data on this (and they are free to share it), please contact me.

For one large multinational company (it does not matter which company it is, really) the graph looks like the picture above. I chose to plot here data from one large company due to the fact that there will exist differences in business rules on applying for USANs, compound registration and numbering conventions - it removes a bunch of variables. However, every company we have looked at is similar in its general pattern.

I think this is pretty interesting, the graph is clearly bi-phasic, there is a break/inflection in 1997. This reflects a very material increase in per drug/USAN rate between the two parts of the graph, about 30-fold in fact. This means that to get a USAN, roughly thirty times more compounds were required. Remember there will be an offset between the USAN date and the synthesis date, anywhere between four and six years (typically). So what happened in ~1992 from the discovery side of things that changed the world? One potential interpretation of the data (which non-surprisingly, I think is almost correct, and can stand quite a bit of challenge) is that molecular biology actually made things a lot worse for the industry, we (as an industry) were suddenly ‘target rich and context poor’ and relied on technology and larger resource levels to solve everything - more compounds, more targets, more screening,.... It really did make quite a lot of sense, back then. ‘Novelty’ became a very strong and seductive and compelling concept (since it allowed us better patent protection, commercial advantage, it allowed the scientists to do more interesting 'novel' research, and so forth). This ‘target richness’ was actually confounded by the fact that most targets were not actually tractable using technology we had at that time (largely small molecules), i.e. they were not druggable, and that we may have, in fact, mapped out most of ‘pharmacologically modulatable space’ at a pathway/systems level at that time, but not known it.

Just before you say 'Ah-Ah' to yourself. Another reason for this increase in compounds per unit output is that large-scale expansion of compound collections occurred at just around that time. If they had actually had an impact on productivity it would have had an impact by now though if it was going to.

A further way to look at this compound per USAN data is below. Note that the fraction of compounds that make it from USAN stage to launch is roughly constant (about a quarter for this company) for these compound cohorts. This fraction if USAN to launch is an interesting analysis for another time, maybe.

My train is now arriving at the station, so I need to shut the lid of my laptop, and go. However, there are a number of assumptions in the above chain of argument, I will try and outline the ones we have considered in a later post.

As always, if anyone is interested in the underlying data, please contact us. (Blue Obelisk Rules!)