Drug and Targets
‘HIV reverse
transcriptase is the target of aciclovir’ – easy to say and it’s sort of
correct - it’s the sort of statement that in the vernacular of drug discovery,
most people would accept without the blink of an eye. This sentence strikes at
the core of the concept of a target
(HIV reverse transcriptase) for a drug
(aciclovir). However, there is much detail under this simple statement that
captures some of the complexities of the representation and storage of
bioactivity data.
Aciclovir is an inhibitor of HIV replication, so it is
targeted to the virus itself – and indeed this can be a useful way of
thinking about the mechanism and effect of aciclovir (and all other
antiretroviral drugs). We know a lot about HIV-replication and infection, of
which the reverse transcriptase function is an essential part, shared across all retroviruses, and is the process that aciclovir
blocks. Due to the intense research on this devastating pathogen, we know a lot
of detail about HIV (there was a striking paper on Nature on this a few years
ago) and this ‘systems-level’ information can also be represented in terms of a
network/pathway, in a resource like Reactome. Being able to tag this pathway
with a drug is a useful thing to do as well – but we are typically interested
in the more molecular and biochemical aspects of how a drug works – the
molecular basis of it’s action.
Firstly – HIV is a name of a family of viruses, HIV-1 and
HIV-2 being the major forms, each of these can be further classified into subtypes/strains, e.g. HIV-1A, and within each of these strains, it’s appropriate to
think of an infected person as containing a constantly changing ensemble of
sub-strains. The entire family is related in sequence, but the key point is
that the sequences differ - between HIV-1 and HIV-2 the differences are
relatively substantial, and between the particular pool of viruses within a
patient they are typically minor differences. So how should you store the
organism/target (and an associated particular sequence) for this case?
A comfort here is that it, in most cases, doesn’t really
matter – any sequence of a native HIV-1 virus is basically OK, since aciclovir
will probably usefully inhibit these,
and the affinity/potency differences will be negligible. In fact, aciclovir is
active as an inhibitor against both the HIV-1 and HIV-2 viruses. A big, big
exception though is for strains of virus that have been under selective
pressure following treatment with aciclovir – clinically resistant sequences
are rapidly selected, and here the most frequent sequence in an infected
individual will have significantly lower binding affinity for aciclovir.
Usually these differences are near the drug-binding site, but not always. So
for cases like this, it makes sense to try and store the sequence of the
resistant strain – but of course each drug will have it’s own ensemble of
resistant strains, and so it becomes complex. However, in order to understand
selectivity profiles and risks, the management of these differences is crucial
– as it is for intra-human sequence variation.
So, HIV-1 is a virus, it has a genome, and some sequences,
there are a number of genes within HIV-1 – XXXX of them, and the major ones are
env, gag, and pol (there are also a bunch of others including, tat, rev, vpr,
vif, nef, vpu and tev). These genes were named after the envelope,
group-specific antigen, and polymerase functions early in the study of HIV-1.
It turns out that the reverse transcriptase (RT) is part of the pol gene, and
the pol gene also encodes the integrase and proteinase (both also the ‘targets’
of clinically successful drugs). The key word here is ‘part of’.
RT is part of the pol gene – it requires cleavage from the
precursor polyprotein to become catalytically active (and to be inhibited by
aciclovir). The cleavage from the polyprotein is performed by a specific
proteinase encoded in the HIV-1 genome (called PR) – this proteolytic activity
is essential, and there are a class of drugs targeted against HIV-1 PR. So the
gene sequence itself doesn’t contain all the information to capture the
functional activity of RT – you need to know the sequence of the mature
protein.
It’s a little bit more complex than that though – the
functional RT is actually an obligate dimer of two RT sequences – and a little
more complicated than that yet, it isn’t a homo-dimer (two identical chains)
but a heterodimer made up of two different length chains called p81 and p73
(the numbers refer to the approximate sizes of the proteins from early gel
experiments).
So, we’re getting there, slowly. ‘The p51/p66 RT heterodimer of HIV-1A is the target of aciclovir’
is better.
Of course, in an ideal database, we’d need to be able to
store this target information in a usable form, that can then be generalised to
new systems. This isn’t just some nerdery, this detailed representation is
essential for things like docking, understanding the consequences of mutations,
etc.
We know the 3-D structure of the mature dimeric form of
HIV-1 RT and it is in fact composed of a series of distinct structural domains,
and ligand binding is often associated with binding to a specific domain within
these multidomain sequences. So storing the ligand binding domain(s) is a
useful thing too, if you want to be able to generalise the observations across
new data.
Enough of the target for now!
Now, let's think about the drug for a moment – aciclovir –
an old drug, rescued from it’s original application as a potential anti-cancer
to an anti-viral. Is aciclovir an inhibitor of this functional heterodimer?
No. It isn’t.
What is an inhibitor though is an active metabolite of
aciclovir – specifically the triphosphate form. Aciclovir is an example of a
prodrug – inactive (against it’s efficacy target) in the dosed form, and
requiring specific metabolic events to occur before it is active against it’s
target.
‘The p51/p66 RT
heterodimer of HIV-1A is the target of active metabolite of aciclovir’ is
getting there.
More nerdery you cry – well no. If you wanted to discover
computationally that aciclovir was useful as a drug for HIV – you’d need to
know (or store) the active triphosphate form (there are also come intermediate
forms on the way to the triphosphate that should probably be considered too).
Of course, the body also ‘sees’ the originally dosed aciclovir, so you may want
to store that to, dock it to host proteins for side effects, etc.
At this stage we’ve probably got a detailed enough
representation of the drug-target complex to allow us to do some reliable and
useful things with the data.
It is worth going to a higher level of detail though, since
it illustrates another important point.
Aciclovir triphosphate binds in a specific binding site of
HIV-1 RT, at the catalytic site – this is definitively known from enzymatic and
structural studies, since aciclovir is a nucleoside analogue, this site is
known as the nucleoside site. Sequence changes around this nucleoside site can
rapidly be selected for to give rise to resistant variants. Knowing where the
drug aciclovir binds can aid both sequence/resistance analysis studies, 3-D
modelling, and also help in docking experiments, since it’s possible to focus
studies on a known functional site.
There’s a second class of drug, NNRTIs – non-nucleoside
reverse transcriptase inhibitors. Prototypical of these is efiravenz. These are
very different in chemical structure to nucleoside analogues, and in fact bind
at a different site – an ‘allosteric’ site, that isn’t formed until the ligand
binds. Resistance can a does arise for this class of inhibitor too, but because
the drug binds at a different site, a different constellation of residues is
involved in resistance. Interestingly,
this site doesn’t exist at all in the closely related HIV-2 enzyme, and so
NNRTIs are essentially inactive against HIV-2.
So this site is allosteric – what does this mean – well
since the structure of the protein varies during ligand binding – it is
important to keep track of these different possible conformational states –
essential if one wants to do docking, etc.
At the tip of this target taxonomy we have to think about a particular
conformational substrate of a protein.
So there are two target sites in HIV RT, the nucleoside and
the NNRTI site, so perhaps we should state….
‘The nucleoside
binding site of p51/p66 RT heterodimer of HIV-1A is the target of active
metabolite of aciclovir’
Another away to think about this is as a hierarchy
Aciclovir triphosphate is a....
Another away to think about this is as a hierarchy
Aciclovir triphosphate is a....
- Retrovirus replication inhibitor
- HIV replication inhibitor
- HIV-1A replication inhibitor
- Reverse Transcriptase Inhibitor
- p55/p61 RT inhibitor
- nucleoside site binder
Imagine for a second that ‘HIV-3’ is sequenced, and we need
a new drug quickly – we can sequence the genome pretty quickly and cheaply
nowadays, but hopefully the complexity above will show that the transformation
from the gene sequence to a useful object to be analysed as a target is a
complex one, requiring a lot of tacit knowledge of the particular system.
Don’t worry, not everything is as complicated as this
example, and it is one of my favourites, since there are so many twists and
turns in this particular case. But you must, you simply must, now be wondering
how we currently do, and in the future will, cope with this sort of thing in
ChEMBL. Well – that will be the subject of a future post!
If there’s interest, I can add references and some
background links to this post – let me know if you’d be interested in the
comments.