• Webinar on Drug Targets - 27th March 2014


    I'm giving a webinar on Drug Targets and Drug Targeting at 2-3 pm EDT on Thursday March 27th 2014. Please note that Europe and the US have not aligned their saving times then, so the time difference will be 4 hours for the UK and Portugal (6pm GMT/WET), and 5 hours for the most of the rest of western/central Europe (7pm CET) and 6 hours for Finland and Eastern Europe (8pm EET). I plan to cover quite a lot of ground, with quite a lot of new stuff and analyses.

    Registration is free on this link http://acswebinars.org/drug-discovery, and the slides will be available after the meeting on this site too. Well done ACS!!

    The next in the series, on lead discovery on Thursday 24th April, is by our great friend and collaborator Tudor Oprea, so put that in your diary too.

    jpo


  • Unpacking a GPU computation server...Leviathan unleashed


    What / why?
    As you might know, EMBL-EBI has a very powerful cluster. Yet some time ago we were running into some limitations and were pondering on how great it would be if we had the ability to run more concurrent threads in a single machine (avoiding the bottleneck that inevitably appears on the network for some jobs).

    It turns out there is an answer, namely in the form of a GPU (graphics processing unit). This is the same type of chip that creates 3D graphics for games in your home PC / laptop. While the capabilities of individual calculation cores are relatively limited on GPUs compared to CPUs, they can have a massive amount of them in order to generate 3D environments at the speeds of 60 frames per second. Schematically it looks like this (CPU left, GPU right):


    As you can see, the CPU can handle 8 threads concurrently, whereas the GPU can handle 2880 (see also this great youtube video by the myth busters). We have all kinds of ideas of calculations we want to run on the GPUs (that have shown to work well in MD), but now first ... the geek tradition that is unboxing!

    Nvidia 
    The guys at Nvidia were very generous and provided us with 5 GPUs (thanks to Mark Berger and Timothy Lanfaer). Tim was also very quick with technical questions concerning the hardware specs needed and software troubleshooting. Thanks again!



    Unpacking
    At the EMBL-EBI people typically work with laptops or thin clients, and the cluster consists of blades so there was no place to put our GPUs. Yet, after a quick investigation we had a list of hardware we wanted and a big box was delivered two weeks ago !



    Time to unpack...


    So after opening and removing the hardware, we had a tower / 4u rackmountable chasis




    Next up, placement of the GPUs inside the chassis:




    Some tinkering was in order:


    And finally we could boot and install the OS. We choose Ubuntu 12.04 LTS because of the stability, and availability many packages (with source code). 




    Leviathan?
    Just one question remains, why 'Leviathan'? 

    Given the availability of python based cuda packages, we will probably start there. Hence our server we be a very powerful incarnation of python, and what's more awe-inspiring than the Leviathan?

    CUDA running
    After some trouble getting the drivers to work (we use Ubuntu 12.04 LTS), Michal got everything up and running!



    Potential projects
    Some of the projects we will be starting with are CUDA based random forests, similarity matrix calculations, and compound clustering. If you have a good idea and would like to collaborate and co-publish, please contact us via email!

    Specs
    The server contains the following hardware:
    Case: Supermicro GPU tower/4U server 
    PSU: 1,620W Redundant PSU
    CPUs: 2*Intel Xeon E5-2603 1.8GHz 4core 
    RAM:  8*8GB Reg ECC DDR3 1600MHz 
    Disk: 1*2TB 3.5” SATA HDD
    GPUs: 1*Tesla K40; 2*Tesla K20 (one extra to be added later)

    Michal & Gerard

  • ChEMBL and Handling of Retracted Papers


    There is much attention paid to retracted data and errors in the literature, and also to resources that use the literature to build knowledge on top of published papers (for example ChEMBL). Sometimes there is a deliberate intent to deceive, and other times an accident in data processing and interpretation. The Retraction Watch blog is great reading on a long train journey if you want to see some of the pre-formal retraction discussion. With advances in text mining (in the broadest sense, so including images as text, etc.) and secondly with more publications becoming Open Access, it is easier to find and flag these errors; for example see some of the pioneering work and ideas of Peter Murray-Rust. We find errors and inconsistencies in the literature really frequently - units that don't make sense, an end point inconsistent with the reported assay, etc. We either fix, or flag these sort of inconsistencies when we curate data. In general, science is pretty robust to these errors, and most errors, to be frank, have little impact in many/most realistic applications of the data (and consequently on literature derived datasources). What we don't do is contact the editors of the journal or the original authors - and maybe this is something we should start doing.

    Given that we now are running SureChEMBL, which is completely automated in it's operation, we are thinking carefully about errors, and how to flag or mark them in some way (for example in the accuracy of text extracted chemical structures) - I think research into the processing and filtering of such 'big data' is going to be a very active and important field in the near future - and is core to reproducibility of analyses. I've looked a little at some cases where ChEMBL structures are the odd one out compared to other public chemistry resources - sometimes we've been wrong, and by comparison with other sources, we've then fixed things - sometimes though, we're right and the rest of the community is 'wrong'. For me this is the way that resources like ChEMBL improve, by verifying the data we hold in whatever reasonable way is available to us. Using simple consensus or voting approaches to validate proof is often right, and often wrong - the most insidious case is where wrong data is propagated without provenance, and this is especially problematic in integration resources which merge data from many sources. I have a draft blog post on some of the analyses I've done, but this currently unfinished work, but will contact the other data providers first to feedback the differences.

    There is one particular type of error though that can be captured semi-automatically, and then included - formal retractions of papers. The PubMed search above (in the picture) shows the retractions recorded for J. Med. Chem. - a small number you'd agree, it's then straightforward to identify the source papers and flag these in some way. Based on what I've found so far, the issue with the literature extracted in ChEMBL is very minor, but still important if you are basing work on analyses that rely on these particular data.

    We're still deciding what to do in ChEMBL, but when we've settled on exactly what to do, we'll process the data to correct for these retractions and corrections.

    I must acknowledge the fantastic Laura Furlong at IMIM for help with this problem - Laura responded to a twitter post I wrote asking the question of how to link retractions with original papers - so social media does work in science.

  • Software that phones home: Good or bad?


    This is something that's been bugging me for a few days now - probably just triggered by reading all the recent disclosures of NSA/GCHQ surveillance, and trust in software systems in general. The basic issue I'm thinking about is when and where is it 'right' for software to 'phone home'?

    This checking in with base idea is sometimes a good thing - for example if when I fire up a program, I get a little box that tells me a new version is available then that's a good thing. Or if my computer or phone is stolen, then calling in to let me know where in the world it is, is a good thing. It is probably also a good thing if you are a software vendor, and you want to ensure that your software hasn't been pirated, or run outside of the parameters for which it is properly licensed. For the latter case, it may even be a good idea to encrypt the message pinged back, to prevent l33t hax0rs suppressing license compliance mechanisms.

    But the privacy issues of this sort of thing are very big, especially if, you as a user don't know it's being done, or if you don't know what is being sent back to base. I have only ever come across one software license (in this case a commercial vendor) that discusses this (in the context of the licensee not suppressing in any way this communication as a way of ensuring license compliance - not addressing at all what is sent back - if it's my source IP address and a timestamp fine, if it is a dump of all my queries, I'd be furious).

    Of course, it's possible to control or spot this sort of activity, and I've just installed Radio Silence as a quick way of seeing if any of my desktop apps do anything behind the scenes I don't know about.

    But, in general, are there any community expectations and standards for this sort of thing, especially for cases where the software will be used explicitly to generate trade secrets and perform confidential research?

  • USANS - February 2014

    Just catching up on some recently published USAN statements.

    USAN Research Code InChIKey (Parent) Drug Class Therapeutic class Target
    asvasiran-sodium

    ALN-RSV01

    n/a RNAi therapeutic n/a
    beclabuvir

    BMS-791325

    ZTTKEBYSXUCBSE-VSBZUFFNSA-N synthetic small molecule therapeutic HCV NS5B polymerase
    benzhydrocodonebenzhydrocodone-hydrochloride
    KP-201
    VPMRSLWWUXNYRY-PJCFOSJUSA-N natural product derived small molecule therapeutic Opioid receptors
    bradaniclinebradanicline-hydrochloride

    TC-5619

    OXKRFEWMSWPKKV-GHTZIAJQSA-N synthetic small molecule therapeutic alpha-7 nicotinic acetylcholine receptor
    briciclibbriciclib-sodium

    ON-014185

    LXENKEWVEVKKGV-BQYQJAHWSA-N synthetic small molecule therapeutic n/a
    ceritinib

    NVP-LDK378-NX

    VERWOWGGCGHDQE-UHFFFAOYSA-N synthetic small molecule therapeutic ALK
    dasabuvir

    ABT-333

    NBRBXGKOEOGLOI-UHFFFAOYSA-N synthetic small molecule therapeutic HCV NS5B polymerase

    defactinibdefactinib-hydrochloride

    VS-6063

    FWLMVFUGMHIOAA-UHFFFAOYSA-N synthetic small molecule therapeutic FAK
    dianhydrogalactitol

    VAL-083, NSC-1323313

    AAFJXZWCNVJTMK-UHFFFAOYSA-N synthetic small molecule therapeutic DNA
    dinutuximab
    n/a
    n/a monoclonal antibody therapeutic GD2
    diridavumab

    CR-6261

    n/a monoclonal antibody therapeutic haemagglutinin
    enceniclineencenicline-hydrochloride

    EVP-6124

    SSRDSYXGYPJKRR-ZDUSSCGKSA-N synthetic small molecule therapeutic alpha-7 nicotinic acetylcholine receptor
    esuberaprostesuberaprost-sodium


    APS-314d, BPS-314d

    CTPOHARTNNSRSR-NOQAJONNSA-N synthetic small molecule therapeutic IP1 receptor
    filociclovir


    MBX-400

    KMUNHOKTIVSFRA-KXFIGUGUSA-N synthetic small molecule therapeutic CMV DNA polymerase
    fosdagrocorat

    PF-04171327

    n/a synthetic small molecule therapeutic GR
    gedatolisib

    PF-05212384, PKI-587

    DWZAEMINVBZMHQ-UHFFFAOYSA-N synthetic small molecule therapeutic PI3K & mTOR
    glasdegib

    PF-04449913

    SFNSLLSYNZWZQG-VQIMIIECSA-N synthetic small molecule therapeutic smoothened
    indoximod
    D-1MT
    n/a natural product derived small molecule therapeutic IDO
    latiglutenase

    ALV-003

    n/a enzyme therapeutic n/a
    lulizumab-pegol BMS-931699

    n/a monoclonal antibody therapeutic CD28
    ombitasvir

    ABT-267

    PIDFDZJZLOTZTM-KHVQSSSXSA-N synthetic small molecule therapeutic HCV NS5a
    omega-3-carboxylic-acids
    n/a
    n/a natural product derived small molecule therapeutic n/a
    peficitinib

    ASP-015K

    DREIJXJRTLTGJC-UHFFFAOYSA-N synthetic small molecule therapeutic JAK
    pegargiminase
    n/a
    n/a enzyme therapeutic n/a
    pembrolizumab
    n/a
    n/a monoclonal antibody therapeutic Programmed cell death 1 (PDCD1)
    polmacoxib

    CG-100649

    IJWPAFMIFNSIGD-UHFFFAOYSA-N synthetic small molecule therapeutic COX-2
    sarolaner

    PF-6450567

    FLEFKKUZMDEUIP-QFIPXVFZSA-N synthetic small molecule therapeutic n/a
    transcrocetinate-sodium
    n/a
    n/a natural product derived small molecule radiation sensitizer n/a
    uprosertib

    GSK-2141795C

    AXTAPYRUEKNRBA-JTQLQIEISA-N synthetic small molecule therapeutic AKT1
    venetoclax

    ABT-199

    LQBVNQSMGBZMKD-UHFFFAOYSA-N synthetic small molecule therapeutic BCL-2

  • New Drug Approvals 2013 - Pt. XXIV - Sofosbuvir (Sovaldi ™)





    ATC code (stem): J05AB
    Wikipedia: Sofosbuvir
    ChEMBL: CHEMBL1259059

    On December 6, 2013, the FDA approved sofosbuvir for the treatment of patients with chronic hepatitis C infection. Sofosbuvir is intended for use as a component in combination treatments, depending on the type of hepatitis C either alongside Ribavirin alone, or in combination with both Ribavirin and peginterferon-alpha. Earlier in 2013, the FDA had already approved
    Simeprevir for the treatment of this condition.

    Hepatitis C is an infectious disease that affects primarily the liver and is caused by the hepatitis C virus (HCV), which belongs to the family of Flaviviridae and has a positive sense single stranded RNA genome of 9,600 nucleotides. Infection is mainly by blood-to-blood contact, through sharing or reuse of syringes or unsterilized medical equipment. Initially, the infection progresses without symptoms, and only becomes apparent in the chronic stages when liver damage leads to symptoms such as bleeding, jaundice, liver cancer and hepatic encephalopathy.

    Sofosbuvir is a nucleotide analog inhibitor of the viral RNA polymerase (NS5b, Uniprot genome polyprotein: P26664, 2421-3011, PDB 3hkw). Viral RNA polymerases differ significantly from eukaryotic and bacterial polymerases both in sequence and three-dimensional structure. Thus, sofosbuvir inhibits only the amplification of the viral RNA genome and not endogenous transcription in the host organism by entering the polymerase as a substrate and terminating the transcript chain. The IC50 measured against NS5b ranged between 0.7 and 2.6 micro-molar, depending on the genotpye of the HCV isolate.

    Structure of HCV NS5b, genotype 1a generated in pymol from PDB 3hkw.
     Sofosbuvir is a prodrug that is converted to the active form through a mono-phosphorylated intermediate. In contrast to other nucleotide analog inhibitors, the intermediate is formed in a step that cleaves off the groups attached to the phosphate group already present in sofosbuvir. This step is a lot faster than the enzymatic addition of a phosphate group that is required with other nucleotide analogs. The enzymes catalyzing this initial step include the lysosomal protective protein (Uniprot P10619), liver carboxylesterase 1 (Uniprot P23141) and Hint1 (Uniprot P49773). [1]



     Canonical SMILES: CC(C)OC(=O)[C@H](C)N[P@](=O)(OC[C@H]1O[C@@H](N2C=CC(=O)NC2=O)[C@](C)(F)[C@@H]1O)Oc3ccccc3 
    Std-InChI: InChI=1S/C22H29FN3O9P/c1-13(2)33-19(29)14(3)25-36(31,35-15-8-6-5-7-9-15)32-12-16-18(28)22(4,23)20(34-16)26-11-10-17(27)24-21(26)30/h5-11,13-14,16,18,20,28H,12H2,1-4H3,(H,25,31)(H,24,27,30)/t14-,16+,18+,20+,22+,36-/m0/s1
    Std InChI key: TTZHDVOVKQGIBA-IQWMDFIBSA-N

    Sofosbuvir is an off-white crystalline substance that is slightly soluble in water. The molecular weight and logP are 529.45 Da and 0.92, respectively. Note the relatively low logP charateristic of nucleotide analog compounds.

    The recommended daily dose of sofosbuvir is 400mg in a single tablet. Peak plasma concentration of the active metabolite are reached after 30-120 minutes post administration. The clearance is primarily through the kidney, with a half-life of 0.4 hours for sofosbuvir and 27 hours for its metabolite.  Sofosbuvir is a substrate of P-gp, and therefore inducers of P-gp, such as rifampicin and St John's wort are contraindicated for use with sofosbuvir.

    Reported side effects of sofosbuvir include fatigue, headache, nausea, insomnia and anemia.

    Sofosbuvir is marketed by Gilead under the name Sovaldi.

    References:
    [1] Murakami E, Tolstykh T, Bao H, Niu C, Steuer HMM, Bao D, Chang W, Espiritu C, Bansal S, Lam AM, Otto MJ, Sofia MJ, Furman P a: Mechanism of activation of PSI-7851 and its diastereoisomer PSI-7977. J. Biol. Chem. 2010, 285:34337–47.

  • HELM in ChEMBL



    While the vast majority of molecules in ChEMBL are small molecules, we also have a growing collection of peptide-derived compounds, monoclonal antibodies and other biotherapeutic drugs in the database. Historically, these molecules have been represented by molfiles (for small-medium peptides) or protein sequences (for monoclonal antibodies) in the database.

    However, for many biotherapeutics, these formats are not sufficient to represent the complexities of the molecules. Molfiles and other chemical structure formats are impractical for large molecules, and simple protein sequences cannot adequately capture the non-natural amino acids and other modifications that are commonplace in biotherapeutic drugs.

    We are therefore working to adopt the HELM (Hierarchical Editing Language for Macromolecules) standard, developed by Pfizer and the Pistoia Alliance, within ChEMBL and plan to include HELM notation for all peptide-derived drugs and compounds in release 20 of the database.

    See also the recent press-release for more information.

  • #ICanHazStructurez


    I needed a pdf for a presentation I was giving this morning, I was in a hotel, which doesn't have an institutional subscription, so was stuck. On twitter, there is a hashtag #ICanHazPDF, which is quite successful, but to be clear I didn't use that route ;{o .

    It got me thinking though, given the reach and immediacy of twitter, could I use it to get chemical structures - so #ICanHazStructurez was born. It worked, well in fact, and was very quick (see the image above - remember this was 6 am in the morning (in Germany at least).

    So a big high five to @Lewis_Lab and @nickholway - FF and all that!