• Reminder: Pipeline Pilot Cambridgeshire UGM



    This is a gentle reminder for the Cambridgeshire Pipeline Pilot Users Group Meeting that will take place on Thursday 17th January 2013 (aka tomorrow), at 3pm here at the ChEMBL HQ.

    This is the agenda for the meeting:


    1. Welcome and Host talk:  George Papadatos + Gerard van Westen
          Cool things with Pipeline Pilot and ChEMBL
    2. Peter Woollard (GSK)
        Using Pipeline Pilot for computational biology capabilities, where it has helps the most and where it is less used.
    3. Richard Carter (Oxford Nanopore Technology):
          PP on a memory stick
    4. Mike Cherry (Accelrys) :
          Repetitive Data Flow
    5. Question and Answer session including:
       - how people have found NGS components  and TAC components
    6. Willem van Hoorn (Accelrys)
          Matched Molecular Pairs
    7. Adrian Stevens (Accelrys)
          Upcoming chemistry components in PP9.0

    There's still time so if you fancy attending, drop us a line.


    George

  • Competition Time - Follow Up

    VX-509?


    So there were two answers posted in comments to the lunchtime competition recently, and a further one sent by email last night. Look at the comments on the post alongside this summary if you are interested.

    The answer is that the compound structure (now known by the INN Adelatinib) is a JAK inhibitor and is almost certainly VX-509.

    Everyone seemed to follow a similar strategy, and I've paraphrased this process here. This is quite general for the case of a known structure, unknown research code case (the reverse though is a lot more common - known research code, unknown/proprietary structure). For the latter, look at some of the excellent guidance over at our buddy Chris Southan's blog.

    • Convert the image with OSRA - the CH3 was mildly misinterpreted for me, but easy to fix.
    • Search PubChem/ChemSpider with the molecular structure. There is a hit in PubChem (http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=59422203), and this is from SureChem.
    • Search patent literature - most people seemed to use SureChem
    • Find a patent with the structure in (the patents are filed by Vertex, and the compounds are claimed as JAK inhibitors).
    • Search the web for "Vertex JAK inhibitor" as a google/bing/whatever search string, and you find that Vertex have one (VX-509) in Phase 2a, which would be a classic time to file for a non-proprietary name.

    There was a CAS number in the original document (here). I didn't include this in the image yesterday, since it doesn't find anything on google (at the time the document was published). A simple google for CAS numbers is great for many new compounds, and the hits often lead to Far Eastern compound suppliers, who are clearly pretty good at snaffling up interesting compound structures from the public web.

    It would also have been trivial to have turned the connection diagram into an InChI key and done a Google search (it didn't find anything, at least not yesterday).

    Note a few things -
    • The string "VX-509" does not appear in the patent, and so it is an inference that the structure is VX-509 - in reality it would be really really surprising if it wasn't. But you never know. 
    • The patent literature is a pretty good source of interesting stuff, but it's difficult to get, and there is probably also some lag between patent structures being accessible in other databases and from SureChem - SureChem recently announced that they were depositing structures into these sources, which is great, but this example exemplifies, that you will probably end up doing the search twice - and maybe it's best to go to SureChem first?
    Also, for me it was a great example of crowd-sourced public annotation - this will end up in ChEMBL_16.

    jpo and George


  • Paper: UniChem


    We have just had a paper published on UniChem - simple name, simple functionality, but we love it, and it has become the way that we map ChEMBL to other data sources and keep things linked in real time, and also keep the ChEMBL molecule tables manageable. It's published in the Open Access Journal of Cheminformatics.

    There is an interface on the above UniChem link, but for most use we anticipate REST web services access - details are on the link above.

    The link to the provisional pdf is here.

    One of the jolly blog pixies is writing a blog post showing some use cases for UniChem - and I have a lovely thing called "Chive" to tell you about in a few weeks!

    %T UniChem: a unified chemical structure cross-referencing and identifier tracking system
    %A J. Chambers
    %A M. Davies
    %A A. Gaulton
    %A A. Hersey
    %A S. Velankar
    %A R. Petryszak
    %A J. Hastings
    %A L. Bellis
    %A S. McGlinchey
    %A J.P. Overington
    %J Journal of Cheminformatics 
    %D 2013
    %V 5
    %O doi:10.1186/1758-2946-5-3
    
    
    

  • Competition Time!

    Something to keep you entertained over lunch.

    So, what is the research code of the compound above (the INN is Adelatinib).

    Leave the answer in the comments to this post, and there's a wonderful prize for the winner!

  • New Drug Approvals 2012 - Pt. XXXIII - Apixaban (ELIQUIS®)


    ATC code : B01AF02
    Wikipedia : Apixaban

    On December 28, FDA approved Apixaban (Trade Name: ELIQUIS®; ChEMBLCHEMBL231779KEGGD03213; ChemSpider8358471; DrugBankDB07828; PubChemCID 10182969) as an anticoagulant for prevention of venous thromboembolism and related events, indicated to reduce the risk of stroke and systemic embolism in patients with non-valvular atrial fibrillation. 

    Atrial fibrillation (AF) is most common cardiac arrhythmia (irregular heart beat). There are many classes of AF according to American College of Cardiology (ACC), American Heart Association (AHA) and the European Society of Cardiology (ESC) one of which is non-valvular AF - absence of rheumatic mitral valve disease, a prosthetic heart valve, or mitral valve repair (AF which not caused by a heart valve problem). Usually AF increases the degree of stroke risk, can be up to seven times that of the average population. AF is one of the major cardiogenic risk factors for stroke. For instance, patients with inappropriate or abnormal blood clotting (coagulation disorder) will result in clot formation in heart which can easily find their way into the brain, resulting in stroke.

    Coagulation (thrombogenesis) is the process by which blood forms clots. Coagulation cascade has two pathways which lead to fibrin formation, they are intrinsic pathway and extrinsic pathway. The pathways are a series of reactions, in which a zymogen of a serine protease and its glycoprotein co-factor are activated to become active components that then catalyze the next reaction in the cascade, ultimately resulting in cross-linked fibrin. Apixaban belongs to Direct factor Xa inhibitors ('xabans') class of anticoagulant drugs, which directly acts on Factor X (FX) in the coagulation cascade without antithrombin as mediator. 

    Apixaban is reversible and selective active site inhibitor of Factor Xa (FXa) . It does not require antithrombin III for antithrombotic activity. Apixaban inhibits free and clot-bound FXa, and prothrombinase activity. Apixaban has no direct effect on platelet aggregation, but indirectly inhibits platelet aggregation induced by thrombin. By inhibiting FXa, apixaban decreases thrombin generation and thrombus development.


    The PDBe entry (PDBe : 2p16) for the crystal structure for human Factor X (chain A & chain L) in complex with Apixaban (blue-green - molecule shaped) is shown above.


    IUPAC Name : 1-(4-methoxyphenyl)-7-oxo-6-[4-(2-oxopiperidin-1-yl)phenyl]-4,5,6,7-tetrahydro-1H-pyrazolo[3,4-c]pyridine-3-carboxamide
    Canonical SMILES : COc1ccc(cc1)n2nc(C(=O)N)c3CCN(C(=O)c23)c4ccc(cc4)N5CCCCC5=O
    Standard InChI : 1S/C25H25N5O4/c1-34-19-11-9-18(10-12-19)30-23-20(22(27-30)24(26)32)13-15-29(25(23)33)17-7-5-16(6-8-17)28-14-3-2-4-21(28)31/h5-12H,2-4,13-15H2,1H3,(H2,26,32)
    Standard InChI Key : QNZCBYKSOIHPEH-UHFFFAOYSA-N

    Apixaban is available for oral administration at doses of 2.5 mg and 5 mg. It displays prolonged absorption with bioavailability of ~50% for doses up to 10 mg. Plasma protein binding was estimated to be ~87% and Vss is ~21 liters. Apixaban is metabolized by mainly via CYP3A4 with minor contributions from CYP1A2, CYP2C8, CYP2C9, CYP2C19 and CYP2J2. Approximately 25% of Apixaban is recovered in urine and faeces. Despite a short clearance half-life about 6 hrs, apparent half-life is 12 hrs, due to prolonged absorption phase; renal excretion accounts to 27% of the clearance.

    Apixaban comes with a boxed warning for risks and remedies while discontinuing drug. There is one other direct factor Xa inhibitor approved by FDA in 2011, Rivaroxaban (ChEMBL : CHEMBL198362, ATC code  : B01AX06, PubChem : CID6433119), was "first in class" FXa inhibitor (can be accessed by one of our old blog posts, here) which had similar boxed warning along with spinal/epidural hematoma in surgical settings.

    The license holder is Bristol-Myers Squibb, and the product website is www.eliquisglobal.com.

    Full prescribing information can be found here.

    Ramesh

  • Privacy and the ChEMBL Database


    Privacy is pretty important - for example, in the picture above I have protected to privacy of two colleagues, as I think I should ;) In fact I've even made sure that the black box securing their identities is not a layer on the image that can be trivially removed.....

    Chemistry is a little different to some other areas of life-science research, and there is a little more caution applied typically in the use of 'public' database systems by people working on chemical structures - primarily because of patenting and novelty. There are probably similar privacy/security concerns over sequence data too - and in ChEMBL we've covered that too. I'm not going to drift on to what constitutes a 'publication', and all that sort of stuff since 1) I'm not qualified, 2) I don't have the time (and 1) anyway), and 3) it attracts trolls (and 1) and 2) anyway).

    I have been asked for a talk through on the usage and query privacy of ChEMBL as part of the great OpenPHACTS project for some time; so here it is - to make it clear - I'm not an expert, but I do worry about these things, and I read a lot. Any feedback or suggestions would be great in the comment section.

    ChEMBL https://www.ebi.ac.uk/chembl is hosted on production machines at a pair of physically separated load-balanced Class 3 data centers in London. These are pretty close to one of the main Internet backbones in the UK, so reliability, latency and throughput is pretty good. The ChEMBL database and application is automatically loaded from a staging system at Hinxton. Once it leaves our staging area, we can't access the production data/server at all; in fact only a small number of named staff, using all sorts of access control and logging mechanisms can get into the machine rooms.

    You may have noted that we use https: on the ChEMBL url above - even if you try and force use of http: to access the server, it will switch you over to https: (go on try it, I told you so). This ensures 1) that the server you access really is the genuine ChEMBL server (you should see a little lock in the corner of your browser), and 2) that the traffic between your client and our server is encrypted, and so no one can simply sit on the same network as you, listening to all your queries. So this is pretty secure, the tls standard used by https: is relied on by essentially everyone who implement secure and private web sites. It takes a little care to actually get https: to work properly - with a common reason for non-validation (so the little padlock doesn't appear) being the use of http: links on the nominally https: source page, or http: links to third party sites such as for advertisers, etc.

    We don't (currently) have a green bar in the browser for this https: service - the green bar (or something similar depending on your browser) comes from the use of a Extended Validity Certificate (EVC). For these, you and your Certification Authority need to do a little more paperwork, and then spend a little more money. There is no difference in the technical security - the little padlock is the mark of security, not the green bar, just that the certificate authority has done some more work to validate that you really physically are who you say you are and so on. At the moment, sites like PayPal and so forth have EVCs, but they will no doubt spread, as the public starts to associate only sites with a green url bar with 'enhanced' security, and assume that the green thing is The Mark of website safety.

    We do not use accounts to access the ChEMBL website - there is no need for the things we do - any personalisation is done via cookies saved on your machine in the cookies folder (we have an Institutional cookie policy too, that describes what cookies we will write on your computer). It is not straightforward to implement good password systems, as many large professional internet companies have amply shown (LinkedIn - I'm thinking of you!), and for us we don't need them for ChEMBL, so we haven't bothered.

    There is also an Institutional Privacy Policy which covers a broad range of personal type data across all our activities (including recruitment, etc).

    There is an Institutional Terms of Use for all institute resources. There is usage logging performed on the servers for internally reviewing the use of our services, or for spotting of problems (like DOS attacks, innocent scripting that can look like a DOS (Ben ;) )) and to collect statistics (like total usage, distribution of users, etc), to track enhanced usage following interface/data addition (this makes us feel good sometimes, it's nice to know our things are used). This data is all private, and is forbidden from being shared other than at aggregate level with third parties/collaborators.

    The ChEMBL web application is written to not store any user queries (chemical structures or sequences, or text queries), other than storage required for application and database performance - so for example some automatically flushed, short-lifetime caches that are part of Oracle, and as I've said above, we don't have access to these anyway on the production servers.

    We do not run google analytics on our ChEMBL application (but some of the Institutes services do, and we do on the ChEMBL-og) - it is tempting to do use GA for the fancy plots and maps, but what it means is that a third party (Google for GA) will be seeing all the query IP source addresses and url strings. Google already know enough about me, they don't also need to know I have a late night penchant for 4-amino-anilines as well.

    So, if I was to extract some general principles from the above:
    • Use https: for everything - there's no real cost over http:, and make sure it validates!
    • Have a clear and easy to find Terms of Use.
    • Have a clear and easy to find license for any data.
    • Have a cookie policy and explain to users what the cookies you use are.
    • Have a privacy policy.
    • Keep your security certificates up to date.
    • Do not store any user queries for later analysis.
    • Think carefully before placing a user account system on your software - Does it really need it. If you do need implement one; for example your application has user uploaded data, has complex long running queries, or stores intermediate results, etc.? Read widely and plan defensively before you do. 
    • If you use third party analytics tools, make sure that your users know this, and if privacy is a concern to you, make sure you're also familiar with their ToU.
    • If you deploy things 'on the cloud' - read the agreement and T&Cs that you have with the company for your use of their services. Usually they do a very good job of dodging any responsibility, and sometimes grant themselves rights you would not expect. (We don't use third party cloud provision for any of our ebi.ac.uk/chembl services - but we do use the cloud for some data entry portals. For these we're not doing anything that really requires great privacy, since once we've entered the data, we give it away anyway). And once you've read the T&Cs, read them again.
    • ChEMBL is typically "tighter" than the our Institute policies, but I think it's too confusing to make this specifically clear.....
    Update - two things, 1) we do have a privacy policy specific to ChEMBL on our page http://www.ebi.ac.uk and 2) The readers of the ChEMBL-og are very smart people, really you are. My attempts at protecting the privacy of one of the fellas above was woeful - I left his name badge in plain view! Doh! Sorry.



  • Paper: Fuelling Open-Source Drug Discovery: 177 Small-Molecule Leads against Tuberculosis


    As it was announced last year, some of our collaborators in GSK Tres Cantos just published the results of a large antimycobacterial phenotypic screening campaign against Mycobacterium bovis BCG with hit confirmation in M. tuberculosis H37Rv. After the screening and in silico cascade, a set of 177 potent non-cytotoxic H37Rv hits was identified, providing a plethora of diverse potential starting points for new synthetic lead-generation activities to the global scientific community.

    The dataset is hosted in ChEMBL and can be downloaded from here with a short description here.

    %T Fueling Open-Source Drug Discovery: 177 Small-Molecule Leads against Tuberculosis
    %A L. Ballell
    %A R.H. Bates
    %A R.J. Young
    %A D. Alvarez-Gomez
    %A E. Alvarez-Ruiz
    %A V. Barroso
    %A D. Blanco
    %A B. Crespo
    %A J. Escribano
    %A R. González
    %A S. Lozano
    %A S. Huss
    %A A. Santos-Villarejo
    %A J.J. Martín-Plaza
    %A A. Mendoza
    %A M.J. Rebollo-Lopez
    %A M. Remuiñan-Blanco
    %A J.L. Lavandera
    %A E. Pérez-Herran
    %A F.J. Gamo-Benito
    %A J.F. García-Bustos
    %A D. Barros
    %A J.P. Castro
    %A N. Cammack
    %J ChemMedChem
    %O http://dx.doi.org/10.1002/cmdc.201200428

    George

  • New Drug Approvals 2012 - Pt. XXXI - Lomitapide (JuxtapidTM)




    ATC Code: C10AX12
    Wikipedia: Lomitapide

    On December 21st, the FDA approved Lomitapide (Tradename: Juxtapid; Research Codes: BMS-201038-04, BMS-201038, AEGR-733), a Microsomal triglyceride transfer protein (MTP) inhibitor, as a complement to a low-fat diet and other lipid-lowering treatments, in patients with homozygous familial hypercholesterolemia (HoFH).

    Familial hypercholesterolemia is a genetic disorder, characterised by high levels of cholesterol rich low-density lipoproteins (LDL-C) in the blood. This genetic condition is generally attributed to a faulty mutation in the LDL receptor (LDLR) gene, which mediates the endocytosis of LDL-C.

    Lomitapide, trough the inhibition of the microsomal triglyceride transfer protein in the liver, prevents the assembly of Apoliprotein B-containing lipoproteins, which is required for the formation of LDLs, thus contributing to lower the circulating LDL-C levels.

    The Microsomal triglyceride transfer protein, which resides in the lumen of the endoplasmic reticulum, is a heterodimer composed of the microsomal triglyceride transfer protein large subunit (Uniprot: P55157; ChEMBL: CHEMBL2569), and the protein disulfide isomerase. Lomitapide binds to the large subunit.

    >MTP_HUMAN Microsomal triglyceride transfer protein large subunit
    MILLAVLFLCFISSYSASVKGHTTGLSLNNDRLYKLTYSTEVLLDRGKGKLQDSVGYRIS
    SNVDVALLWRNPDGDDDQLIQITMKDVNVENVNQQRGEKSIFKGKSPSKIMGKENLEALQ
    RPTLLHLIHGKVKEFYSYQNEAVAIENIKRGLASLFQTQLSSGTTNEVDISGNCKVTYQA
    HQDKVIKIKALDSCKIARSGFTTPNQVLGVSSKATSVTTYKIEDSFVIAVLAEETHNFGL
    NFLQTIKGKIVSKQKLELKTTEAGPRLMSGKQAAAIIKAVDSKYTAIPIVGQVFQSHCKG
    CPSLSELWRSTRKYLQPDNLSKAEAVRNFLAFIQHLRTAKKEEILQILKMENKEVLPQLV
    DAVTSAQTSDSLEAILDFLDFKSDSSIILQERFLYACGFASHPNEELLRALISKFKGSIG
    SSDIRETVMIITGTLVRKLCQNEGCKLKAVVEAKKLILGGLEKAEKKEDTRMYLLALKNA
    LLPEGIPSLLKYAEAGEGPISHLATTALQRYDLPFITDEVKKTLNRIYHQNRKVHEKTVR
    TAAAAIILNNNPSYMDVKNILLSIGELPQEMNKYMLAIVQDILRFEMPASKIVRRVLKEM
    VAHNYDRFSRSGSSSAYTGYIERSPRSASTYSLDILYSGSGILRRSNLNIFQYIGKAGLH
    GSQVVIEAQGLEALIAATPDEGEENLDSYAGMSAILFDVQLRPVTFFNGYSDLMSKMLSA
    SGDPISVVKGLILLIDHSQELQLQSGLKANIEVQGGLAIDISGAMEFSLWYRESKTRVKN
    RVTVVITTDITVDSSFVKAGLETSTETEAGLEFISTVQFSQYPFLVCMQMDKDEAPFRQF
    EKKYERLSTGRGYVSQKRKESVLAGCEFPLHQENSEMCKVVFAPQPDSTSSGWF
    

    There are no known 3D structures for this protein.


    Lomitapide (IUPAC: N-(2,2,2-trifluoroethyl)-9-{4-[4-({[4'-(trifluoromethyl)biphenyl-2- yl]carbonyl}amino)piperidin-1-yl]butyl}-9H-fluorene-9-carboxamide; Canonical smiles: FC(F)(F)CNC(=O)C1(CCCCN2CCC(CC2)NC(=O)c3ccccc3c4ccc(cc4)C(F)(F)F)c5ccccc5c6ccccc16; PubChem: 9853053; Chemspider: 8028764 ; ChEMBL: CHEMBL354541; Standard InChI Key: MBBCVAKAJPKAKM-UHFFFAOYSA-N) is a synthetic compound with a molecular weight of 693.7 Da, nine hydrogen bond acceptors, two hydrogen bond donors, and has an ALogP of 7.79. The compound is therefore not compliant with the rule of five.

    Lomitapide is available in the capsular form and the recommended starting daily dose is 5mg, with the possibility to gradually increase it, based on acceptable safety and tolerability, up to a maximum of 60mg. It has an apparent volume of distribution of 985-1292 L, upon oral administration of a single 60-mg dose, and its absolute bioavailability is 7%. Lomitapide binds extensively to plasma proteins (99.8%). The mean terminal half-life (t1/2) of lomitapide is 39.7 hours, being mainly metabolised by CYP3A4. This reliance on CYP3A4 for metabolism leads to multiple opportunities for drug-drug interactions with both CYP3A4 inhibitors and inducers, therefore when combining lomitapide with other lipid-lowering therapies, i.e. statins, a dose adjustment is required.

    Lomitapide has been given a black box warning due to an increase in transaminases (alanine aminotransferase [ALT] and/or aspartate aminotransferase [AST]) levels after exposure to the drug.

    The license holder for JuxtapidTM is Aegerion Pharmaceuticals, and the full prescribing information can be found here.