• Books and Papers - 5 - Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison

    You know what it's like - you have a deadline, stuff that's really important to do, you get your work area ready, and then you browse your bookcase for something interesting. Four hours later, it's time to do something else. Well that was yesterday, and I spent those hours re-reading this old classic. Fairly recently released in a reprinted, and cheaper form. In my opinion, this is one of the best books in sequence comparison, it is full of interesting ideas, has good coverage of related fields of computer science, and a coverage of the algorithms that are deep enough to allow you to go away and start messing around with code.

    %D 2000
    %E David Sankoff & Joseph Kruskal
    %T Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison
    %I Cambridge University Press
    %O ISBN 978-1575862170
    

  • EMBL-EBI Industry Programme

    I thought I would post a note on the EMBL-EBI's Industry Programme. This is a forum open to all life-science companies to network and understand the tools, resources and direction of the EMBL-EBIs activities, and also encourage discussion of pre-competitive and collaborative activities between life-science companies. Given the increasing demands for cost-reduction, data integration and knowledge representation within the sector this activity is becoming increasingly important. Details of how to join the can be found on the EMBL-EBI web-site (link above). An analogous group exists for smaller companies (SME/SMBs).

  • ChEMBL Specialist Blogs

    We have set-up some specialist sub-blogs to the ChEMBL blog - ChEMBL-grants, and ChEMBL-curators. This is for open exchange of information, and will cover research proposals, beta-testing of functionality, will cover some of the minutiae of the curation pipeline, discuss methods and curation strategies, etc. etc.

  • PhD Student Projects within ChEMBL

    Here are details of some of the PhD studentship project ideas for the ChEMBL group. If you are interested in studying in any of these areas at the EMBL-EBI outstation in Hinxton, please contact us. The underlying theme of the projects is to provide infrastructure components built upon the ChEMBL databases for applied and translational drug discovery. They all offer the opportunity to study and develop cutting-edge Open Source drug discovery technologies in a highly collaborative, diverse and international setting.

    1. Non-human Secreted Proteins as New Therapies.

    Surprisingly, several classes of very important human biological therapeutics are derived from non-human sources; for example, hirudin from medicinal leaches. This project will apply data-mining approaches to characterise the features for immunotolerance on non-host proteins, and then develop a series of approaches to identify potential candidate therapeutic proteins from a wide variety of organisms (e.g. Mammals, Ticks, Flukes, Nematodes, etc.). The project will integrate sequence and structure-based data mining methods along with KDD/data-mining methods to predict function, pharmacokinetic and affinity properties.

    2. Monoclonal Antibody Drug ‘Rescue’

    Monoclonal antibody drugs (mAbs) are viewed as highly specific therapies, with low clinical failure rates. However, attrition for mAb therapies is often expensive and occurs at a late and expensive stage in their clinical progression, with success often crucially dependent on choice of disease model and trial design. As part of our CandiStore project, we have accumulated a unique set of clinical stage mAbs, and their targets ligands. This project will apply modelling, docking, text-mining and KDD approaches to understand and predict new clinical applications of previously failed mAbs. The project will also attempt to discover general features for success/failure of this important class of therapy.

    3. Automated Drug Design (‘Robot-Chemist’)

    The discovery and optimisation of novel, well tolerated drugs is becoming an increasingly important commercial challenge. We have assembled a large training set of SAR data in our StARlite database, and have produced proof-of-principle applications in areas such as bioisostere discovery, affinity optimisation, etc. This project will build a catalog of empirically observed and synthetically tractable transformations from a given chemical starting point, and attempt to objectively score their likely effect on bioactivity, in particular drug like properties such as metabolism, frank toxicity, absorption, etc. This project will provide an excellent introduction to the principles of medicinal chemistry from a compound design perspective, and also a firm grounding in a broad variety of KDD approaches.

    4. Drug Design Strategies for Robustness to Acquired Resistance.

    Acquired resistance, the selection of mutant forms of a target under the selective pressure of a cidal drug, is of increasing importance in both anti-microbial and anti-cancer targeted therapies. In the vast majority of cases, this resistance can be understood at a structural level, once the 3-D structure of the drug-target complex is established. This project will provide an integrative approach for the prediction of alternate functional forms of a target, under simulated evolutionary pressure of drug binding. Techniques such as comparative modelling, sequence analysis, Monte-Carlo simulation and QSAR approaches will be applied, and the methods then used at genome scale to identify targets and compound design strategies likely to be robust to acquired resistance. Depending on progress made during the major part of the project, the methods could be applied to understand differential drug response caused by genetic diversity/cSNPs in the human genome for currently approved therapies.

  • CandiStore - An Overview

    A few people have mailed me about getting hold of information on 'failed' drugs, or drugs in clinical development. This is one of those things that sounds really simple, should be readily available but isn't. However, such a list would be really useful in many applications in life science informatics, for example in understanding or anticipating the chemotypes of drug failure, to provide data for second or third generation agents with differentiated profiles (MeToo's are not all bad you know!), to perform 'drug rescue' studies (i.e. in this context to find a new clinical indication for a drug outside of it's originally anticipated use). There are some interesting commercial options for this type research, even for 'off-patent' drugs that can lead to both commercial gain, and significant patient benefit.

    For the case of a drug that has been withdrawn post approval, it is quite trivial to get hold of a list of these, it is however, more difficult to get hold of reasons or data for the withdrawl. There are arguably two fundamental reasons for withdrawl, Safety, or Commercial, and both of these words have potent meaning in the Pharmaceutical Business. Do not even think of getting a clear view on the reasons for failure for compounds in development.

    Some time ago we started to analyse affinity data of drugs against targets other than their conventional target, and this led to the collection on an ad hoc basis of clinical development candidates against particular targets or target classes. Doing this on a couple of different systems led to a search strategy that makes, in our view, the collection of a complete list of clinical stage development candidates for deposition in the public domain a possibility. This database we call CandiStore, it attempts to capture....

  • Chemical structure or sequence (it also covers peptides, monoclonal antibodies, aptamers, etc.)
  • Trade names
  • Formal name
  • Research codes
  • Molecular target(s)
  • Mode of action (inihbitor, activator, etc.)
  • Highest development stage (phase 1 thru 4, with 4 being a launched drug)
  • Lead development company
  • CAS number
  • A few other things primarily involved with database consistency and construction

    CandiStore is a work in progress, we have about 11,000 structures in total at the moment, but addition of additional data, and digging into earlier development compounds is patchy at the moment. We currently estimate that CandiStore will contain ca. 45,000 compounds when complete. It is an active part of our curation efforts within ChEMBL, but we would be delighted to have collaborators and contributors to this effort, maybe as part of an 'open source' style project. That way, more people could benefit from the data quicker, but any data you submit would need to be yours to submit! As in all our work at the EMBL-EBI this will be freely available to one and all.

    CANDiStore is pretty cool too!

    There are, of course, several commercial and proprietary databases that address different aspects of this need. Some of them are truly excellent.

  • InChIs/SMILES for StARlite database

    The InChIs and SMILES for almost all of the current StARlite (release 31) set of compounds is now available. Please do not upload the structures into public domain search engines or registries, since it is bound to lead to confusion later on, when full data ChEMBL data is registered. Thanks to the white hatted Jeremy Besnard for the file conversion, while I get my PipeLine Pilot properly set up.....

    Please, please, please, if you click the mail link above, do not edit the subject string of the message ;)

  • Recruitment Closing Date Approaching

    Posting from my sick bed, a reminder that the closing deadline for applications for most of the positions currently available is Sunday the 8th of February.

  • Fantasy Pharma - The Game

    The Drug Discovery industry is in a 'bit of a state', as they say. A common complaint is that the fault is that of current management, and if that only the right decisions were taken, everything would be fine. So why don't we run an experiment - but in a game format. The basic idea will be to become a CEO of a virtual pharma, you can select some drugs, and track them through to the market. You will start in a pretty good position with two launched revenue generating products, let's see how you get on. The winner will be the person at the end of the game with the largest pile of cash. This cash is virtual, of course, it is just a game.

    Here are some preliminary rules.


  • Any person, or group, can register as a Player.
  • Entry shall be free, and completely void of obligations from either party. It is a game.
  • A Player can select two current Launched Drugs, three Phase 3 Development Candidates, six Phase 2 Development Candidates, and ten Phase 1 Development Candidates, as their Pipeline. The highest current development stage of a compound will be used in all cases. Compounds currently in Registration will be considered to be a Launched Drug.

  • Revenues from Global Sales of the drugs will be accrued annually. The starting Cash Balance for each player is zero $. Relevant currency conversions to $US will be performed at year end, using current finance.yahoo.com conversion rates. Accrued Revenues will be gross, and as reported in relevant company financial reports.

  • Any Legal Damages incurred by a Launched Drug during the game period will be deducted from the players Cash Balance.

  • Players must maintain a zero or positive Cash Balance at all times. If they fail to do so, they will leave the game.

  • The game will run for ten years.

  • At the end of years three and six, players can buy new candidates from their accrued Cash Balance, at a cost of $50M for a phase 1 Development Candidate, $250M for a phase 2 Development Candidate, and $800M for a phase 3 Development Candidate. A Player can spend up to 20% of their Cash Balance on such purchases. The total number of Development Candidates at any particular phase cannot be larger than the allocations at the start of the game (see above). You cannot buy a Launched Drug, beyond your initial selection of two currently Launched Drugs.

  • As Development Candidates progress from phase 1 to phase 2, etc. a Player can refill their pipeline with the purchase of a new Development Candidate to repopulate the preceeding phase up to the total development stage allocations (at the prices listed above).

  • You will be charged $1000M p.a. for R&D costs, this will be deducted from you Cash Balance.

  • Players can remain anonymous if they wish. But it would be great to have some well known industry personalities. Anonymous players cannot claim the prize (details of the prize are below). Please note, some form of identity validation will be performed in cases where a clearly identifiable individual is named.

  • Closing date for entries will be April 1st 2009. If there are less than ten entries, the game will not proceed.

  • This rule list is preliminary, and subject to change!

  • I will do a little bit of web searching over the next few days to see if any other related games are already underway, and get back with a follow-on post with an update. As an update, I could not find a similar game discussed, or underway.

    Given the length of the game (ten years!) it is not one for extreme excitement and interaction - you have been warned!

    Finally, if there are more than 20 entries I will set up some visually satisfying content on the game web site (http://www.fantasypharmagame.com, at the moment the domain is simply held). I will also put in place some alternate pipeline selection strategies, based on random selection of drugs/development candidates, 'wisdom of the commons' etc., these will join the game as virtual players.

    There may be something interesting in comparing these alternate strategies. A further set of interesting selections would be to run the current top 10 pharma as Players, with their own current pipelines, or best subsets thereof.

    I have looked into identifying a suitable prize, and the prize will be a case (for clarity, six 750ml bottles) of a quality 2009 champagne and an engraved cup - which by the time the competition ends could be an excellent vintage, depending on the summer we have in France this year - another gamble, which seems appropriate given the game itself. For those of you that do not drink alcohol, at least you get the cup.