ChEMBL blog

ChEMBL Compound Clean Up
27 Feb 2013

For the last three months, I've been busy working my way through a 9000 long (sometimes headache-inducing) set of ChEMBL compound ids. These had been highlighted for curation for the reason that for each ChEMBL_id in the list, there were two or more compound keys from the same paper. This implied that either there were two indistinguishable using InChI representation compounds described in the paper or they were different compounds that had been somehow merged together in the database.

Each ChEMBL_id was individually checked against the data in the original paper to see if there were indeed two compound keys for the same structure.

The outcome of this check gave rise to one of four cases:
- The structure(s) was found to be incorrect and was redrawn.
- The structure was correct for some records but not others, so a new compound was created for those selected records.
- The structure required the definition of stereochemistry or a salt.
- The structure was left alone - either the stereochemistry could not be shown or it was indeed a currently indistinguishable compound with separate compound keys. An example of this case is where chemists have separated enantiomers, and know that a pair of compounds only differ by stereochemistry, but they don't know the absolute configuration, just that they are 'opposite'.
It was a laborious but satisfying job to complete, allowing me to make use of my pedantic and geek-like tendencies. This has shown that there are a fairly significant number of papers where the authors have given identical structures two different compound keys. In some cases these are duplicates and probably should have been merged in the original publication; it also highlights some of the problems of representation of relative stereoisomers and sometimes atropisomers. These are difficult things.

It has definitely been an interesting project to get through with over 3,800 compounds being redrawn, altered or had records moved/merged. These changes will be available with the release of ChEMBL_16, further enhancing the data you have and need!

Any questions or queries, please feel free to contact ChEMBL Help at the usual address.

Louisa
New Drug Approvals 2013 - Pt. III - Pomalidomide (PomalystTM)
24 Feb 2013

ATC Code: L04A (partial)
Wikipedia: Pomalidomide

On February 8^th, the FDA approved Pomalidomide (Tradename: Pomalyst; Research Code: CC-4047, IMiD 3), a thalidomide analogue, indicated for the treatment of multiple myeloma in patients who failed to respond to previous therapies (e.g. lenalidomide and bortezomib).

Multiple myeloma is a form of blood cancer that primarily affects older adults, and arises from the accumulation of abnormal plasma cells in the bone marrow. These abnormal plasma cells produce large amounts of unneeded antibodies, which are then deposited in various organs, causing renal failure, polyneuropathy and other myeloma-associated symptoms.

Pomalidomide, an analogue of thalidomide, is an immunomodulatory agent with antineoplastic activity. Like other thalidomide analogs, the exact mechanism of action is yet not fully understood, however in vitro assays demonstrated that pomalidomide inhibited proliferation and induced apoptosis of hematopoietic tumor cells, including lenalidomide-resistant multiple myeloma cell lines. It has also been shown that pomalidomide enhanced T cell and natural killer (NK) cell-mediated immunity and inhibited production of pro-inflammatory cytokines (e.g., TNF-α and IL-6). For more information take a look at this review.

Pomalidomide, like other thalidomide derivatives, belongs to the -domide USAN/INN stem. Members of this class are thalidomide, lenalidomide (both approved drugs and licensed by Celgene Corporation), and Mitindomide and Endomide. Pomalidomide is a result of a quest for safer analogs of thalidomide, and has a higher potency than any of its predecessors.

Pomalidomide (IUPAC Name: 4-amino-2-(2,6-dioxopiperidin-3-yl)isoindole-1,3-dione; Canonical smiles: Nc1cccc2C(=O)N(C3CCC(=O)NC3=O)C(=O)c12 ; ChEMBL: CHEMBL43452; PubChem: 134780; ChemSpider: 118785; Standard InChI Key: UVSMNLNDYGZFPF-UHFFFAOYSA-N) is a derivative of thalidomide, with a molecular weight of 273.2 Da, 5 hydrogen bond acceptors, 2 hydrogen bond donors, and has an ALogP of -0.65. The compound is therefore fully compliant with the rule of five.

Pomalidomide is available in the capsular form, and the recommended daily dose is 4 mg on days 1-21 of repeated 28-day cycles until disease progression. Following administration of single oral doses in patients with multiple mieloma, the systematic exposure was characterized by an AUC(Τ) of 400 ng.hr/ mL and maximum plasma concentration (C_max) of 75 ng/mL. At steady state, the mean apparent volume of distribution (Vd/F) was 62-138 L. Pomalidomide is weakly bound to human plasma proteins (12-44%).

Pomalidomide is primarily metabolized in the liver by CYP1A2 and CYP3A4, with additional minor contributions from CYP2C19 and CYP2D6. Pomalidomide is also a substrate for P-glycoprotein (P-gp). The elimination median plasma half-life (t_1/2) for pomalidomide is approximately 9.5 hours in healthy subjects and 7.5 in patients with multiple mieloma. Pomalidomide has a mean total body clearance (CL/ F) of 7-10 L/hr.

Pomalidomide has been issued with a black box warning due to its teratogenic profile, i.e., it can cause severe life-threatening birth defects, and also due to its higher risk for venous thromboembolism in patients exposed to the drug. Because of Pomalyst’s embryo-fetal risk, it is available only through the Pomalyst Risk Evaluation and Mitigation Strategy (REMS) Program.

The license holder for Pomalyst^TM is Celgene Corporation, and the full prescribing information can be found here.
Photos of Chemical Structures Wanted
24 Feb 2013

We are interested in benchmarking recovery of 2-D chemical structures from 'real-world' photos. So if anyone has things that they have captured at conferences from slides or posters, or fancy taking pcitures from a poster, journal, powerpoint projection, etc. that they'd be willing to donate as samples, that would be great.

Specifically, we are investigating in building a pipeline using some image processing to condition the images and then the wonderful OSRA for the image to 2-D conversion. Having a broad range of cameras, lighting conditions and so forth would be great, (especially from a range of portable devices). We'll also put together for this set, the correct structures, and so it may be useful to a broader community. This will all be available under a CC0 license if we gather sufficient material.

So post away in the comments, or mail us directly with any images you may have.

Update - thanks for the photos you've mailed in so far. Good start - we'll update you in the future as to how we've got on.
Word Clouds - ChEMBL vs PubMed and some musing
24 Feb 2013

Here's two word clouds, the upper one generated from the titles and abstracts from articles abstracted in ChEMBL, the lower one from the whole of PubMed. It's quite interesting to see how 'molecular' the words are in the ChEMBL cloud (compounds, inhibitors, binding, etc.) and are generally quantitative - but also how many words there are to do with variation (series, analogue, derivatives, selective, etc.). The PubMed cloud stresses clinical activities (patients, treatment, study, etc.) and arguably more trend/qualitative data.

These pictures made me reflect a little on what is needed to make the data we store in ChEMBL more 'translational' - we've made some steps towards this recently, with better coverage and accessibility of clinical stage and drug compounds, (ChEMBL 15 release notes for details) - but we need to go deeper in describing the effects of compounds in molecular systems in cellular and organism settings, in particular human patients - preferably individuals. This is related to some work we are doing in trying to map the assays in ChEMBL to various pathway resources - the trouble is that pharmacology seems to map poorly to the ways we represent pathways as a series of molecular interactions when you are actually interested in the emergent functional phenomena. I also now more strongly feel that we need to capture more early human PK, genetic and potentially biomarker data. We can't do this with our current funding, and we certainly can't do this alone, but we will see what we can do towards this.....
Save the date: 2nd RDKit UGM, 2-4 October
21 Feb 2013

We'll be organising the 2nd RDKit Users Group Meeting which will be held from the 2nd until the 4th of October 2013 here at the EMBL-EBI in Hinxton. In addition to two days of talks, tutorials and discussions, the last day will be dedicated to a coding/documentation sprint.

Stay tuned for more information, as well as a call for presenters, which will come over the next few weeks, but, in the meantime, please go ahead and block the dates in your busy calendars!

George
Personal Genomics, Phenotyping and Life Choices
20 Feb 2013

There are many companies that now perform various types of direct to consumer (DTC) genome/exome sequencing and genotyping. Given my professional interests this is something that regularly crops up in discussions at work, and I have had an interest in getting myself genotyped. On campus in Hinxton we had a project where staff could get genotyped for a small number of known trait linkages – but this project deliberately avoided some of the more ‘interesting’ traits and was also limited in the total number of traits profiled – I didn’t take part. So at Christmas I thought a cool gift would be to get my wife and I genotyped (she provided consent of course, so the present wasn’t exactly a surprise, in fact I don’t think she really thought of it as a ‘present’). The process was easy, and was overall very cheap, if a little bit laboured to physically ship the saliva sample to the States from the UK – Do not try and drop the samples off at a local DHL franchise desk – they will freak out at the ‘biohazard’ labels, and then be unable to process the pre-paid shipping label; go to one of the big DHL warehouses instead.

Anyway, my impression of the results of the genotyping is very positive – it identified known family health risks, and confirmed that I had known risk alleles associated with these too; generally though it delivered excellent news/lower risks on some serious diseases – and also led to post-genotyping diagnosis of a sub-clinical chronic disease for which I have a very significantly raised genetic risk (it turns out), and that I’ve actually had the pathology of for thirty-odd years, but never at the level where it impacted my life, or even warranted a visit to a doctor. Of course, these genetic components are simply one side of the coin w.r.t. personal health. Environment, life-style, diet, and so forth are important too, and finally, the genotyping results are simply indications of risks, derived from current data and known SNP associations – but for me the insights have been accurate and useful. For others though, they may get news of higher health risks, surprises over their parentage, so caveat emptor.

The process got my elder kids (currently 25, 23, 21, 18 year olds), who all followed a ‘liberal arts’ undergraduate path interested in genetics and their health too, which is a good thing, and they are now also keen to get genotyped (if I pay, so nothing changes there!). Supporting people in interpreting the results though is complex – ‘so I will get this disease then?’ - ‘not necessarily’ – ‘well what use is it all then’…. Education, at many levels/ages is obviously the answer and for a UK audience, the time is right for a ‘Brian Cox’-like figure to make a decent popular TV series on personalised health, essential statistics, genetics and ancestry – but for God’s sake, please, please, do not get Brian Cox to do this. If only there was a respected, camera friendly, trendy, good-looking, young geneticist available – answers on a postcard to ‘BBC Television Centre, 201 Wood Lane, London W12 7RJ’…. So in summary - ‘We need to make understanding of genetics encoded in our cultural and educational DNA’ (Sorry Ewan – it was too tempting to write this sentence).

Specifically, I was genotyped by 23andMe, and their website is pretty useful for visual overviews of the complex data underlying the big data of genetics – they also encourage people to take part in a series of lifestyle, preference and health surveys, and this post is primarily in connection with the results of one of these, for me. I must also note, that their website is superbly engineered too, great session and security management, great layouts, and it does a great job of presenting complex scientific data to a non-specialist but interested audience. The only areas to avoid are the forums, where like anywhere on Teh Internets, trolling and bigotry are potentially only a few keystrokes away.

Anyway, there is a multiple-choice online test on ‘Systemizing Quotient’ that I recently took online. This is one of a number of surveys 23andMe have, and for these the website gathers a set of scores from genotyped people, and then later attempt to find genetic loci that may associate with this trait (in this case a trend to ‘organise’ stuff). The type of questions in the survey are things like ‘Did you collect stamps as a child?’ (to which I immediately thought ‘as a child? - why is age important? don’t most adults collect stamps as well?’). I scored highly in this test – since I love organising, collating and sorting things, I really do. The image at the top of the post is a screenshot of my score, click for larger). I love the way they described the Systematizing Quotient – ‘assess drive to construct systems or to predict the behavior of a system’. To me this is the distilled essence of my job and you may well see this phrase in future recruitment job descriptions that I write ;)

So I seem to have ended up in a career that aligns to some existing personality traits – this makes me feel a little bit good, and is far better than the other way round. Of course, now, it is tantalising to know if there is an underlying single genetic polymorphism, or set of variants, that are actually associated with the Systematizing Quotient trait. So come on 23andMe – get on and do the research!
Sign Up Now For Our Webinars!!
18 Feb 2013

A couple of weeks ago, I created a Doodle Poll to gauge interest in hosting another series of Webinars, after the success of the ones we hosted last year.

After a good response, these Webinars have now been organised and those who are interested in signing up to them, can do so here.

Most of the webinars will only take 45mins and will give a good overview of the topic that they are talking about. You can watch and listen to them from the comfort of your own desk.

The Doodle Poll signup section is hidden so only us here at ChEMBL Towers can see your personal details. However, I must stress that it is important that you leave your name and email address on the poll when you sign up, so that I can send on the connection details for the Webinar. Without this information, you will not be able to take part as I won't know where to send the connection details to.

Any issues or queries, please do not hesitate to contact ChEMBL Help.

USAN Watch: February 2013

17 Feb 2013

The USANs for February 2013 have recently been published.

USAN	Research Code	Structure	Drug Class	Therapeutic class	Target
avarofloxacin	JNJ-Q2, JNJ-32729463-AAA		synthetic small molecule	therapeutic	topo II
dianexin	ASP-8597		protein	therapeutic
eldelumab	BMS-936557, MDX-1100		monoclonal antibody	therapeutic	CXCL10
eluxadoline	JNJ-27018966		synthetic small molecule	therapeutic	mu Opioid R, delta Opioid R
formofilcon A			polymer	contact lens polymer	n/a
galunisertib	LY-2157299		synthetic small molecule	therapeutic	TGFBetaR1 TGFBetaR2
guselkumab	CNTO-1959		monoclonal antibody	therapeutic	IL23
ledipasvir	GS-5885		synthetic small molecule	therapeutic	HCV NS5A
liafensine	BMS-820836		synthetic small molecule	therapeutic
margetuximab	MGAH22		monoclonal antibody	therapeutic	ERBB2
mavatrep	JNJ-39439335		synthetic small molecule	therapeutic
methylsamidorphan	ALKS-37, RDC-1036-00		natural product-derived small molecule	therapeutic	Opioid receptors
palbociclib	PD-0332991		synthetic small molecule	therapeutic	CDK4 CDK6
pegbovigrastim	LY-2953726		protein	therapeutic	CSFR
pevonedistat	MLN-4924		synthetic small molecule	therapeutic	NAE
quilizumab	RG-7449, MEP-1972A, Anti-M1’, Anti-M1 prime		monoclonal antibody	therapeutic	IgE M1'
sisapronil	PF-0241851		synthetic small molecule	therapeutic
technetium Tc 99m trofolastat	MIP-Tc-1404, MIP-99mTc-1404		synthetic small molecule	imaging agent	PSMA
tovetumab	MEDI-575		monoclonal antibody	therapeutic	PDGFRa
vantictumab	OMP-18R5		monoclonal antibody	therapeutic	Frizzled
vatiquinone	EPI-743		synthetic small molecule	therapeutic	NQO1
vedroprevir	GS-9451		synthetic small molecule	therapeutic	HCV NS3 PR

jpo