For the last three months, I've been busy working my way through a 9000 long (sometimes headache-inducing) set of ChEMBL compound ids. These had been highlighted for curation for the reason that for each ChEMBL_id in the list, there were two or more compound keys
from the same paper. This implied that either there were two indistinguishable using InChI representation compounds described in the paper or they were different compounds that had been
somehow merged together in the database.
Each ChEMBL_id was individually checked against the data in the original paper to see if there were indeed two compound keys for the
same structure.
The outcome of this check gave rise to one of four cases:
The structure(s) was found to be incorrect and was redrawn.
The structure was correct for some records but not others, so a new compound was created for those selected records.
The structure required the definition of stereochemistry or a salt.
The structure was left alone - either the stereochemistry could not be shown or it was indeed a currently indistinguishable compound with separate compound keys. An example of this case is where chemists have separated enantiomers, and know that a pair of compounds only differ by stereochemistry, but they don't know the absolute configuration, just that they are 'opposite'.
It was a laborious but satisfying job to complete, allowing me
to make use of my pedantic and geek-like tendencies. This has shown that there are a fairly significant number of papers where the authors
have given identical structures two different compound keys. In some cases these are duplicates and probably should have been merged in the original publication; it also highlights some of the problems of representation of relative stereoisomers and sometimes atropisomers. These are difficult things.
It has definitely been an interesting project to get through with over
3,800 compounds being redrawn, altered or had records moved/merged. These changes will
be available with the release of ChEMBL_16, further enhancing the data you have and need!
On February 8th, the FDA approved Pomalidomide (Tradename: Pomalyst; Research Code: CC-4047, IMiD 3), a thalidomide analogue, indicated for the treatment of multiple myeloma in patients who failed to respond to previous therapies (e.g. lenalidomide and bortezomib).
Multiple myeloma is a form of blood cancer that primarily affects older adults, and arises from the accumulation of abnormal plasma cells in the bone marrow. These abnormal plasma cells produce large amounts of unneeded antibodies, which are then deposited in various organs, causing renal failure, polyneuropathy and other myeloma-associated symptoms.
Pomalidomide, an analogue of thalidomide, is an immunomodulatory agent with antineoplastic activity. Like other thalidomide analogs, the exact mechanism of action is yet not fully understood, however in vitro assays demonstrated that pomalidomide inhibited proliferation and induced apoptosis of hematopoietic tumor cells, including lenalidomide-resistant multiple myeloma cell lines. It has also been shown that pomalidomide enhanced T cell and natural killer (NK) cell-mediated immunity and inhibited production of pro-inflammatory cytokines (e.g., TNF-α and IL-6). For more information take a look at this review.
Pomalidomide, like other thalidomide derivatives, belongs to the -domide USAN/INN stem. Members of this class are thalidomide, lenalidomide (both approved drugs and licensed by Celgene Corporation), and Mitindomide and Endomide. Pomalidomide is a result of a quest for safer analogs of thalidomide, and has a higher potency than any of its predecessors.
Pomalidomide (IUPAC Name: 4-amino-2-(2,6-dioxopiperidin-3-yl)isoindole-1,3-dione; Canonical smiles: Nc1cccc2C(=O)N(C3CCC(=O)NC3=O)C(=O)c12 ; ChEMBL: CHEMBL43452; PubChem: 134780; ChemSpider: 118785; Standard InChI Key: UVSMNLNDYGZFPF-UHFFFAOYSA-N) is a derivative of thalidomide, with a molecular weight of 273.2 Da, 5 hydrogen bond acceptors, 2 hydrogen bond donors, and has an ALogP of -0.65. The compound is therefore fully compliant with the rule of five.
Pomalidomide is available in the capsular form, and the recommended daily dose is 4 mg on days 1-21 of repeated 28-day cycles until disease progression. Following administration of single oral doses in patients with multiple mieloma, the systematic exposure was characterized by an AUC(Τ) of 400 ng.hr/ mL and maximum plasma concentration (Cmax) of 75 ng/mL. At steady state, the mean apparent volume of distribution (Vd/F) was 62-138 L. Pomalidomide is weakly bound to human plasma proteins (12-44%).
Pomalidomide is primarily metabolized in the liver by CYP1A2 and CYP3A4, with additional minor contributions from CYP2C19 and CYP2D6. Pomalidomide is also a substrate for P-glycoprotein (P-gp). The elimination median plasma half-life (t1/2) for pomalidomide is approximately 9.5 hours in healthy subjects and 7.5 in patients with multiple mieloma. Pomalidomide has a mean total body clearance (CL/ F) of 7-10 L/hr.
Pomalidomide has been issued with a black box warning due to its teratogenic profile, i.e., it can cause severe life-threatening birth defects, and also due to its higher risk for venous thromboembolism in patients exposed to the drug. Because of Pomalyst’s embryo-fetal risk, it is available only through the Pomalyst Risk Evaluation and Mitigation Strategy (REMS) Program.
The license holder for PomalystTM is Celgene Corporation, and the full prescribing information can be found here.
We are interested in benchmarking recovery of 2-D chemical structures from 'real-world' photos. So if anyone has things that they have captured at conferences from slides or posters, or fancy taking pcitures from a poster, journal, powerpoint projection, etc. that they'd be willing to donate as samples, that would be great.
Specifically, we are investigating in building a pipeline using some image processing to condition the images and then the wonderful OSRA for the image to 2-D conversion. Having a broad range of cameras, lighting conditions and so forth would be great, (especially from a range of portable devices). We'll also put together for this set, the correct structures, and so it may be useful to a broader community. This will all be available under a CC0 license if we gather sufficient material.
So post away in the comments, or mail us directly with any images you may have.
Update - thanks for the photos you've mailed in so far. Good start - we'll update you in the future as to how we've got on.
Here's two word clouds, the upper one generated from the titles and abstracts from articles abstracted in ChEMBL, the lower one from the whole of PubMed. It's quite interesting to see how 'molecular' the words are in the ChEMBL cloud (compounds, inhibitors, binding, etc.) and are generally quantitative - but also how many words there are to do with variation (series, analogue, derivatives, selective, etc.). The PubMed cloud stresses clinical activities (patients, treatment, study, etc.) and arguably more trend/qualitative data.
These pictures made me reflect a little on what is needed to make the data we store in ChEMBL more 'translational' - we've made some steps towards this recently, with better coverage and accessibility of clinical stage and drug compounds, (ChEMBL 15 release notes for details) - but we need to go deeper in describing the effects of compounds in molecular systems in cellular and organism settings, in particular human patients - preferably individuals. This is related to some work we are doing in trying to map the assays in ChEMBL to various pathway resources - the trouble is that pharmacology seems to map poorly to the ways we represent pathways as a series of molecular interactions when you are actually interested in the emergent functional phenomena. I also now more strongly feel that we need to capture more early human PK, genetic and potentially biomarker data. We can't do this with our current funding, and we certainly can't do this alone, but we will see what we can do towards this.....
We'll be organising the 2nd RDKit Users Group Meeting which will be held from the 2nd until the 4th of October 2013 here at the EMBL-EBI in Hinxton. In addition to two days of talks, tutorials and discussions, the last day will be dedicated to a coding/documentation sprint.
Stay tuned for more information, as well as a call for presenters, which will come over the next few weeks, but, in the meantime, please go ahead and block the dates in your busy calendars!
There are many companies that now perform various types of direct
to consumer (DTC) genome/exome sequencing and genotyping. Given my professional
interests this is something that regularly crops up in discussions at work, and
I have had an interest in getting myself genotyped. On campus in Hinxton we had
a project where staff could get genotyped for a small number of known trait
linkages – but this project deliberately avoided some of the more ‘interesting’
traits and was also limited in the total number of traits profiled – I didn’t
take part. So at Christmas I thought a cool gift would be to get my wife and I
genotyped (she provided consent of course, so the present wasn’t exactly a
surprise, in fact I don’t think she really thought of it as a ‘present’). The
process was easy, and was overall very cheap, if a little bit laboured to physically
ship the saliva sample to the States from the UK – Do not try and drop the
samples off at a local DHL franchise desk – they will freak out at the ‘biohazard’
labels, and then be unable to process the pre-paid shipping label; go to one of
the big DHL warehouses instead.
Anyway, my impression of the results of the genotyping is
very positive – it identified known family health risks, and confirmed that I
had known risk alleles associated with these too; generally though it delivered
excellent news/lower risks on some serious diseases – and also led to post-genotyping
diagnosis of a sub-clinical chronic disease for which I have a very
significantly raised genetic risk (it turns out), and that I’ve actually had
the pathology of for thirty-odd years, but never at the level where it impacted
my life, or even warranted a visit to a doctor. Of course, these genetic
components are simply one side of the coin w.r.t. personal health. Environment, life-style, diet, and so forth
are important too, and finally, the genotyping results are simply indications
of risks, derived from current data and known SNP associations – but for me the
insights have been accurate and useful. For others though, they may get news of
higher health risks, surprises over their parentage, so caveat emptor.
The process got my elder kids (currently 25, 23, 21, 18 year
olds), who all followed a ‘liberal arts’ undergraduate path interested in
genetics and their health too, which is a good thing, and they are now also
keen to get genotyped (if I pay, so nothing changes there!). Supporting people
in interpreting the results though is complex – ‘so I will get this disease
then?’ - ‘not necessarily’ – ‘well what use is it all then’…. Education, at
many levels/ages is obviously the answer and for a UK audience, the time is
right for a ‘Brian Cox’-like figure to make a decent popular TV series on
personalised health, essential statistics, genetics and ancestry – but for God’s
sake, please, please, do not get Brian Cox to do this. If only there was a
respected, camera friendly, trendy, good-looking, young geneticist available –
answers on a postcard to ‘BBC Television Centre, 201 Wood Lane, London W12
7RJ’…. So in summary - ‘We need to make understanding of genetics encoded in our
cultural and educational DNA’ (Sorry Ewan – it was too tempting to write this
sentence).
Specifically, I was genotyped by 23andMe, and their website
is pretty useful for visual overviews of the complex data underlying the big
data of genetics – they also encourage people to take part in a series of
lifestyle, preference and health surveys, and this post is primarily in
connection with the results of one of these, for me. I must also note, that
their website is superbly engineered too, great session and security
management, great layouts, and it does a great job of presenting complex
scientific data to a non-specialist but interested audience. The only areas to
avoid are the forums, where like anywhere on Teh Internets, trolling and
bigotry are potentially only a few keystrokes away.
Anyway, there is a multiple-choice online test on
‘Systemizing Quotient’ that I recently took online. This is one of a number of
surveys 23andMe have, and for these the website gathers a set of scores from genotyped
people, and then later attempt to find genetic loci that may associate with
this trait (in this case a trend to ‘organise’ stuff). The type of questions in
the survey are things like ‘Did you collect stamps as a child?’ (to which I immediately
thought ‘as a child? - why is age important? don’t most adults collect stamps
as well?’). I scored highly in this test – since I love organising, collating
and sorting things, I really do. The image at the top of the post is a
screenshot of my score, click for larger). I love the way they described the
Systematizing Quotient – ‘assess drive to construct systems or to predict the behavior
of a system’. To me this is the distilled essence of my job and you may well
see this phrase in future recruitment job descriptions that I write ;)
So I seem to have ended up in a career that aligns to some existing
personality traits – this makes me feel a little bit good, and is far better
than the other way round. Of course, now, it is tantalising to know if there is
an underlying single genetic polymorphism, or set of variants, that are actually
associated with the Systematizing Quotient trait. So come on 23andMe – get on
and do the research!
A couple of weeks ago, I created a Doodle Poll to gauge interest in hosting another series of Webinars, after the success of the ones we hosted last year.
After a good response, these Webinars have now been organised and those who are interested in signing up to them, can do so here.
Most of the webinars will only take 45mins and will give a good overview of the topic that they are talking about. You can watch and listen to them from the comfort of your own desk.
The Doodle Poll signup section is hidden so only us here at ChEMBL Towers can see your personal details. However, I must stress that it is important that you leave your name and email address on the poll when you sign up, so that I can send on the connection details for the Webinar. Without this information, you will not be able to take part as I won't know where to send the connection details to.
Any issues or queries, please do not hesitate to contact ChEMBL Help.