ChEMBL blog

Registration for diXa course "Microarray Analysis using R and Bioconductor" is now open
13 Mar 2013

We are partners in the diXa FP7 infrastructure grant for chemical safety 'omics data, and as part of this, there is a course aimed at people who could benefit from an introduction to microarray data analysis. This will take place at the EMBL-EBI from 14 -16 May 2013. No prior R or Bioconductor experience is required. Registration closes on 21^st April 2013.

This course is aimed at researchers and scientists (PhD students, post-doc, staff scientist) who will benefit from an introduction to microarray data analysis and training in how to perform simple analyses using R/Bioconductor. All sessions are a combination of lectures and hands-on. Prerequisites are a life science degree or equivalent experience, basic understanding of microarray techniques, and a basic understanding of biostatistics. No prior knowledge of R or Bioconductor is assumed.

Participants will receive a basic understanding of the R syntax and ability to manipulate R objects. After this course students should feel comfortable with the R/Bioconductor environment and be in a position to continue their own explorations of the functionality of R and start using R for their basic biostatistics needs. You will understand why Quality Control of microarray is necessary, run a QC workflow and be able to correctly interpret the results. A range of data exploration methods will be reviewed (PCA, Hierarchical clustering, KNN and Kmean, Scatter plots).

For more information and a full programme:

http://www.dixa-fp7.eu/dixa-training/dixa-training-agenda/dixa-microarray-training

The "Data infrastructure for chemical safety" (diXa) project aims to support the EU Toxicogenomics Research Community in developing non-animal assays in vitro/in silico for chemical safety, which better predict human toxicity in vivo. The diXa project will design a robust and openly accessible data infrastructure for capturing toxicogenomics data produced by past, current and future EU research projects.

As part of the project we will organise a range of training courses over the next 2 years; this is the first diXa training course open to the general scientific community. diXa training courses will focus on hands-on training using the consortiums unique combination of knowledge and expertise.

jpo

USAN Watch - March 2013

13 Mar 2013

The USANs for March 2013 have recently been published.

USAN	Research Code	InChIKey (Parent)	Drug Class	Therapeutic class	Target
avarofloxacin hydrochloride	JNJ-32729463	UAJMNUKJPPPJJT-CLNHMMGSSA-N	synthetic small molecule	therapeutic	topo II
lubabegron, lubabegron fumarate	LY-488756	WIKLBKJBYSBYLL-QHCPKHFHSA-N	synthetic small molecule	therapeutic	beta 3 receptor
methylsamidorphan chloride	RDC-1036-03		natural product-derived small molecule	therapeutic	opioid receptors
palbociclib isethionate	PD-332991, PF-00080665	AHJRHEGDXFFMBM-UHFFFAOYSA-N	synthetic small molecule	therapeutic	CDK4 CDK6
pevonedistat hydrochloride	MLN-4924	UPOLJMQMGITQAT-MNDZRTMASA-N	synthetic small molecule	therapeutic	NAE
rivipansel sodium	GMI-1070, PF-06460031		natural product-derived small molecule	therapeutic	selectins
sarecycline, sarecycline hydrochloride	P-005672	APPRLAGZQKOUFL-UHFFFAOYSA-N	natural product derived small molecule	therapeutic	30S ribosome
tenofovir alafenamide fumarate	GS-7340	LDEKQSIMHVQZJK-CAQYMETFSA-N	synthetic small molecule	therapeutic prodrug	HIV RT
velcalcetide	KAI-4169	C.dCdAdRdRdRdAdR	peptide	therapeutic	CaSr

USAN stem searching within ChEMBL
13 Mar 2013

Here's a little tip that may be of some use to you. The compound name search feature will also match substrings, so it is easy to type in a USAN/INN stem, and then retrieve all matching compounds - for example, "gliptin" will retrieve all DPP-IV inhibitors with non-proprietary (formal) names. This is pretty useful - of course the substring search functionality is not restricted to USANs/INNs.

A list of the current USAN stems is here. Note, they were never designed to be orthogonal, so the low complexity ones will give a lot of false positives.....
Group Leader/Postdoc positions in selective kinase inhibitor design
05 Mar 2013
One of our Industry Programme members passed on a listing for some positions they are looking to fill at the new center in Heidelberg - BioMed X. It looks like it may be of interest to many of the ChEMBL-og readers, so I thought I would post it here....

They are looking for a Computational Chemistry/Drug Design group leader for a project "Development of a design software of SELECTIVE kinase inhibitors". The BioMed X Innovation Center in Heidelberg, Germany, constitutes a new class of incubator at the interface between academia and industry where top life science talents from all over the world are jointly working on biomedical innovation outside the pharma box. Young talents from leading academic institutions world-wide are selected in annual assessment centers based on their scientific expertise, creative energy, and passion for product-oriented pre-clinical research & development. Interdisciplinary project teams are collaborating in an open-innovation lab facility in Heidelberg with guidance of experienced mentors from academia and industry while expanding their scientific network and receiving an intensive entrepreneurship and leadership training.

Application details are to be found in the link above.

jpo
To Remove Or Not To Remove - That Is The Question
04 Mar 2013

During the course of standard compound curation, I come across problem inorganic compounds. An example of these are Cisplatin and Transplatin. These compounds only differ in the orientation of their complex bonds but complex bonds cannot be drawn in a standard molfile without causing InChI issues. At the moment, they are kept separate by showing standard bonds between the Pt, Cl and NH3 in Cisplatin, but we have removed the bonds altogether for Transplatin. This is not an ideal situation, nor an accurate structural representation.

Another example is the compound, below left, and how it should look as a complex, right, from the paper:

At the moment, there are approximately 1,800 cases like this, which only accounts for 0.15% of the entire ChEMBL compound set.

What we are proposing to do is to remove the structures for these complex compounds and to keep only their names and all of the associated biological data. This would then treat them in a similar way to the antibodies and large peptides that we store in ChEMBL.

So, we have set up an online private Doodle Poll for you, our users, to have your vote on whether we should remove the structures and keep the biological data, or leave them as they are.

All comments are welcome.

Louisa
Images of Chemical Structures from a Large Scientific Meeting
03 Mar 2013

As you may have picked up from a couple of recent posts, one of my current "skunk works" projects is trying to use Social Media to open up chemical data - given my interest in naming stuff, I've called this chinterest - a sort of Pinterest for Chemistry (please don't register the Internet domain name, we may want to use it in the future). So I've set up an album on Picasa that will be open with all the images of interesting chemical structures I come across at the forthcoming ACS National Meeting in New Orleans - I'm personally interested in drug discovery, so they will tend to be med. chem. type things that are covered, but it will just be me, so I won't get everything, but it will be a real world test of the approach of grabbing images and putting stuff out there in sort of real time. The ACS is not a renown meeting for first time disclosures, so there won't be a lot of "extra fresh" stuff I'm sure.

I have an iPhone 5, so the camera is pretty good on this, but I am old, so shake a little. We will see what the images are like. Closer to the meeting, I will post links to the image stream.

I will try and encourage some of the "ChEMBL Elves"^* to process the images into real 2D structures, InChIs, etc. and we'll look at the error rate.

I will need to check the terms and conditions of the meeting to make sure that I'm allowed to take photos - but I could draw stuff on a notebook and photo that, and attendance at the meeting isn't covered by an NDA - so I'm pretty sure it will be fine - the action will also not be problematic wrt copyright. But if I'm expelled from the meeting I'll make sure my one call is to you all on twitter!

I looked at a number of picture upload sites, and settled on Picasa due to the ability to open everything up, flexible annotation, and reasonable privacy terms. Pinterest - on who the chinterest name is a homage is quite shocking with it's terms and conditions.

jpo

^* "ChEMBL Elves" are small beings that seem to live on our campus - you leave some unfinished scientific work on your desk when you leave, and in the morning it's magically done. One problem with them though is that they are unfortunately useless at financial reporting, writing activity reports, etc.
Recovering useful data from images of graphs
03 Mar 2013

Lots of data is only ever reported as graphs, and for human readers, this is a pretty sensible thing to do - we're pretty good at pattern matching, and making sense of what we see. But to reuse data, to fit it to a different interpretative model, or compare it, and even to compare two plots, we need the underlying data. We're particularly interested in PK data at the moment, and one of these cases where data is only usual available as an image is for drug plasma concentration time-course data.

So, it turns out it's pretty quick to build a data table from time-course data like this, and then play around with fitting new functions to the data, etc. Above is the curve for a typical drug - the time taken (for me, you may be quicker or slower than this) from opening the publication to a data set is about 7 minutes. So this is about six curves an hour (with some YouTube kitten video time thrown in to relax the eyes) and so about 50 a working day - so in a month it would be possible for one person to capture the published time-course data for all approved oral drugs. That's pretty cool, don't you think? Which means, four people would be able to do this in a week, and twenty people do this task in a day.

Here's the digitised data from the graph above. First column is time in hours, and second is plasma concentration in ng.ml^-1. The data relates to a 100 mg single dose of DEIYFTQMQPDXOT-UHFFFAOYSA-N.

0.0 0.0
0.36 56.7
0.60 327.0
1.1 445.0
1.6 413.0
2.0 355.0
3.0 250.0
4.0 178.0
6.0 80.6
8.0 50.8
10.0 35.9
12.0 22.8
18.0 12.3
24.0 9.3

Of course, there is a big problem with single vs multiple dosing, dose accumulation, population variance, confidence intervals. There's also some errors in the digitizing process, but there are ways to estimate what these are, and probably they are smaller than the variance in the underlying data in this case.

Would anyone want to help me with this task - we can pool the datasets and no doubt find one or two useful things for publications (e.g. what is the distribution of C_max for once a day dosed drugs (in uM)). All the data would need to end up in the public domain of course......

Yay crowdsourcing!

jpo

Update - with the data above, it's now possible to feed it to sites like the excellent http://sbpkpd.org and explore fit against a number of canonical models.
New Malaria-Data release
01 Mar 2013
We are very pleased to announce that a new release of the malaria-data resource (MMV_2) is now freely available here.

The release was prepared on 1st March 2013 and contains:
- 362,845 compound records
- 280,985 compounds
- 3,288,801 activities
- 190,243 assays
- 5,431 targets
- 24,200 documents
The database contains several new datasets, including OSDD, Harvard and WHO-TDR Malaria screening data. Furthermore, the new interface has adopted the new look and feel features recently introduced in the main ChEMBL interface, such as the redesigned search hits tables and document report card.

As usual, the interface provides compound, assay and target keyword search capabilities, as well as structure-based and sequence-based search functionality for compounds and protein targets respectively. Finally, structure look-ups are offered out-of-the-box via the UniChem cross-references.

Please see MMV_2_release_notes.txt for full details of all changes in this release.

This is probably the most comprehensive public malaria data resource available but we aspire to broaden the coverage even more. If you or your academic group would like to deposit malaria screening data, please get in touch!

We greatly acknowledge the support and collaboration with the MMV.

George and Shaun