UniChem - An EBI compound structure cross-referencing resource


We have faced for some time some issues with compound integration with ChEMBL - specifically the loading of compound sets into ChEMBL for cross referencing, between for example, ChEBI, PDBe compounds, etc. The ChEMBL update cycle is relatively slow with respect to some other resources, and there is inevitable thrash with compounds not being present, especially for exciting new data. Without doing something different for compound integration, we were starting to face a scenario where we had a compound table with many millions of compounds without any bioactivity data, and following this the inevitable slowdown in searching, etc.

We also had some issues facing us about curation of other people's primary data, changing compound structures, or their rendering, etc.

So, we decided to set up an external system to resolve cross-references between various databases. This is a very simple Standard InChI lookup, containing compounds from resources such as ChEMBL, ChEBI, PDBe, DrugBank, KEGG, BindingDB, PubChem, and so forth. UniChem can also handle versioning of the contained resources. We will be migrating various components of the current ChEMBL interface across to use web services on UniChem, this way, the cross links will always be fresh and correct, and we can focus on curation and optimisation of ChEMBL content. There are some other resources, like ZINC, STITCH, and ChemSpider, for example, that would be great to integrate, if we can get hold of the required data.

The easiest way for us to handle deposition into UniChem is for us to take an ftp: feed of a simple table of resource_id, standard_InChI, and standard_InChI_key.

At the moment, UniChem sits behind our firewall, but if people want to have a play, let us know.

We will write something more specific and detailed, but would welcome thoughts of whether this resolver should be externally facing, and what other resources would be good to integrate?

The image above may or may not be the UniChem logo.