Phinterest - A More Sketched Out Idea For An App To Cover Conferences
Returning for a moment to some stuff we've covered in the blog before - the capture and open sharing of timely data to help drug discovery. The basic idea is, is it possible to rapidly capture and share key disclosure data (compound structures, toxicology, efficacy, ADME, etc.) in order to incorporate accurate timely data into your own experiments. At the moment, this area is very active commercially, with large corporations providing the needed data to people who can pay, who are not necessarily the best consumers of such data. There is also some experimentation by professional bodies - C&EN live blogging a few years ago on some of the key Med Chem talks from a National ACS meeting, which hasn't been repeated, despite being well received.
To me this seems a great opportunity for citizen science - attendees at key conferences sharing results openly, in real time, for the benefit of all - introducing knowledge and data 'liquidity' to research.
Let's now suspend reality for a few seconds, and especially ignore the likely tightening of rules of reporting/sharing data if this Citizen Science impacts valuable commercial streams for middlemen/conference organizers. The copyright of the original slide producer. But, we may return to this in a future post.
So the basic idea is
- go to conference
- write down stuff
- share it with the world
There can't be anything wrong with that surely?
I tweeted a few months ago asking if there was an app that could take photos from, say a poster, then do structure conversion. There was not a lot out there at all (the one thing out there was pretty duff, but then the developers said it was pretty duff), but there was quite some interest in the idea based on replies, retweets, etc. This has led to a little spare time thinking, and the following now seems to be technically possible.
Names for stuff are important to me - so we'll call this 'Phinterest' - named as homage to the online pinboard website Pinterest. This has a really simple paradigm, upload some pictures and provide some tags. Which is where we start.
1) You go to the conference of interest, and either cruise the posters, or attend talks. Almost everybody has a smartphone now, with a camera capable of capturing good pictures. You'd then take pictures of whatever captures your interest and upload them with a single click to phinterest.org. There is often built in location tagging (so phinterest.org would capture time and location of the upload - this could support provenance of the uploaded photo, and allow auto-tagging with conference name, etc). There would be the ability to tag the photos if you wanted, but it's not really needed.
2) The pictures could then be bundled automatically into sets from a conference - a stream, and would be visible by all, as they were uploaded. The crappy out of focus ones could be down-voted, and the useful ones would be quality auto-curated in this way, so the community sorts out the interesting stuff. If you really cared about the structure of the selective MEK inhibitor RG-7421, you could read the photo, and get what you need right there and then.
3) The photos could then go through auto-OCR - this is pretty simple to do and set up - for example there's the website http://www.free-ocr.com that takes pictures and does OCR - some simple technical and semantic rules to enhance OCR for things like Vd, IC50, gene names, research codes, etc would be pretty easy, as would pairing an IC50 with a numeric value and a unit. This sort of things is now pretty standard in biomedical literature. So no big shakes. This would then add some useful tags to the photos, and a simple search functionality would allow useful searches if you knew what you were looking for. Regardless of how accurate this would be, you'd always have the original evidence photo to check with.
4) A special feature would be to perform useful OCR on molecular objects - DNA and protein sequences are pretty simple to extract and OCR, and then tag with parent genes, patents, etc. Secondly, and more technically challenging is to perform OCR on chemical structures. OSRA is great, and there are already some web services to allow upload of images and extraction of structures. On phinterest.org, these could be displayed alongside the parent photo, then confirmed/flagged by the community of experts.
Basic workflow is therefore....
Photograph - Upload - OCR - Tagging - Sharing
Even if you only got to stage 1) it would be useful to the community of drug discoverers. To me the challenges are...
- Dealing with segmenting the images, and selecting the high quality useful ones, but with enough sources of photos of the same thing, it's likely that a couple of useful ones could be found.
- It is the way of the world, that idiots would abuse the facility, load rude offensive photos, use it for inappropriate marketing.
- Of course, direct real-time upload of slides would be great - in the real world though this doesn't happen, how many times have you asked the speaker for copy of slides, been told, 'yeah, of course buddy' then got nothing. For conference organisers the slides are part of the overall commercial package/benefits in some cases. So it just ain't gonna happen - at least not without a significant nudge - and anyway, steps 2 to 4 would still be required - just you'd have a higher resolution starting point.
- Legally, it's an interesting area. How different is it from me writing something in a notebook and using it in my research, or sharing it with a colleague - just set in an Internet age? Sharing data is exactly what conferences are for. However, there are all sorts of concerns, image copyright, recording permissions, etc. As I said at the start, it's likely that conference organisers and publishers would try and strangle this idea at birth, primarily for the commercial interests of their shareholders and highly paid management.....
This would form a great project for an intern, so if you're interested in coming to the lab to work on this, let me know.