There were a lot of
people.
It was my first time at this particular meeting and it
struck me that there were a lot of people in attendance, somewhere around 100. I was surprised, though perhaps I should not be as the community has been
steadily growing over the last few years. Still, this suggests this is no longer just a niche activity and has many more paid-up members than ever before. OK, I accept the fact it was in Paris might have swayed a few hearts and minds, but still.
There was a lot of
interesting work…
I was pleasantly surprised at how good a lot of the work
presented was and it spoke to much of what we have been doing over the past six months. To pick out one, I was impressed with a talk given
by Matthew Hindle on data federation on Ecotoxicology using SADI in which he
outlined the way they utilise the services to start to answer what I would call proper
biological questions. It was a nice example of where they had been able to pick
up some of the existing approaches and resources and apply them to their data
and problems successfully and speaks to the notion that the field has
matured some over the last few years. Of course I've read this claim numerous times in the past but largely
from an anecdotal point of view; this was at least evidential that, to some
extent, things exist that can be used to solve problems in the life sciences. There was inevitably some work to do of course, they did not find everything as an out of the box
solution, but the components were there at least. I also liked the SPARQL R package that has been recently published and
that we've been using in an MSc project with one of our students. It's been
very useful and we've written a package for analysing our Gene Expression RDF data in more intuitive ways, allowing simple R-like function calls to be made with the RDF, behind which the SPARQL lies, hidden. I think this sort of tool is important because it exposes the technology to an important, existing bioinformatic community in ways that aid and not hinder their work. We’ll be releasing this tool early next year. UniProt also presented their work on using rules within their RDF to detect inconsistencies within the annotations. I like this a lot and it's something we have also started exploring with our own RDF, such as looking for disease annotations that were made to ontology classes which were not subclasses of disease - we found a couple. This demonstrates nicely the advantage of having your data in a format that is native to the ontology. This enables one to ask meaningful ontology type questions (subclasses of x, classes part_of y, etc.) rather than having to formulate some hack between a database query and an OWL one and then do a textual comparison.
…but nothing that got
me really, really excited
I didn't at any point have that epiphany that made me think "we're there". I've read much hyperbole of this sort far too often in publications (numbers
of triples does not alone equate to good science), and at the moment it still remains just
that. I do think it's maturing and I think this field
is now becoming Important to those working in the life sciences. Certainly here, at EBI, we've been working a lot on
RDF representations of data over the last six months and I see this from others
too. But it doesn't underpin the basic science we do. It’s not that important
to our databases, our curation, our applications and most importantly our users.
It may become so - I think it probably will - but it's not there yet so, for now, read those aforementioned publications with skepticism.
There is much more to
be done; opportunity, hard work and money
I see a lot of interesting opportunities in this area but,
for my mind, there needs to be more engineering methods applied if we want to
see less bespoke solutions occurring that live and die within the length it
takes a paper to be written, accepted and published (in particular SPARQL end
points). I've said this many times about engineering ontologies and I think it
equally applies here. We need better update models – updating RDF once it's in
a triple store seems to be too onerous a task at present. And documentation on
how to do this stuff is really lacking. The HCLS W3C group that we have been
working with have been having a go at writing a note on representing gene expression
data in RDF but it's slow work and I still don’t know if we’re on the right lines. Which
also suggests we need better ways of evaluating this stuff. What does it all
mean once it's out there, in RDF? Can I use it, integrate with it and how? Most
importantly, is it correct? It's not just about sticking a URI on everything and making it RDF –
that's too simple if we want to use this in more computational approaches to
analysing and solving real biological problems. One of the big problems I've found in this area is convincing funding agencies that these approaches can result in discoveries in biology when the evidence for this is conceptually strong but practically weak. As is often the case, we have a scenario in which waiting for discoveries before giving more funding will mean the discoveries never come, because the work is never funded. If I had a single want it would be some focused calls for this stuff across the likes of the BBSRC and EU Framework 7 with some specific biological objectives. There is hope on this front - Douglas Kell gave a plenary talk at last year's SWAT4LS so there is some recognition of the importance of this area.
My comrade in arms Simon Jupp gave a talk recently on the
RDF work we have been doing (mainly Simon) which I added a subtitle to: ‘after the
hype, the biology’. The hype may be fading, the biology may be surfacing but there is still much to do.
*Congratulation to Tomasz Adamusiak on winning an iPad in the Nanopublication competition. He celebrated Tomasz style; an evening at the Moulin Rouge.*
*Congratulation to Tomasz Adamusiak on winning an iPad in the Nanopublication competition. He celebrated Tomasz style; an evening at the Moulin Rouge.*