Friday, 1 June 2012

Common Ontology Questions #1: what is it you do again?

I've always maintained the best ideas start life as a problem framed as a question. How can we stop people from catching Polio? What is the Moon really made of? What happens if I push that red flashing button marked 'Never Push Me'? Sometimes there aren't good answers of course.

The question I'm asked most from people is what is it you do again? That includes colleagues in Bioinformatics, my Family and the Student Loan Company, and it is, of course, a good question. And I hope that my answer is demonstrable of a solution to an important problem. So here goes.
The web comes in many different flavours

The problem is really one of words and it's one that has existed for a very long time. We give names to everything, me, you, this blog and we often reuse those same words for other things. The good thing is that when I talk to someone about you I usually put it in context, or it's obvious I mean you because I'm talking to someone that knows that I know you so it's clear. And when I say you're small they also know I mean your thin and not short (because you are of course tall) because they've seen you before so I obviously mean height. And similarly, when I say I met you in The Flying Pig, they know I mean the pub down the road because that's where we both like to drink and that I don't mean some new creature that crossed a pig with wings or some such abomination.

So that's clear then. The problem is that if I sent the same information, your name and that you are small to some other people and asked them to point me to exactly which one you are, they'd probably struggle. In the wider world, names are not uniquely given to objects. There is at least one other James Malone in this world - I know because I regularly receive emails intended for him - and there are probably thousands. But I am unique. Similarly, saying I'm small because I'm thin is also fine if you know me, but a lot of people might use that to mean small in height. So your description doesn't mean what you intended.

OK, that's trivial but why am I actually employed you may ask. Well in biology, like many other sciences, we have millions of objects of interest; different animals, diseases, types of cells, you name it, and in order to make sense of the data we produce from experiments we really need to know what they're about. And a mouse is not just a mouse - though that is another blog post.

It gets worse. Humans are quite good at guessing and disambiguating because they have tacit knowledge about the world and more often than not context. Someone might guess I mean you because they know both of us, but a computer? It wouldn't have a clue no matter how many times you strike it and curse at it. Believe me.

This is where I come in. I use a method of writing this stuff down in a way that is (at least a bit) less ambiguous and that method concerns the use of an ontology. Ontologies, ironically, have been defined in a hundred different ways, but people mostly mean the same thing which is that an ontology is a way of talking about the objects we are interested in in some explicit way and in addition describing how those objects relate to one another. So to go back to the example, one such object is me and my tallness and my thinness. All of these things can be considered useful in an ontology about people generally. We might capture the thing I am (human) and the things describing me (tall) as a concept, a class or a type - which all mean roughly the same thing. A human class is everything that is a human, so I'm a type or instance of that class. Relations also exist. Me and tall for example. The relationship there might be called something like has height or more generally has physical characteristic i.e. James has physical characteristic tall. There, our first bit of ontology done.

Of course it's not that simple as we have millions of things in biology but fortunately some of these things are the same and some are closely related. Genes for example, they might all be instances of a gene class, just as humans live under a human class. What adds complexity is how we make this amenable to a computer reading it which is critical when you have terabytes of data (that's a lot I believe). Fortunately languages such as the Web Ontology Language (OWL) help us as they provide the syntax needed to specify classes, instances and relations in a way that both you and I and our stupid computers can understand.

What we really want, in a futuristic (but somewhat unlikely) scenario is all of the data available on the web to be described in this way so that computers can ask sensible questions of it and get back sensible answers because they understand what they're looking at in the same we you and I do. This is hard (and I say unlikely because doing this is never ending - of course doing some of this is achievable and useful) but it's the long term vision of the Semantic Web and ontologies, clearly, play an important role here as they tell us what the data actually means and whether that house I'm buying online is really a house for me and that I won't end up with a cage for a rabbit. This is important to me, but to the wider bioinformatics world, what's really important is that when someone says this gene is somehow linked to this sample with cancer, that we know we're talking about the same type of cancer and the same gene because this is really important. Fortunately work is well underway in this area, for example the Gene Ontology has been producing descriptions of genes and related properties for over a decade now.

Anyway, I hope that helps to explain a bit about what I do and why. In the future I'll be writing about things we do here, thoughts and ideas I have (sometimes even good ones), problems I face and probably general rants.

No comments:

Post a comment