Wednesday, 6 June 2012

The Apprentice: A Lesson in Ontology

I finally got to see the final of the Apprentice today and was interested to hear Lord Sugar describe one of the proposed business plans as requiring 'a trillion hours of software development'. The idea came from Nick Holzherr (I won't say if he won or not for anyone who cares about such a thing) but it was a fairly simple one; when anyone visits an online recipe site and finds a recipe they wish to cook, they click a button and his software enables a trolley of the ingredients to be readily loaded into a supermarket (any supermarket) of their choice.

The Apprentice illustration from around 1882 by S. Barth
The original Sorcerer's Apprentice
creating his first App. iPhones were much
different in the nineteenth century
So this is not revolutionary and I think some of his claims are a little overplayed, not least because all of the main supermarkets already allow you to do this with their own recipe sites - the bridge between them all is what Nick is proposing. But what peaked my interest was how one would go about writing software to do this computationally because the problems are not dissimilar from those we face in bioinformatics; lots of data (though food is more limited in scope) and a desire to consume and integrate it in a meaningful way. And since this is a blog about ontologies and the semantic web and not reality TV I should probably get to the point.

My immediate thought was that this is an ideal showcase for semantic web technology (if it was ubiquitous as per the original vision). In such a scenario this is incredibly easy to do (well easier anyway). In such a world, all of the data on the web is semantically described and ontologies and food products are just another part of this web of data. I can go to any online supermarket web service and ask for their ontology of all of their products and get it and it tells me exactly what they have. If they are using similar ontologies, Nick's application is trivial - I have access to the exact ingredients I need for every supermarket, and if we assume recipe sites do the same then the connection is made between them all. Very simple.

Of course that reality does not exist, and you could speculate whether or not it ever well. Instead the problems you have to overcome are those everyone working in data integration faces - text mining to find relevant data, NLP to try and identify concepts and meaningfully map them between sources and probably some machine learning to work out rules of interest (when people say x they really mean y so map to y).

The advantages for external applications such as Nick's are clear, but for a supermarket the buy-in is perhaps more difficult. So why would they bother? Here are a few reasons:
  1. Better searching. Some of the supermarket searching is sophisticated and some is less so (I shan't name names) but at the very least I should be able to search for egg and get eggs, and a search for jam should probably return me conserves as well. Synonyms and a good inheritance hierarchy would help with this. 
  2. Managing classification - The Vegetarian. This is a really good example of when asserting classification gives you poor results. I searched for 'vegetarian' in one of the most widely used supermarket internet sites and I got back 70 products. 70!? They only sell 70 products that are suitable for vegetarians?? Of course the answer is no - a search for vegetable brings back 625 alone, so this tells you the search is very simplistic - it's bringing back a small subset tagged as vegetarian. If we define vegetarian in an ontology as something that does not contain an ingredient derived from an animal then we are getting somewhere. You should get all of the results automatically.
  3. Allergy checking. Filtering out products that contain certain ingredients (nuts, spices, wheat) in a simple and consistent way would be very useful for allergy sufferers and this is more than just saying 'contains nuts' in a text description in the ingredients blurb. Certain food ingredients are themselves derived from foods that someone could be allergic to, for example some curries contain curry powder which in turn contains wheat to prevent clumping. Transitive relations in the ontology would enable this.
  4. Intelligent substitution. At the moment there seems to be a simple system whereby if something is out of stock it gives me stuff based on the same word (a different make of bread for instance). But could axioms (rules) coded in ontologies offer more? If there is no plain flour then self-raising flour would be of no use for a specific recipe, but in contrast if it requires bacon, gammon  or ham might suffice since they are from the same part of the animal. Disjoints and explicit axioms between concepts would help with this.
  5. Consistency checking. As per the previous example, an animal based product can't be a vegetarian product - they should be disjoint in ontology parlance.
  6. Linking to your data. This becomes much easier and apps like Nick's could be readily deployed.
Most of this is not about making money directly, but about making your results more meaningful and correct and therefore your customer experience better and that is also our aim in serving bioinformatics data (where our products are genes and proteins etc.) with ontologies. In the field I work in, we want to ensure when someone searches for disease samples they don't get healthy samples and that when they ask for cancer, they get leukaemia as well, and so on. The problems are the same, only the words change. And we don't have a trillion hours but that's lucky because this becomes just a couple when we use these technologies properly.

No comments:

Post a Comment