A blog about Web 3.0, Ontologies, RDF, OWL, Prolog and the Semantic Web

Have a nice spring!

this is an experimental content

Latest

A Prolog view of the Semantic Web (101)

A partial hierarchy of Prolog programming tech...

One of my favourite blogs about the Semantic Web is dbtune“. It is the blog of Yves Raimond, a very talented Semantic Web / Prolog programmer (and PhD student) who has created (among other goodies) an Ontology of music and musicians, as his PhD Thesis. His personal site is “http://moustaki.org“, which made me wonder if he is related to the… Greek musician Georges Moustaki – who lives in France (probably not).

Well, since I consider myself a Prolog veteran but (more-or-less) a novice in the Semantic Web, I couldn’t help admiring the simplicity of the following piece of code in dbtune‘s post Henry A small N3 parser/reasoner for SWI-Prolog, back in 2007:

rdf(C,'http://example.org/uncle',U):-
    rdf(C,'http://example.org/parent',F),
    rdf(F,'http://example.org/brother',U).

In more typical Prolog fashion, this is the same as:

uncle(C,U):- parent(C,F), brother(F,U).

In other words (i.e. in natural language):

Someone(C) has an uncle(U) if:
    He(C) has a parent(F),
    who(F) is a brother of this uncle(U).

Now,  suppose that instead of a binary relation (uncle/2) we wanted to check out the unary relation, of whether or not someone (U) “is an uncle”. In Prolog, this might be:

is_uncle(U):- parent_of(C,F), brother_of(F,U).

In other words (i.e. in natural language):

someone(U) is an "uncle", if:
    there exists someone(C) whose parent(F)
    ...is a brother of this guy(U).

Unfortunately, the above modified code, will probably… run forever! To undestand why this is so, you don’t have to be a Prolog programmer, or even a programmer in general:

  • Bear in mind that Prolog programs are executed sequentially, just like every other piece of code, despite their “logical” semantics. So, any program that searches first an entire database to retrieve all possible parents of everyone (in the database) before actually using the specific fact you’ve already supplied to it (the person U) is likely to take much longer to respond, than another program making immediate use of the specific fact (U) supplied to it, before combining this fact with more general information (like checking out the parents of a smaller number of individuals, the brothers of U).

So, the necessary optimisation here is to reverse the order of the two calls (parent_of’ and ‘has_brother’) in order to check out first if U has a brother (F), and then check if this brother (F) is also a parent (of anyone else, C):

is_uncle(U):-
   brother_of(F,U),
   parent_of(_,F).

In other words, someone (U) is an “uncle”, if one has a brother(F), who(F) also happens to be a parent of someone else (_).

Which brings me… to the realisation that this may not be so obvious, to a lot of people who aren’t acquainted with Prolog and simple optimisations like this. E.g. suppose you are a programmer and you write a program to access e.g. dbpedia (the download-able Semantic version of Wikipedia); a program much more complicated than the previous 3-line code, operating on a database of millions (or even billions) of “triples” (relations of the form object-predicate-subject).

  • In this case, chances are high that you’ll make mistakes, like the one mentioned. As a result, you shouldn’t feel surprised if your program runs (almost) forever, while… “impatient customers” will start blaming the… Semantic Web’s “innate inefficiency”! :)

1) SWI-Prolog (open source Prolog compiler for WinXP/Vista/Linux):

2) All the SWI-Prolog code in dbtune’s blog:

3) The SmartWeb Integrated Ontology (only 1 Mb)

4) The YAGO ontology:

5)  Freebase downloads:

6) dbpedia (the Semantic version of Wikipedia):

Recommended (English) DbPedia Core Datasets for easier access:

Titles nt csv
Short Abstracts nt csv
Extended Abstracts nt csv
Images nt csv
Links to Wikipedia Article nt csv
Articles Categories nt csv
External Links nt csv
Ontology Infoboxes nt
Ontology Types nt
DBpedia Ontology owl
Wikipedia Infoboxes nt csv
Properties nt csv
Homepages nt csv
Geographic Coordinates nt csv
Pagelinks nt csv
Persondata nt csv
Redirects nt csv
Disambiguation Links nt csv
WordNet Classes nt csv
Categories (Labels) nt csv
Categories (Skos) nt csv

Extended dbpedia Datasets:

Dataset en
Links to Geonames nt -
Links to RDF Bookmashup nt -
Links to DBLP nt -
Links to Eurostat nt -
Links to CIA Factbook nt -
Links to Project Gutenberg nt -
Links to Musicbrainz nt -
Links to Revyu nt -
Links to US Census nt -
Links to flickr wrappr nt -
Links to WikiCompany nt -
Links to Cyc nt -
Links to Freebase nt -
YAGO Classes nt -
YAGO Links nt -

7) Other Downloads, suggested in my bookmarks’ collection:

Reblog this post [with Zemanta]
Follow

Get every new post delivered to your Inbox.