Linked data & RDF : draft notes for comment

I started to try and write an email about Linked Data, which I was planning to send to some staff in Library I work in.

I felt that it was a topic that will greatly impact on the Library / Information management world and wanted to sow the seeds until I could do something more. However after defining linked data, I felt I should also mention RDF and the Semantic Web, and also try and define these. And then in a nutshell try and explain what these were all about and why they are good. And then add some links.

Quickly it became the ultimate email that no one would read, would confuse the hell out of most people and was just a bit too random to come out of the blue.

So I think I will turn it in to a talk instead.

This will be quite a challenge, I’ve read a bit about all this, and get the general gist of it all, though don’t think I have a firm foundation of what RDF should be used for or how exactly it should be structured, and what bits of all this fall under the term ‘Semantic Web’. There’s a big difference to hearing/reading various introductions to a subject and being able to talk about it with confidence.

Anyway, below was the draft of the email, with various links and explanations, which turns in to just a list of links and notes as the realisation pops in that this will never be coherent enough to send.

If you know about this stuff (and good chance you know more than me), please comment on anything you think I should add, change, or perhaps where I have got things wrong.

I will probably regret putting a quick draft email, mistakes and all, on the web, but nevertheless, here is the email:

You have probably heard the phase ‘linked data’, and if not, you will do in the future.

It’s used in conjunction with RDF (a data format) and the Semantic Web.

“Linked Data is a term used to describe a method of exposing, sharing, and connecting data on the Web via dereferenceable URIs.”

en.wikipedia.org/wiki/Linked_Data

It’s the idea about web pages (URLs) describing specific things. E.g. a page on our Repository describes (is about) a particular article. A page on the BBC describes the program ‘Dr Who’, another describes the Today program. A page on our new reading list system describes a list.

I know what these pages contain because I can look at them. But what if this was formalised so that systems (computers) could make use of these things, and extract information and links from within them?

Two companies prominent in this area are Talis and the BBC. The University of Southampton is also doing work and research around the semantic web (no coincidence Tim Berners-Lee worked there).

The following link is a diagram which is being used a lot in presentations and articles (and there’s a good chance you will see it crop up in the future):

http://www4.wiwiss.fuberlin.de/bizer/pub/loddatasets_2009-03-05_colored.png

Why should we care about all this?

Look at the image: Pubmed, IEEE, RAE, eprints, BBC and more. These are services we – and out users – use. This isn’t some distant technology thing, it’s happening in our domain.

Why should we care? (part 2)

Because when I searched for RDF I came across a result on the Department for Innovation, Universities and Skills (the one that funds us)…  It was a page called ‘library2.0’ ( that sounds relevant)…  The first link was to *Sussex’s* Resource list system… we’re already part of this.

http://sn.im/glr45

(for those confused, The DIUS site simply shows things bookmarked at Delicious matching certain tags, someone from Eduserv while at UKSG had bookmarked our Resource List system as it was being used as an example of RDF/linked data, as UKSG was quite recent this link is at the top)

As I mention our Resource lists…

For each of our lists on Talis Aspire, there is the human readable version (html) and computer readable versions, eg

In HTML:

http://liblists.sussex.ac.uk/lists/p3030.html

In RDF/XML:

http://liblists.sussex.ac.uk/lists/p3030.rdf

In JSON (don’t worry what these mean, they are just computer readable versions of the same info):

http://liblists.sussex.ac.uk/lists/p3030.json

And in the same way for each BBC program/episode/series/person there is a webpage, and also a computer readable – rdf – version of the page page , e.g. The Today program:

http://www.bbc.co.uk/programmes/b006qj9z

http://www.bbc.co.uk/programmes/b006qj9z.rdf

There is tons of stuff I could point to about the BBC effort, here are some good introductions

http://www.bbc.co.uk/blogs/radiolabs/2009/04/brands_series_categories_and_t.shtml

http://blogs.talis.com/nodalities/2009/01/building-coherence-at-bbccouk.php

I saw Tom Scott give this presentation at the  ‘confused’ Open Knowledge Conference:

http://derivadow.com/2009/03/31/linking-bbccouk-to-the-linked-data-cloud/

http://www.bbc.co.uk/blogs/radiolabs/2008/07/music_beta_and_linked_data.shtml

A Paper on BBC/DBpedia with good introduction on background to /programmes

http://www.georgikobilarov.com/publications/2009/eswc2009-bbcdbpedia.pdf

RDF

===

RDF is just a way of asserting facts (knowledge)

“RDF is designed to represent knowledge in a distributed world.”

(from http://rdfabout.com/quickintro.xpd)

“RDF is very simple. It is no more than a way to express and process a series of simple assertions. For example: This article is authored by Uche Ogbuji. This is called a statement in RDF and has three structural parts: a subject (“this article”), a predicate (“is authored by”), and an object (“Uche Ogbuji”). “

(from http://www.ibm.com/developerworks/library/w-rdf/)

Example:

Imagine a book (referred by its ISBN).

Our catalogue asserts who wrote, and also asserts what copies we have

Amazon assert what price they offer for that book.

OCLC assert what other ISBNs are related to that item

Librarything assert what tags have been given for that item.

All these assertions are distributed across the web

But what if one system could use them all to display relevant information on one page, and created appropriate links to other pages.

#######################

Things to include…

http://en.wikipedia.org/wiki/Ontology_(computer_science)

http://www.slideshare.net/iandavis/30-minute-guide-to-rdf-and-linked-data

http://www.slideshare.net/ostephens/the-semantic-web-1336258

http://vocab.org/aiiso/schema

(created for talis aspire – i think)

Tower and the cloud (not really linked data):

http://www.worldcat.org/oclc/265381796

Semantic web for the working ontologist (we have a copy):

http://beta.lib.sussex.ac.uk/ABL/?itemid=|library/marc/talis|991152

Freebase http://www.vimeo.com/1513562

Fantastic YouTube video from Davos by Tom Ilube:

http://www.youtube.com/watch?v=k_zoEeWOBuo

Any pointers to good descriptions/explanations of RDF? (I think this is the most difficult area). Clearly this is mainly a set of links, and not a talk, but I will probably use this as a basis of what I will try and say.
All comment welcome.