An Introduction to SPARQL and Queries over Linked Data

Nowadays, more and more datasets are published on the Web adhering to the Linked Data principles. The availability of this data, including the existence of data-level connections between datasets, presents exciting opportunities for the next generation of Web-based applications. As a consequence, consuming Linked Data is a highly relevant topic in the context of Web engineering. Our introductory tutorial aims to provide participants with an understanding of one of the basic aspects of Linked Data consumption, that is, querying Linked Data.

The tutorial consists of three main parts: First, we briefly introduce the concept of Linked Data and its underlying data model, the resource description framework (RDF). The second and largest part provides a comprehensive introduction to SPARQL, the de facto query language for RDF. Participants will learn how to express basic queries with SPARQL and how to use the more complex features of the language. Finally, in the third part of the tutorial, we discuss several approaches for executing SPARQL queries over multiple, interlinked datasets. We understand the tutorial as a beginners' introduction. The pre-requisites for participation in this tutorial include a broad technical understanding of querying databases, and a basic conceptual understanding of the architecture of the World Wide Web.

Slides

Hands-on Exercises

We use a SPARQL endpoint that answers SPARQL queries over data about the World Wide Web conference 2012. The data was copied from the Semantic Web Dog Food dataset. You may want to take a look at an RDF dump of the data (in Turtle format, ca. 2.5 MB).

SPARQL editor: http://linkeddata.informatik.hu-berlin.de/sparqleditor/sparqleditor.html

  1. Ask for the title (property: 'dc:title') of each workshop (class: 'swc:WorkshopEvent').
  2. Ask for an ordered list of the subjects (property: 'dc:subject') of all workshops; each subject must not appear more than once.
  3. Ask for the title of workshops that have "Linked Data" as their subject.
  4. Does the result for the previous query contain the USEWOD workshop?
    • If not, investigate why (using SPARQL queries similar to the previous ones) and adjust the query so that the USEWOD workshop becomes part of the result.
    • If yes, adjust the query so that the result does not include the USEWOD workshop.
  5. Each workshop took place in a particular location (property: 'swc:hasLocation'). Ask for the label (property: 'rdfs:label') of these locations; pair them with the title of the corresponding workshop.
  6. Ask for the URI of any event that took place in the same location as the USEWOD workshop (URI: 'http://data.semanticweb.org/workshop/usewod/2012'); the USEWOD workshop must not be listed.
  7. Ask for events that took place before the USEWOD workshop, in the same location as the USEWOD workshop. (Hint: events have properties 'ical:dtstart' and 'ical:dtend')
  8. While all events that took place in the same location as the USEWOD workshop have a label (property: 'rdfs:label'), only some of them have a title (property: 'dc:title'). List these labels and if the corresponding event also has a title, then show this title in addition to the label. (Hint: use OPTIONAL)
  9. Adjust the previous query such that it only lists the labels of those events that do not have a title (and that took place in the same location as the USEWOD workshop). (Hint: use the negation by failure pattern)
  10. Adjust the previous query such that it lists the labels of those events that do not have a title and it lists the title of those events that have a title (and a label); in both cases use the same variable for the output, that is, the query result should consist of a single column only.