RDF elementary guide part 3: SPARQL querying knowledge graphs and datasets with semantic meaning

Resource Description Framework (RDF) is an open standard by W3C for describing concepts and resources digitally with semantic meaning. The SPARQL standard is syntax and protocol for making querying and manipulating datasets in RDF format. The standard is comprehensive and includes everything from searching, updating, exporting and maintaining RDF datasets and can be compared with the SQL standard for relational databases. With the difference that SPARQL is applied to datasets with triplets pattern – subject, predicate, object and has a semantic meaning. The guide is based on the previously presented knowledge graph with artists and paintings.

Select

The keyword SELECT is used to search for data in triple format and the result is presented in tabular form with rows and columns. SELECT connects results from the graph to variables defined in the query. The WHERE clause must follow the triplet pattern of subjects, predicates and objects to specify which resources to search for.

PREFIX cb: <https://www.clearbyte.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?artist ?painting
WHERE {
  ?artist rdf:type     cb:Artist.
  ?artist cb:creatorOf ?painting.
}

The example defines two variables that is used to bind data from the dataset that match the triplet pattern specified in the WHERE clause. Which looking for subjects that is of the type Artist. The second pattern is looking for subjects that uses the predicate creatorOf. The triplet pattern is ended with a full stop.

PREFIX cb: <https://www.clearbyte.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?artist ?painting
WHERE {
  ?artist rdf:type cb:Artist;
  cb:creatorOf ?painting.
}

What distinguishes this queries is that it use a semicolon to indicate that the pattern continues on the next line. The result is the same and is represented in tabular format. Apache Fuseki export tables in JSON, XML, CSV, TSV formats. Tips on how to install Fuseki can be found at the end of the article.

?artist?painting
<https://www.clearbyte.org/Picasso><https://www.clearbyte.org/guernica>
<https://www.clearbyte.org/Picasso><https://www.clearbyte.org/maJolie>
<https://www.clearbyte.org/Picasso><https://www.clearbyte.org/crossedArms>
<https://www.clearbyte.org/VanGogh><https://www.clearbyte.org/starryNight>
<https://www.clearbyte.org/VanGogh><https://www.clearbyte.org/sunflowers>
<https://www.clearbyte.org/VanGogh><https://www.clearbyte.org/potatoEaters>
<https://www.clearbyte.org/VanGogh><https://www.clearbyte.org/sundayEindhoven>
<https://www.clearbyte.org/VanGogh><https://www.clearbyte.org/minersInTheSnow>

The next example counts the number of paintings each painter has made in the dataset by aggregating the result with GROUP BY. To make the result easier to read, first and last name are copied to a variable with BIND, CONCAT, and the instances from COUNT are formatted from integers to a string.

PREFIX cb: <https://www.clearbyte.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name (STR(COUNT(?painting)) AS ?pating_count)
WHERE {
  ?artist rdf:type     cb:Artist.
  ?artist foaf:firstName ?fn.
  ?artist foaf:surname ?sn.  
  ?artist cb:creatorOf ?painting.
  BIND(CONCAT(?fn, " ",  ?sn) AS ?name)
}
GROUP BY ?artist ?name
?name?pating_count
Pablo Picasso3
Vincent van Gogh5

By using GROUP BY and COUNT, it is easy to find out which predicates and how often they occur in datasets.

SELECT ?predicate (STR(COUNT(?predicate)) AS ?pTotal)
WHERE {
  ?subject ?predicate ?object.
}
GROUP BY ?predicate
?predicate?pTotal
<https://www.clearbyte.org/paintingTechnique>8
<https://www.clearbyte.org/city>1
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>19
<http://www.w3.org/2000/01/rdf-schema#range>6
<http://xmlns.com/foaf/0.1/surname>3
<https://www.clearbyte.org/street>1
<https://www.clearbyte.org/country>1
<https://www.clearbyte.org/createdBy>5
<https://www.clearbyte.org/homeAddress>1
<http://www.w3.org/2000/01/rdf-schema#label>17
<http://xmlns.com/foaf/0.1/firstName>3
<https://www.clearbyte.org/createBy>3
<http://www.w3.org/2000/01/rdf-schema#domain>3
<http://www.w3.org/2000/01/rdf-schema#comment>6
<https://www.clearbyte.org/creatorOf>8
<http://www.w3.org/2000/01/rdf-schema#seeAlso>12
<http://www.w3.org/2000/01/rdf-schema#subPropertyOf>3

About Clear Byte

Clear Byte is a non-profit organisation committed to an inclusive and collaborative digital development since 2011.

Interoperability for data, information and knowledge based on open technology and standards is one of our focus areas to achieve innovative solutions to the multifaceted global challenges declared in global goals.

Optional

Previous query about the number of paintings each painter has created specifies the patterns ?artist cb:creatorOf ?painting. This means that painters who do not have any paintings in the dataset will not be included in the result. The keyword OPTIONAL enables triplets that do not meet a specific pattern to be included. By using OPTIONAL painters who do not have any paintings in the dataset are included as well.

PREFIX cb: <https://www.clearbyte.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name (STR(COUNT(?painting)) AS ?pating_count)
WHERE {
  ?artist rdf:type     cb:Artist.
  ?artist foaf:firstName ?fn.
  ?artist foaf:surname ?sn.  
  BIND(CONCAT(?fn, " ",  ?sn) AS ?name)
  OPTIONAL {?artist cb:creatorOf ?painting.}
}
GROUP BY ?artist ?name
?name?pating_count
Carl Larsson0
Vincent van Gogh5
Pablo Picasso3

Union

UNION is useful for migrating and integrating data from different datasets. This enables interconnection of different datasets with similar or closely related semantic description of resources. The elementary guide explains requests for datasets that are available on one and the same SPARQL server. Integration and interconnection of datasets from different external data sources and servers – so-called federated queries are not described in this guide.

In the case of artists and paintings, there was an earlier version of the model that used the standard Friend-of-a-friend, with the prefix foaf to describe artist. The early model had two paintings that were not migrated into the newer version. Query shows that the paintings weeping Woman by Picassso and lumberSale by vanGogh are not included in the newer model.

PREFIX cb: <https://www.clearbyte.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?artist ?painting
WHERE {
  ?artist rdf:type     foaf:Person.
  ?artist cb:creatorOf ?painting.
}
?artist?painting
<http://examples.org/Picasso><http://examples.org/guernica>
<http://examples.org/Picasso><http://examples.org/weepingWoman>
<http://examples.org/VanGogh><http://examples.org/lumberSale>

The keyword UNION combines different triplet patterns and combines painters defined by using different object types, Artist and Person. For the sake of example, both the older and newer definition of artists are in the same dataset and the resources are distinguished by using URI fragment (#).

PREFIX cb: <https://www.clearbyte.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?artist ?painting
WHERE {
  {
      ?artist rdf:type     cb:Artist.
      ?artist cb:creatorOf ?painting.
  }
  UNION
  {
      ?artist rdf:type     foaf:Person.
      ?artist cb:creatorOf ?painting.
  }
}
?artist?painting
<https://www.clearbyte.org/Artist#Picasso><https://www.clearbyte.org/guernica>
<https://www.clearbyte.org/Artist#Picasso><https://www.clearbyte.org/maJolie>
<https://www.clearbyte.org/Artist#Picasso><https://www.clearbyte.org/crossedArms>
<https://www.clearbyte.org/Artist#VanGogh><https://www.clearbyte.org/starryNight>
<https://www.clearbyte.org/Artist#VanGogh><https://www.clearbyte.org/sunflowers>
<https://www.clearbyte.org/Artist#VanGogh><https://www.clearbyte.org/potatoEaters>
<https://www.clearbyte.org/Artist#VanGogh><https://www.clearbyte.org/sundayEindhoven>
<https://www.clearbyte.org/Artist#VanGogh><https://www.clearbyte.org/minersInTheSnow>
<https://www.clearbyte.org/Picasso><https://www.clearbyte.org/guernica>
<https://www.clearbyte.org/Picasso><https://www.clearbyte.org/weepingWoman>
<https://www.clearbyte.org/VanGogh><https://www.clearbyte.org/lumberSale>

Construct

CONSTRUCT creates triplets / graphs as a result. This is useful for searching, copying, transforming and exporting all or parts of a dataset with retained semantic meaning. The CONSTRUCT clause must contain at least one whole triplet pattern and end with a full stop within curly brackets.

PREFIX cb: <https://www.clearbyte.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
CONSTRUCT {?s ?p ?o.}
  WHERE {
    ?s a cb:Painting;
      ?p ?o.
      ?o a cb:Artist.
}

CONSTRUCT queries result in a graph including associated prefixes and IRI paths to vocabularies and taxonomies used in the dataset.

@prefix schema: <https://schema.org/> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix wd:    <http://www.wikidata.org/entity/> .
@prefix foaf:  <http://xmlns.com/foaf/0.1/> .
@prefix cb:    <https://www.clearbyte.org/> .

cb:Guernica  cb:createBy  cb:Picasso .
cb:PotatoEaters  cb:createdBy  cb:VanGogh .
cb:StarryNight  cb:createdBy  cb:VanGogh .
cb:SundayEindhoven  cb:createdBy  cb:VanGogh .
cb:MaJolie  cb:createBy  cb:Picasso .
cb:Sunflowers  cb:createdBy  cb:VanGogh .
cb:MinersInTheSnow  cb:createdBy  cb:VanGogh .
cb:CrossedArms  cb:createBy  cb:Picasso .

The following query list all triplets in a dataset.

CONSTRUCT {?subject ?predicate ?object.}
WHERE {
  ?subject ?predicate ?object.
}

The SPARQL standard is comprehensive and contains much more than can be described in an introduction. Querying data with semantic meaning concludes the elementary guides to get started using the RDF framework. Feel free to write in the comments field if you have suggestions for in-depth articles that would be of interest to you.

Apache Fuseki SPARQL server – installation tips

  1. Install Apache Tomcat Web Server (Ubuntu)
  2. Install Apache Jena Fuseki as a web application

The content is created by Clear Byte and is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Leave a Reply