RDF elementary guide part 3: SPARQL querying knowledge graphs and datasets with semantic meaning

Resource Description Framework (RDF) is an open standard by W3C for describing concepts and resources digitally with semantic meaning. The SPARQL standard is syntax and protocol for making querying and manipulating datasets in RDF format. The standard is comprehensive and includes everything from searching, updating, exporting and maintaining RDF datasets and can be compared with the SQL standard for relational databases. With the difference that SPARQL is applied to datasets with triplets pattern – subject, predicate, object and has a semantic meaning. The guide is based on the previously presented knowledge graph with artists and paintings.

Continue reading “RDF elementary guide part 3: SPARQL querying knowledge graphs and datasets with semantic meaning”

RDF elementary guide part 2: Creating Ontologies and Knowledge Graphs with RDF-Schema

The Resource Description Framework (RDF) is an open standard by W3C to describe concepts and resources digitally with semantic meaning. Data described in RDF format can be exchanged and reused with retained conceptual understanding of concepts between businesses, industries and countries. This is the second article in a series that introduce the basics of describing digital resources with semantic meaning. The model in the article interlink resource descriptions from Wikidata (Wikipedia) to relate to equivalent concepts to create context. The previous article describe how classes and properties are defined with RDF-Schema (RDFS).

Prefix, relative and absolute paths

The turtle format uses prefix to abbreviate IRI addresses which make the syntax easier to read and are defined in the beginning of the file. Prefixes represents IRI paths to terminologies and classification systems (taxonomies) used in the model. The prefix in the model refers to external resources.

@base <http://www.clearbyte.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix wd: <http://www.wikidata.org/entity/> .

To create relative paths to resources declared in the model, the notation @base is used. The declaration below shows the relative path to the Guernica resource.

<Guernica> a <Painting> ;
    rdfs:label "Guernica" .

Resources declared within <> are assigned the address declared with @base. This makes it easy to change address if need be. The path to the resource above becomes http://www.clearbyte.org/Guernica. If @base and @prefix are left blank, the resources in the model will get the same address as the server where the RDF resources are published. It is possible to omit the prefix name which makes the syntax shorter but more difficult to read.

@prefix : <http://clearbyte.org/> .

:Guernica a :Painting ;
    rdfs:label "Guernica" .

Semantic model

The model below describes concepts concerning artists and paintings. All resources are described through statements which consist of subject, predicate and object (explained in the previous article). Subjects (Artist and Painting) are assigned attributes by declaring predicate and object in pairs. Artist has attributes for label (rdfs: label) and description (rdfs: comment) with @en notation to indicate that the language of the description is in English.

# Class definition

<Artist> a rdfs:Class ;
    rdfs:label "Artist" ;
    rdfs:comment "Creator of art artifacts"@en ;
    rdfs:seeAlso wd:Q483501 ;
    rdfs:seeAlso foaf:Person .

<Painting> a rdfs:Class ;
    rdfs:label "Painting";
    rdfs:comment "Art artifact on canvas"@en ;
    rdfs:seeAlso wd:Q3305213 ;
    rdfs:seeAlso schema:Painting .

Objects may consist of text strings, numeric values, or other resources with their own attributes declared in another statement. When the object uses a subject from another statement, resources are linked together and form the basis of a graph. For example, the object wd: Q483501 is a statement defined by Wikidata. The model uses the predicate rdfs: seeAlso to link and associate Artist in the model with Wikidata description of the concept. The association with the Wikidata definition of Artist (Q483501) and FOAF (friend-of-a-friend) description of Person shows how resources are interlinked with external once to create context and conceptual understanding.

To create instances of classes, a unique IRI is needed for the resource. It is not possible to declare other subjects as <VanGogh> in the same model because the path will not be unique.

<VanGogh> a <Artist> ;
    rdfs:label "VanGogh" ;
    foaf:firstName "Vincent" ;
    foaf:surname "van Gogh";
    rdfs:seeAlso wd:Q5582 ;
    <creatorOf> <starryNight>, <sunflowers>, <potatoEaters>, <sundayEindhoven>, <minersInTheSnow> .

<StarryNight> a <Painting> ;            
    rdfs:label "StarryNight" ;
    <paintingTechnique> <oil> .

If there are already classes and properties defined in standards or commonly used taxonomies, it is advantageous to use these, for example by using Person from FOAF. This will interlink models and simplifies queries on data that use common descriptions of resources. In some cases, it may be justified to create your own definitions of resources if the naming convention is differs from existing models. As in the case of Wikidata, which uses its own set of codes to identify resources, Q5582 for VanGogh and Q3305213 for Painting. Therefore, the predicate rdfs: seeAlso is used to link equivalent resources when the same naming convention is not used.

<Painting#Guernica> a <Painting> ;
    rdfs:label "Guernica";
    rdfs:seeAlso wd:Q175036 ;
    <createBy> <Picasso> ;
    <paintingTechnique> <oil> .

It is possible to use IRI / URI fragments in the path to subordinate resources in relation to each other. By using # fragments, one resource can be subordinated to another. The resource above gets the path http://www.clearbyte.org/Painting#Guernica, which indicates that Guernica is subordinate to the painting resource. Note that fragments are part of the IRI / URI protocol standard on which the RDF framework is based.

The model declares new properties <creatorOf> and <createdBy>. To declare properties, the RDFS predicate rdfs: Property is used. The definition of properties adds semantic significance to how subjects and objects are interlinked. The observant noted earlier that it is possible to list several objects to one predicate by declaring that VanGogh is the <creatorOf> several paintings.

<creatorOf> a rdfs:Property ;
    rdfs:domain <Artist> ;
    rdfs:range <Painting> .

<createdBy> a rdfs:Property ;
    rdfs:domain <Painting> ;
    rdfs:range <Artist> .

The image below shows the link between subject and object, where the properties-predicate are depicted as arrows. The direction of the arrows is determined by domain and range. The createdBy property belongs to the Painting (StarryNight) class and is applied to instances of the Artist (VanGogh) class. Depending on the visualization tool of RDF graphs, the user can choose classes and properties to display.

Knowledge graph and semantic model of artists, paintings and painting techniques

The graph in the image is a simple knowledge graph that uses properties to describe the semantic relationship between classes. The RDF framework offers more features and possibilities for more detailed definitions of knowledge graphs with semantic meaning, which will be explained in later articles.

RDF elementary guide part 1: Class and property definition in RDF-Schema

The Resource Description Framework (RDF) is an open standard from W3C to describing digital resources with semantic meaning. Data described in RDF format can be exchanged and reused with retained conceptual understanding of resources between businesses, industries and countries. This is the first article in a series of guides to get started describing data with semantic meaning. The reader is recommended to read W3C guide RDF 1.1 Primer to take advantage of the article series, and gives a good introduction of what the RDF framework consists of. The first article describes how resources are defined in RDF-Schema. Next article presents a semantic model and explains how resources from Wikidata and other sources can be linked to enrich the description.

RDF-Schema (RDFS)

The RDF Standard describes digital resources by defining and using classes and properties. RDF-Schema (RDFS) is one example of several languages that offer notation rules for describing resources with a semantic meaning by creating vocabulary and taxonomies of closely related concepts. Common to all languages following the RDF standard are the way they describe resources in the form of a statement that consist of subject, predicate, object, which constitute a triplet. All resources in RDF are identified by IRI (international resource identifier) which is a generalization of URI. This enables resources / triplets to be linked globally since IRI / URI is a fundamental part of the HTTP protocol for identifying web resources. There are other languages that enable more advanced classification systems in the form of ontology that can describe rules, temporal and dynamic relationship among resources, such as OWL 2 (Web Ontology Language).

RDF Format

To define a statement in RDF, there are a number of different formats that has different purposes. The Turtle (.ttl) format is a syntax that has abbreviations, prefixes and is easier to read for humans and used through this article series unless otherwise stated. Other formats in the Turtle family are more compact and optimized formats for machine reading, such as N-Triples, Q-Quads.

Classes and properties in RDFS

Classes (rdfs: Class) are used to classify resources. An instance of an rdfs: Class is defined using the predicate rdf: type. For example, we can define that Artist is a class and that Picasso is an instance of the Artist class. Note that ex, rdf and rdfs are IRI prefixes and are abbreviations instead of writing the entire path to resources (http://www.clearbyte.org/example/Artist).

ex:Artist rdf:type rdfs:Class . 
ex:Picasso rdf:type ex:Artist .

Properties (rdfs: Property) is used to add attributes to classes. Similar to how we define classes, we can define instances of properties to add attributes to statements. In the example from earlier we add the properties (predicate) ex: name and ex: created. Name is defined by a text string (literal) “Pablo Picasso” and creator (ex: creatorOf) of an object in the form of an instance of the ex: Guernica class. Note that both name and creatorOf are objects in the statement of Picasso, where name is a text string and creators a object which is a subject in another statement. This could an objekt in the same model or if we want to use the definition of the painting Guernica from Wikidata’s (Wikipedia) by using an IRI prefix for the resource wd: Q175036 (https://www.wikidata.org/wiki/Q175036). Turtle offers the abbreviation of rdf: type in the form of the letter a, which makes the syntax short and easy to read.

ex:name a rdfs:Property .
ex:creatorOf a rdfs:Property .

ex:Picasso a ex:Artist ;
    ex:name “Pablo Picasso”;
    ex:creatorOf ex:Guernica.

Predicate interconnect the subject and the object in a statement which forms the basis of a graph. Subjects and objects can be seen as nodes and the predicate as a meaningful link that describes the relationship between nodes.

Note that the definition of properties usually starts with a lowercase letter and classes with a uppercase.

Domain & Range

To semantically describe and derive relationships between subjects and objects rdfs: domain and rdfs: range are used. The predicate rdfs: domain declares that a property belongs to one or more classes. For exadeducmple, we can define that property P belongs to class D.

P rdfs:domain D .

ex:hasMother rdfs:domain ex:Person ; 
ex:frank ex:hasMother ex:frances .

The example implicitly derives that ex: frank also belongs to the ex: Person class because ex: hasMother belongs to the ex: Person class.

To deduce that the value of an instance belongs to one or more classes the predicate rdfs: range is used. For example, we can define that the value of P belongs to the instance of class R.

P rdfs:range R .

The difference between domain and range is that the first declares that a property belongs to a domain of one or more classes. And that range declares that a property belongs to one or more instances of classes. The following statement illustrates the difference. The example defines two classes, book and person, and the author property. The author property belongs to the domain of Book. But when instantiating a book, the name of the author refers to the class Person.

ex:Book a rdfs:Class .
ex:Person a rdfs:Class .
ex:author a rdf:Property .

ex:author rdfs:domain ex:Book .
ex:author rdfs:range ex:Person .

The example below derives that the value of the ex:motherTo property belongs to both the ex: Female and ex: Person class.

ex:motherTo rdfs:range ex:Female . 
ex:motherTo rdfs:range ex:Person .

The following statement describes that Eva is the mother to Pete and implicitly also a woman and a person.

ex:Eva ex:motherTo ex:Pete .

Definitions of properties using rdfs: range can also be used to describe the data type such as integers, decimal numbers et cetera if it is not a text string which is the default.

ex:age rdf:type rdf:Property . 
ex:age rdfs:range xsd:integer .

The ability to declare properties belonging to a specific class (domain) or a selection (range) of instances of classes makes it possible to draw conclusions (inference) through implicit relations between resources. Deriving implicit connections between resources enables logical reasoning and is a powerful feature of the RDF framework. The next RDF elementary guide presents a semantic data model that describes classes and properties of artists and paintings, and interlink resources from Wikidata that also uses the RDF standard to describe resources.

Congruent organisational structure – digital value creation part 2

In order for businesses to gain added value from partnerships, they need to be structured and adapted to collaborate. Which facilitate quick and efficient implementation of decision-making processes, project change, knowledge transfer et cetera. Congruent structures are the second general mechanism identified in the study of data and information exchange within a partnership between train operators and the Swedish Transport Administration. The first mechanism highlights interoperability for digital resources and infrastructure.

Continue reading “Congruent organisational structure – digital value creation part 2”

Interoperability for digital resource and infrastructure – Digital value creation part 1

A recently completed study examined railway operators perspectives on digital value creation within the partnership between Swedish Transport Administration and national railway operators for data and information exchange. The first mechanism identified of three, highlights value creation of data and information exchange and alignment of information system used within the collaboration.

Continue reading “Interoperability for digital resource and infrastructure – Digital value creation part 1”

Review of Trafikverket open API for traffic information

Profilbild Trafikverket

The Swedish transport administration authority Trafikverket, offering several open data services. One of these is the API for traffic information, which contains data and information for nation-wide train and road traffic. The API began as a information service for train, which was later on expanded to include road data. Our reviews of open APIs are part of an effort to highlight barriers and requirements from a user perspective. We hope that the reviews providing constructive feedback to the data owners, and inspire others by showing examples and solutions. Read more about the background to why we are reviewing open APIs and open data sources.
Continue reading “Review of Trafikverket open API for traffic information”

Review of Västtrafik open API for public transport

vt-logo-digitala-kanaler

Västtrafik handles and coordinate all public transport in west Sweden with the city of Gothenburg as its main transit hub. Västtrafik offers several APIs to search and plan journeys by train, tram, ferries and bus in West Sweden. The API can be found at Västtrafiks development portal (in Swedish), which serves as a focal point for their open API service. The aim of our reviews of open API:s is to shed light on common obstacles and requirements from a user perspective. Note that the portal around the API is in Swedish, but the documentation and the API syntax is in English. The review will try to guide user with no knowledge in Swedish on how to get started. Continue reading “Review of Västtrafik open API for public transport”

User demand driven and machine-readable open data

water-drop

Open data is undergoing a paradigm change where the focus is shifting to user demand driven publication of data in machine-readable formats, with open standards and licenses that is appropriate for its application area. This is often refereed to as “liquid information” or “liquid data” which can be read about in this report from McKinsey’s 2013. The report address the potential value that can be achieved if standards, formats and metadata are functional for its intended use. Open data 2.0 is another emerging term which refers to data that is being made available based on demand and provides means for participation and collaboration, where users can report suggestions for improvement and provide feedback on flawed data. Continue reading “User demand driven and machine-readable open data”