Linked Open Data

Introduction

The aim of Open Data initiatives like Open Data Euskadi is to publish government data in the most interoperable and rich way, so that citisens and other institutions can build interesting applications and perform deep analyses with the data. Linked Data offers a suitable technology to do so, through the so called Linked Open Data.

Five star Linked Open Data

The idea behind Linked Data is to publish data directly on the Web, using current technologies, with standards like RDF, OWL, SHACL, and SPARQL. In order for such data to be useful, it must be identified with URIs, accessible through HTTP, and most importantly, linked to external Linked Data resources, to be a part of the Linked Open Data Cloud.

Linked Open Data cloud

By publishing Open Data as Linked Data, 3rd parties (humans or programs) can browse the data through links and perform interesting queries on integrated data.

Technical documentation

In Open Data Euskadi, we have chosen data from different sources (Open Data Euskadi catalog, Legegunea, web content, etc.) and we have converted it to Linked Data. This documentation is provided to make the consumption of such data easier to developers, citisens, journalists, etc.

Linked Data

Linked Data is based on following four principles:

  1. Identify every data item (entity or relationship) with a URI.

  2. Make those URIs HTTP resolvable, that is, when the URI is requested a document containing information about the entity can be obtained.

  3. When an entity is requested by HTTP, provide its information using an open formatting standard. The format provided should be determined by HTTP content negotiation between the client and the server (e.g. RDF for an automatic agent, or HTML for a human user), so that the entity and its representations are decoupled. Importantly, the RDF format should always be available.

  4. Ensure, to the greatest extent possible, that the information provided by URI resolution contains typed relations to other entities, so that the agent can traverse those relations to discover new information, analogously to how humans browse the web.

Resource Description Framework (RDF)

RDF can be described as the "HTML for data": a shared language for representing data on the Web. RDF is based on representing data as subject-predicate-object triples ("Bob"-"is interested in"-"Mona Lisa"): by gluing these triples together, we obtain a graph. Each entity (subject, predicate, or object) is identified by a URI, except objects, since they can also be literals (Numeric data, Strings, etc. -- see XSD Datatypes -- ). RDF is stored in Triple Stores: GraphDB is the one we use, through the RDF Java Framewor RDF4J.

SPARQL

SPARQL is a query language for RDF and standard API for accessing SPARQL endpoints. A Triple Store exposes a SPARQL endpoint.

Web Ontology Language (OWL)

OWL is a Knowledge Representation language to build ontologies. An ontology is a vocabulary we use to describe general properties about the data we publish, through axioms. All the entities in an OWL Ontology are identified by URIs, except literals.

Shapes Constraint Language (SHACL)

SHACL is a language for validating RDF data. With SHACL, constrains can be defined, and check whether an RDF graph complies with them..

In Linked Data, resources are identified by URIs. This means that URIs should be persistent and well defined (See "Linked Data Best Practices" references bellow). 

The resource URIs at Open Data Euskadi allways follow the pattern

http://id.euskadi.eus/{resource}

Usually the {resource} URIs adheres to the NTI scheme for URIs (NTI stands for spanish Norma Técnica de Interoperabilidad)

http://id.euskadi.eus/{Sector}/{Domain}/{ClassName}/{Identifier}

 where:

  • Sector: one of the sectors provided by the NTI (e.g. environment), translated from spanish to english. The SKOS file with the sector names can be found here.
  • Domain: the realm to which the resource belongs, defined by Open Data Euskadi (e.g. air-quality). The SKOS file with the domain names can be found here.
  • ClassName: the name of the class to which this resource belongs. In other words, the name of the resource at the other end of the rdf:type predicate (e.g. observation, from http://purl.org/linked-data/cube#Observation). See section "Ontologies used" bellow.
  • Identifier: a unique identifier, generated from the original data (e.g. AV-GASTEIZ-2017-01-26).

Therefore a real URI, identifying an observation of air quality that follows the Data Cube model, looks like: http://data.euskadi.eus/id/environment/air-quality/observation/AV-GASTEIZ-2017-01-26.

Where:

  • Sector: environment.
  • Domain: air-quality.
  • ClassName: observation.
  • Identifier: AV-GASTEIZ-2017-01-26.

Other big set of resource URIs come from the Legegunea service ant follow the URI pattern defined by the European Legislation Identifier (ELI) project:

http://id.euskadi.eus/eli/{jurisdiction}/{type}/{year}/{month}/{day}/{naturalidentifier}/{version}/{pointintime}/{language}/{format}

Finally, apart from resources, the following entities also have URI schemes defined:

  • OWL Classes: http://id.euskadi.eus/def/{OntologyName}#{ClassName}.
  • OWL properties: http://id.euskadi.eus/def/{OntologyName}#{PropertyName}.
  • OWL Ontology: http://id.euskadi.eus/def/{OntologyName}.
  • SKOS Concept: http://id.euskadi.eus/kos/{ConceptName}.
  • Dataset in a DCAT file: http://id.euskadi.eus/dataset/{NamedGraph}.
  • Distribution in a DCAT file: http://id.euskadi.eus/distribution/{NamedGraph}/[lang]/format. lang is optional.
  • Named Graph in a DCAT file or Triple Store: http://id.euskadi.eus/graph/{NamedGraph}.

Euskadi.eus URI policy test

An important notion of Linked Data is that a URI identifies a resource, but a resource can have different representations of the same content

  • A HTML page describing the resource
  • RDF data describing the resource
  • XML, CSV, etc data
  • etc

Content negotiation is the process by which the server provides the appropriate representation for each client, according to the MIME type of the Accept header provided by the client (text/html for a web browser, application/rdf+xml for an RDF agent, etc.).

The content negotiation process at Open Data Euskadi is designed in the same way as in DBpedia. Given a URI, the content negotiation process redirects the client with HTTP 303 codes to the appropriate URLs containing representations (page or data URLs).

In the case of Open Data Euskadi there is an additional consideration since some resources can hava an associated web page (ie a formal human oriented web page describing the resource) while other resources do NOT have a web page (they're only data) so if the client requests HTML content for the resource URI (http://id.euskadi.eus/{resource}):

  • If the {resource} has an asocciated formal web, the client is redirected to that web page (ie: www.euskadi.eus/{resource}) when requesting http://id.euskadi.eus/{resource} with mime=html
  • If the {resource} does NOT have an associated formal web, when the client requests html content for http://id.euskadi.eus/{resource}, the client is redirected to http://doc.euskadi.eus/{resource} where Epimorhic's ELDA is used to return an HTML representation for the resource's data

Internally, when the client requests html data for a {resource}, the [URI handler] (the open-data euskadi's module in charge of content negotiation) issues a [triple-store] query to guess if that {resource} has a main-entity-of-page property that points to the formal web page; if this is the case, a 303 redir to that page is returned, if not, a 303 redir to http://doc.euskadi.eus/{resource} is returned 

If the client requests a not-html content type (ie RDF, Turtle, etc),, the [URI handler] redirects the client to http://data.euskadi.eus/{resource}

So the URI api is consistent:

  • {resources} are iidentified by URIs like: http://id.euskadi.eus/{resource}
  • Requesting a {resource} url (http://id.euskadi.eus/{resource}) with content type = html, results with a 303 redirect response to http://doc.euskadi.eus/{resource}
  • While requesting a {resource} uri (http://id.euskadi.eus/{resource}) iwith content type other than html, results with a 303  redirect response to http://data.euskadi.eus/{resource}

Obviously if one knows beforehand that the html representation of the resource is required, the client can call directly http://doc.euskadi.eus/{resource}; in the same way, if the data is required, the client can call http://data.euskadi.eus/{resource} with the appropiate content-type

The following is an schema of the [URI handler] (the opendata euskadi's module in charge of content negotiation)


         ^
         |                                                  +--------------------------+
         +                                                  | http://idsite/{resource} |
     Resource                                               +-------------+------------+
       URIs                                                               |
         +                             Is main entity                     |
         |                           +---+ of page? +-----+MIME=HTML+-----+-----+MIME=RDF+--------+
         |                           |                                                            |
         |              +------------+-------------+                                              |
         |              |                          |                                              |
         |              |                          |                                              |
         |              |                          |                                              |
     +---------------------------------------------------+[***** CLIENT REDIR ****]+----------------------------------+
         |              |                          |                                              |
                        |                          |                                              |
         |   +----------v--------------+ +---------v---------------+                 +------------v-------------+
         |   |http://docsite/{resource}| |http://website/{resource}|                 |http://datasite/{resource}|
         +   +----------+--------------+ +---------+---------------+                 +------------+-------------+
                        |                          |                                              |
   Representation       |                          |                                              |
       URLs             |                          |                                              |
         +       +------v-------+        +---------v--------+                            +--------v--------+
         |       |              |        |                  |                            |                 |
         |       |     ELDA     |        |        Web       |                            |   Triple+Store  |
         |       |              |        |                  |                            |                 |
         v       +--------------+        +------------------+                            +-----------------+

The list of supported MIME types can be found at the RDF4J REST API documentation.

Named Graphs provide a mechanism to organise RDF triples into meaningful groups, since a Named Graph is a collection of RDF statements identified by a URI. Named Graphs are useful to organise the data in a Triple Store, and they also offer the possibility of recording the provenance of data in triples, since the URI of the named Graph can be the subject of more (metadata) triples.

Named Graphs at Open Data Euskadi

The Named Graph and metadata mechanism is used at Open Data Euskadi to add provenance information to the RDF datasets generated from Open Data Euskadi datasets. Those datasets already have DCAT metadata, so the DCAT is "recycled" to obtain the metadata of the datasets that are created by converting existing Open Data Euskadi datasets to RDF (normally from CSV).

Named Graphs and DCAT at Open Data Euskadi

This excerpt from a DCAT file shows the appropriate triples (Not all the triples are shown):

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix schema: <http://schema.org/> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .

# The main dataset has two distributions: a CSV file and the RDF stored in the Triple Store
<http://id.euskadi.eus/dataset/calidad-aire-en-euskadi-2017/> dcat:distribution 
	<http://id.euskadi.eus/distribution/calidad-aire-en-euskadi-2017/es/csv>, 
	<http://id.euskadi.eus/distribution/calidad-aire-en-euskadi-2017/lod> .

# CSV distribution	
<http://id.euskadi.eus/distribution/calidad-aire-en-euskadi-2017/es/csv>
  a dcat:Distribution ;
  dc:format [
    a dc:IMT ;
    rdfs:label "CSV" ;
    rdf:value "text/csv"
  ] ;
  dcat:byteSize 0.0 ;
  dcat:accessURL "http://id.euskadi.eus/contenidos/ds_informes_estudios/calidad_aire_2017/es_def/adjuntos/datos_diarios_csv.zip"^^xsd:anyURI ;
  dc:title "Calidad del aire"@es .	
	
# Linked Data distribution
<http://id.euskadi.eus/distribution/calidad-aire-en-euskadi-2017/lod> a dcat:Distribution, void:Dataset, schema:Distribution ;
sd:namedGraph <http://id.euskadi.eus/graph/calidad-aire-en-euskadi-2017>;
void:sparqlEndpoint <http://api.euskadi.eus/sparql/> ;
dc:modified "2008-11-17"^^xsd:date ;
dc:title "Calidad del aire Linked Data"@es .

This means that SPARQL can be used to query data and metadata.

Many webs of the euskadi.eus domain have been annotated with terms from the Schema vocabulary, adding JSON-LD snippets to them. Since JSON-LD is RDF, the content created for the webs is also stored in the Triple Store.

Even though different ontologies have been used in the RDF data, the URIs of the classes and properties of such ontologies can be found in Euskadipedia, the OWL ontology we maintain with external and internal entities.

Specifications, standards, and general purpose vocabularies

Tools used in the project

Online tools

Other Linked Open Data projects

Interesting articles and posts

Linked Data Best Practices

Euskadi, bien común