Ontologies define the structure of the data that the RDF triple store can hold. It defines the possible resource classes, the properties these classes may have, and the relation between the different classes as expressed by these properties.
Base ontology
The base ontology is the seed for defining application-specific ontologies. It defines the building blocks to build ontologies upon, like the definition of classes, properties, and literal types themselves. The base ontology is based on RDFS Schema.
It is made up of several components:
- XML schema (XSD) defines the basic literal types.
- Resource description framework defines properties, lists and language-tagged strings.
- RDF Schema defines classes and inheritance.
- Nepomuk Resource Language (NRL) defines resource cardinality and database-level indexes.
- Dublin core metadata (DC) defines a common set of document-oriented superproperties for RDF resources.
Nepomuk
Nepomuk is the swiss army knife of the semantic desktop, similar in scope to Schema.org. It defines data structures for almost any kind of data you might want to store in a personal computer.
It is split into several domains:
- Nepomuk Information Element (NIE) is the core of Nepomuk. It settles the basic principles like the split between “container” and “content”, and defines the base nie:DataObject and nie:InformationElement objects that represent this split.
- Nepomuk File Ontology (NFO) describes the basic filesystem-oriented objects.
- Nepomuk Multimedia (NMM) describes multi-media data.
- Nepomuk Contacts Ontology (NCO) describes contacts and addresses.
- Libosinfo ontology describes OS images.
- Maemo Feeds Ontology (MFO) describes feeds.
- Simplified Location Ontology (SLO) extends metadata with geolocation tagging.
- Nepomuk Annotation Ontology (NAO) extends metadata with annotations.
- Other Tracker extensions to further annotate data and link to external services.
Creating custom ontologies
Tracker does also allow developers to define ontologies that are tailored for their use.
Ontologies are made themselves of RDF data in the Turtle
format with the .ontology
extension. Custom-made ontologies will build upon the
base ontology provided for this purpose.
Ontologies may be split in multiple documents in a same directory. The individual ontology files do not need be self-consistent (e.g. they may use definitions from other files), but all the ontology files as a whole must be self-consistent. Tracker will not open or create a RDF triple store if the ontology is not consistent, and will roll back any change if necessary.
Tracker loads the ontology files in alphanumeric order, it is advisable that those have a numbered prefix in order to load those at a consistent order despite future additions.
Defining a namespace
A namespace is the topmost layer of an individual ontology, it will contain all classes and properties defined by it. In order to define a namespace you can do:
# These prefixes will be used in the definition of the ontology,
# thus must be explicitly defined
@prefix nrl: <http://tracker.api.gnome.org/ontology/v3/nrl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
# This is our example namespace
@prefix ex: <http://example.org/#>
ex: a nrl:Namespace, nrl:Ontology
nrl:prefix "ex"
rdfs:comment "example ontology"
nrl:lastModified "2017-01-01T15:00:00Z"
Defining classes
Classes are the base of an ontology, all stored resources must define themselves as “being” at least one of these classes. They all derive from the base rdfs:Resource type. To eg. define classes representing animals and plants, you can do:
ex:Eukaryote a rdfs:Class;
rdfs:subClassOf rdfs:Resource;
rdfs:comment "An eukaryote".
By convention all classes use CamelCase names, although class names are not restricted. The allowed charset is UTF-8.
Declaring subclasses is possible:
ex:Animal a rdfs:Class;
rdfs:subClassOf ex:Eukaryote;
rdfs:comment "An animal".
ex:Plant a rdfs:Class;
rdfs:subClassOf ex:Eukaryote;
rdfs:comment "A plant".
ex:Mammal a rdfs:Class;
rdfs:subClassOf ex:Animal;
rdfs:comment "A mammal".
With such classes defined, resources may be inserted to the endpoint, eg. with the SPARQL:
INSERT DATA { <merry> a ex:Mammal }
INSERT DATA { <treebeard> a ex:Animal, ex:Plant }
Note that multiple inheritance is possible, resources will just inherit all properties from all classes and superclasses.
Defining properties
Properties relate to a class, so all resources pertaining to that class can define values for these.
ex:cromosomes a rdf:Property;
rdfs:domain ex:Eukaryote;
rdfs:range xsd:integer.
ex:unicellular a rdf:Property;
rdfs:domain ex:Eukaryote;
rdfs:range xsd:bool;
ex:dateOfBirth a rdf:Property;
rdfs:domain ex:Mammal;
rdfs:range xsd:dateTime;
The class the property belongs to is defined by rdfs:domain
, while the
data type contained is defined by rdfs:range
. By convention all
properties use dromedaryCase names, although property names are not
restricted. The allowed charset is UTF-8.
The following basic types are supported:
xsd:boolean
xsd:string
andrdf:langString
xsd:integer
, ranging from -2^63 to 2^63-1.xsd:double
, able to store a 8 byte IEEE floating point number.xsd:date
andxsd:dateTime
, able to store dates and times since January 1st 1 AD, with microsecond resolution.
Of course, properties can also point to resources of the same or other classes, so stored resources can conform a graph:
ex:parent a rdf:Property;
rdfs:domain ex:Mammal;
rdfs:range ex:Mammal;
ex:pet a rdf:Property;
rdfs:domain ex:Mammal;
rdfs:range ex:Eukaryote;
There is also inheritance of properties, an example would be a property in a subclass concretizing a more generic property from a superclass.
ex:geneticInformation a rdf:Property;
rdfs:domain ex:Eukaryote;
rdfs:range xsd:string;
ex:dna a rdf:Property;
rdfs:domain ex:Mammal;
rdfs:range xsd:string;
rdfs:subPropertyOf ex:geneticInformation.
SPARQL queries are expected to provide the same result when queried for a property or one of its superproperties.
# These two queries should provide the exact same result(s)
SELECT { ?animal a ex:Animal;
ex:geneticInformation "AGCT" }
SELECT { ?animal a ex:Animal;
ex:dna "AGCT" }
Defining cardinality of properties
By default, properties are multivalued, there are no restrictions in the number of values a property can store.
INSERT DATA {
<cat> a ex:Mammal .
<dog> a ex:Mammal .
<peter> a ex:Mammal ;
ex:pets <cat>, <dog>
}
Wherever this is not desirable, cardinality can be limited on properties through nrl:maxCardinality.
ex:cromosomes a rdf:Property;
rdfs:domain ex:Eukaryote;
rdfs:range xsd:integer;
nrl:maxCardinality 1.
This will raise an error if the SPARQL updates in the endpoint end up in the property inserted multiple times.
# This will fail
INSERT DATA { <cat> a ex:Mammal;
ex:cromosomes 38;
ex:cromosomes 42 }
# This will succeed
INSERT DATA { <donald> a ex:Mammal;
ex:cromosomes 47 }
Tracker does not implement support for other maximum cardinalities than 1.
Defining uniqueness
It is desirable for certain properties to keep their values unique across all resources, this can be expressed by defining the properties as being a nrl:InverseFunctionalProperty.
ex:geneticInformation a rdf:Property, nrl:InverseFunctionalProperty;
rdfs:domain ex:Eukaryote;
rdfs:range xsd:string;
With that in place, no two resources can have the same value on the property.
# First insertion, this will succeed
INSERT DATA { <drosophila> a ex:Eukariote;
ex:geneticInformation "AGCT" }
# This will fail
INSERT DATA { <melanogaster> a ex:Eukariote;
ex:geneticInformation "AGCT" }
Defining indexes
It may be the case that SPARQL queries performed on the endpoint are known to match, sort, or filter on certain properties more often than others. In this case, the ontology may use nrl:domainIndex in the class definition:
# Make queries on ex:dateOfBirth faster
ex:Mammal a rdfs:Class;
rdfs:subClassOf ex:Animal;
rdfs:comment "A mammal";
nrl:domainIndex ex:dateOfBirth.
Classes may define multiple domain indexes.
Note: Be frugal with indexes, do not add these proactively. An index in the wrong place might not affect query performance positively, but all indexes come at a cost in disk size.
Defining full-text search properties
Tracker provides nonstandard full-text search capabilities, in order to use these, the string properties can use nrl:fulltextIndexed:
ex:name a rdf:Property;
rdfs:domain ex:Mammal;
rdfs:range xsd:string;
nrl:fulltextIndexed true;
nrl:weight 10.
Weighting can also be applied, so certain properties rank higher than others in full-text search queries. With nrl:fulltextIndexed in place, sparql queries may use full-text search capabilities:
SELECT { ?mammal a ex:Mammal;
fts:match "timmy" }
Predefined elements
It may be desirable for the ontology to offer predefined elements of a certain class, which can then be used by the endpoint.
ex:self a ex:Mammal.
Usage does not differ in use from the elements of that same class that could be inserted in the endpoint.
INSERT DATA { ex:self ex:pets <cat> .
<cat> ex:pets ex:self }
Updating an ontology
As software evolves, sometimes changes in the ontology are unavoidable. Tracker can transparently handle certain ontology changes on existing databases.
- Adding a class.
- Removing a class. All resources will be removed from this class, and all related properties will disappear.
- Adding a property.
- Removing a property. The property will disappear from all elements pertaining to the class in domain of the property.
-
Changing rdfs:range of a property. The following conversions are allowed:
xsd:integer
toxsd:bool
,xsd:double
andxsd:string
xsd:double
toxsd:bool
,xsd:integer
andxsd:string
xsd:string
toxsd:bool
,xsd:integer
andxsd:double
-
Adding and removing
nrl:domainIndex
from a class. - Adding and removing
nrl:fulltextIndexed
from a property. - Changing the
nrl:weight
on a property. - Removing
nrl:maxCardinality
from a property.
However, there are certain ontology changes that Tracker will find incompatible. Either because they are incoherent or resulting into situations where it can not deterministically satisfy the change in the stored data. Tracker will error out and refuse to do any data changes in these situations:
- Properties with rdfs:range being
xsd:bool
,xsd:date
,xsd:dateTime
, or any other custom class are not convertible. Only conversions covered in the list above are accepted. - You can not add
rdfs:subClassOf
in classes that are not being newly added. You can not removerdfs:subClassOf
from classes. The only allowed change tordfs:subClassOf
is to correct subclasses when deleting a class, so they point a common superclass. - You can not add
rdfs:subPropertyOf
to properties that are not being newly added. You can not change an existingrdfs:subPropertyOf
unless it is made to point to a common superproperty. You can however removerdfs:subPropertyOf
from non-new properties. - Properties can not move across classes, thus any change in
rdfs:domain
is forbidden. - You can not add
nrl:maxCardinality
restrictions on properties that are not being newly added. - You can not add nor remove
nrl:InverseFunctionalProperty
from a property that is not being newly added.
The recommendation to bypass these situations is the same for all, use different property and class names and use SPARQL to manually migrate the old data to the new format if necessary.
High level code is in a better position to solve the possible incoherences (e.g. picking a single value if a property changes from multiple values to single value). After the manual data migration has been completed, the old classes and properties can be dropped.
Once changes are made, the nrl:lastModified
value should be updated
so Tracker knows to reprocess the ontology.