Configuration

Configuration of the RDF Server #

A more fine-grained configuration of SemSpect is available with the help of a variety of parameters and the semspect-spring.sh resp. semspect-spring.bat scripts. In general there are two ways to specify parameters for SemSpect:

  • Using Spring application parameters:

    ./semspect-spring.sh --server.port=8080 --semspect.rdf.mode=load [...]

  • Using an external configuration file (.yaml or .properties):

    ./semspect-spring.sh --spring.config.additional-location="path/to/config_file"

The following describes all available configuration options for generating indices, loading indices as well as starting SemSpect with the help of YAML configuration files.

Index Generation #

################################
# required generation parameters
################################

semspect.rdf.mode: generate

# rdfDataSources
#
# Supported data sources:
# - plain files: *.ttl, *.ttls, *.owl, *.nt, *.n3, *.rdf, *.hdt, *.jsonld,
#   *.brf, *.rj, *.trig, *.trigs,
# - compressed files: *.bz2, *.gz
# - archives: *.zip
# - directories
# - URLs: supports plain files or compressed files (no archives).
#   - Note: The Http content type must indicate the correct RDF format,
#     otherwise it is defaulted to RDF Turtle.
# - Graph store content URLs:
#   - GraphDB (tested with v10.5.1):
#       http://localhost:7200/repositories/some-store/statements
#   - RDFox (works with >= v6.3a):
#      http://localhost:12110/datastores/some-store/content?default
semspect.rdf.indexing.rdfDataSources:
  - DATA_SOURCE_1
  - ...
  - DATA_SOURCE_N
  
################################
# optional generation parameters
################################

# indicesDirectory
#
# storage folder for the SemStore indices
#
# default is a new folder in JVM working directory
semspect.rdf.indicesDirectory: "path/to/directory"

# parsingStrategy
#
# options:
# - ONE_PASS (default):
#   - generate base structures (triples, dictionary) in a single pass
#     over the provided RDF datasets
#   - consumes more main memory because uncompressed dictionary and triples
#     are loaded simultaneously into memory
# - TWO_PASS:
#   - generate base structures (triples, dictionary) in two iterations
#     over the provided RDF datasets
#   - dictionary is generated in first pass, triples in second
#   - consumes less main memory since dictionary is compressed during
#     generation on demand
semspect.rdf.indexing.parsingStrategy: ONE_PASS

# numberOfThreads
#
# default: available processors of machine
semspect.rdf.indexing.numberOfThreads: 4

# terminateAfterIndexing
#
# default: false
semspect.rdf.indexing.terminateAfterIndexing: false

# validateParsedResources
#
# default: false
semspect.rdf.indexing.validateParsedResources: false

# iriDictionarySectionType
#
# Which type of dictionary section should be used for all IRIs.
# An uncompressed type requires more space but might lead to a higher
# performance when applying string filters.
#
# options:
# - PLAIN_FRONT_CODING (default)
# - UNCOMPRESSED_STRINGS
semspect.rdf.indexing.iriDictionarySectionType: PLAIN_FRONT_CODING

# stringLiteralDictionarySectionType
#
# Which type of dictionary section should be used for all string literals.
# An uncompressed type requires more space but might lead to a higher
# performance when applying string filters.
#
# options:
# PLAIN_FRONT_CODING (default)
# UNCOMPRESSED_STRINGS
semspect.rdf.indexing.stringLiteralDictionarySectionType: PLAIN_FRONT_CODING

# disableRDFSDomainRangeEntailment
#
# When set to true all entailments implied by rdfs:domain as well as rdfs:range
# are ignored.
#
# default: false
semspect.rdf.indexing.disableRDFSDomainRangeEntailment: false

#########################
# experimental features
#########################

# translateSKOSToRDFS
#
# This option introduces a class hierarchy to reflect the SKOS concept
# taxonomy. A concept class is generated for each SKOS concept that has a narrower concept. 
# and each skos:broader, skos:narrower, and skos:exactMatch relation is translated to the
# corresponding rdfs:subClassOf axioms.
# Moreover, all root and leaf concepts of the derived hierarchy are assigned new classes
# "Root Concept" and "Leaf Concept" respectively.
# Example:
#   For ":super-c skos:narrower :sub-c", we derive ":sub-c rdfs:subClassOf :super-c".
#   For ":sub-c skos:broader :super-c", we derive ":sub-c a :super-c".
#
# default: false
semspect.rdf.indexing.translateSKOSToRDFS: false

Load Indices and Start SemSpect #

#############################
# required loading parameters
#############################

semspect.rdf.mode: load
semspect.rdf.indicesDirectory: "path/to/indices/directory"

Additional Initialization Parameters (All Optional) #

Server Settings #

##################################
# additional server settings
##################################

# port
#
# default: 8080
server.port: 8080

# x-frame support
# The x-frame functionality is a way of combining and organizing web-based
# documents together on a single webpage through the use of frames (HTML
# elements that allow you to display content from another source, such as
# another webpage or a video). By default, the x-frame functionality is
# disabled by Spring Boot, however, by using the following parameter, it is
# possible to define frame-ancestors to include SemSpect in other HTML pages
# (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/frame-ancestors) .
#
# default: disabled; below are some examples
server.frame-ancestors:
  # local HTML files are allowed to integrate SemSpect in an x-frame
  - "file://*"
  # any subdomain of "example.org" is allowed to integrate SemSpect in an x-frame
  - "https://www.*.example.org"

# context path
#
# Determines at which context path the content will be served by the server.
# For instance, if it is set to "/dataset-x", SemSpect gets hosted on "localhost:8080/dataset-x".
# This option might be helpful to distinguish different running instances of
# SemSpect not merely by their port.
#
# default: /
server.servlet.context-path: /

Exploration Settings #

##################################
# additional exploration settings
##################################

# explorationMenuWithoutCount
#
# If enabled, the exploration menu does not show the distinct number of connected
# resources. It only shows whether at least one resource of a particular class is
# connected to the given group (increases performance).
#
# default: false
semspect.rdf.exploration.explorationMenuWithoutCount: false

# numberOfThreads
#
# default: number of processors of machine
semspect.rdf.exploration.numberOfThreads: 4

# cacheGroups
#
# default: true
semspect.rdf.exploration.cacheGroups: true

# showClassesAndPropertiesAsResources
#
# If set to true, rdf:Property and rdfs:Class are shown in the class tree.
#
# default: false
semspect.rdf.exploration.showClassesAndPropertiesAsResources: false

# showTopClassInTree
#
# If set to true, the top class rdfs:Resource is shown in the class tree.
#
# default: false
semspect.rdf.exploration.showTopClassInTree: false

# logMemoryUsage
#
# Logs the used main memory to the file
# INDICES_DIRECTORY/exploration/log/memoryConsumption.csv
#
# default: false
semspect.rdf.exploration.logMemoryUsage: false

# explorationMenuComputationMethod
#
# options:
# - ROARING_BITMAPS_PER_CLASS
# - SORTED_ITERATION_PER_CLASS
# - INDIVIDUAL_QUERIES_PER_CLASS
# - INDIVIDUAL_QUERIES
# - ROARING_BITMAPS
# - DYNAMICALLY_DETERMINED_PER_CLASS
# - DYNAMICALLY_DETERMINED (default)
semspect.rdf.exploration.explorationMenuComputationMethod: DYNAMICALLY_DETERMINED

# explorationMenuWithoutCountComputationMethod
#
# options:
# - PER_CLASS
# - GLOBAL
# - DYNAMICALLY_DETERMINED (default)
semspect.rdf.exploration.explorationMenuWithoutCountComputationMethod: DYNAMICALLY_DETERMINED

# predecessorCountComputationMethod
#
# options:
# - SORTED_ITERATION
# - HASH_SET
# - DYNAMICALLY_DETERMINED (default)
semspect.rdf.exploration.predecessorCountComputationMethod: DYNAMICALLY_DETERMINED

# filterComputationMethod
#
# options:
# - QUERY_PER_INDIVIDUAL
# - INDEX_ITERATION
# - DYNAMICALLY_DETERMINED (default)
semspect.rdf.exploration.filterComputationMethod: DYNAMICALLY_DETERMINED

# sortingMethod
#
# options:
# - INDEX_ITERATION
# - SORTING_ON_THE_FLY
# - DYNAMICALLY_DETERMINED (default)
semspect.rdf.exploration.sortingMethod: DYNAMICALLY_DETERMINED

IRI Prefix Configuration #

In order to shorten resource IRIs in the UI, SemStore collects all prefixes that have been defined in the provided RDF datasets. Furthermore, a list of commonly deployed RDF prefixes is added by default. Note that the explicitly given prefixes of the RDF datasets have a higher priority than the defaults. To examine and modify the IRI-to-prefix map, inspect the file INDICES-DIRECTORY/exploration/config/iriToPrefixMap.yaml. The changes will be applied after the next startup of SemSpect.

Script Configuration #

All scripts can be configured through environment variables:

  • Path of the java command: SEMSPECT_JAVA_PATH
    (default: "java")
  • SemSpect specific JDK options: SEMSPECT_JDK_OPTIONS
    (default: <empty>, overrides JDK_JAVA_OPTIONS and JAVA_TOOL_OPTIONS)
  • Installation directory: SEMSPECT_HOME
    (default: <script location>)

Moreover, settings for the paths of the configuration and output paths are available for the smart scripts (semspect.sh & semspect.bat):

  • Location of the semspect configuration: SEMSPECT_CONFIG_PATH
    (default: <SEMSPECT_HOME>/semspect-config/semspect-config.yaml)
  • Location of the semstore configuration: SEMSTORE_CONFIG_PATH
    (default: <SEMSPECT_HOME>/semspect-config/default-semstore-config.yaml)
  • Location of the indices: SEMSTORE_INDICES_DIR
    (default: <SEMSPECT_HOME>/semspect-indices/)

Dossier Configuration #

For each dataset, we can define the rendering type of data property values such that URLs become links or image URLs show the respective image etc. This is done in a separate dossier configuration.

As an example, consider the following data:

@prefix :     <http://www.example.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

:Company
  a rdfs:Class .

:derivo
  a      :Company ;
  :name  "Derivo" ;
  :image "https://www.derivo.de/fileadmin/externe_websites/ext.derivo/images/layout/logo_deutsch_01.png" ;
  :email "info@derivo.de" ;
  :url   "https://www.derivo.de/home/" .

Now, to specify the UI types of the respective properties (called attributes in the configuration), a YAML dossier configuration is required:

version: 1
databases:
  - database: default
    attributes:
      - id: 'http://www.example.org/image'
        type: 'IMAGE' # images or GIFs
      - id: 'http://www.example.org/url'
        type: 'URL'
      - id: 'http://www.example.org/email'
        type: 'EMAIL'

Depending on the startup script, SemSpect is reading the dossier configuration by default from the following paths:

  • semspect.sh: SEMSPECT_HOME/config/datasets/SOME_DATASET/dossier.yaml
  • semspect-spring.sh: INDICES_DIRECTORY/exploration/config/dossier.yaml

Alternatively a different file can be specified using the following JVM parameter (not available as Spring parameter):

-Dde.derivo.semspect.server.configuration.dossier.path

Example:

#!/bin/bash
export SEMSPECT_JDK_OPTIONS=-Dde.derivo.semspect.server.configuration.dossier.path="path/to/dossier-config.yaml"
./semspect.sh run test-data.ttl

Result in SemSpect:

Facet Configuration (Experimental) #

In RDF SemSpect, facets are configurable filter for groups shown in separate sections of the dossier on the right-hand side. They behave in the same way as the class filters and can be collapsed or expanded in the case of a hierarchy. In fact, facets display classes from the class schema. With the help of a facet configuration one can define which parts of the class schema at which particular combination are shown as a separate facet. This is useful to distinguish between the backbone class schema and classes that provide a “second perspective” on the data. As an example, consider the following data set that deals with clothes, their sales status and usage type:

In case our main focus is on the type of clothing (shirt, trouser etc.) it might be useful to declare sales and seasonal information as facets. This has the advantage that the class tree and exploration menu becomes shorter (facets are not shown in the class tree and exploration menu). However, the facet classes are still available for filtering on those groups that contain individuals of those classes:

Please note that all specified classes of a facet no longer appear in the class hierarchy shown on the left, unless they also occur in a branch that is not declared as a facet.

Facets are defined in a facet configuration file. Each facet definition consists of one or more so-called facet values and a facet name. There are two types of facets:

  • simple (default): only the listed classes are used as facet values, not their subclasses (unless they are also listed). This setting is mostly relevant for Neo4j SemSpect where the label hierarchies are extrapolated from the data and might not be semantically meaningful (this does not happen in RDF SemSpect).
    Remark: unlisted subclasses of a simple facet class will be moved up one level in the class tree and histograms (or hidden if they appear in a neighbour branch).

  • subtree: the listed classes and all their subclasses are used as facet values.
    Remark: subclasses of a subtree facet class that appear in other non facet branches of the class hierarchy can still be found there.

In the example above, the following facet definition was given:

version: 1
databases:
  - database: default
    facets:
      - name: 'Product type'
        values: [ 'http://www.semspect.de/test#Special_offer',
                  'http://www.semspect.de/test#Remaining_item' ]
        type: simple
      - name: 'Seasonal classification'
        values: [ 'http://www.semspect.de/test#Seasonal_type' ]
        type: subtree
Click to expand the matching RDF data of this example
@prefix : <http://www.semspect.de/test#> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

:Textiles rdf:type rdfs:Class .
:Trousers rdf:type rdfs:Class ; rdfs:subClassOf :Textiles .
:Pullover rdf:type rdfs:Class ; rdfs:subClassOf :Textiles .
:Shirts rdf:type rdfs:Class ; rdfs:subClassOf :Textiles .
:Dresses rdf:type rdfs:Class ; rdfs:subClassOf :Textiles .
:Suits rdf:type rdfs:Class ; rdfs:subClassOf :Textiles .

:Special_offer rdf:type rdfs:Class .
:Remaining_item rdf:type rdfs:Class .

:Seasonal_type rdf:type rdfs:Class .
:Summer rdf:type rdfs:Class ; rdfs:subClassOf :Seasonal_type .
:Winter rdf:type rdfs:Class ; rdfs:subClassOf :Seasonal_type .
:Transitional rdf:type rdfs:Class ; rdfs:subClassOf :Seasonal_type .
:Spring rdf:type rdfs:Class ; rdfs:subClassOf :Transitional .
:Autumn rdf:type rdfs:Class ; rdfs:subClassOf :Transitional .

:trouser1 rdf:type :Trousers ; rdf:type :Winter .
:trouser2 rdf:type :Trousers ; rdf:type :Winter ; rdf:type :Special_offer .
:trouser3 rdf:type :Trousers ; rdf:type :Autumn .
:trouser4 rdf:type :Trousers ; rdf:type :Spring .
:trouser5 rdf:type :Trousers ; rdf:type :Winter .
:short1 rdf:type :Trousers ; rdf:type :Summer ; rdf:type :Special_offer .
:short2 rdf:type :Trousers ; rdf:type :Summer .

:pullover1 rdf:type :Pullover ; rdf:type :Autumn .
:pullover2 rdf:type :Pullover ; rdf:type :Autumn .
:pullover3 rdf:type :Pullover ; rdf:type :Winter ; rdf:type :Remaining_item .
:pullover4 rdf:type :Pullover ; rdf:type :Winter .

:shirt1 rdf:type :Shirts ; rdf:type :Autumn ; rdf:type :Special_offer .
:shirt2 rdf:type :Shirts ; rdf:type :Winter .
:shirt3 rdf:type :Shirts ; rdf:type :Spring .
:tshirt1 rdf:type :Shirts ; rdf:type :Summer .
:tshirt2 rdf:type :Shirts ; rdf:type :Summer .

:dress1 rdf:type :Dresses ; rdf:type :Autumn .
:dress2 rdf:type :Dresses ; rdf:type :Winter .
:dress3 rdf:type :Dresses ; rdf:type :Autumn ; rdf:type :Remaining_item .

:suit1 rdf:type :Suits ; rdf:type :Autumn .
:suit2 rdf:type :Suits ; rdf:type :Winter ; rdf:type :Special_offer .
:suit3 rdf:type :Suits ; rdf:type :Summer .
:suit4 rdf:type :Suits ; rdf:type :Summer .

:matches rdf:type rdf:Property .

:trouser1 :matches :pullover1 .
:trouser1 :matches :shirt1 .
:trouser2 :matches :pullover1 .
:trouser2 :matches :pullover2 .
:trouser3 :matches :pullover3 .
:trouser4 :matches :pullover4 .
:trouser4 :matches :shirt3 .
:trouser4 :matches :shirt2 .

Depending on the startup script, SemSpect is reading the facet definitions from the following paths (if existent):

  • semspect.sh: SEMSPECT_HOME/config/datasets/SOME_DATASET/facets.yaml
  • semspect-spring.sh: INDICES_DIRECTORY/exploration/config/facets.yaml Alternatively a different file can be specified by a JVM parameter: -Dde.derivo.semspect.server.configuration.facets.path=/path/to/semspect_facets.yaml

SemStore Statistics #

SemStore collects statistics while generating all indices as well as during the exploration of a corresponding dataset. These statistics are stored in a subdirectory of the specified indices folder and can be visualized with our SemStore statistics Python application that is available on DockerHub. To execute the respective application with Docker, take a look at the scripts located in the tools/semstore-statisics/ folder of the SemSpect installation directory.

To generate plots for a single indices directory:

./semstore-eval.sh ./path/to/indices/directory

To carry out the meta-evaluation for several index directories, the shell script ./semstore-meta-eval.sh must be adapted accordingly.