Configuration

Configuration of the RDF Server #

A more fine-grained configuration of SemSpect is available with the help of a variety of parameters and the semspect-spring.sh resp. semspect-spring.bat scripts. In general there are two ways to specify parameters for SemSpect:

  • Using Spring application parameters:

    ./semspect-spring.sh --server.port=8080 --semspect.rdf.mode=load [...]

  • Using an external configuration file (.yaml or .properties):

    ./semspect-spring.sh --spring.config.additional-location="path/to/config_file"

The following describes all available configuration options for generating indices, loading indices as well as starting SemSpect with the help of YAML configuration files.

Index Generation #

################################
# required generation parameters
################################

semspect.rdf.mode: generate

# rdfDataSources
#
# Supported data sources: 
# - plain files: *.ttl, *.ttls, *.owl, *.nt, *.n3, *.rdf, *.hdt, *.jsonld,
#   *.brf, *.rj, *.trig, *.trigs, 
# - compressed files: *.bz2, *.gz
# - archives: *.zip
# - directories
# - URLs: supports plain files or compressed files (no archives). 
#   Note: The Http content type must indicate the correct RDF format,
#   otherwise it is defaulted to RDF Turtle.
semspect.rdf.indexing.rdfDataSources:
  - path/to/data_source_1
  - ...
  - path/to/data_source_n
  
################################
# optional generation parameters
################################

# indicesDirectory
#
# storage folder for the SemStore indices
#
# default is a new folder in JVM working directory
semspect.rdf.indicesDirectory: path/to/directory 

# parsingStrategy
#
# Options:
# - ONE_PASS (default):
#   - generate base structures (triples, dictionary) in a single pass
#     over the provided RDF datasets
#   - consumes more main memory because uncompressed dictionary and triples 
#     are loaded simultaneously into memory
# - TWO_PASS: 
#   - generate base structures (triples, dictionary) in two iterations 
#     over the provided RDF datasets
#   - dictionary is generated in first pass, triples in second
#   - consumes less main memory since dictionary is compressed during 
#     generation on demand
semspect.rdf.indexing.parsingStrategy: ONE_PASS 

# numberOfThreads
#
# default: available processors of machine 
semspect.rdf.indexing.numberOfThreads: 4 

# terminateAfterIndexing
#
# default: false
semspect.rdf.indexing.terminateAfterIndexing: false 

# validateParsedResources
#
# default: false
semspect.rdf.indexing.validateParsedResources: false 

# iriDictionarySectionType
#
# Which type of dictionary section should be used for all IRIs. 
# An uncompressed type requires more space but might lead to a higher 
# performance when applying string filters.
#
# Options:
# - PLAIN_FRONT_CODING (default)
# - UNCOMPRESSED_STRINGS
semspect.rdf.indexing.iriDictionarySectionType: PLAIN_FRONT_CODING

# Which type of dictionary section should be used for all string literals. 
# An uncompressed type requires more space but might lead to a higher 
# performance when applying string filters.
#
# Options:
# PLAIN_FRONT_CODING (default)
# UNCOMPRESSED_STRINGS
semspect.rdf.indexing.stringLiteralDictionarySectionType: PLAIN_FRONT_CODING

Load Indices and Start SemSpect #

#############################
# required loading parameters
#############################

semspect.rdf.mode: load
semspect.rdf.indicesDirectory: path/to/indices/directory    

Additional Settings #

#####################
# additional settings
#####################

# port
#
# default: 8080
server.port: 8080

# context path
#
# Determines at which context path the content will be served by the server. 
# For instance, if it is set to "/dataset-x", SemSpect gets hosted on "localhost:8080/dataset-x".
# This option might be helpful to distinguish different running instances of SemSpect not merely by their port.  
#
# default: /
server.servlet.context-path: /

# numberOfThreads
# 
# default: number of processors of machine
semspect.rdf.exploration.numberOfThreads: 4 

# cacheGroups
#
# default: true
semspect.rdf.exploration.cacheGroups: true

# showClassesAndPropertiesAsResources
# 
# If set to true, rdf:Property and rdfs:Class are shown in the class tree.
# 
# default: false
semspect.rdf.exploration.showClassesAndPropertiesAsResources: false

# showTopClassInTree
# 
# If set to true, the top class rdfs:Resource is shown in the class tree.
# 
# default: false
semspect.rdf.exploration.showTopClassInTree: false

# logMemoryUsage
#
# Logs the used main memory to the file 
# INDICES-DIRECTORY/exploration/log/memoryConsumption.csv
#
# default: false
semspect.rdf.exploration.logMemoryUsage: false    

# explorationMenuComputationMethod
#
# Options:     
# - ROARING_BITMAPS_PER_CLASS  
# - SORTED_ITERATION_PER_CLASS  
# - INDIVIDUAL_QUERIES_PER_CLASS  
# - INDIVIDUAL_QUERIES  
# - ROARING_BITMAPS  
# - DYNAMICALLY_DETERMINED_PER_CLASS  
# - DYNAMICALLY_DETERMINED (default)
semspect.rdf.exploration.explorationMenuComputationMethod: DYNAMICALLY_DETERMINED

# predecessorCountComputationMethod
#
# Options: 
# - SORTED_ITERATION  
# - HASH_SET  
# - DYNAMICALLY_DETERMINED (default)
semspect.rdf.exploration.predecessorCountComputationMethod: DYNAMICALLY_DETERMINED

# filterComputationMethod
#
# Options: 
# - QUERY_PER_INDIVIDUAL  
# - INDEX_ITERATION  
# - DYNAMICALLY_DETERMINED (default)
semspect.rdf.exploration.filterComputationMethod: DYNAMICALLY_DETERMINED

# sortingMethod
#
# Options: 
# - INDEX_ITERATION  
# - SORTING_ON_THE_FLY  
# - DYNAMICALLY_DETERMINED (default)
semspect.rdf.exploration.sortingMethod: DYNAMICALLY_DETERMINED    

IRI Prefix Configuration #

In order to shorten resource IRIs in the UI, SemStore collects all prefixes that have been defined in the provided RDF datasets. Furthermore, a list of commonly deployed RDF prefixes is added by default. Note that the explicitly given prefixes of the RDF datasets have a higher priority than the defaults. To examine and modify the IRI-to-prefix map, inspect the file INDICES-DIRECTORY/exploration/config/iriToPrefixMap.yaml. The changes will be applied after the next startup of SemSpect.

SemSpect JDK Options #

To set Java options when using SemSpect, the standard environment variables JDK_JAVA_OPTIONS and JAVA_TOOL_OPTIONS can be used. If some JVM options should apply only to SemSpect (ex: maximal heap size), the user-defined environment variable SEMSPECT_JDK_OPTIONS can be used.

SemStore Statistics #

SemStore collects statistics while generating all indices as well as during the exploration of a corresponding dataset. These statistics are stored in a subdirectory of the specified indices folder and can be visualized with our SemStore statistics Python application that is available on DockerHub. To execute the respective application with Docker, take a look at the scripts located in the semstore-statisics/ folder of the SemSpect installation directory.

To generate plots for a single indices directory:

./semstore-eval.sh ./path/to/indices/directory

To carry out the meta-evaluation for several index directories, the shell script ./semstore-meta-eval.sh must be adapted accordingly.