Installation and Quickstart

Installation & Quickstart of RDF SemSpect Server #

System Requirements #

  • RDF SemSpect was tested with the following operating systems: OS X, Linux, Windows
  • It requires Java 17 or later (parsing is approx. 50% slower with Java >=19)
  • It requires a valid license (except for the Beta versions)

Supported RDF Formats #

RDF SemSpect can load one or more files or all files from a given directory or archive (zip, gzip, bz2) in the following formats:

  • Turtle (*.ttl)
  • Turtle with RDF-Star extension (*.ttls)
  • OWL (*.owl)
  • N-Triple (*.nt)
  • N-Quad (*.nq)
  • Notation3 (*.n3)
  • RDF/XML (*.rdf)
  • HDT (*.hdt)
  • JSON-LD (*.jsonld)
  • BinaryRDF (*.brf)
  • RDF/JSON (*.rj)
  • TriG (*.trig)*
  • TriG with RDF-Star extension (*.trigs)*

IRIs of the parsed resources are not validated by default. The validation can however be activated in the SemSpect configuration.

* Note that in case of the N-Quad and TriG formats, named graphs are ignored and everything is loaded into one default graph.

Installation of the RDF SemSpect Server #

Download the RDF SemSpect BETA server package from:

https://www.semspect.de/rdf-rest-distribution-14.0.0-beta.zip

Note that this is full-feature time limited beta of RDF SemSpect until March 31st 2024.

Unpack this archive into your preferred target directory. The resulting directory contains scripts for starting SemSpect. In case you want to call these from any directory, adjust your PATH environment variable accordingly.

If you want to stay informed about new versions of and announcements about RDF SemSpect, please send an email to info@semspect.de.

RDF SemSpect Quickstart #

Start RDF SemSpect #

To start SemSpect run the semspect.sh resp. semspect.bat script and supply your RDF input as list of arguments. For instance:

> semspect.sh turtle-file.ttl ./data/n-triple-files.zip

This will start the SemSpect server. In case the given input data was previously supplied to SemSpect, the server loads the already existing index structures. If no index is available for the given data, SemSpect will generate them first. Once the SemSpect server is initialized and ready, it displays the following terminal output:

... INFO StartupInfoLogger SemSpect:  Started SemSpectServer in x.yz seconds ...

Now switch to your preferred web browser and open http://localhost:8080/ or http://127.0.0.1:8080/ to load the SemSpect UI.

If you need to change the port, you will have to use one of the semspect-spring scripts (see the the SemSpect configuration).

Stop RDF SemSpect #

To stop SemSpect, you can abort the SemSpect server in the terminal with Ctrl-C.

Due to the multithreaded processing, depending on your OS and when you wish to stop SemSpect you might have to use Ctrl-Z to suspend the application and then terminate it with the appropriate command.

Other Commands #

To list all available commands of the terminal application enter:

> semspect.sh --help

The following commands might be useful after aborting a run:

  • --clean: Deletes the indices directory of the given RDF data sources if it already exists (before generating the indices anew).
  • --purge: Recursively deletes the directory that contains all indexed datasets.

Memory Setting #

The heap size necessary for generating or loading the indexes varies depending on your data. Based on our experience, the one pass generation may require 1.5 to 4 times the size of the uncompressed input data in ntriples format, while 0.5 to 2 times may be sufficient for the two-pass variant.

The maximal heap size can be set using the -Xmx JVM parameter (example: -Xmx16G for 16 GB). To set the JVM parameters specifically for the SemSpect script, use the SEMSPECT_JDK_OPTIONS environment variable (examples: export SEMSPECT_JDK_OPTIONS=-Xmx16G under Linux/OsX or set SEMSPECT_JDK_OPTIONS=-Xmx16G under Windows).

If the maximal heap size is not set in SEMSPECT_JDK_OPTIONS, the standard java settings will be used (environment variables JDK_JAVA_OPTIONS and JAVA_TOOL_OPTIONS or the JDK defaults settings it they are not set. The default heap size is 25% of the available physical memory for OpenJDK 17).

We recommend changing this setting to the highest acceptable value: The more memory, the fewer intermediate reorganization and compression steps will be necessary. Moreover the memory released after the generation will be used for caching, resulting in a smoother user experience.

General Settings #

The root directory of the SemSpect installation can be set through an environment variable:

  • Installation directory: SEMSPECT_HOME (default: <script location>)

Furthermore, there are settings for the paths of the configuration and output paths (semspect.sh & semspect.bat):

  • Location of the semspect configuration: SEMSPECT_CONFIG_PATH (default: <SEMSPECT_HOME>/semspect-config/semspect-config.yaml)
  • Location of the semstore configuration: SEMSTORE_CONFIG_PATH (default: <SEMSPECT_HOME>/semspect-config/default-semstore-config.yaml)
  • Location of the indices: SEMSTORE_INDICES_DIR (default: <SEMSPECT_HOME>/semspect-indices/)

Uninstallation #

SemSpect stores data on disk as well as in the web browser you have used for the SemSpect UI (see Data Privacy for details). To remove all user provided data you have to

  1. Start SemSpect and open the UI
  2. In the top menu select SemSpect / Settings / Reset local data (repeat this with all browsers in which you used SemSpect)
  3. Stop SemSpect
  4. Remove all files from the installation directory (If you have set other data directories via the configuration, also delete these directories)

Current Limitations #

  • RDF SemSpect supports in theory datasets with up to 2.14 billion triples (due to size limitation of Java collections; maximum triples tested: ~500M).
  • Currently, no data languages other than English are supported as labels for classes, properties and resources. The priority order for selecting a label shown in the UI (from high to low):
    1. literal with @en language tag
    2. literal without language tag
    3. IRI
  • The filtering and sorting of properties with RDF collections as values (rdf:List, rdf:Seq, rdf:Bag, rdf:Alt) might is some cases lead to an unexpected behavior.