Installation and Quickstart

Installation & Quickstart of RDF SemSpect Server #

System Requirements #

  • RDF SemSpect was tested with the following operating systems: OS X, Linux, Windows
  • It requires Java 17 or later (parsing is approx. 50% slower with Java >=19)
  • It requires a valid license (except for the free version)

Supported RDF Formats #

RDF SemSpect can load one or more files or all files from a given directory or archive (zip, gzip, bz2) in the following formats:

  • Turtle (*.ttl)
  • Turtle with RDF-Star extension (*.ttls)
  • OWL (*.owl)
  • N-Triple (*.nt)
  • N-Quad (*.nq)
  • Notation3 (*.n3)
  • RDF/XML (*.rdf)
  • HDT (*.hdt)
  • JSON-LD (*.jsonld)
  • BinaryRDF (*.brf)
  • RDF/JSON (*.rj)
  • TriG (*.trig)*
  • TriG with RDF-Star extension (*.trigs)*

IRIs of the parsed resources are not validated by default. The validation can however be activated in the SemSpect configuration.

* Note that in case of the N-Quad and TriG formats, named graphs are ignored and everything is loaded into one default graph.

Installation of the RDF SemSpect Server #

Download the RDF SemSpect server package from:

https://www.semspect.de/rdf-semspect-distribution-17.0.0.zip

Note that this is full-feature time limited free version of RDF SemSpect until December 31, 2024.

Unpack this archive into your preferred target directory. The resulting directory contains scripts for starting SemSpect. In case you want to call these from any directory, adjust your PATH environment variable accordingly.

If you want to stay informed about new versions of and announcements about RDF SemSpect, please send an email to info@semspect.de.

RDF SemSpect Quickstart #

Start RDF SemSpect #

To start SemSpect execute the semspect.sh resp. semspect.bat script with the dedicated run command and supply your RDF input as list of arguments. For instance:

> semspect.sh run turtle-file.ttl ./data/n-triple-files.zip

This will start the SemSpect server. In case the given input data was previously supplied to SemSpect, the server loads the already existing index structures. If no index is available for the given data, SemSpect will generate them first. Once the SemSpect server is initialized and ready, it displays the following terminal output:

... INFO StartupInfoLogger SemSpect:  Started SemSpectServer in x.yz seconds ...

Now switch to your preferred web browser and open http://localhost:8080/ or http://127.0.0.1:8080/ to load the SemSpect UI.

If you need to change the port, you have to modify the config/default-semstore-config.yaml (see SemSpect configuration for more information ).

Stop RDF SemSpect #

To stop SemSpect, you can abort the SemSpect server in the terminal with Ctrl-C.

Due to the multithreaded processing, depending on your OS and when you wish to stop SemSpect you might have to use Ctrl-Z to suspend the application and then terminate it with the appropriate command.

Other Commands #

To list all available commands of the terminal application enter:

> semspect.sh help

The following commands might be useful after aborting a run:

  • semspect.sh run --clean <SOURCE-1> <SOURCE-2> ...: Deletes the indices directory of the given RDF data sources if it already exists (before generating the indices anew). Example: semspect.sh run data-1.ttl data-2.xml --clean
  • semspect.sh purge: Recursively deletes the directory that contains all indexed datasets.

Memory Setting #

The heap size necessary for generating or loading the indexes varies depending on your data. Based on our experience, the one pass generation may require 1.5 to 4 times the size of the uncompressed input data in ntriples format, while 0.5 to 2 times may be sufficient for the two-pass variant.

The maximal heap size can be set using the -Xmx JVM parameter (example: -Xmx16G for 16 GB). To set the JVM parameters specifically for the SemSpect script, use the SEMSPECT_JDK_OPTIONS environment variable (examples: export SEMSPECT_JDK_OPTIONS=-Xmx16G under Linux/OsX or set SEMSPECT_JDK_OPTIONS=-Xmx16G under Windows).

If the maximal heap size is not set in SEMSPECT_JDK_OPTIONS, the standard java settings will be used (environment variables JDK_JAVA_OPTIONS and JAVA_TOOL_OPTIONS or the JDK defaults settings if they are not set. The default heap size is 25% of the available physical memory for OpenJDK 17).

We recommend changing the memory setting to the highest acceptable value: The more memory, the fewer intermediate reorganization and compression steps will be necessary. Moreover, the memory released after the generation will be used for caching, resulting in a smoother user experience.

General Settings #

The root directory of the SemSpect installation can be set by an environment variable:

  • Installation directory: SEMSPECT_HOME (default: <script location>)

Furthermore, there are settings for the paths of the configuration and output paths (semspect.sh & semspect.bat):

  • Location of the semspect configuration: SEMSPECT_CONFIG_PATH (default: <SEMSPECT_HOME>/config/semspect-config.yaml)
  • Location of the semstore configuration: SEMSTORE_CONFIG_PATH (default: <SEMSPECT_HOME>/config/default-semstore-config.yaml)
  • Location of the indices: SEMSTORE_INDICES_DIR (default: <SEMSPECT_HOME>/indices/)
  • Location of dataset configurations: SEMSPECT_CONFIGS_DIR (default: <SEMSPECT_HOME>/config/datasets/)

Uninstallation #

SemSpect stores data on disk as well as in the web browser you have used for the SemSpect UI (see Data Privacy for details). To remove all user provided data you have to

  1. Start SemSpect and open the UI
  2. In the top menu select SemSpect / Settings / Reset local data (repeat this with all browsers in which you used SemSpect)
  3. Stop SemSpect
  4. Remove all files from the installation directory (If you have set other data directories via the configuration, also delete these directories)

Current Limitations #

  • RDF SemSpect supports in theory datasets with up to 2.14 billion triples (due to size limitation of Java collections; maximum triples tested: ~500M).
  • Currently, no data languages other than English are supported as labels for classes, properties and resources. The priority order for selecting a label shown in the UI (from high to low):
    1. literal with @en language tag
    2. literal without language tag
    3. IRI