Installation & Quickstart of the RDF Server #
System Requirements #
- Java 17 or later (parsing is approx. 50% slower with Java >=19)
The RDF SemSpect server was tested under macOS, Linux, Windows - A recent web browser
WARNING: we do not recommend using Firefox under Windows, some versions have issues in the table view
Installation of the RDF SemSpect Server #
https://www.semspect.de/rdf-semspect-distribution-19.1.0.zip
This archive consists of a single directory that contains scripts for starting SemSpect. Unpack the archive into your preferred target directory. In case you want to call these from any directory, adjust your PATH environment variable accordingly.
Without a license, the usage of SemSpect is restricted. See here for an overview of the restrictions and license options.
If you want to stay informed about new versions of and announcements about RDF SemSpect, please send an e-mail to info@semspect.de.
RDF SemSpect Quickstart #
Start RDF SemSpect #
For this quickstart, we will use the single-database “smart” mode of SemSpect (for the multi-database “server” mode,
see the configuration page). Execute the semspect-smart.sh resp.semspect-smart.bat script with the run
command and supply your RDF input as a list of arguments.
Example:
% ./semspect-smart.sh run turtle-file.ttl ./data/n-triple-files.zip
In “smart” mode, in case the given input data was previously supplied to SemSpect, the server loads the already existing index structures. If no index is available for the given data, SemSpect will generate them first. Once the SemSpect server is initialized and ready, it displays the following terminal output:
... INFO StartupInfoLogger SemSpect: Started SemSpectServer in x.yz seconds ...
Now switch to your preferred web browser and open http://localhost:8080/ or http://127.0.0.1:8080/ to load the
SemSpect UI.
If necessary, you can modify the port to your liking in smart-config/semstore_config.yaml or set the output paths
for the indexes via an environment variable: see SemSpect configuration for more information.
Stop RDF SemSpect #
To stop SemSpect, abort the SemSpect server in the terminal with Ctrl-C.
Due to the multithreaded processing, depending on your OS and when you wish to stop SemSpect you might have to
use Ctrl-Z to suspend the application and then terminate it with the appropriate command.
Other Commands #
To list all available commands of the terminal application enter:
% ./semspect-smart.sh help
The following commands might be useful after aborting a run:
% ./semspect-smart.sh run --clean <SOURCE-1> <SOURCE-2> ...: Deletes the indices directory of the given RDF data sources if it already exists (before generating the indices anew). Example:% ./semspect-smart.sh run --clean data-1.ttl data-2.xml% ./semspect-smart.sh purge: Recursively deletes the directory that contains all indexed datasets.
Memory Setting #
The heap size required for generating or loading the indexes varies depending on your data. Based on our experience, the one pass generation may require 1.5 to 4 times the size of the uncompressed input data in n-triples format, while 0.5 to 2 times may be sufficient for the two-pass variant.
The maximal heap size can be set using the -Xmx JVM parameter (example: -Xmx16G for 16 GB). To set the JVM
parameters specifically for the SemSpect script, use the SEMSPECT_JDK_OPTIONS environment variable
(examples: export SEMSPECT_JDK_OPTIONS=-Xmx16G under Linux/OSX or set SEMSPECT_JDK_OPTIONS=-Xmx16G under Windows).
If the maximal heap size is not set in SEMSPECT_JDK_OPTIONS, the standard java settings will be used (environment
variables JDK_JAVA_OPTIONS and JAVA_TOOL_OPTIONS or the JDK defaults settings if they are not set. The default
heap size is 25% of the available physical memory for OpenJDK 17).
We recommend changing the memory setting to the highest acceptable value: The more memory, the fewer intermediate reorganization and compression steps will be necessary. Moreover, the memory released after the generation will be used for caching, resulting in a smoother user experience.
Supported RDF Formats #
RDF SemSpect can load one or more files or all files from a given directory or archive (zip, gzip, bz2) in the following formats:
- Turtle (
*.ttl) - Turtle with RDF-Star extension (
*.ttls) - OWL (
*.owl) - N-Triple (
*.nt) - N-Quad (
*.nq) - Notation3 (
*.n3) - RDF/XML (
*.rdf) - HDT (
*.hdt) - JSON-LD (
*.jsonld) - BinaryRDF (
*.brf) - RDF/JSON (
*.rj) - TriG (
*.trig)* - TriG with RDF-Star extension (
*.trigs)*
IRIs of the parsed resources are not validated by default. The validation can however be activated in the SemSpect configuration.
* Note that in case of the N-Quad and TriG formats, named graphs are ignored and everything is loaded into one default graph.
Uninstallation #
SemSpect stores data on disk as well as in the web browser you have used for the SemSpect UI (see Data Privacy for details). To remove all user provided data you have to:
- Start SemSpect and open the UI,
- In the top menu select
SemSpect / Settings / Reset local data(repeat this with all browsers in which you used SemSpect), - Stop SemSpect,
- Remove all files from the installation directory (If you have set other data directories via the configuration, also delete these directories).
Current Limitations #
- RDF SemSpect supports in theory datasets with up to 2.14 billion triples (due to size limitation of Java collections; maximum triples tested: ~500M).
- Invisible
nullcharacters (\u0000) inside text entries are automatically removed to simplify processing in our compressed dictionary format.