Split large RDF files into reasonably sized N-Quads files
This tool is also available as the Docker image stain/rdfsplit.
You should mount your RDF directory as the volume /data
which will be the current directory within the Docker image.
Note that for the relative paths of your shell expansion (e.g. *.ttl
)
to work, you should be in the exported directory when running docker.
To use:
docker run stain/rdfsplit rdfsplit --help
cd /home/johndoe/rdfstuff
mkdir split
docker run stain/rdfsplit -v /home/johndoe/rdfstuff:/data rdfsplit --output split/ *.ttl
First ensure you have Java 8 (or higher) and Leiningen, then:
lein uberjar
You will find the executable in the equivalent of
target/uberjar/rdfsplit-0.1.0-SNAPSHOT-standalone.jar
$ java -jar rdfsplit-0.1.0-standalone.jar --help
Options:
-o, --output OUTPUTDIR . Output directory (default is current directory)
-r, --recursive Recurse into subdirectories
-f, --force Overwrite any existing output files. Make output directory if missing.
-v, --verbose Verbose log output
-h, --help
java -jar rdfsplit-0.1.0-standalone.jar --output split/ *.ttl
Please feel free to contribute and report any issues at the rdfsplit Github repository.
Copyright © 2015-2016 Stian Soiland-Reyes
Distributed under Apache License, version 2.0.
Dependencies included in the produced standalone jar includes Apache Jena (Apache license) and Clojure (Eclipse license).