Skip to content

DGST Configuration

Guanghui.Zhu edited this page Feb 7, 2018 · 5 revisions

Configuration File

The default configuration file of DGST is conf/conf.properties.

Core Parameters

The core parameters required by DGST are:

Parameter Name Default Meaning
alphabet.num 128 Size of alphabet
div.start 2 Initial count window size in sub-tree partitioning
div.step 4 Count window step size in sub-tree partitioning
root.max.count 2000000 Maximum sub-tree size (i.e., maximum S-prefix frequency)
fs.extra.len 1024 Tail length of input split
first.buffer 10 Number of symbols in the first element of the local LCP-Range array
lcp.range 128 Size of range in the LCP-Range structure
sorting.method java Element-wise sorting method
grouping.method bfhg Sub-tree construction task allocation strategy
spark.partitions 96 Computation parallelism on Spark
input.dir hdfs://master:9000/input The input data path on HDFS or local file system
output.location hdfs://master:9000/output The output data path on HDFS or local file system
working.dir hdfs://master:9000/tmp The tempoaray data path on HDFS or local file system
merged.filename merged Name of the merged string
Clone this wiki locally