Skip to content

Performance Comparison on Spark

Guanghui.Zhu edited this page Jan 31, 2018 · 1 revision

Performance Comparison

DGST outperforms the state-of-the-art ERa algorithm with about 3 times speedup on both DNA and English text datasets.

DNA Dataset

We first compare the performance of DGST with ERa on the DNA dataset. We extract strings of different lengths from the Pine genome (with a total length of 12 GBps). The performance comparison is shown below. We can see that DGST performs with 3 times speedup on average.

English text Dataset

We also compare the performance of DGST with ERa on the English text dataset. We extract strings of different lengths from the Wikipedia (with a total length of 10G characters). The performance comparison is shown below. We can see that DGST achieves 2.6 times speedup on average.

Clone this wiki locally