(How to?) Improve performance when parsing many strings in the same format #17

robin-xyzt-ai · 2021-09-28T18:01:28Z

I was wondering if there is an option to improve the performance even further when parsing many strings that are all in the same format.
My use-case is parsing timestamps from a CSV file where the CSV file has million of rows and each of the timestamps is in the same format.
It would be ideal if I could just say to the parser: "remember that format you detected for the previous string. I'm pretty sure this string is in the same format, so try that first when parsing this string".

To illustrate this, my situation is similar to this benchmark

package com.github.sisyphsu.dateparser.benchmark;

import com.github.sisyphsu.dateparser.DateParser;
import org.openjdk.jmh.annotations.*;

import java.util.Random;
import java.util.concurrent.TimeUnit;

@Warmup(iterations = 2, time = 2)
@BenchmarkMode(Mode.AverageTime)
@Fork(2)
@Measurement(iterations = 3, time = 3)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class MultiSameBenchmark {

    private static String[] TEXTS;

    static {
        Random random = new Random(123456789l);
        TEXTS = new String[10000000];
        for(int i = 0; i < TEXTS.length; i++){
            TEXTS[i] = String.format("2020-0%d-1%d 00:%d%d:00 UTC",
                    random.nextInt(8) + 1,
                    random.nextInt(8) + 1,
                    random.nextInt(5),
                    random.nextInt(9));
        }
    }

    @Benchmark
    public void parser() {
        DateParser parser = DateParser.newBuilder().build();
        for (String text : TEXTS) {
            parser.parseDate(text);
        }
    }
}

Is there already such an option on the parser that I overlooked ?

sisyphsu added enhancement New feature or request help wanted Extra attention is needed labels Nov 27, 2021

robin-xyzt-ai mentioned this issue Jan 11, 2023

Improve performance when parsing many strings in the same format #28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(How to?) Improve performance when parsing many strings in the same format #17

(How to?) Improve performance when parsing many strings in the same format #17

robin-xyzt-ai commented Sep 28, 2021

(How to?) Improve performance when parsing many strings in the same format #17

(How to?) Improve performance when parsing many strings in the same format #17

Comments

robin-xyzt-ai commented Sep 28, 2021