Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(How to?) Improve performance when parsing many strings in the same format #17

Open
robin-xyzt-ai opened this issue Sep 28, 2021 · 0 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@robin-xyzt-ai
Copy link

I was wondering if there is an option to improve the performance even further when parsing many strings that are all in the same format.
My use-case is parsing timestamps from a CSV file where the CSV file has million of rows and each of the timestamps is in the same format.
It would be ideal if I could just say to the parser: "remember that format you detected for the previous string. I'm pretty sure this string is in the same format, so try that first when parsing this string".

To illustrate this, my situation is similar to this benchmark

package com.github.sisyphsu.dateparser.benchmark;

import com.github.sisyphsu.dateparser.DateParser;
import org.openjdk.jmh.annotations.*;

import java.util.Random;
import java.util.concurrent.TimeUnit;

@Warmup(iterations = 2, time = 2)
@BenchmarkMode(Mode.AverageTime)
@Fork(2)
@Measurement(iterations = 3, time = 3)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class MultiSameBenchmark {

    private static String[] TEXTS;

    static {
        Random random = new Random(123456789l);
        TEXTS = new String[10000000];
        for(int i = 0; i < TEXTS.length; i++){
            TEXTS[i] = String.format("2020-0%d-1%d 00:%d%d:00 UTC",
                    random.nextInt(8) + 1,
                    random.nextInt(8) + 1,
                    random.nextInt(5),
                    random.nextInt(9));
        }
    }

    @Benchmark
    public void parser() {
        DateParser parser = DateParser.newBuilder().build();
        for (String text : TEXTS) {
            parser.parseDate(text);
        }
    }
}

Is there already such an option on the parser that I overlooked ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants