Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Cache predictions in the evaluation pipeline #14

Open
NISH1001 opened this issue Mar 30, 2023 · 2 comments
Open

[feature request] Cache predictions in the evaluation pipeline #14

NISH1001 opened this issue Mar 30, 2023 · 2 comments

Comments

@NISH1001
Copy link
Collaborator

NISH1001 commented Mar 30, 2023

What

Currently, evalem.pipelines.SimpleEvaluationPipeline is stateless. That means any forward passes (including inferencing and evaluation results) aren't cached within the pipeline object. This is fine for inference+evaluation on a small sample size. However, for a bigger size, say full-on squad v2 86k train samples, re-running the inference to get predictions is time-consuming when we want to switch the Evaluator object.

Why

To speed up evaluation without re-running forward pass on a huge dataset.
This can also help in debugging for such large samples because for large samples it's a bummer to catch the runtime errors (say tokenization error relating to weird texts, etc) at a late stage during the pipeline.

How

Maybe, we can have a new CachedSimpleEvaluationPipeline or something like that to be able to load predictions from external files (text, JSON, etc.)


cc: @muthukumaranR

@NISH1001 NISH1001 changed the title [feature request] Cache predictions in the Pipeline [feature request] Cache predictions in the evaluation pipeline Mar 30, 2023
@muthukumaranR
Copy link
Collaborator

As far as data consistency, would it be possible to enforce checks (for tokenizing) on the DTOs during, say pipeline.build()? that way you're guaranteed to not have any in pipeline.run(). essentially split the execution of pipeline into two, where in first part you handle all the checks. and in the final part you run the pipeline.

As far as caching, I think the mechanism will be useful nevertheless.

@NISH1001
Copy link
Collaborator Author

NISH1001 commented Apr 3, 2023

I like the build(...) mechanism. Will add to my to-do list. Right now, what we're basically doing is passing texts as it is to the transformers.pipeline(...) which implicitly handles all the tokenization, forward-pass, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants