[feature request] Cache predictions in the evaluation pipeline #14

NISH1001 · 2023-03-30T15:08:32Z

What

Currently, evalem.pipelines.SimpleEvaluationPipeline is stateless. That means any forward passes (including inferencing and evaluation results) aren't cached within the pipeline object. This is fine for inference+evaluation on a small sample size. However, for a bigger size, say full-on squad v2 86k train samples, re-running the inference to get predictions is time-consuming when we want to switch the Evaluator object.

Why

To speed up evaluation without re-running forward pass on a huge dataset.
This can also help in debugging for such large samples because for large samples it's a bummer to catch the runtime errors (say tokenization error relating to weird texts, etc) at a late stage during the pipeline.

How

Maybe, we can have a new CachedSimpleEvaluationPipeline or something like that to be able to load predictions from external files (text, JSON, etc.)

cc: @muthukumaranR

The text was updated successfully, but these errors were encountered:

muthukumaranR · 2023-04-03T14:13:34Z

As far as data consistency, would it be possible to enforce checks (for tokenizing) on the DTOs during, say pipeline.build()? that way you're guaranteed to not have any in pipeline.run(). essentially split the execution of pipeline into two, where in first part you handle all the checks. and in the final part you run the pipeline.

As far as caching, I think the mechanism will be useful nevertheless.

NISH1001 · 2023-04-03T15:18:07Z

I like the build(...) mechanism. Will add to my to-do list. Right now, what we're basically doing is passing texts as it is to the transformers.pipeline(...) which implicitly handles all the tokenization, forward-pass, etc.

NISH1001 changed the title ~~[feature request] Cache predictions in the Pipeline~~ [feature request] Cache predictions in the evaluation pipeline Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] Cache predictions in the evaluation pipeline #14

[feature request] Cache predictions in the evaluation pipeline #14

NISH1001 commented Mar 30, 2023 •

edited

Loading

muthukumaranR commented Apr 3, 2023

NISH1001 commented Apr 3, 2023

[feature request] Cache predictions in the evaluation pipeline #14

[feature request] Cache predictions in the evaluation pipeline #14

Comments

NISH1001 commented Mar 30, 2023 • edited Loading

What

Why

How

muthukumaranR commented Apr 3, 2023

NISH1001 commented Apr 3, 2023

NISH1001 commented Mar 30, 2023 •

edited

Loading