Releases: NASA-IMPACT/evalem
nlp and cv namespace segregation
Disclaimer: This creates a breaking changes but only at namespace level. So, all the previous
evalem.models
,evalem.metrics
, etc are now residing atevalem.nlp.models
,evalem.nlp.metrics
.
With this new release, now evalem
has both nlp as well as cv namespace segregation:
evalem.nlp
evalem.cv
Both of these have:
- models
- metrics
- evaluation pipeline
All these are derived from bases at evalem._base
.
v0.0.3-alpha.1
This release fixes few setup related misconfigurations. See #16
v0.0.3-alpha
This release adds a simple pipeline abstraction for existing ModelWrapper
, Metric
and Evaluator
.
Changelog
Major
evalem.pipelines.SimpleEvaluationPipeline
is added that wraps existing model wrappers, metrics and evaluators to run in single coherent abstraction. see PR- More semantic metrics like Bleu, ROUGE, METEOR are added. see PR
Minor
- Test suites are refactored. For example, the model and pipeline tests suites are parameterized through
conftest.py
paradigm.
Usage
from evalem.pipelines import SimpleEvaluationPipeline
from evalem.models import TextClassificationHFPipelineWrapper
from evalem.evaluators import TextClassificationEvaluator
# can switch to any implemented wrapper
model = TextClassificationHFPipelineWrapper()
# can switch to other evaluator implementation
evaluator = TextClassificationEvaluator()
# initialize
eval_pipe = SimpleEvaluationPipeline(model=model, evaluators=evaluator)
results = pipe(inputs, references)
# or
results = pipe.run(inputs, references)
[alpha] Initial release
This release adds initial metrics and model components as:
1) Metrics
We can import various metrics from evalem.metrics
BasicMetrics
andSemanticMetrics
can be used- basic metrics are:
-F1Metric
-RecallMetric
-PrecisionMetric
-ConfusionMatrix
-AccuracyMetric
-ExactMatchMetric
- semantic metrics include
BertScore
andBartScore
These metrics can be used independently to evaluate the predictions from upstream models using references/ground-truths.
2) ModelWrapper
evalem.models
include various model wrapper implementation. See PRs this and this
-
evalem.models.QuestionAnsweringHFPipelineWrapper
andevalem.models.TextClassificationHFPipelineWrapper
are now the main wrappers for QA and Text Classification tasks respectively.- These also have better parameter initialization, allowing any suitable models and tokenizers to be used along with device types.
hf_params
dict
is also provided as a parameter that will be used for initializing the HF pipeline
-
The model wrappers utilize 2 distinct processing parameters (one for pre-preocessing and one for post-processing) which should be
Callable
(lambda function, external modules that can be called, etc.) and can be modified accordingly to pre/post processing.
3) Evaluator
evaluators provide abstraction/containerization of metrics to evaluate in group.
See PRs this, this and this
We have 2 different evaluator implementation:
evalem.evaluators.QAEvaluator
for evaluating QA metricsevalem.evaluators.TextClassificationEvaluator
for text classification
We can also directly use evalem.evaluators._base.Evaluator
to create our own custom evaluator object.