Obtain confusion matrix from spacy evaluate command #9055

felipefoschiera · 2021-08-25T21:45:18Z

felipefoschiera
Aug 25, 2021

Hi, I have a pipeline on the spaCy CLI in which one of the commands runs python -m spacy evaluate [model] [data_path] --output [output_path].
The metrics.json file generated contains all the information on precision, recall and F-score, but I would like to know if it is possible to export the confusion matrix data (TP, FP, TN, FN).

Answered by felipefoschiera

Aug 28, 2021

I was able to find the variables for true positives (tp), false positives (fp) and false negatives (fn) in the spaCy/spacy/scorer.py file, and then include them in the export metrics dictionary.

View full answer

polm · 2021-08-26T04:32:40Z

polm
Aug 26, 2021

We don't currently have that feature, no. It would definitely be useful though.

We recently added the ability to specify custom scorers (#8766, #8929), so I don't think it'd be too hard to implement a custom one to do this, depending on which component you're doing it for.

1 reply

felipefoschiera Aug 26, 2021
Author

Thank you! I'll take a look, but I'm not very familiar with implementing these custom functions..

felipefoschiera · 2021-08-28T00:21:32Z

felipefoschiera
Aug 28, 2021
Author

I was able to find the variables for true positives (tp), false positives (fp) and false negatives (fn) in the spaCy/spacy/scorer.py file, and then include them in the export metrics dictionary.

1 reply

zakcali Sep 15, 2024

did you find the True negatives? How to find true negatives, and how to find support number for each entity type?
is calculating true positives and true negatives enough to find support values? I asked gemini 1.5 pro to patch scorer.py, but it can't calculate True negatives, it is always zero:

`def get_ner_prf(examples: Iterable[Example], **kwargs) -> Dict[str, Any]:
"""Compute micro-PRF and per-entity PRF scores, TP, FP, TN, FN for a sequence of examples."""
score_per_type = defaultdict(PRFScore)
for eg in examples:
if not eg.y.has_annotation("ENT_IOB"):
continue
golds = {(e.label_, e.start, e.end) for e in eg.y.ents}
align_x2y = eg.alignment.x2y
for pred_ent in eg.x.ents:
if pred_ent.label_ not in score_per_type:
score_per_type[pred_ent.label_] = PRFScore()
indices = align_x2y[pred_ent.start : pred_ent.end]
if len(indices):
g_span = eg.y[indices[0] : indices[-1] + 1]
# Check we aren't missing annotation on this span. If so,
# our prediction is neither right nor wrong, we just
# ignore it.
if all(token.ent_iob != 0 for token in g_span):
key = (pred_ent.label_, indices[0], indices[-1] + 1)
if key in golds:
score_per_type[pred_ent.label_].tp += 1
golds.remove(key)
else:
score_per_type[pred_ent.label_].fp += 1
for label, start, end in golds:
score_per_type[label].fn += 1

# Calculate TN for each entity type
for entity_type in score_per_type:
    tp = score_per_type[entity_type].tp
    fp = score_per_type[entity_type].fp
    fn = score_per_type[entity_type].fn
    # Assuming total possible entities are the sum of TP, FP, and FN for that type
    total_possible = tp + fp + fn
    # TN is calculated as the difference between total possible and the sum of TP, FP, and FN
    score_per_type[entity_type].tn = total_possible - (tp + fp + fn)

totals = PRFScore()
for prf in score_per_type.values():
    totals += prf
if len(totals) > 0:
    return {
        "ents_p": totals.precision,
        "ents_r": totals.recall,
        "ents_f": totals.fscore,
        "ents_per_type": {
            k: {
                "p": v.precision,
                "r": v.recall,
                "f": v.fscore,
                "tp": v.tp,
                "fp": v.fp,
                "tn": v.tn,  # Added TN
                "fn": v.fn,
            }
            for k, v in score_per_type.items()
        },
    }
else:
    return {
        "ents_p": None,
        "ents_r": None,
        "ents_f": None,
        "ents_per_type": None,
    }`

MightyGoldenOctopus · 2021-11-03T16:48:17Z

MightyGoldenOctopus
Nov 3, 2021

A simple CLI solution can be made quite easily from already posted solutions, here is an simple script you can use with mostly the same usage: python generate_confusion_matrix.py [model_dir] [ner_jsonl_path] [output_dir]. It takes as input a Prodigy-generated annotations .jsonl file.

Here is the source code:

import srsly
import typer
import warnings
from pathlib import Path
import spacy
import numpy
import os
import pandas as pd

from matplotlib import pyplot
from sklearn.metrics import confusion_matrix
from tqdm import tqdm
from spacy.training import offsets_to_biluo_tags


def _load_data(file_path):
    samples, entities_count = [], 0
    for line in srsly.read_jsonl(file_path):
        sample = {
            "text": line["text"],
            "entities": []
        }
        if "spans" in line.keys():
            entities = [(s["start"], s["end"], s["label"]) for s in line["spans"]]
            sample["entities"] = entities
            entities_count += len(entities)
        else:
            warnings.warn("Sample without entities!")
        samples.append(sample)
    return samples, entities_count


def _get_cleaned_label(label: str):
    if "-" in label:
        return label.split("-")[1]
    else:
        return label


def _create_total_target_vector(nlp, samples):
    target_vector = []
    for sample in samples:
        doc = nlp.make_doc(sample["text"])
        ents = sample["entities"]
        bilou_ents = offsets_to_biluo_tags(doc, ents)
        vec = [_get_cleaned_label(label) for label in bilou_ents]
        target_vector.extend(vec)
    return target_vector


def _get_all_ner_predictions(nlp, text):
    doc = nlp(text)
    entities = [(e.start_char, e.end_char, e.label_) for e in doc.ents]
    bilou_entities = offsets_to_biluo_tags(doc, entities)
    return bilou_entities


def _create_prediction_vector(nlp, text):
    return [_get_cleaned_label(prediction) for prediction in _get_all_ner_predictions(nlp, text)]


def _create_total_prediction_vector(nlp, samples):
    prediction_vector = []
    for i in tqdm(range(len(samples))):
        sample = samples[i]
        prediction_vector.extend(_create_prediction_vector(nlp, sample["text"]))
    return prediction_vector


def _plot_confusion_matrix(cm, classes, normalize=False, text=True, cmap=pyplot.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """

    title = "Confusion Matrix"

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, numpy.newaxis]

    fig, ax = pyplot.subplots()
    im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
    ax.figure.colorbar(im, ax=ax)
    # We want to show all ticks...
    ax.set(xticks=numpy.arange(cm.shape[1]),
           yticks=numpy.arange(cm.shape[0]),
           # ... and label them with the respective list entries
           xticklabels=classes, yticklabels=classes,
           title=title,
           ylabel='True label',
           xlabel='Predicted label')

    # Rotate the tick labels and set their alignment.
    pyplot.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")

    # Loop over data dimensions and create text annotations.
    if text:
        fmt = '.2f' if normalize else 'd'
        thresh = cm.max() / 2.
        for i in range(cm.shape[0]):
            for j in range(cm.shape[1]):
                ax.text(j, i, format(cm[i, j], fmt),
                        ha="center", va="center",
                        color="white" if cm[i, j] > thresh else "black")
    fig.tight_layout()
    return ax, pyplot


def get_confusion_matrix(model_path: Path, data_path: Path, output_dir: Path):
    spacy.prefer_gpu()
    nlp = spacy.load(model_path)
    print(f"Loaded SpaCy pipeline.")
    samples, entities_count = _load_data(data_path)
    print(f"Loaded {len(samples)} samples including {entities_count} entities.")
    classes = sorted(set(_create_total_target_vector(nlp, samples)))
    print(f"Identified {len(classes)} classes: {', '.join(classes)}")
    y_true = _create_total_target_vector(nlp, samples)
    print("Computed target vector!")
    print("Computing prediction vector...")
    y_pred = _create_total_prediction_vector(nlp, samples)
    matrix = confusion_matrix(y_true, y_pred, labels=classes)
    print("Generated confusion matrix!")
    cm_df = pd.DataFrame(matrix, columns=classes)
    cm_df.insert(0, "TARGETS", classes)
    ax, plot = _plot_confusion_matrix(matrix, classes, normalize=True, text=False)
    print("Plotted confusion matrix!")
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    print(f"Saving rendered image to: {output_dir}/confusion.png")
    pyplot.savefig(f"{output_dir}/confusion.png")
    print(f"Saving confusion matrix data to: {output_dir}/confusion.csv")
    cm_df.to_csv(f"{output_dir}/confusion.csv")
    print("Finished!")


if __name__ == "__main__":
    typer.run(get_confusion_matrix)

And the requirements.txt:

spacy
typer
sklearn
numpy
pandas
matplotlib
tqdm

It will generate a .png rendering of the confusion matrix heatmap and a .csv with the matrix values inside the directory specified by the output_dir parameters. Run example:

(env) C:\Users\XXXX>python generate_confusion_matrix.py ./models/XXXX ./datasets/XXXX/train.jsonl ./confusion_train_out
Loaded SpaCy pipeline.
Loaded XXXX samples including XXXX entities.
Identified 3 classes: PER, ORG, DATE
Computed target vector!
Computing prediction vector...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| XXXX/XXXX [XX:XX<00:00, 26.68it/s]
Generated confusion matrix!
Plotted confusion matrix!
Saving rendered image to: confusion_train_out/confusion.png
Saving confusion matrix data to: confusion_train_out/confusion.csv
Finished!

(env) C:\Users\XXXX>

0 replies

TheMcSebi · 2023-01-14T10:28:05Z

TheMcSebi
Jan 14, 2023

For everyone wanting to load the evaluation data from a DocBin file, here's an adjusted version of the _load_data() function.
Make sure to pass the nlp object to the function aswell.

def _load_data(file_path: str, nlp):
    doc_bin = DocBin().from_disk(file_path)
    samples, entities_count = [], 0
    for doc in doc_bin.get_docs(nlp.vocab):
        sample = {
            "text": doc.text,
            "entities": []
        }
        if len(doc.ents) > 0:
            entities = [(e.start_char, e.end_char, e.label_) for e in doc.ents]
            sample["entities"] = entities
            entities_count += len(entities)
        else:
            warnings.warn("Sample without entities!")
        samples.append(sample)
    return samples, entities_count

0 replies

zakcali · 2024-09-15T11:28:39Z

zakcali
Sep 15, 2024

I asked gemini 1.5 pro to make a patch to give tp,fp,fn,support values. I call the function in command line as this
python -m spacy benchmark accuracy .\output\model-best\ .\data\test.spacy --output metrics.json --gpu-id 0

It also calculates overall accuracy (patch by ChatGPT 4o).
true negatives (tn) are not in the scorer.py, and I can not find a proper way to calculate correctly
Here is the patched get_ner_prf function of scorer.py from spacy V3.7.5:

def get_ner_prf(examples: Iterable[Example], **kwargs) -> Dict[str, Any]:
    """Compute micro-PRF, per-entity PRF scores, overall accuracy, and overall support."""
    score_per_type = defaultdict(PRFScore)
    total_tp, total_fp, total_fn = 0, 0, 0  # Initialize overall counts for accuracy and support calculation

    for eg in examples:
        if not eg.y.has_annotation("ENT_IOB"):
            continue
        golds = {(e.label_, e.start, e.end) for e in eg.y.ents}
        align_x2y = eg.alignment.x2y
        for pred_ent in eg.x.ents:
            if pred_ent.label_ not in score_per_type:
                score_per_type[pred_ent.label_] = PRFScore()
            indices = align_x2y[pred_ent.start : pred_ent.end]
            if len(indices):
                g_span = eg.y[indices[0] : indices[-1] + 1]
                # Check we aren't missing annotation on this span. If so,
                # our prediction is neither right nor wrong, we just ignore it.
                if all(token.ent_iob != 0 for token in g_span):
                    key = (pred_ent.label_, indices[0], indices[-1] + 1)
                    if key in golds:
                        score_per_type[pred_ent.label_].tp += 1
                        total_tp += 1  # Update total TP
                        golds.remove(key)
                    else:
                        score_per_type[pred_ent.label_].fp += 1
                        total_fp += 1  # Update total FP
        for label, start, end in golds:
            score_per_type[label].fn += 1
            total_fn += 1  # Update total FN

    totals = PRFScore()
    for prf in score_per_type.values():
        totals += prf

    # Calculate overall accuracy
    total_predictions = total_tp + total_fp + total_fn
    overall_accuracy = total_tp / total_predictions if total_predictions > 0 else None

    # Calculate overall support
    overall_support = total_tp + total_fn  # Total TP + FN is the overall support

    if len(totals) > 0:
        return {
            "ents_p": totals.precision,
            "ents_r": totals.recall,
            "ents_f": totals.fscore,
            "ents_acc": overall_accuracy,  # Include overall accuracy
            "ents_support": overall_support,  # Include overall support
            "ents_per_type": {
                k: {
                    "p": v.precision,
                    "r": v.recall,
                    "f": v.fscore,
                    "tp": v.tp,
                    "fp": v.fp,
                    "fn": v.fn,
                    "support": v.tp + v.fn,
                }
                for k, v in score_per_type.items()
            },
        }
    else:
        return {
            "ents_p": None,
            "ents_r": None,
            "ents_f": None,
            "ents_acc": None,
            "ents_support": None,
            "ents_per_type": None,
        }

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Obtain confusion matrix from spacy evaluate command #9055

{{title}}

Replies: 5 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Obtain confusion matrix from spacy evaluate command #9055

felipefoschiera Aug 25, 2021

Replies: 5 comments · 2 replies

polm Aug 26, 2021

felipefoschiera Aug 26, 2021 Author

felipefoschiera Aug 28, 2021 Author

zakcali Sep 15, 2024

MightyGoldenOctopus Nov 3, 2021

TheMcSebi Jan 14, 2023

zakcali Sep 15, 2024

felipefoschiera
Aug 25, 2021

Replies: 5 comments 2 replies

polm
Aug 26, 2021

felipefoschiera Aug 26, 2021
Author

felipefoschiera
Aug 28, 2021
Author

MightyGoldenOctopus
Nov 3, 2021

TheMcSebi
Jan 14, 2023

zakcali
Sep 15, 2024