guides/yolo-performance-metrics/ #8790

2024-03-08T11:15:28Z

giscus[bot]
bot Mar 8, 2024

guides/yolo-performance-metrics/

A comprehensive guide on various performance metrics related to YOLOv8, their significance, and how to interpret them.

https://docs.ultralytics.com/guides/yolo-performance-metrics/

HaldunMatar · 2024-03-08T11:15:29Z

HaldunMatar
Mar 8, 2024 — with giscus

We wish there were illustrative charts for each scale.

3 replies

pderrenger Mar 8, 2024
Maintainer

@HaldunMatar thank you for your suggestion! 🌟 We're always looking to improve our documentation and provide more value to our users. Adding illustrative charts for each scale is a great idea to enhance understanding. While we work on incorporating this into our documentation, you might find our Performance Metrics Deep Dive helpful. It covers various metrics in detail, which could complement your understanding as we update our resources. Stay tuned for updates, and feel free to share any more feedback or ideas you have!

StevanCakic Apr 9, 2024 — with giscus

Maybe I have a stupid question because I can't find direct answer, but here it is :)

The metric mAP50-95 observed during the training phase refers to the mean average precision on the train dataset, or on validation dataset? It is clear for metrics like loss because we see both training (train/box_loss) and validation (val/box_loss) losses reported, but there might be some uncertainty regarding the mAP metric.
If it represents mAP on the training dataset, is it possible to have mAP calculated for both the training and validation datasets concurrently during training, or if a separate validation mode is required with model.val()

pderrenger Apr 9, 2024
Maintainer

@StevanCakic discussion 📊 Great question! The mAP50-95 metric you see during the training phase actually refers to the performance on the validation dataset, not the training dataset. This distinction is significant because evaluating on the validation set provides an unbiased measure of how well the model is expected to perform on unseen data. 👍

If you're interested in calculating mAP for both the training and validation datasets concurrently, you typically would focus on the validation set for performance evaluation during training. However, if you specifically want to assess overfitting or analyze the model's performance on the training set, you can separately run a validation using your training set by specifying it with the data argument in model.val(data='your_train_dataset.yaml').

Here's a quick example for the validation mode:

from ultralytics import YOLO

# Load your model
model = YOLO('path/to/your/model.pt')  # replace with your model path

# Validate on custom dataset
results = model.val(data='your_custom_dataset.yaml')  # Ensure your YAML points to your desired dataset

Hope this helps clear up the confusion! If you have more questions or need further clarifications, feel free to ask. 🌟

lungger · 2024-04-10T12:39:10Z

lungger
Apr 10, 2024 — with giscus

Hello, I've noticed that in the precision curve, beyond the maximum confidence predicted by the model, precision is set to 1. I'm curious as to why it is not set to 0, as it seems more logical to me that when the threshold is too high, the model does not make any predictions, hence setting it to 0 would be more reasonable. Is there any literature discussing the definition of the precision curve? Thank you.

1 reply

glenn-jocher Apr 10, 2024
Maintainer

@lungger hello! Great question! 🌟 Precision is defined as the number of correct positive results divided by the number of positive results predicted by the classifier. In the context you've described, when the confidence threshold is beyond the maximum predicted by the model and leads to zero predictions, setting precision to 1 instead of 0 can seem counterintuitive.

However, this approach prevents the precision from unfairly penalizing the model at extremely high confidence thresholds where no predictions are made. The idea is that if we make no predictions (and presumably, there are no objects to detect), we should not penalize the model's precision. This is a convention used in some implementations to ensure the curve starts from a "perfect precision" point when no false positives are introduced due to an overly strict threshold. It's more about the convention used in the calculation rather than a strict rule.

For literature discussing precision and other performance metrics, the COCO detection challenge and Pascal VOC are good starting points, as they are foundational to many object detection benchmarks and their metric definitions.

If you're working with a specific dataset or benchmark, it might be worth exploring their documentation or associated publications for further details on how they define and utilize precision curves.

Here's a simplistic graphical way to think about it:

# Assuming an overly simplified scenario
thresholds = [0.5, 0.9, 1.0]
predictions = [1, 0] # At th=0.5, we predict correctly; at th=0.9, no predictions
precisions = [1, 1] # Precision stays 1 as no false positives are introduced

# This visualization is over-simplified but illustrates why precision might not drop to 0 immediately.

Hope that sheds some light on the reasoning! If you have more queries or need clarification on other aspects, feel free to ask.

Wang-taoshuo · 2024-04-11T15:07:56Z

Wang-taoshuo
Apr 11, 2024 — with giscus

The train/dfl_loss is also a training result chart that was generated. What is the meaning of this chart?

1 reply

pderrenger Apr 11, 2024
Maintainer

The dfl_loss (Distribution Focal Loss) chart you're inquiring about shows the performance related to the Distribution Focal Loss over the training period. This loss function aims to enhance the model's accuracy, especially in complex object detection scenarios by focusing more on hard-to-detect objects. Essentially, it adjusts how the model weights errors differently depending on their difficulty, helping to improve overall performance. This chart helps you monitor how well the model is learning to manage these challenging cases over time. 📊

Chinmoy-Nath · 2024-04-26T05:43:30Z

Chinmoy-Nath
Apr 26, 2024 — with giscus

does it provide plotting for accuracy, sensitivity and specificity ?

3 replies

glenn-jocher Apr 26, 2024
Maintainer

Yes, YOLOv8 provides extensive performance metrics including precision and recall which can be used to derive sensitivity (recall) and specificity. However, accuracy is directly provided, but sensitivity and specificity require a bit of calculation. Here's a brief on how you can interpret these from YOLOv8's output:

Precision (P): Directly reported per class and can be used as is.
Recall (R) or Sensitivity: Also directly reported per class. This is your sensitivity.
Specificity: While not directly reported, specificity depends on the true negatives (TN), which isn't directly provided but can be inferred from the confusion matrix.

To calculate specificity, you'd typically need to know the number of true negatives (TN), which isn't a direct output for many detection models, including YOLO. However, you can get a good understanding of the model's performance on false positives and true negatives by looking at the Precision-Recall (PR) curve and the confusion matrix included in the validation results. These can give insights into how well the model is distinguishing between classes (including background as a class for negatives).

Keep in mind, for object detection, where the focus is often on identifying objects rather than classifying entire images, these metrics might need to be interpreted with the specific context of your application in mind.

For numerical calculations of specificity from YOLOv8 metrics or any custom needs, it might require a bit of post-processing of the results. If you're looking for a code snippet to plot these metrics, the documentation and output files from model.val() provide a good starting point, alongside tools like matplotlib for custom plotting.

Hope this answers your question! 😊

ghost May 7, 2024 — with giscus

Anyone have an idea how i get from the results.csv-file a PR and a F1-Score plot? (YOLOv8)

glenn-jocher May 7, 2024
Maintainer

Hello! To generate Precision-Recall (PR) and F1-Score plots from your results.csv file using YOLOv8, you can leverage Python libraries like Matplotlib for plotting. Here’s a quick example to help you visualize these metrics:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import precision_recall_curve, f1_score

# Load your results
data = pd.read_csv('results.csv')
y_true = data['true_label'].values  # Your actual labels
y_scores = data['model_scores'].values  # Your model's scores

# Calculate Precision-Recall values
precision, recall, _ = precision_recall_curve(y_true, y_scores)

# Calculate F1-Score
f1 = f1_score(y_true, (y_scores > 0.5).astype(int))  # Threshold = 0.5

# Plotting
plt.figure(figsize=(10, 5))

# Subplot 1: PR Curve
plt.subplot(1, 2, 1)
plt.plot(recall, precision, marker='.')
plt.title('Precision-Recall curve')
plt.xlabel('Recall')
plt.ylabel('Precision')

# Subplot 2: F1 Score
plt.subplot(1, 2, 2)
plt.bar(['F1 Score'], [f1])
plt.ylim(0, 1)
plt.ylabel('F1 Score')

plt.tight_layout()
plt.show()

This script first calculates the precision and recall from the scores and then plots them alongside a bar chart for the F1-Score. Adjust the path to your results.csv file accordingly, and ensure it contains the columns true_label and model_scores. Happy plotting! 😊📊

GenieV19 · 2024-05-15T12:22:54Z

GenieV19
May 15, 2024 — with giscus

Firstly, thank you so much for providing so much comprehensive documentation, its is honestly so so helpful.
I have a question on how to change the weighting of the different metrics. I would like to reduce the importance of exactly predicting the correct coordinates for bounding boxes, and instead prioritise correct classification of my objects. Please could you point me in the right direction with this? Thank you so much in advance :)

5 replies

pderrenger May 15, 2024
Maintainer

Hi there! I'm glad you find the documentation helpful! 👍 To adjust the importance of different metrics like reducing the weight on exact bounding box coordinates and prioritizing object classification, you can modify the loss weights in your YOLOv8 model configuration.

Here's a basic outline for customizing these weights:

Locate your model's YAML file (e.g., yolov8n.yaml).
Adjust the loss weights:
- Decrease box weight to reduce the emphasis on bounding box coordinates.
- Increase cls (classification) weight to prioritize correct classification.

Here's an example snippet from a YOLOv8 YAML configuration file:

# Example modifying weights in yolov8n.yaml
nc: 80  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple

# Loss weights (box, obj, cls)
box: 0.5  # Lower value reduces emphasis on box coordinate accuracy
obj: 1.0
cls: 2.0  # Higher value increases emphasis on classification accuracy

Adjust these weights according to your specific needs. The changes will direct the training to focus more on classification accuracy than the precise localization of bounding boxes.

For more guidance on how to apply these changes, you might check out the sections on model configuration and training adjustments in our documentation. Let me know if you need further assistance! 😊

GuenKainto May 16, 2024 — with giscus

Hi, I have a question as to why when confidence increases, precision increases but recall decreases. Can I understand confident here like putting conf=0.2 to reduce reliable results below 0.2?
And one more thing I learned is that the Confusion Matrix is calculated with iou = 0.25 while defaulting when val iou = 0.6. Does that make the confusion matrix any more effective? As I understand it, it will reduce duplicate results with iou > 0.25 with NMS.

pderrenger May 16, 2024
Maintainer

@GuenKainto hello! Great questions, let's dive into them! 😊

Precision and Recall Trade-off: Increasing the confidence threshold generally increases precision but can decrease recall. This happens because a higher threshold means only the predictions with higher certainty are considered, reducing false positives (increasing precision) but potentially missing true positive detections (decreasing recall). Setting conf=0.2 means you are considering detections with a confidence score above 0.2, indeed filtering out less reliable detections.
Confusion Matrix with Different IoU: Using an IoU of 0.25 for the confusion matrix versus the default IoU of 0.6 for other evaluations can indeed affect how the confusion matrix appears. A lower IoU threshold might count overlaps that are not considered in more stringent IoU settings, helping to reduce duplicates due to the application of Non-Max Suppression (NMS). This approach confirms detections more inclusively, which may offer a broader view of how many objects the model detects, though it might lessen the precision in some cases.

If you are aiming to refine how your model handles the balance between detecting as many relevant objects as possible versus ensuring the detections are correct, you might consider adjusting these parameters based on what fits best for your specific scenario and dataset. Test with different configurations to find the balance that meets your objectives.

Hope this clears your doubts! Let me know if you have more questions. 🌟

GuenKainto May 16, 2024

Thank you for this information ^.^

pderrenger May 16, 2024
Maintainer

Hey, thanks for reaching out and for the kind words! 😊

To focus more on classification accuracy over precise bounding box coordinates, you can adjust the loss weights in your model configuration. Generally, these changes involve reducing the box weight and increasing the cls weight. Here’s how you can do it:

Locate the .yaml configuration file for your model setup.

Modify the weights:

# Example from a YOLOv8 YAML configuration
box: 0.5  # lower this value
cls: 2.0  # increase this value

This directs the model's learning to emphasize correct classification over bounding box precision.

Just remember to balance these adjustments to avoid potential trade-offs in overall model accuracy!

If you need further details, this kind of adjustment is discussed in the "Performance Metrics Deep Dive" section in our documentation. Keep experimenting to find the optimal settings that meet your specific needs! 🚀

lishihong-1 · 2024-05-18T06:02:25Z

lishihong-1
May 18, 2024 — with giscus

I want to know when I use the val command, does the output of mAP50 for a single class in fact refer to AP50 for a single class? Because mAP50 theoretically evaluate all classes.

4 replies

pderrenger May 18, 2024
Maintainer

Yes, you are correct! When you use the val command in YOLOv8 and see the output for mAP50 for a single class, it indeed refers to the AP50 for that specific class. In the context of mAP50, the metric is averaged over all classes, but when it's reported for a single class, it represents just the AP50, detailing how well the model predicts that particular class at an IoU threshold of 0.50. This can be especially useful for understanding performance across different classes in your dataset. If you have any more questions or need further clarification, feel free to ask! 🌟

lishihong-1 May 20, 2024

I noticed that when I used the val command, the data printed contained a Precision and a Recall value for all classes. I want to know that what's the meaning of the two values and how to calculate them in the official code? For the reason that the P and R value are usually used for a single class.

lishihong-1 May 20, 2024

the printed data like this:

pderrenger May 20, 2024
Maintainer

Hello! In the YOLOv8 validation output, the Precision (P) and Recall (R) values you see for all classes are aggregated metrics. They provide an overall view of the model's performance across all classes, not just for a single class.

Precision for all classes is calculated as the total number of correct positive predictions (true positives) divided by the total number of positive predictions made (true positives + false positives).
Recall for all classes is the total number of correct positive predictions (true positives) divided by the total number of actual positives (true positives + false negatives).

These metrics help you understand the model's ability to correctly identify objects (precision) and its ability to find all the relevant cases (recall) across all classes. If you need a breakdown by individual class, you can look at the class-wise metrics section of the output.

If you're looking into the code, these metrics are typically calculated during the evaluation phase where predictions are compared against the ground truth data. The specific implementation details can be found in the validation script where metrics are computed using confusion matrices or similar methods.

Hope this helps! If you have more questions or need further clarification, feel free to ask. 😊

mohcinelbizani · 2024-06-04T17:28:43Z

mohcinelbizani
Jun 4, 2024 — with giscus

hello! Why is mAP50 can evaluate yolov8 model and what it is represente in reality ? Thank you

1 reply

glenn-jocher Jun 4, 2024
Maintainer

Hello! Great question! 🌟

mAP50 stands for Mean Average Precision at an Intersection over Union (IoU) threshold of 0.50. It's a performance metric used to evaluate how well the YOLOv8 model detects objects. Here's what it represents in reality:

Mean Average Precision (mAP): This is the average of the AP (Average Precision) calculated for all classes. AP measures the area under the precision-recall curve for detections at a specific IoU threshold.
At IoU = 0.50: This threshold means that for a detection to be considered correct, the predicted bounding box must have at least 50% overlap with the ground truth box. It's a relatively lenient criterion, focusing on detecting the presence of objects rather than their precise localization.

In simpler terms, mAP50 evaluates the model's ability to correctly detect objects, where the detection is considered correct if the predicted box overlaps at least 50% with the true box. It's particularly useful for getting a quick sense of the model's detection capabilities without being overly stringent on the exactness of the bounding box placement.

If you're looking to dive deeper into how these metrics are calculated and their implications, the performance metrics guide provides a thorough overview. Keep exploring and happy modeling! 😊

vivekbiragoni · 2024-06-06T03:21:49Z

vivekbiragoni
Jun 6, 2024 — with giscus

Request for Additional Classification Metrics

Dear YOLOv8 Team,

Thank you for your excellent work on YOLOv8! Currently, the classification module provides top-1 and top-5 accuracy metrics, which are very useful. However, I believe the inclusion of additional standard classification metrics would further enhance the framework's utility.

Suggested Metrics:

Precision: Ratio of correctly predicted positives to total predicted positives.
Recall: Ratio of correctly predicted positives to all actual positives.
F1-Score: Harmonic mean of precision and recall.
Confusion Matrix: Detailed view of actual vs. predicted classifications.
AUC-ROC: Measure of model's ability to distinguish between classes.
Log Loss: Measure of the uncertainty of probabilities assigned to classes.

Including these metrics will provide a more comprehensive understanding of model performance, aiding in better model evaluation and tuning.

1 reply

pderrenger Jun 6, 2024
Maintainer

Hello,

Thank you for your thoughtful suggestions and kind words about YOLOv8! We appreciate your interest in enhancing the classification metrics for our models.

Your proposed metrics, including Precision, Recall, F1-Score, Confusion Matrix, AUC-ROC, and Log Loss, are indeed valuable for providing a more comprehensive understanding of model performance. We recognize the importance of these metrics in fine-tuning and accurately evaluating models.

I'll forward your suggestions to our development team for consideration in future updates. We're always looking to improve and expand the capabilities of our models based on community feedback.

Thank you once again for your contribution to making YOLOv8 even better! 🚀

aynazrmn · 2024-08-21T21:09:14Z

aynazrmn
Aug 21, 2024 — with giscus

I am currently working on my dissertation, for which I have trained ten YOLOv8x models using camera trap data. I have a few questions regarding the results.csv file that is automatically generated after model training.

Could you please clarify whether the precision, recall, mAP50, and mAP95 values recorded in the results.csv file pertain to the training set or the validation set?

As I am preparing to present the results of the models I have trained, I would appreciate your guidance on which values to report as the final results. Given that I trained the models for 100 epochs, there are multiple values for these parameters across the epochs. Could you advise on how best to interpret and present these results?

1 reply

glenn-jocher Aug 22, 2024
Maintainer

@aynazrmn the precision, recall, mAP50, and mAP95 values in the results.csv file pertain to the validation set. For presenting your results, it's common to report the final epoch's metrics or the best epoch based on validation performance. For more details on interpreting these metrics, please refer to our Performance Metrics Guide.

matpalm · 2024-08-27T04:00:17Z

matpalm
Aug 27, 2024

Hello Ultralytics!

Was wondering if you had the training curves related the the pretrained weights for Yolo v8 and v10? ( Which, if I understand things right, were trained on COCO ) ( i.e. runs/detect/trainNN/results.png )

Am particulary interested in the comparison of the o2m components compared to the o2o during v10 training, as well as compared to their equivalents in v8.

Can try to reproduce them myself, but thought they might be easily available?
( And apologies if they are somewhere in the guides/docs and I just missed them )

Cheers!
Mat

5 replies

glenn-jocher Aug 27, 2024
Maintainer

@matpalm hi Mat,

We currently don't have the training curves for YOLOv8 and YOLOv10 publicly available. However, you can reproduce them by training the models on the COCO dataset and saving the results. If you need guidance on training, please refer to our documentation.

glenn-jocher Aug 27, 2024
Maintainer

Thanks for your patience! Unfortunately, we don't have the specific training curves for YOLOv8 and YOLOv10 publicly available. You can reproduce them by training the models on the COCO dataset. For guidance on training, please refer to our documentation.

glenn-jocher Aug 29, 2024
Maintainer

Hi Mat,

We currently don't have the training curves for YOLOv8 and YOLOv10 publicly available. You can reproduce them by training the models on the COCO dataset and saving the results. For guidance on training, please refer to our documentation.

matpalm Sep 4, 2024

thanks, will try myself

glenn-jocher Sep 5, 2024
Maintainer

@matpalm you're welcome! If you have any more questions or need further assistance, feel free to ask. Good luck with your training!

Shubham77saini · 2024-09-08T18:26:49Z

Shubham77saini
Sep 8, 2024 — with giscus

How are you calculating class-wise precision? In my case, I have 1 true positive (TP) and 2 false positives (FP). Using the formula for precision (TP / (TP + FP)), I expect the precision to be 0.33. However, I am getting a precision value of 0.734 instead. Could you explain why?
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 1/1 [00:00<00:00, 3.19it/s]
all 1 13 0.942 0.35 0.761 0.605
car 1 1 0.734 1 0.995 0.895
bicycle 1 1 1 0 0.332 0.232
rickshaw 1 1 1 0 0.995 0.697
ambulance 1 6 1 0 0.667 0.6
person 1 4 0.976 0.75 0.816 0.598

1 reply

glenn-jocher Sep 8, 2024
Maintainer

@Shubham77saini the discrepancy in your precision calculation might be due to how the model handles confidence thresholds or other internal settings. Precision is calculated as TP / (TP + FP), but factors like confidence thresholds can affect which detections are considered true positives or false positives. You might want to check the confidence threshold settings or review the detailed metrics output for more insights. For further details, you can refer to the YOLOv8 documentation on performance metrics.

Shubham77saini · 2024-09-09T05:13:50Z

Shubham77saini
Sep 9, 2024

I am sharing my confusion matrix with you. My IoU confidence threshold is set to 0.5. Based on the confusion matrix for the class 'Car', there is 1 true positive and 2 false positives, so the precision should be 0.33. However, when I evaluate it using Ultralytics, the precision is reported as 0.734. I am passing only 1 image in val. Could you explain the discrepancy?

…

On Mon, Sep 9, 2024 at 2:14 AM Glenn Jocher ***@***.***> wrote: @Shubham77saini <https://github.com/Shubham77saini> the discrepancy in your precision calculation might be due to how the model handles confidence thresholds or other internal settings. Precision is calculated as TP / (TP + FP), but factors like confidence thresholds can affect which detections are considered true positives or false positives. You might want to check the confidence threshold settings or review the detailed metrics output for more insights. For further details, you can refer to the YOLOv8 documentation on performance metrics. — Reply to this email directly, view it on GitHub <#8790 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVGNPVMZM5JDG4YIVSUJP4DZVSZKHAVCNFSM6AAAAABEMUQDPWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTANJYGQZTGNI> . You are receiving this because you were mentioned.Message ID: ***@***.*** com>

1 reply

glenn-jocher Sep 9, 2024
Maintainer

@Shubham77saini the discrepancy in precision might be due to internal settings or how confidence thresholds are applied. Precision is calculated as TP / (TP + FP), but factors like confidence thresholds can influence which detections are considered true or false positives. Ensure your confidence settings align with your expectations, and consider reviewing detailed metrics output for clarity. For more insights, you can refer to the YOLOv8 documentation on performance metrics.

alifazeli00 · 2024-09-25T08:27:38Z

alifazeli00
Sep 25, 2024 — with giscus

Hi everyone! I’d like to get your thoughts on the approach we use for evaluating machine learning models.

We start by calculating the F1 Score for each class, and then we take the harmonic mean of these scores. This method aims to provide a more accurate measure for model evaluation. Interestingly, the harmonic mean is more sensitive to lower scores, meaning that if any one score is low, it significantly affects the final score.

Next, we again use the harmonic mean to calculate the final score, which considers both the F1 Score mean and the mAP. However, it’s important to note that mAP at 50-95 doesn’t provide very high accuracy for evaluation. In my dataset, I also need to ensure that recall and precision are thoroughly assessed.

What do you think about this method? Do you find it suitable?

1 reply

glenn-jocher Sep 25, 2024
Maintainer

Your approach of using the harmonic mean for F1 Scores and mAP is insightful, as it emphasizes sensitivity to lower scores. While mAP @alifazeli00 provides a broad evaluation, ensuring precision and recall are balanced is crucial for comprehensive assessment. For further insights, you might explore the detailed metrics guide at Ultralytics Docs.

hq-2021 · 2024-10-01T01:59:08Z

hq-2021
Oct 1, 2024 — with giscus

Hello, my model's mAP0.5 is 95.5%, with precision and recall of 89.7% and 89.6% respectively (iou=0.7), indicating a significant difference. I adjusted the IoU to 0.6 again, but it still remains the same. How should I adjust? Why is mAP0.5 so much larger than precision and recall?

1 reply

glenn-jocher Oct 1, 2024
Maintainer

@hq-2021 the difference between mAP and precision/recall can occur due to how these metrics are calculated. mAP considers the average precision across different IoU thresholds, while precision and recall are specific to a single threshold. To address this, you might want to analyze the precision-recall curve and adjust your confidence thresholds or explore different IoU settings. For more insights, you can refer to the detailed explanations in our documentation.

atharvahude · 2024-10-02T08:17:43Z

atharvahude
Oct 2, 2024 — with giscus

Question: Clarification on AP vs mAP in YOLOv8 metrics output

Hello,

I'm using YOLOv8 for object detection, and I have some questions about the metrics output, particularly regarding the use of map50 and map50-90 in the results. In the output, I'm seeing these metrics reported for individual classes as well as an "all" category that aggregates the performance.

My understanding is that:

AP (Average Precision) should be calculated for each individual class.

mAP (mean Average Precision) is the average of the AP values across all classes.

However, in the output, the terms map50 and map50-90 are listed for each individual class. Should these values be interpreted as AP for each class, rather than mAP, which I would expect to be the mean across all classes? Additionally, the "all" row appears to be giving the mAP value.

Could you confirm:

Should the per-class map50 and map50-90 values be treated as AP for those specific classes?
Is the "all" row correctly representing the mean AP across all classes?

Thanks in advance for your clarification!

2 replies

kedardg Oct 2, 2024

@atharvahude output of mAP50 for a single class in fact refer to AP50. You can also refer to the following discussion on the same.
https://github.com/orgs/ultralytics/discussions/8790#discussioncomment-9476915

glenn-jocher Oct 2, 2024
Maintainer

@atharvahude the per-class map50 and map50-95 values should indeed be treated as AP for those specific classes. The "all" row represents the mean AP across all classes, confirming your understanding. For more details, you can refer to our documentation on performance metrics.

GaviraghiElia · 2024-10-05T15:57:39Z

GaviraghiElia
Oct 5, 2024 — with giscus

It would be usefull to have also Average Recall in output, in addition to the Average Precision. Please take this into consideration! In many tasks, it is more important to detect as many objects as possible rather than detect fewer but more accurately. And as you know, the Recall metric alone is not enough. Thank you :)

3 replies

glenn-jocher Oct 5, 2024
Maintainer

@GaviraghiElia thank you for your suggestion! We appreciate your feedback and will consider incorporating Average Recall in future updates. For now, you can explore the existing metrics like mAP and F1 Score for a comprehensive evaluation. If you have further questions, feel free to reach out here.

GaviraghiElia Oct 20, 2024

One last question: Precision and Recall that are displayed with the val() method, with what value of IoU are calculated?
Because in order to say whether a predicted bounding box is a True Positive or not, we need to compare it with the ground truth, deciding a IoU thresholds (which is not the IoU parameter that user could set for the NMS algorithm) - is it 0.50?

glenn-jocher Oct 21, 2024
Maintainer

The Precision and Recall metrics in the val() method are calculated using an IoU threshold of 0.50. This threshold helps determine true positives by comparing predicted bounding boxes with ground truth. For more details, you can refer to the Ultralytics documentation.

sivikt · 2024-10-09T12:29:58Z

sivikt
Oct 9, 2024 — with giscus

Hi folk!

Thanks for the framework! Last time i used yolo5 source code and can say that the newly architectured framework way more convenient.

I am wondering, what is the strategy if I want to step away from the standard algorithm to select the "best" model. Let's say, to change somehow fitness() function or choose based on F1/ROC AuC?

1 reply

glenn-jocher Oct 9, 2024
Maintainer

@sivikt thank you for your feedback! To customize the model selection strategy, you can modify the fitness() function in the source code to incorporate metrics like F1 or ROC AUC. This allows you to tailor the evaluation criteria to better suit your specific needs. For further guidance, you might find the performance metrics section in our documentation helpful.

hq-2021 · 2024-10-11T13:16:25Z

hq-2021
Oct 11, 2024

Hello, may I ask how to calculate FPS? My calculation method is, for example, through model. val (split="test"), output "0.5ms pre-processing, 3.2ms inference, 0.0ms loss, 1.2ms post-processing per image", calculate FPS=1000/(0.5+3.2+1.2), is this correct? If that's the case, I might have different values after measuring twice in a row. What's going on?

3 replies

GaviraghiElia Oct 11, 2024

Yes it's correct.
Time of inference = 0.5 ms+3.2 ms+1.2 ms = 4.9 ms = 0.0049 s

FPS = 1 / time of inference = 1 / 0.0049 ≈ 204.08

Are you working on local machine or with cloud services? Which function are you using? model.val() or model.predict()?

hq-2021 Oct 11, 2024

We are using a local machine, model.val(), not model.predict()，it is strange

glenn-jocher Oct 11, 2024
Maintainer

It's normal to see variations in FPS calculations due to system load and resource availability. Ensure consistent testing conditions for more stable results. If you have further questions, feel free to ask!

alifazeli00 · 2024-10-31T08:24:16Z

alifazeli00
Oct 31, 2024 — with giscus

Question: Suppose we have annotated 10 objects in an image, but there are actually 20 objects present. When we use a validation set to calculate metrics, if the model labels objects that have not been annotated, will this affect the model's performance? Which metrics will be impacted?

2 replies

GaviraghiElia Oct 31, 2024

of course the metrics will change: the model will predict those bounding boxes, but in the ground truths they are not noted and therefore will be reported as false positives. This will affect the model's Precision (and, if I am not mistaken, therefore all metrics that make use of Precision metrics, such as f1-score, mAP50 and mAP50-95).

Then indirectly also Recall: perhaps the model struggles to detect annotated bboxes, but those that have not been annotated it predicts much better. The actual Recall should be higher, however, maybe it could be lower (based on the scenario you presented).

glenn-jocher Oct 31, 2024
Maintainer

@GaviraghiElia the model's performance will indeed be affected if it labels objects that haven't been annotated, as these will be considered false positives. This impacts metrics like Precision, F1 Score, and mAP, which rely on accurate detection and annotation. For more details, you can refer to the YOLO Performance Metrics Guide.

francescoolivieri · 2024-11-09T01:15:12Z

francescoolivieri
Nov 9, 2024 — with giscus

I've successfully trained a YOLOv8 model. Now, I want to evaluate its performance on a separate labeled test dataset. I'm interested in metrics like precision-recall curves. I remember seeing graphs generated during training. Can I produce similar graphs for my test dataset? (is there any callable method?)

1 reply

glenn-jocher Nov 9, 2024
Maintainer

@francescoolivieri you can generate precision-recall curves for your test dataset using the model.val() function, which provides various performance metrics and visual outputs similar to those seen during training.

Mee-mi · 2024-11-13T08:04:12Z

Mee-mi
Nov 13, 2024 — with giscus

The question I want to ask here is whether the Precision and Recall values are the bounding box Precision and Recall (like the overall box precision across all classes), or if they represent the class-specific Precision calculated as
TP/(TP+FP)

5 replies

glenn-jocher Nov 13, 2024
Maintainer

@Mee-mi the Precision and Recall values in YOLO11's validation output are class-specific, calculated as TP/(TP+FP) for Precision and TP/(TP+FN) for Recall, providing insights into each class's detection performance.

Mee-mi Nov 14, 2024

Thanks for you prompt reply @glenn-jocher .
I was confused as it is written as Box(P) which i guess states that Overall box precision across all classes. Precision measures how many of the detected boxes match the ground truth objects.

I was working in the analysis of each class and there i was stuck that if poop precision is 1 according to that TP/(TP+FP) or it is the precision of the detected boxes that match the ground truth objects.

it would be great if you explain it more. thanks!

glenn-jocher Nov 14, 2024
Maintainer

@Mee-mi the Precision and Recall values in the validation output are indeed class-specific, calculated as TP/(TP+FP) for Precision, reflecting the accuracy of detections for each class.

Mee-mi Nov 14, 2024

thanks @glenn-jocher for your timely response.
keep up the good work

glenn-jocher Nov 14, 2024
Maintainer

Thank you for your kind words! The precision values in the validation output are indeed class-specific, calculated as TP/(TP+FP), reflecting the accuracy of detections for each class. If you have further questions, feel free to ask!

astronitya · 2024-11-18T18:59:51Z

astronitya
Nov 18, 2024 — with giscus

Hello YOLO ultralytics team,
Ih hope this msg finds you well. I am an astronomer, and custom training YOLOV11m model to detect astronomical objects in my images.

I have my model ready, I have trained the model for 30,000 objects on 3165 images, for 300 epochs and the validation statistics are as follows;
Accuracy: 0.65
Precision: 0.73
Recall or Completeness: 0.85
False Positive Rate (FPR): 0.27

You can see, still the False Possitive rate is high, what are some ways to fix it? and also, when should I stop in terms of training and locking my final trained model, so that I ca start deploying it on the whole dataset? In other words, at which stage I can trust my model is perfectly trained?
Thankyou so much

4 replies

astronitya Nov 18, 2024 — with giscus

and there is one more issue I am facing. The model is working fair with the validation set, but when I run it on a completely new dataset, the result was terrible.

glenn-jocher Nov 19, 2024
Maintainer

It sounds like your model might be overfitting to your training set, which can lead to poor performance on new data. Consider strategies such as increasing the diversity of your training dataset, applying data augmentation, or adjusting the model's hyperparameters to improve generalization. Additionally, evaluating different performance metrics can help you gain insights into specific areas that need improvement. For further guidance, you can review our YOLO11 Performance Metrics guide.

astronitya Nov 19, 2024 — with giscus

Thankyou for your response glenn, I am gonna follow your instructions.

glenn-jocher Nov 19, 2024
Maintainer

You're welcome! If you have further questions or need assistance, feel free to reach out to the community on GitHub or the Ultralytics Discord server. Happy training!

ismailokta · 2024-12-10T11:43:22Z

ismailokta
Dec 10, 2024 — with giscus

I want to ask, why when running validation with the following command:

Valid_model = YOLO('output/knife_yolov11/weights/best.pt')
metrics = Valid_model.val(split = 'val')

Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 58/58 [00:05<00:00, 10.45it/s]
                   all        919       1041      0.827      0.777      0.858      0.612
                pistol         85         85      0.815      0.894       0.94      0.734
            smartphone        140        140       0.93      0.664      0.885      0.725
                 knife        452        452      0.956      0.951       0.98      0.682

the result is different from the output confusion_matrix.png, the result is lower. Then what does Box P mean in the validation output? Look at the pistol class, why are the results different?

2 replies

ismailokta Dec 10, 2024 — with giscus

I include the confusion matrix in the form of count

GaviraghiElia Dec 10, 2024

@ismailokta Box(P) is the precision capability of the model, calculated as True Positive / True Positive + False positive.
In the confusion matrix, on the diagonal, you see only the percentage of True Positive out of the total.

In any case, the values of IoU used for the confusion matrix and the values instead that you display after validation() are different: in the confusion matrix you have IoU = 0.45, the output results from the val() method are given with IoU = 0.6.
This not only affects the NMS parameter (lower = more stringent), but also the value of IoU used to evaluate whether a prediction is a TP or an FP. So even by calculating by hand, you might get different Precision (or Recall) values: this is a (important) bug in the Ultralytics libraries. In any case, I would not rely too much on the confusion matrix output from the model.

Aniket1210 · 2024-12-23T05:36:33Z

Aniket1210
Dec 23, 2024 — with giscus

Hello Team,
How can be find latency number for TSR yolov8 model on RTX 4049 system ?

0 replies

guides/yolo-performance-metrics/ #8790

giscus[bot] bot Mar 8, 2024

guides/yolo-performance-metrics/

Replies: 24 comments · 52 replies

HaldunMatar Mar 8, 2024 — with giscus

pderrenger Mar 8, 2024 Maintainer

StevanCakic Apr 9, 2024 — with giscus

pderrenger Apr 9, 2024 Maintainer

lungger Apr 10, 2024 — with giscus

glenn-jocher Apr 10, 2024 Maintainer

Wang-taoshuo Apr 11, 2024 — with giscus

pderrenger Apr 11, 2024 Maintainer

Chinmoy-Nath Apr 26, 2024 — with giscus

glenn-jocher Apr 26, 2024 Maintainer

ghost May 7, 2024 — with giscus

glenn-jocher May 7, 2024 Maintainer

GenieV19 May 15, 2024 — with giscus

pderrenger May 15, 2024 Maintainer

GuenKainto May 16, 2024 — with giscus

pderrenger May 16, 2024 Maintainer

pderrenger May 16, 2024 Maintainer

lishihong-1 May 18, 2024 — with giscus

pderrenger May 18, 2024 Maintainer

pderrenger May 20, 2024 Maintainer

mohcinelbizani Jun 4, 2024 — with giscus

glenn-jocher Jun 4, 2024 Maintainer

vivekbiragoni Jun 6, 2024 — with giscus

Request for Additional Classification Metrics

pderrenger Jun 6, 2024 Maintainer

aynazrmn Aug 21, 2024 — with giscus

glenn-jocher Aug 22, 2024 Maintainer

glenn-jocher Aug 27, 2024 Maintainer

glenn-jocher Aug 27, 2024 Maintainer

glenn-jocher Aug 29, 2024 Maintainer

glenn-jocher Sep 5, 2024 Maintainer

Shubham77saini Sep 8, 2024 — with giscus

glenn-jocher Sep 8, 2024 Maintainer

glenn-jocher Sep 9, 2024 Maintainer

alifazeli00 Sep 25, 2024 — with giscus

glenn-jocher Sep 25, 2024 Maintainer

hq-2021 Oct 1, 2024 — with giscus

glenn-jocher Oct 1, 2024 Maintainer

atharvahude Oct 2, 2024 — with giscus

glenn-jocher Oct 2, 2024 Maintainer

GaviraghiElia Oct 5, 2024 — with giscus

glenn-jocher Oct 5, 2024 Maintainer

glenn-jocher Oct 21, 2024 Maintainer

sivikt Oct 9, 2024 — with giscus

glenn-jocher Oct 9, 2024 Maintainer

glenn-jocher Oct 11, 2024 Maintainer

alifazeli00 Oct 31, 2024 — with giscus

glenn-jocher Oct 31, 2024 Maintainer

francescoolivieri Nov 9, 2024 — with giscus

glenn-jocher Nov 9, 2024 Maintainer

Mee-mi Nov 13, 2024 — with giscus

glenn-jocher Nov 13, 2024 Maintainer

glenn-jocher Nov 14, 2024 Maintainer

glenn-jocher Nov 14, 2024 Maintainer

giscus[bot]
bot Mar 8, 2024

Replies: 24 comments 52 replies

HaldunMatar
Mar 8, 2024 — with giscus

pderrenger Mar 8, 2024
Maintainer

pderrenger Apr 9, 2024
Maintainer

lungger
Apr 10, 2024 — with giscus

glenn-jocher Apr 10, 2024
Maintainer

Wang-taoshuo
Apr 11, 2024 — with giscus

pderrenger Apr 11, 2024
Maintainer

Chinmoy-Nath
Apr 26, 2024 — with giscus

glenn-jocher Apr 26, 2024
Maintainer

glenn-jocher May 7, 2024
Maintainer

GenieV19
May 15, 2024 — with giscus

pderrenger May 15, 2024
Maintainer

pderrenger May 16, 2024
Maintainer

pderrenger May 16, 2024
Maintainer

lishihong-1
May 18, 2024 — with giscus

pderrenger May 18, 2024
Maintainer

pderrenger May 20, 2024
Maintainer

mohcinelbizani
Jun 4, 2024 — with giscus

glenn-jocher Jun 4, 2024
Maintainer

vivekbiragoni
Jun 6, 2024 — with giscus

pderrenger Jun 6, 2024
Maintainer

aynazrmn
Aug 21, 2024 — with giscus

glenn-jocher Aug 22, 2024
Maintainer

glenn-jocher Aug 27, 2024
Maintainer

glenn-jocher Aug 27, 2024
Maintainer

glenn-jocher Aug 29, 2024
Maintainer

glenn-jocher Sep 5, 2024
Maintainer

Shubham77saini
Sep 8, 2024 — with giscus

glenn-jocher Sep 8, 2024
Maintainer

glenn-jocher Sep 9, 2024
Maintainer

alifazeli00
Sep 25, 2024 — with giscus

glenn-jocher Sep 25, 2024
Maintainer

hq-2021
Oct 1, 2024 — with giscus

glenn-jocher Oct 1, 2024
Maintainer

atharvahude
Oct 2, 2024 — with giscus

glenn-jocher Oct 2, 2024
Maintainer

GaviraghiElia
Oct 5, 2024 — with giscus

glenn-jocher Oct 5, 2024
Maintainer

glenn-jocher Oct 21, 2024
Maintainer

sivikt
Oct 9, 2024 — with giscus

glenn-jocher Oct 9, 2024
Maintainer

glenn-jocher Oct 11, 2024
Maintainer

alifazeli00
Oct 31, 2024 — with giscus

glenn-jocher Oct 31, 2024
Maintainer

francescoolivieri
Nov 9, 2024 — with giscus

glenn-jocher Nov 9, 2024
Maintainer

Mee-mi
Nov 13, 2024 — with giscus

glenn-jocher Nov 13, 2024
Maintainer

glenn-jocher Nov 14, 2024
Maintainer

glenn-jocher Nov 14, 2024
Maintainer