evaluation-metrics

Star

Here are 398 public repositories matching this topic...

confident-ai / deepeval

Star

The LLM Evaluation Framework

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated Jul 5, 2024
Python

xinshuoweng / AB3DMOT

Star

(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

tracking machine-learning real-time computer-vision robotics evaluation evaluation-metrics multi-object-tracking kitti 3d-tracking 3d-multi-object-tracking 2d-mot-evaluation 3d-mot 3d-multi kitti-3d

Updated Apr 3, 2024
Python

AgentOps-AI / agentops

Star

Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen

agent ai openai evaluation-metrics mistral cost-estimation autogen groq agentops llm langchain anthropic evals ollama crewai

Updated Jul 6, 2024
Python

google-research / rliable

Star

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.

benchmarking machine-learning google reinforcement-learning rl evaluation-metrics

Updated Jul 4, 2024
Jupyter Notebook

MIND-Lab / OCTIS

Star

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

nlp natural-language-processing hyperparameter-optimization topic-modeling nlp-library bayesian-optimization hyperparameter-tuning latent-dirichlet-allocation evaluation-metrics neural-topic-models latent-semantic-analysis topic-models hyperparameter-search non-negative-matrix-factorization nlproc

Updated Jun 15, 2024
Python

jitsi / jiwer

Star

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

python3 automatic-speech-recognition speech-to-text evaluation-metrics wer word-error-rate

Updated May 6, 2024
Python

up42 / image-similarity-measures

Star

📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.

processing machine-learning image metrics evaluation-metrics p1

Updated Jul 6, 2024
Python

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Mor…

python nlp machine-learning natural-language-processing library linguistics computational-linguistics text-processing nlp-library search-algorithms evaluation-metrics folia language-modelling

Updated Sep 14, 2023
Python

huggingface / lighteval

Star

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

evaluation evaluation-metrics evaluation-framework huggingface

Updated Jul 5, 2024
Python

Unbabel / COMET

Star

A Neural Framework for MT Evaluation

nlp machine-learning natural-language-processing machine-translation artificial-intelligence evaluation-metrics

Updated Jun 30, 2024
Python

AmenRa / ranx

Star

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

python information-retrieval evaluation comparison numba recommender-systems evaluation-metrics metasearch data-fusion score-fusion ranking-metrics information-retrieval-evaluation information-retrieval-metrics rank-fusion

Updated Jul 1, 2024
Python

relari-ai / continuous-eval

Star

Open-Source Evaluation for LLM Application Pipelines

information-retrieval evaluation-metrics evaluation-framework rag llmops retrieval-augmented-generation llm-evaluation

Updated Jul 4, 2024
Python

v-iashin / SpecVQGAN

Star

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

audio video pytorch transformer gan multi-modal evaluation-metrics video-understanding vas video-features vqvae bmvc melgan audio-generation vggsound

Updated Jun 6, 2023
Jupyter Notebook

salesforce / factCC

Star

Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

text-summarization evaluation-metrics

Updated Jul 22, 2023
Python

bheinzerling / pyrouge

Star

A Python wrapper for the ROUGE summarization evaluation package

nlp summarization rouge evaluation-metrics

Updated Feb 10, 2021
Python

clovaai / generative-evaluation-prdc

Star

Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.

diversity machine-learning deep-learning evaluation generative-adversarial-network generative-model recall precision evaluation-metrics fidelity icml icml-2020 icml2020

Updated Jan 9, 2023
Python

FuxiaoLiu / LRV-Instruction

Star

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

evaluation vision vqa llama object-detection gpt evaluation-metrics iclr multimodal vision-and-language hallucination vicuna gpt-4 foundation-models prompt-engineering chatgpt llava iclr2024

Updated Mar 13, 2024
Python

TonicAI / tonic_validate

Star

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

evaluation-metrics evaluation-framework rag large-language-models llm llms llmops retrieval-augmented-generation

Updated Jul 6, 2024
Python

sharmaroshan / Twitter-Sentiment-Analysis

Star

It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text analysis, data analysis and data visualization

nlp machine-learning sentiment-analysis cross-validation eda data-visualization wordcloud classification data-analysis bag-of-words hashtags evaluation-metrics count-vectorizer datacleaning

Updated Nov 3, 2023
Jupyter Notebook

davidsbatista / NER-Evaluation

Star

An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity

named-entity-recognition semeval ner crfsuite evaluation-metrics notebook-jupyter semeval-2013 ner-evaluation

Updated Jul 2, 2024
Python

Improve this page

Add a description, image, and links to the evaluation-metrics topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluation-metrics topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation-metrics

Here are 398 public repositories matching this topic...

confident-ai / deepeval

xinshuoweng / AB3DMOT

AgentOps-AI / agentops

google-research / rliable

MIND-Lab / OCTIS

jitsi / jiwer

up42 / image-similarity-measures

proycon / pynlpl

huggingface / lighteval

Unbabel / COMET

AmenRa / ranx

relari-ai / continuous-eval

v-iashin / SpecVQGAN

salesforce / factCC

bheinzerling / pyrouge

clovaai / generative-evaluation-prdc

FuxiaoLiu / LRV-Instruction

TonicAI / tonic_validate

sharmaroshan / Twitter-Sentiment-Analysis

davidsbatista / NER-Evaluation

Improve this page

Add this topic to your repo