evaluation

Star

Here are 1,147 public repositories matching this topic...

ncalc / ncalc

Star

Mathematical Expressions Evaluator for .NET

parser csharp math runtime async dotnet evaluation antlr antlr4 expressions ncalc

Updated Jul 9, 2024
C#

langfuse / langfuse

Star

🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

open-source playground monitoring analytics evaluation self-hosted ycombinator openai gpt observability large-language-models llm prompt-engineering langchain llmops llama-index prompt-management evals llm-evaluation

Updated Jul 9, 2024
TypeScript

🤖 Build AI applications with confidence ✅ DSPy Visualizer ✅ Understand how your users are using your LLM-app ✅ Get a full picture of the quality performance of your LLM-app ✅ Collaborate with your stakeholders in ONE platform ✅ Iterate towards the most valuable & reliable LLM-app.

ai analytics evaluation openai gpt datasets observability llm prompt-engineering

Updated Jul 9, 2024
TypeScript

symflower / eval-dev-quality

Star

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.

evaluation software-development software-quality evaluation-framework llms

Updated Jul 9, 2024
Go

Striveworks / valor

Star

Valor is a centralized evaluation store which makes it easy to measure, explore, and rank model performance.

computer-vision evaluation classification object-detection image-segmentation evaluation-metrics model-evaluation mlops

Updated Jul 9, 2024
Python

huggingface / lighteval

Star

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

evaluation evaluation-metrics evaluation-framework huggingface

Updated Jul 9, 2024
Python

mcthouacbb / Sirius

Star

Chess engine

chess ai extensions engine evaluation bitboard pruning alpha-beta-pruning negamax reductions

Updated Jul 9, 2024
C++

EXP-Tools / steam-discount

Star

steam 特惠游戏榜单（自动刷新）

steam crawler evaluation rank discount zero playing

Updated Jul 9, 2024
Python

open-compass / VLMEvalKit

Star

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

computer-vision evaluation pytorch gemini openai vqa vit gpt multi-modal clip claude openai-api gpt4 large-language-models llm chatgpt llava qwen gpt-4v

Updated Jul 9, 2024
Python

k4black / codebleu

Star

Pip compatible CodeBLEU metric implementation available for linux/macos/win

code evaluation code-generation code-evaluation evaluation-metrics codebleu

Updated Jul 9, 2024
Python

jianzfb / antgo

Star

Machine Learning Experiment Manage Platform

challenge evaluation machinelearning train mltalker

Updated Jul 9, 2024
Python

promptfoo / promptfoo

Star

Test your prompts, agents, and RAGs. Use LLM evals to improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

testing ci evaluation ci-cd cicd prompts evaluation-framework rag llm prompt-engineering llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework

Updated Jul 9, 2024
TypeScript

lisiarend / PRONE

Star

R Package for preprocessing, normalizing, and analyzing proteomics data

evaluation data-analysis proteomics normalization

Updated Jul 9, 2024
R

modelscope / eval-scope

Star

A streamlined and customizable framework for efficient large model evaluation and performance benchmarking

performance evaluation llm

Updated Jul 9, 2024
Python

MinhVuong2000 / LLMReasonCert

Star

Official Implementation of ACL2024 paper "Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs"(https://arxiv.org/abs/2402.11199).

framework evaluation knowledge-graph reasoning evaluation-framework llms faithfulness

Updated Jul 9, 2024
Python

GAIR-NLP / BeHonest

Star

BeHonest: Benchmarking Honesty in Large Language Models

nlp benchmark evaluation alignment honesty llm

Updated Jul 9, 2024
JavaScript

ziqihuangg / Awesome-Evaluation-of-Visual-Generation

Star

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

benchmark awesome evaluation image-generation evaluation-metrics generative-models video-generation evaluation-system

Updated Jul 9, 2024

JieyuZ2 / TaskMeAnything

Star

A task generation and model evaluation system.

benchmark evaluation foundation-models

Updated Jul 9, 2024
Python

HarryBleckert / moodle-mod_evaluation

Star

Moodle plugin for evaluations with Moodle. This is the evaluation activity plugin.

evaluations evaluation moodle moodle-activity moodle-plugin evaluation-kit lehrveranstaltungsevaluationen evaluations-with-moodle

Updated Jul 9, 2024
PHP

onejune2018 / Awesome-LLM-Eval

Star

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表，主要面向基础大模型评测，旨在探求生成式AI的技术边界.

nlp benchmark machine-learning leaderboard evaluation dataset openai llama bert rag awsome-list gpt3 llm awsome-lists chatgpt large-language-model chatglm qwen llm-evaluation

Updated Jul 9, 2024

Improve this page

Add a description, image, and links to the evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation

Here are 1,147 public repositories matching this topic...

ncalc / ncalc

langfuse / langfuse

langwatch / langwatch

symflower / eval-dev-quality

Striveworks / valor

huggingface / lighteval

mcthouacbb / Sirius

EXP-Tools / steam-discount

open-compass / VLMEvalKit

k4black / codebleu

jianzfb / antgo

promptfoo / promptfoo

lisiarend / PRONE

modelscope / eval-scope

MinhVuong2000 / LLMReasonCert

GAIR-NLP / BeHonest

ziqihuangg / Awesome-Evaluation-of-Visual-Generation

JieyuZ2 / TaskMeAnything

HarryBleckert / moodle-mod_evaluation

onejune2018 / Awesome-LLM-Eval

Improve this page

Add this topic to your repo