mirror of
https://github.com/langchain-ai/agent-evals.git
synced 2026-06-29 09:45:01 -04:00
main
Agent Evals
This is a collection of evaluation scripts for evaluating agents.
Repo Structure
Each folder in the repo contains:
README.md: A description of the evaluation (dataset, metrics, how to run the eval)run_eval.py: A script to run the evaluation
Available evals
Below is the list of currently available evals:
| Task | Dataset(s) | Description | Input Example | Output Example |
|---|---|---|---|---|
| Math | Math Problems | Solve math problems and return numerical answers | {"Question": "Find the second derivative of f(x)=ln(x) and evaluate it at x=0.5."} |
{"Answer": "-4"} |
| Public Company Data Enrichment | Public Companies | Extract structured company information like CEO, headquarters, employee count etc. | {"company": "Nvidia", "extraction_schema": {...}} |
{"info": {"ceo": "Jensen Huang", "name": "Nvidia Corporation", ...}} |
| Startup Data Enrichment | Startups | Extract structured company information like latest round, total funding, year founded etc. | {"company": "LangChain", "extraction_schema": {...}} |
{"info": {"latest_round": "Series A", ...}} |
| People Data Enrichment | People Dataset | Extract structured information about people like work experience, role, company etc. | {"person": {"name": "Erick Friis", "email": "erick@langchain.dev", ...}, "extraction_schema": {...}} |
{"extracted_information": {"Years-Experience": 10, "Company": "LangChain", ...}} |
Description
Languages
Python
100%