2024-12-04 11:03:33 -05:00
2024-12-03 11:00:05 -05:00
2024-12-03 11:53:33 -05:00
2024-12-02 15:08:17 -05:00
2024-12-03 12:43:32 -05:00

Agent Evals

This is a collection of evaluation scripts for evaluating agents.

Repo Structure

Each folder in the repo contains:

  • README.md: A description of the evaluation (dataset, metrics, how to run the eval)
  • run_eval.py: A script to run the evaluation

Available evals

Below is the list of currently available evals:

Task Dataset(s) Description Input Example Output Example
Math Math Problems Solve math problems and return numerical answers {"Question": "Find the second derivative of f(x)=ln(x) and evaluate it at x=0.5."} {"Answer": "-4"}
Public Company Data Enrichment Public Companies Extract structured company information like CEO, headquarters, employee count etc. {"company": "Nvidia", "extraction_schema": {...}} {"info": {"ceo": "Jensen Huang", "name": "Nvidia Corporation", ...}}
Startup Data Enrichment Startups Extract structured company information like latest round, total funding, year founded etc. {"company": "LangChain", "extraction_schema": {...}} {"info": {"latest_round": "Series A", ...}}
People Data Enrichment People Dataset Extract structured information about people like work experience, role, company etc. {"person": {"name": "Erick Friis", "email": "erick@langchain.dev", ...}, "extraction_schema": {...}} {"extracted_information": {"Years-Experience": 10, "Company": "LangChain", ...}}
S
Description
Evals for agents
Readme 168 KiB
Languages
Python 100%