Bump the pip group across 1 directory with 3 updates
🦜🛠 LangSmith SDK Benchmarks
Pre-requisites
0. Install Python 3.11, poetry
If you use Homebrew, you can install poetry with:
brew install poetry
1. Install Dependencies
poetry install
2. Set Environment Variables
After installing dependencies, copy the .env.example file contents into .env and set the required values:
cp .env.example .env
Running Benchmarks
This package provides an interactive script to run benchmarks end-to-end. Simply run:
python run_benchmarks.py
This will present you with a menu to choose between:
- Tracing Benchmarks - Benchmarks trace ingestion performance
- Evaluation Benchmarks - Benchmarks evaluation performance
- Exit
You can customize defaults for each benchmark type, or press Enter to use the defaults.
Non-Interactive Mode
To run benchmarks without prompts (uses defaults - runs evaluation benchmarks):
python run_benchmarks.py --non-interactive
Tracing Benchmarks
Overview
Tracing benchmarks measure the performance of ingesting traces into LangSmith. The script automatically:
- Prepares trace files (replaces UUIDs and updates dates)
- Runs flat tracing benchmark (runs get their own traces)
- Runs nested tracing benchmark (runs properly nested under parents)
Requirements
- Trace data files in JSONL format (
processed_run_ops_*.jsonl) in the specified data directory - Default data directory:
tracing/data
Running Tracing Benchmarks
Via interactive script:
python run_benchmarks.py
# Select option 1 (Tracing Benchmarks)
# Enter data directory (default: data)
Directly:
cd tracing
poetry run python benchmark_flat.py [data_dir]
poetry run python benchmark_nested.py [data_dir]
Results
Results are printed to the terminal and saved to:
tracing/benchmark_results_flat.txttracing/benchmark_results_nested.txt
Evaluation Benchmarks
Overview
Evaluation benchmarks measure the performance of running evaluations on LangSmith datasets. The script automatically:
- Benchmarks data upload performance (uploads CSV data to LangSmith)
- Runs evaluation benchmarks on the uploaded dataset
Note: If the dataset already exists in LangSmith, the upload step will be skipped and the script will proceed directly to running evaluations.
Requirements
- CSV data file in
evals/data/directory - Dataset configuration in
evals/config.json - Default dataset:
10k-long-emails-example - Default data directory:
evals/data
1. Prepare Your Data
Place your CSV file in the evals/data/ directory. The CSV file must be named {dataset_name}.csv where {dataset_name} matches the name you'll use in config.json.
2. Configure Dataset Mapping
You must specify in the evals/config.json file which CSV columns should be mapped to dataset inputs, and which columns should map to dataset outputs.
Configuration Details:
inputs: A list of CSV column names that will be extracted from each row and set as the input data for each example in the LangSmith dataset. These columns will be converted to dictionaries (one per row) and passed toclient.create_examples(inputs=...).outputs: A list of CSV column names that will be extracted from each row and set as the expected outputs (ground truth) for each example in the LangSmith dataset. These columns will be converted to dictionaries (one per row) and passed toclient.create_examples(outputs=...). If empty ([]), no outputs will be uploaded.
Example config.json structure:
{
"_instructions": "This configuration file maps CSV datasets to LangSmith dataset structure...",
"data_files": {
"your-dataset-name": {
"inputs": ["column1", "column2"],
"outputs": ["expected_output"]
}
}
}
The CSV file must be named {dataset_name}.csv and placed in the evals/data/ directory. The column names in inputs and outputs must match the column headers in your CSV file.
3. Run Evaluation Benchmarks
Via interactive script:
python run_benchmarks.py
# Select option 2 (Evaluation Benchmarks)
# Enter dataset name (default: 10k-long-emails-example)
# Enter data directory (default: data)
Directly:
cd evals
# First, benchmark data upload
poetry run python benchmark_upload.py [data_dir] [dataset_name]
# Then, run evaluation benchmarks
poetry run python benchmark_evals.py [dataset_name]
Results
Results are printed to the terminal and saved to:
evals/benchmark_results_upload_data.txt(upload benchmark results)evals/benchmark_results_evals.txt(evaluation benchmark results)
Notes
-
Dataset Upload: Data will be uploaded to LangSmith as part of the evaluation benchmarks workflow. If a dataset with the same name already exists in LangSmith, the upload step will be automatically skipped and the script will proceed directly to running evaluations.
-
Data Directory: Both tracing and evaluation benchmarks allow you to specify custom data directories. Defaults are
datafor tracing andevals/datafor evaluations. -
Trace Data Preparation: For tracing benchmarks, UUID replacement and date updates are automatically handled before running benchmarks. These steps run silently in the background.