Init commit

2026-07-01 13:14:41 -04:00 · 2025-11-07 09:35:52 -05:00
commit 53a6754881
68 changed files with 11394 additions and 0 deletions
@@ -0,0 +1,7 @@
+# langsmith
+LANGSMITH_TRACING=true
+LANGCHAIN_ENDPOINT="https://api.smith.langchain.com" # Replace with your instance!
+LANGSMITH_API_KEY="<langsmith_api_key>"
+LANGSMITH_PROJECT="<project_name>"
+
+OTEL_BSP_MAX_QUEUE_SIZE=10000 # default is 2048, increase if you are benchmarking a lot of data and see `Queue is full, likely spans will be dropped.` in the logs.
@@ -0,0 +1,53 @@
+evals/data/*
+!evals/data/*-example.csv
+
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# Virtual Environment
+venv/
+env/
+ENV/
+.env
+
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+.DS_Store
+*.log
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# Coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+coverage.xml
+*.cover
+
+# Node
+node_modules/
@@ -0,0 +1 @@
+3.11
@@ -0,0 +1,160 @@
+# 🦜🛠 LangSmith SDK Benchmarks
+
+## Pre-requisites
+
+### 0. Install Python 3.11, poetry
+
+If you use Homebrew, you can install poetry with:
+
+```commandline
+brew install poetry
+```
+
+### 1. Install Dependencies
+```commandline
+poetry install
+```
+### 2. Set Environment Variables
+After installing dependencies, copy the `.env.example` file contents into `.env` and set the required values:
+```commmandline
+cp .env.example .env
+```
+
+<br>
+
+## Running Benchmarks
+
+This package provides an interactive script to run benchmarks end-to-end. Simply run:
+
+```commandline
+python run_benchmarks.py
+```
+
+This will present you with a menu to choose between:
+1. **Tracing Benchmarks** - Benchmarks trace ingestion performance
+2. **Evaluation Benchmarks** - Benchmarks evaluation performance
+3. **Exit**
+
+You can customize defaults for each benchmark type, or press Enter to use the defaults.
+
+### Non-Interactive Mode
+
+To run benchmarks without prompts (uses defaults - runs evaluation benchmarks):
+
+```commandline
+python run_benchmarks.py --non-interactive
+```
+
+<br>
+
+## Tracing Benchmarks
+
+### Overview
+Tracing benchmarks measure the performance of ingesting traces into LangSmith. The script automatically:
+1. Prepares trace files (replaces UUIDs and updates dates)
+2. Runs flat tracing benchmark (runs get their own traces)
+3. Runs nested tracing benchmark (runs properly nested under parents)
+
+### Requirements
+- Trace data files in JSONL format (`processed_run_ops_*.jsonl`) in the specified data directory
+- Default data directory: `tracing/data`
+
+### Running Tracing Benchmarks
+
+**Via interactive script:**
+```commandline
+python run_benchmarks.py
+# Select option 1 (Tracing Benchmarks)
+# Enter data directory (default: data)
+```
+
+**Directly:**
+```commandline
+cd tracing
+poetry run python benchmark_flat.py [data_dir]
+poetry run python benchmark_nested.py [data_dir]
+```
+
+### Results
+Results are printed to the terminal and saved to:
+- `tracing/benchmark_results_flat.txt`
+- `tracing/benchmark_results_nested.txt`
+
+<br>
+
+## Evaluation Benchmarks
+
+### Overview
+Evaluation benchmarks measure the performance of running evaluations on LangSmith datasets. The script automatically:
+1. Benchmarks data upload performance (uploads CSV data to LangSmith)
+2. Runs evaluation benchmarks on the uploaded dataset
+
+**Note:** If the dataset already exists in LangSmith, the upload step will be skipped and the script will proceed directly to running evaluations.
+
+### Requirements
+- CSV data file in `evals/data/` directory
+- Dataset configuration in `evals/config.json`
+- Default dataset: `10k-long-emails-example`
+- Default data directory: `evals/data`
+
+### 1. Prepare Your Data
+
+Place your CSV file in the `evals/data/` directory. The CSV file must be named `{dataset_name}.csv` where `{dataset_name}` matches the name you'll use in `config.json`.
+
+### 2. Configure Dataset Mapping
+
+You must specify in the ```evals/config.json``` file which CSV columns should be mapped to dataset inputs, and which columns should map to dataset outputs.
+
+**Configuration Details:**
+* **`inputs`**: A list of CSV column names that will be extracted from each row and set as the input data for each example in the LangSmith dataset. These columns will be converted to dictionaries (one per row) and passed to `client.create_examples(inputs=...)`.
+* **`outputs`**: A list of CSV column names that will be extracted from each row and set as the expected outputs (ground truth) for each example in the LangSmith dataset. These columns will be converted to dictionaries (one per row) and passed to `client.create_examples(outputs=...)`. If empty (`[]`), no outputs will be uploaded.
+
+**Example `config.json` structure:**
+```json
+{
+    "_instructions": "This configuration file maps CSV datasets to LangSmith dataset structure...",
+    "data_files": {
+        "your-dataset-name": {
+            "inputs": ["column1", "column2"],
+            "outputs": ["expected_output"]
+        }
+    }
+}
+```
+
+The CSV file must be named `{dataset_name}.csv` and placed in the `evals/data/` directory. The column names in `inputs` and `outputs` must match the column headers in your CSV file.
+
+### 3. Run Evaluation Benchmarks
+
+**Via interactive script:**
+```commandline
+python run_benchmarks.py
+# Select option 2 (Evaluation Benchmarks)
+# Enter dataset name (default: 10k-long-emails-example)
+# Enter data directory (default: data)
+```
+
+**Directly:**
+```commandline
+cd evals
+# First, benchmark data upload
+poetry run python benchmark_upload.py [data_dir] [dataset_name]
+
+# Then, run evaluation benchmarks
+poetry run python benchmark_evals.py [dataset_name]
+```
+
+### Results
+Results are printed to the terminal and saved to:
+- `evals/benchmark_results_upload_data.txt` (upload benchmark results)
+- `evals/benchmark_results_evals.txt` (evaluation benchmark results)
+
+<br>
+
+## Notes
+
+- **Dataset Upload**: Data will be uploaded to LangSmith as part of the evaluation benchmarks workflow. If a dataset with the same name already exists in LangSmith, the upload step will be automatically skipped and the script will proceed directly to running evaluations.
+
+- **Data Directory**: Both tracing and evaluation benchmarks allow you to specify custom data directories. Defaults are `data` for tracing and `evals/data` for evaluations.
+
+- **Trace Data Preparation**: For tracing benchmarks, UUID replacement and date updates are automatically handled before running benchmarks. These steps run silently in the background.
@@ -0,0 +1,55 @@
+from typing import Tuple
+import asyncio
+import argparse
+from eval_data import run_eval
+
+
+def format_results(ls_results: Tuple[float, str, int]) -> str:
+    """Format benchmark results."""
+    ls_time, _, ls_examples = ls_results
+    
+    # Use the number of examples from the results
+    num_examples = ls_examples
+    
+    avg_ls = ls_time / num_examples if num_examples else 0
+    
+    return f"""\
+Langsmith Benchmark Results
+===========================
+Time Breakdown:
+    Total time      {ls_time:7.3f}s
+
+Performance:
+    Total Examples    {num_examples}
+"""
+
+async def run_benchmark(dataset_name: str):
+    print("Running Langsmith benchmark...")
+    langsmith_results = await run_eval(dataset_name)
+    table = format_results(langsmith_results)
+    
+    # Print to console
+    print("\nBenchmark Results:\n")
+    print(table)
+    
+    # Save results to a file
+    with open("benchmark_results_evals.txt", "w") as f:
+        f.write(table)
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Benchmark evaluation performance on a LangSmith dataset",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Example:
+  python benchmark_evals.py 10k-long-emails
+        """
+    )
+    parser.add_argument(
+        "dataset_name",
+        type=str,
+        help="Name of the LangSmith dataset to benchmark"
+    )
+    
+    args = parser.parse_args()
+    asyncio.run(run_benchmark(args.dataset_name))
@@ -0,0 +1,8 @@
+Langsmith Benchmark Results
+===========================
+Time Breakdown:
+    Total time        9.583s
+
+Performance:
+    Total Examples    2
+    Avg time/example    4.791s
@@ -0,0 +1,9 @@
+Langsmith Benchmark Results
+===========================
+Time Breakdown:
+    Total time        2.970s
+
+Performance:
+    Total Examples    2
+    Total Size      23.9 kB
+    Avg time/example    1.485s
@@ -0,0 +1,89 @@
+import os
+import argparse
+import json
+import humanize
+from pathlib import Path
+from typing import Tuple
+from upload_data import langsmith_init_data
+
+def get_directory_size(data_dir: str, csv_file: str) -> int:
+    """Calculate total size of CSV file in a directory."""
+    csv_path = Path(data_dir) / f"{csv_file}.csv"
+    if csv_path.exists():
+        return csv_path.stat().st_size
+    return 0
+
+
+def format_results(ls_results: Tuple[float, str, int],
+                   data_dir: str,
+                   csv_file: str) -> str:
+    """Format benchmark results."""
+    ls_time, _, ls_examples = ls_results
+    
+    # Use the number of examples from the results
+    num_examples = ls_examples
+    
+    total_size = get_directory_size(data_dir, csv_file)
+    size_human = humanize.naturalsize(total_size)
+    
+    avg_ls = ls_time / num_examples if num_examples else 0
+    
+    return f"""\
+Langsmith Benchmark Results
+===========================
+Time Breakdown:
+    Total time      {ls_time:7.3f}s
+
+Performance:
+    Total Examples    {num_examples}
+    Total Size      {size_human}
+    Avg time/example  {avg_ls:7.3f}s
+"""
+
+def run_benchmark(data_dir: str, csv_file: str):
+    config_path = os.path.join(os.path.dirname(__file__), "config.json")
+    with open(config_path, 'r') as f:
+        config = json.load(f)
+    
+    # Get dataset configuration
+    if csv_file not in config["data_files"]:
+        raise ValueError(f"Dataset '{csv_file}' not found in config.json")
+    if "inputs" not in config["data_files"][csv_file] or "outputs" not in config["data_files"][csv_file]:
+        raise ValueError(f"Dataset '{csv_file}' does not have inputs or outputs in config.json")
+    
+    dataset_config = config["data_files"][csv_file]
+    print(f"Dataset config: {dataset_config}")
+    print("Running Langsmith benchmark...")
+    langsmith_results = langsmith_init_data(csv_file, dataset_config["inputs"], dataset_config["outputs"], data_dir)
+    table = format_results(langsmith_results, data_dir, csv_file)
+    
+    # Print to console
+    print("\nBenchmark Results:\n")
+    print(table)
+    
+    # Save results to a file
+    with open("benchmark_results_upload_data.txt", "w") as f:
+        f.write(table)
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Benchmark dataset upload performance to LangSmith",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Example:
+  python benchmark_upload.py data 10k-long-emails
+        """
+    )
+    parser.add_argument(
+        "data_dir",
+        type=str,
+        help="Directory containing the CSV data file"
+    )
+    parser.add_argument(
+        "csv_file",
+        type=str,
+        help="Name of the CSV file (without .csv extension, must exist in config.json)"
+    )
+    
+    args = parser.parse_args()
+    run_benchmark(args.data_dir, args.csv_file)
@@ -0,0 +1,13 @@
+{
+    "_instructions": "This configuration file maps CSV datasets to LangSmith dataset structure. For each dataset: 'inputs' are CSV column names that will be set as inputs in the LangSmith dataset, and 'outputs' are CSV column names that will be recorded as outputs (expected outputs) in the LangSmith dataset.",
+    "data_files": {
+        "10k-long-emails": {
+            "inputs": ["email"],
+            "outputs": []
+        },
+        "10k-long-emails-example": {
+            "inputs": ["email"],
+            "outputs": []
+        }
+    }
+}
@@ -0,0 +1,111 @@
+email
+"Subject: Urgent: Reclaim Your Ancestral Land Rights – Limited Time Offer!
+
+Body:
+
+Dear Mr. and Mrs. Bartholomew Higgins,
+
+It is with a profound sense of urgency and, frankly, a considerable degree of historical injustice that we reach out to you today. For generations, your family, the Higgins, have been the rightful custodians of the sprawling, fertile valley known as “Whispering Pines” – a land tragically lost through a series of unfortunate, and frankly, bewildering, legal maneuvers perpetrated by the notoriously unscrupulous Silas Blackwood and his descendants. We understand that the passage of time can obscure details, and perhaps the memory of this historical wrong has faded, but we are here to reignite the flame of rightful ownership and guide you through the process of reclaiming what is, undeniably, yours.
+
+Let’s be brutally honest. The situation is complex. It’s not merely a simple matter of finding a forgotten deed. Silas Blackwood, a man who made his fortune in dubious dealings – primarily the lucrative trade in exotic bird feathers during the Victorian era (a period we find particularly fascinating, though irrelevant to your immediate concern) – systematically dismantled your family’s claim through a series of meticulously crafted legal arguments, leveraging loopholes in antiquated land laws, and employing a network of compliant (and, we suspect, bribed) local magistrates. He wasn’t a malicious man, per se, though his ethical compass appeared to be permanently pointing south. He simply operated according to the prevailing norms of the time – norms that prioritized wealth and power over heritage and justice.
+
+The evidence, meticulously compiled over decades by our team of dedicated historical researchers (led by the brilliant and utterly relentless Professor Alistair Finch – a man who once spent three months meticulously cataloging the button collections of a retired naval admiral – a testament to his dedication), demonstrates a clear and irrefutable case of fraudulent land acquisition. We possess original correspondence, including letters from Silas Blackwood himself, outlining his strategy and detailing his efforts to discredit your lineage. We have unearthed previously unknown contracts, signed in code (a particularly clever tactic employed by Blackwood, who was a staunch advocate of secrecy and obfuscation), and even photographs depicting Blackwood himself surveying the valley, a smug expression on his face, clearly anticipating the eventual dismantling of your family’s claim.
+
+But the important thing to understand is that Blackwood’s actions were not isolated. They were part of a broader pattern of exploitation that characterized much of the late 19th and early 20th centuries – a period we often refer to as “The Age of Calculated Neglect,” a time when vast swathes of land were systematically stripped from indigenous populations and marginalized communities in the name of progress and economic development. The Higgins family, due to a series of unfortunate circumstances – including a sudden and unexpected inheritance of a prize-winning marmoset named Bartholomew (a detail we’ve included for context and to illustrate the sheer randomness of historical events) and a regrettable investment in a steam-powered rhubarb harvester – were particularly vulnerable to Blackwood's predatory tactics.
+
+Our work isn’t simply about nostalgia. It’s about justice. It’s about recognizing the enduring consequences of historical inequity and taking concrete steps to rectify it. The “Whispering Pines” valley isn’t just a beautiful expanse of land; it’s a symbol of your family’s heritage, a testament to your resilience, and a rightful claim that has been unjustly denied for over a century.  We understand that the idea of reclaiming land, particularly at your age, might seem daunting. Perhaps you’ve heard stories of legal battles, of endless paperwork, of insurmountable obstacles. We’re here to tell you that it doesn’t have to be. We’ve streamlined the entire process, leveraging cutting-edge technology and a team of legal experts who specialize in historical land disputes.  We’ve built a system that’s not only efficient but also empathetic, recognizing the emotional weight attached to this claim.
+
+Let’s delve into the specifics of what we offer. Our comprehensive “Restoration Package,” as we call it, includes the following:
+
+**Phase 1: Historical Verification & Authentication (Approximately 6-8 Weeks)**
+
+This initial phase is the cornerstone of our entire operation. We don’t simply rely on anecdotal evidence or hearsay. We operate on a foundation of irrefutable facts. This phase involves a meticulous examination of all relevant historical documents, including:
+
+*   **Deciphering the Blackwood Code:** Professor Finch and his team have developed a proprietary algorithm – utilizing advanced linguistic analysis and a surprisingly sophisticated understanding of Victorian-era cryptography – to translate the Blackwood code. This is no simple task. The code isn’t just based on simple substitution ciphers. Blackwood was a master of creating layered codes, incorporating nautical terminology, botanical references, and even subtle variations in penmanship to create a truly impenetrable system.  We’ve spent countless hours cracking this code, and we’re now confident that we can unlock the full extent of Blackwood’s deception.  The translation process alone will take approximately 4-6 weeks. We anticipate discovering a significant number of previously unknown documents, including Blackwood’s personal diaries (which, according to preliminary assessments, are filled with surprisingly candid observations about his business dealings and his profound disdain for the Higgins family) and a series of meticulously crafted maps depicting the valley at various points in time.
+*   **Genealogical Deep Dive:** We'll meticulously trace your family lineage back to its earliest documented roots, meticulously researching every branch of the family tree. We'll be contacting historical societies, genealogical databases, and even conducting on-site interviews with long-standing families in the region to piece together the full story of your ancestors.  We’re particularly interested in discovering any hidden connections to the indigenous tribes who originally inhabited the valley.  There’s a strong possibility that your family’s claim is intertwined with a treaty – a treaty that was, of course, subsequently violated by Silas Blackwood.  This process will involve DNA analysis, comparing your genetic profile to that of known family members and to samples taken from the soil in the valley. We’re exploring the possibility of utilizing ancient DNA techniques, analyzing pollen samples to determine the original composition of the valley’s ecosystem and comparing this data to the genetic makeup of the original inhabitants.
+*   **Title Examination Review:** Our team of legal experts will conduct a thorough examination of all existing land titles, identifying any discrepancies or inconsistencies. We'll be working with the local county recorder’s office, challenging any claims that are not supported by evidence. We’ll be employing sophisticated legal research tools, scouring historical records for any indications of a wrongful transfer of ownership.  We’re looking for evidence of fraud, duress, or undue influence.  We’re prepared to pursue legal action against any entities that may have benefited from the wrongful transfer of ownership.
+*   **Environmental Impact Assessment:**  We will conduct a comprehensive environmental impact assessment of the “Whispering Pines” valley, documenting any changes that have occurred since Blackwood’s acquisition. This will include assessing the impact of agricultural practices, logging activities, and residential development.  We're particularly concerned about the impact of a recent development project – a luxury golf course – which we believe has significantly degraded the valley's natural beauty and disrupted the ecosystem.
+
+**Phase 2: Legal Strategy & Litigation (Approximately 8-12 Weeks)**
+
+Once we’ve completed the historical verification and authentication phase, we’ll move into the legal strategy and litigation phase. This is where we formally assert your claim and begin the process of seeking legal redress. This phase will involve:
+
+*   **Formal Legal Complaint Filing:** We will file a formal legal complaint with the county court, asserting your claim to the “Whispering Pines” valley. The complaint will be supported by all of the evidence we’ve gathered during the historical verification and authentication phase.
+*   **Discovery Process:** We will engage in the discovery process, formally requesting documents and information from the defendants – primarily the current owner of the property, a shell corporation controlled by the Blackwood family’s descendants (a surprisingly persistent lineage, we might add) and the developer of the luxury golf course. This will involve serving subpoenas, taking depositions, and conducting interrogatories.
+*   **Negotiation & Mediation:** We will actively pursue negotiation and mediation with the defendants, seeking a fair and equitable resolution to the dispute. We believe that a negotiated settlement is the most efficient and cost-effective way to resolve the case. However, we are fully prepared to litigate the case in court if necessary.
+*   **Expert Witness Testimony:** We will retain expert witnesses – historians, genealogists, environmental scientists, and legal experts – to provide testimony in support of your claim. These experts will be called upon to testify about the historical context of the case, the scientific evidence, and the legal principles involved.
+
+**Phase 3: Property Restoration & Stewardship (Ongoing)**
+
+This phase is about more than just reclaiming legal ownership. It's about ensuring the long-term stewardship of the “Whispering Pines” valley. This phase will involve:
+
+*   **Valley Restoration Project:** We will implement a comprehensive valley restoration project, aimed at reversing the damage caused by agricultural practices, logging activities, and residential development. This will include reforestation efforts, the restoration of native plant species, and the implementation of sustainable land management practices.
+*   **Community Engagement:** We will actively engage with the local community, seeking their input and support for the valley restoration project. We believe that the valley should be a valuable asset for the entire community.
+*   **Establishment of a Conservation Trust:** We will establish a conservation trust to ensure the long-term protection of the “Whispering Pines” valley. The trust will be governed by a board of directors, comprised of local residents, environmental experts, and legal professionals.
+
+**Investment Opportunity:**
+
+We are currently seeking investors to help fund this ambitious project. Your investment will not only help us reclaim your rightful ownership of the “Whispering Pines” valley, but it will also contribute to the preservation of this valuable natural resource for generations to come. We are offering a variety of investment options, ranging from equity financing to philanthropic donations.
+
+**Disclaimer:** *This is a hypothetical scenario and does not constitute legal advice. Actual outcomes may vary depending on the specific circumstances of the case.*
+
+**Contact Us:**
+
+[Contact Information]
+
+We understand that this is a complex and potentially overwhelming undertaking. However, we are confident that with your support, we can successfully reclaim the “Whispering Pines” valley and ensure its long-term protection. We look forward to hearing from you. iteration 23"
+"Subject: Urgent: Reclaim Your Ancestral Land Rights – Limited Time Offer!
+
+Body:
+
+Dear Mr. and Mrs. Bartholomew Higgins,
+
+It is with a profound sense of urgency and, frankly, a considerable degree of historical injustice that we reach out to you today. For generations, your family, the Higgins, have been the rightful custodians of the sprawling, fertile valley known as “Whispering Pines” – a land tragically lost through a series of unfortunate, and frankly, bewildering, legal maneuvers perpetrated by the notoriously unscrupulous Silas Blackwood and his descendants. We understand that the passage of time can obscure details, and perhaps the memory of this historical wrong has faded, but we are here to reignite the flame of rightful ownership and guide you through the process of reclaiming what is, undeniably, yours.
+
+Let’s be brutally honest. The situation is complex. It’s not merely a simple matter of finding a forgotten deed. Silas Blackwood, a man who made his fortune in dubious dealings – primarily the lucrative trade in exotic bird feathers during the Victorian era (a period we find particularly fascinating, though irrelevant to your immediate concern) – systematically dismantled your family’s claim through a series of meticulously crafted legal arguments, leveraging loopholes in antiquated land laws, and employing a network of compliant (and, we suspect, bribed) local magistrates. He wasn’t a malicious man, per se, though his ethical compass appeared to be permanently pointing south. He simply operated according to the prevailing norms of the time – norms that prioritized wealth and power over heritage and justice.
+
+The evidence, meticulously compiled over decades by our team of dedicated historical researchers (led by the brilliant and utterly relentless Professor Alistair Finch – a man who once spent three months meticulously cataloging the button collections of a retired naval admiral – a testament to his dedication), demonstrates a clear and irrefutable case of fraudulent land acquisition. We possess original correspondence, including letters from Silas Blackwood himself, outlining his strategy and detailing his efforts to discredit your lineage. We have unearthed previously unknown contracts, signed in code (a particularly clever tactic employed by Blackwood, who was a staunch advocate of secrecy and obfuscation), and even photographs depicting Blackwood himself surveying the valley, a smug expression on his face, clearly anticipating the eventual dismantling of your family’s claim.
+
+But the important thing to understand is that Blackwood’s actions were not isolated. They were part of a broader pattern of exploitation that characterized much of the late 19th and early 20th centuries – a period we often refer to as “The Age of Calculated Neglect,” a time when vast swathes of land were systematically stripped from indigenous populations and marginalized communities in the name of progress and economic development. The Higgins family, due to a series of unfortunate circumstances – including a sudden and unexpected inheritance of a prize-winning marmoset named Bartholomew (a detail we’ve included for context and to illustrate the sheer randomness of historical events) and a regrettable investment in a steam-powered rhubarb harvester – were particularly vulnerable to Blackwood's predatory tactics.
+
+Our work isn’t simply about nostalgia. It’s about justice. It’s about recognizing the enduring consequences of historical inequity and taking concrete steps to rectify it. The “Whispering Pines” valley isn’t just a beautiful expanse of land; it’s a symbol of your family’s heritage, a testament to your resilience, and a rightful claim that has been unjustly denied for over a century.  We understand that the idea of reclaiming land, particularly at your age, might seem daunting. Perhaps you’ve heard stories of legal battles, of endless paperwork, of insurmountable obstacles. We’re here to tell you that it doesn’t have to be. We’ve streamlined the entire process, leveraging cutting-edge technology and a team of legal experts who specialize in historical land disputes.  We’ve built a system that’s not only efficient but also empathetic, recognizing the emotional weight attached to this claim.
+
+Let’s delve into the specifics of what we offer. Our comprehensive “Restoration Package,” as we call it, includes the following:
+
+**Phase 1: Historical Verification & Authentication (Approximately 6-8 Weeks)**
+
+This initial phase is the cornerstone of our entire operation. We don’t simply rely on anecdotal evidence or hearsay. We operate on a foundation of irrefutable facts. This phase involves a meticulous examination of all relevant historical documents, including:
+
+*   **Deciphering the Blackwood Code:** Professor Finch and his team have developed a proprietary algorithm – utilizing advanced linguistic analysis and a surprisingly sophisticated understanding of Victorian-era cryptography – to translate the Blackwood code. This is no simple task. The code isn’t just based on simple substitution ciphers. Blackwood was a master of creating layered codes, incorporating nautical terminology, botanical references, and even subtle variations in penmanship to create a truly impenetrable system.  We’ve spent countless hours cracking this code, and we’re now confident that we can unlock the full extent of Blackwood’s deception.  The translation process alone will take approximately 4-6 weeks. We anticipate discovering a significant number of previously unknown documents, including Blackwood’s personal diaries (which, according to preliminary assessments, are filled with surprisingly candid observations about his business dealings and his profound disdain for the Higgins family) and a series of meticulously crafted maps depicting the valley at various points in time.
+*   **Genealogical Deep Dive:** We'll meticulously trace your family lineage back to its earliest documented roots, meticulously researching every branch of the family tree. We'll be contacting historical societies, genealogical databases, and even conducting on-site interviews with long-standing families in the region to piece together the full story of your ancestors.  We’re particularly interested in discovering any hidden connections to the indigenous tribes who originally inhabited the valley.  There’s a strong possibility that your family’s claim is intertwined with a treaty – a treaty that was, of course, subsequently violated by Silas Blackwood.  This process will involve DNA analysis, comparing your genetic profile to that of known family members and to samples taken from the soil in the valley. We’re exploring the possibility of utilizing ancient DNA techniques, analyzing pollen samples to determine the original composition of the valley’s ecosystem and comparing this data to the genetic makeup of the original inhabitants.
+*   **Title Examination Review:** Our team of legal experts will conduct a thorough examination of all existing land titles, identifying any discrepancies or inconsistencies. We'll be working with the local county recorder’s office, challenging any claims that are not supported by evidence. We’ll be employing sophisticated legal research tools, scouring historical records for any indications of a wrongful transfer of ownership.  We’re looking for evidence of fraud, duress, or undue influence.  We’re prepared to pursue legal action against any entities that may have benefited from the wrongful transfer of ownership.
+*   **Environmental Impact Assessment:**  We will conduct a comprehensive environmental impact assessment of the “Whispering Pines” valley, documenting any changes that have occurred since Blackwood’s acquisition. This will include assessing the impact of agricultural practices, logging activities, and residential development.  We're particularly concerned about the impact of a recent development project – a luxury golf course – which we believe has significantly degraded the valley's natural beauty and disrupted the ecosystem.
+
+**Phase 2: Legal Strategy & Litigation (Approximately 8-12 Weeks)**
+
+Once we’ve completed the historical verification and authentication phase, we’ll move into the legal strategy and litigation phase. This is where we formally assert your claim and begin the process of seeking legal redress. This phase will involve:
+
+*   **Formal Legal Complaint Filing:** We will file a formal legal complaint with the county court, asserting your claim to the “Whispering Pines” valley. The complaint will be supported by all of the evidence we’ve gathered during the historical verification and authentication phase.
+*   **Discovery Process:** We will engage in the discovery process, formally requesting documents and information from the defendants – primarily the current owner of the property, a shell corporation controlled by the Blackwood family’s descendants (a surprisingly persistent lineage, we might add) and the developer of the luxury golf course. This will involve serving subpoenas, taking depositions, and conducting interrogatories.
+*   **Negotiation & Mediation:** We will actively pursue negotiation and mediation with the defendants, seeking a fair and equitable resolution to the dispute. We believe that a negotiated settlement is the most efficient and cost-effective way to resolve the case. However, we are fully prepared to litigate the case in court if necessary.
+*   **Expert Witness Testimony:** We will retain expert witnesses – historians, genealogists, environmental scientists, and legal experts – to provide testimony in support of your claim. These experts will be called upon to testify about the historical context of the case, the scientific evidence, and the legal principles involved.
+
+**Phase 3: Property Restoration & Stewardship (Ongoing)**
+
+This phase is about more than just reclaiming legal ownership. It's about ensuring the long-term stewardship of the “Whispering Pines” valley. This phase will involve:
+
+*   **Valley Restoration Project:** We will implement a comprehensive valley restoration project, aimed at reversing the damage caused by agricultural practices, logging activities, and residential development. This will include reforestation efforts, the restoration of native plant species, and the implementation of sustainable land management practices.
+*   **Community Engagement:** We will actively engage with the local community, seeking their input and support for the valley restoration project. We believe that the valley should be a valuable asset for the entire community.
+*   **Establishment of a Conservation Trust:** We will establish a conservation trust to ensure the long-term protection of the “Whispering Pines” valley. The trust will be governed by a board of directors, comprised of local residents, environmental experts, and legal professionals.
+
+**Investment Opportunity:**
+
+We are currently seeking investors to help fund this ambitious project. Your investment will not only help us reclaim your rightful ownership of the “Whispering Pines” valley, but it will also contribute to the preservation of this valuable natural resource for generations to come. We are offering a variety of investment options, ranging from equity financing to philanthropic donations.
+
+**Disclaimer:** *This is a hypothetical scenario and does not constitute legal advice. Actual outcomes may vary depending on the specific circumstances of the case.*
+
+**Contact Us:**
+
+[Contact Information]
+
+We understand that this is a complex and potentially overwhelming undertaking. However, we are confident that with your support, we can successfully reclaim the “Whispering Pines” valley and ensure its long-term protection. We look forward to hearing from you. iteration 24"
@@ -0,0 +1,127 @@
+import asyncio
+import argparse
+import random
+import dotenv
+import time
+from langsmith import traceable, Client
+
+dotenv.load_dotenv()
+client = Client()
+
+@traceable(run_type="llm", metadata={"ls_provider": "openai", "ls_model_name": "gpt-4o-mini"})
+async def mock_chat_completion(*, model, messages):
+    # Sleep for 3 seconds each time
+    await asyncio.sleep(3)
+    input_tokens = random.randint(10000, 12000)
+    output_tokens = random.randint(1000, 2000)
+    return {
+        "role": "assistant",
+        "content": "This is a summary of the information provided.",
+        "usage_metadata": {
+            "input_tokens": input_tokens,
+            "output_tokens": output_tokens,
+            "total_tokens": input_tokens + output_tokens,
+        },
+    }
+
+# Will be traced by default
+async def target(inputs: dict) -> dict:
+    messages = [
+        {
+            "role": "system",
+            "content": "You are an expert summarizer."
+        },
+        # This dataset has inputs as a dict with a "email" key
+        {"role": "user", "content": "Summarize this information:\n\n" + str(inputs)},
+    ]
+    res = await mock_chat_completion(
+        model="gpt-4o-mini",
+        messages=messages
+    )
+
+    return { "summary": res }
+
+
+@traceable(run_type="llm", metadata={"ls_provider": "openai", "ls_model_name": "o3-mini"})
+async def mock_evaluator_chat_completion(*, model, messages):
+    await asyncio.sleep(2)
+    # Mock return value
+    input_tokens = random_number = random.randint(10000, 12000)
+    output_tokens = random.randint(10, 20)
+    return {
+        "role": "assistant",
+        "content": str(random.random()),
+        "usage_metadata": {
+            "input_tokens": input_tokens,
+            "output_tokens": output_tokens,
+            "total_tokens": input_tokens + output_tokens,
+        },
+    }
+
+async def mock_quality_evaluator(inputs: dict, outputs: dict):
+    messages = [
+        {
+            "role": "system",
+            "content": "Assign a quality score for the generated summary of an email."
+        },
+        {
+            "role": "user",
+            "content": f"""
+Input info: {str(inputs)}
+output: {outputs["summary"]}
+"""
+        },
+    ]
+    res = await mock_evaluator_chat_completion(
+        model="o3-mini",
+        messages=messages
+    )
+    return {
+        "key": "quality",
+        "score": float(res["content"]),
+        "comment": "Score justification or other comments can go here.",
+    }
+
+
+
+async def run_eval(dataset_name: str):
+    print("Starting LangSmith experiment!")
+    start = time.perf_counter()
+
+    experiment_results = await client.aevaluate(
+        target,
+        # dataset with 10,000 examples
+        data=dataset_name, #"10k-long-emails"
+        evaluators=[
+            mock_quality_evaluator,
+            # can add multiple evaluators here
+        ],
+        max_concurrency=1000,
+    )
+
+    finish_time = time.perf_counter()
+    print(f"Experiment finished in {finish_time - start} seconds")
+    client.flush()
+    flush_time = time.perf_counter()
+    print(f"All runs flushed to LangSmith in {flush_time - finish_time} seconds")
+    return (finish_time - start, dataset_name, len(experiment_results))
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Run evaluation on a LangSmith dataset",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Example:
+  python eval_data.py 10k-long-emails
+        """
+    )
+    parser.add_argument(
+        "dataset_name",
+        type=str,
+        help="Name of the LangSmith dataset to evaluate"
+    )
+    
+    args = parser.parse_args()
+    results = asyncio.run(run_eval(args.dataset_name))
+    print(results)
@@ -0,0 +1,102 @@
+import os
+import time
+import argparse
+import dotenv
+import json
+import pandas as pd
+from pathlib import Path
+from langsmith import Client
+
+dotenv.load_dotenv()
+
+client = Client()
+
+def langsmith_init_data(csv_file: str, input_keys: list[str], output_keys: list[str], data_dir: str = "data"):
+    path = Path(data_dir) / f"{csv_file}.csv"
+    start = time.perf_counter()
+    
+    # Read the CSV file
+    df = pd.read_csv(path)
+    total_rows = len(df)
+    
+    # Check if dataset exists, otherwise create it
+    try:
+        dataset = client.read_dataset(dataset_name=csv_file)
+        raise ValueError(f"Dataset '{csv_file}' already exists in LangSmith. Please delete it first or use a different dataset name.")
+    except ValueError:
+        # Re-raise ValueError (dataset exists)
+        raise
+    except Exception:
+        # Dataset doesn't exist, create it
+        dataset = client.create_dataset(
+            dataset_name=csv_file,
+            description=f"Dataset created from {csv_file} CSV file with {total_rows} rows"
+        )
+        print(f"Created dataset: {dataset.name}")
+    
+    # Calibrate chunk size to avoid sending too much at once
+    chunk_size = 1000
+    num_chunks = (total_rows + chunk_size - 1) // chunk_size
+    
+    print(f"Uploading {total_rows} rows in {num_chunks} chunks to dataset {dataset.name}...")
+    
+    for i in range(num_chunks):
+        start_idx = i * chunk_size
+        end_idx = min((i + 1) * chunk_size, total_rows)
+        
+        # Create chunk dataframe
+        chunk_df = df.iloc[start_idx:end_idx]
+        
+        # Prepare lists of inputs and outputs
+        inputs_list = [{key: row[key] for key in input_keys if key in row} for _, row in chunk_df.iterrows()]
+        outputs_list = [{key: row[key] for key in output_keys if key in row} for _, row in chunk_df.iterrows()] if output_keys else None
+        
+        # Upload chunk to the dataset
+        client.create_examples(
+            inputs=inputs_list,
+            outputs=outputs_list,
+            dataset_id=dataset.id
+        )
+        print(f"Uploaded chunk {i+1}/{num_chunks}: {len(inputs_list)} examples")
+    
+    end = time.perf_counter()
+    print(f"LangSmith dataset {dataset.name} uploaded in {end - start} seconds")
+    print(f"Total examples uploaded: {total_rows}")
+    return (end - start, dataset.name, total_rows)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Upload CSV data to LangSmith dataset",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Example:
+  python upload_data.py 10k-long-emails --data-dir data
+        """
+    )
+    parser.add_argument(
+        "dataset_name",
+        type=str,
+        help="Name of the dataset to upload (must exist in config.json)"
+    )
+    parser.add_argument(
+        "--data-dir",
+        type=str,
+        default="data",
+        help="Directory containing the CSV data file (default: data)"
+    )
+    
+    args = parser.parse_args()
+    
+    config_path = os.path.join(os.path.dirname(__file__), "config.json")
+    with open(config_path, 'r') as f:
+        config = json.load(f)
+    
+    if args.dataset_name not in config["data_files"]:
+        raise ValueError(f"Dataset '{args.dataset_name}' not found in config.json")
+    if "inputs" not in config["data_files"][args.dataset_name] or "outputs" not in config["data_files"][args.dataset_name]:
+        raise ValueError(f"Dataset '{args.dataset_name}' does not have inputs or outputs in config.json")
+    
+    dataset_config = config["data_files"][args.dataset_name]
+    langsmith_results = langsmith_init_data(args.dataset_name, dataset_config["inputs"], dataset_config["outputs"], args.data_dir)
+    print(f"LangSmith results: {langsmith_results}")
@@ -0,0 +1,24 @@
+[tool.poetry]
+name = "sdk-load-test"
+version = "0.1.0"
+description = "Package for load testing the langsmith sdk"
+authors = ["Robert Xu <xuro@langchain.dev>"]
+readme = "README.md"
+package-mode = false
+
+[tool.poetry.dependencies]
+python = ">=3.11,<3.13"
+langsmith = "^0.4.1"
+langchain = "^1.0.0"
+langchain_openai = "^1.0.0"
+orjson = "^3.10.14"
+python-dotenv = "^1.0.1"
+pathlib = "^1.0.1"
+humanize = "^4.11.0"
+pandas = "^2.3.0"
+packaging = "^25.0"
+
+
+[build-system]
+requires = ["poetry-core"]
+build-backend = "poetry.core.masonry.api"
@@ -0,0 +1,287 @@
+#!/usr/bin/env python3
+"""
+Interactive script to run benchmarks end-to-end.
+
+This script provides a menu interface to:
+- Run tracing benchmarks (with UUID replacement and date updates)
+- Run evaluation benchmarks (with data upload)
+- Customize defaults for each benchmark type
+"""
+import argparse
+import subprocess
+import sys
+from pathlib import Path
+import dotenv
+
+dotenv.load_dotenv()
+
+def run_command(cmd, cwd, description, check=True, verbose=True):
+    """Run a command and handle errors."""
+    if verbose:
+        print(f"\n{'='*60}")
+        print(f"Running: {description}")
+        print(f"{'='*60}")
+        print(f"Command: {' '.join(cmd)}")
+        print(f"Working directory: {cwd}\n")
+    
+    # Always capture output so we can check for specific errors
+    result = subprocess.run(cmd, cwd=cwd, check=False, capture_output=True, text=True)
+    
+    # Check if this is an "already exists" error before printing output
+    error_output = ""
+    if result.stdout:
+        error_output += result.stdout
+    if result.stderr:
+        error_output += result.stderr
+    
+    is_already_exists_error = "already exists" in error_output.lower()
+    
+    if verbose and not is_already_exists_error:
+        # Print output in real-time fashion for verbose mode (unless it's an expected error)
+        if result.stdout:
+            print(result.stdout)
+        if result.stderr:
+            print(result.stderr, file=sys.stderr)
+    
+    if check and result.returncode != 0:
+        if not verbose:
+            print(f"\nError: {description} failed with exit code {result.returncode}")
+            if result.stdout:
+                print(result.stdout)
+            if result.stderr:
+                print(result.stderr)
+        elif not is_already_exists_error:
+            # Only print error details if it's not an expected "already exists" error
+            print(f"\nError: {description} failed with exit code {result.returncode}")
+        return False, result
+    return True, result
+
+def run_tracing_benchmarks(data_dir="data"):
+    """Run tracing benchmarks end-to-end."""
+    project_root = Path(__file__).parent.absolute()
+    tracing_dir = project_root / "tracing"
+    
+    if not tracing_dir.exists():
+        print(f"Error: Tracing directory not found at {tracing_dir}")
+        return False
+    
+    print("\n" + "="*60)
+    print("TRACING BENCHMARKS")
+    print("="*60)
+    
+    # Step 1: Run replace_uuids (silent)
+    success, _ = run_command(
+        [sys.executable, "utils/replace_uuids.py"],
+        cwd=str(tracing_dir),
+        description="Preparing trace files",
+        check=True,
+        verbose=False
+    )
+    if not success:
+        return False
+    
+    # Step 2: Run update_dates (silent)
+    success, _ = run_command(
+        [sys.executable, "utils/update_dates.py"],
+        cwd=str(tracing_dir),
+        description="Preparing trace files",
+        check=True,
+        verbose=False
+    )
+    if not success:
+        return False
+    
+    # Step 3: Run flat benchmark
+    print("\nSTEP 1: Running flat tracing benchmark")
+    success, _ = run_command(
+        [sys.executable, "benchmark_flat.py", data_dir],
+        cwd=str(tracing_dir),
+        description="Flat tracing benchmark"
+    )
+    if not success:
+        return False
+    
+    # Step 4: Run nested benchmark
+    print("\nSTEP 2: Running nested tracing benchmark")
+    success, _ = run_command(
+        [sys.executable, "benchmark_nested.py", data_dir],
+        cwd=str(tracing_dir),
+        description="Nested tracing benchmark"
+    )
+    if not success:
+        return False
+    
+    print("\n" + "="*60)
+    print("SUCCESS: All tracing benchmarks completed!")
+    print("="*60)
+    print(f"\nResults saved to:")
+    print(f"  - {tracing_dir}/benchmark_results_flat.txt")
+    print(f"  - {tracing_dir}/benchmark_results_nested.txt")
+    return True
+
+def run_evals_benchmarks(dataset="10k-long-emails-example", data_dir="data"):
+    """Run evaluation benchmarks end-to-end."""
+    project_root = Path(__file__).parent.absolute()
+    evals_dir = project_root / "evals"
+    
+    if not evals_dir.exists():
+        print(f"Error: Evals directory not found at {evals_dir}")
+        return False
+    
+    print("\n" + "="*60)
+    print("EVALUATION BENCHMARKS")
+    print("="*60)
+    
+    # Step 1: Benchmark data upload
+    print("\nSTEP 1: Benchmarking data upload")
+    
+    # Run benchmark_upload.py
+    success, result = run_command(
+        [sys.executable, "benchmark_upload.py", data_dir, dataset],
+        cwd=str(evals_dir),
+        description=f"Benchmark upload for dataset '{dataset}'",
+        check=False
+    )
+    
+    if not success:
+        # Check if failure was due to dataset already existing
+        error_output = ""
+        if result.stdout:
+            error_output += result.stdout
+        if result.stderr:
+            error_output += result.stderr
+        
+        if "already exists" in error_output.lower():
+            print(f"\nDataset '{dataset}' already exists - skipping upload benchmark")
+            print("Moving directly to evaluation benchmarks...\n")
+        else:
+            # Different error, fail
+            return False
+    
+    # Step 2: Run evaluation benchmarks
+    print("\nSTEP 2: Running evaluation benchmarks")
+    success, _ = run_command(
+        [sys.executable, "benchmark_evals.py", dataset],
+        cwd=str(evals_dir),
+        description=f"Evaluation benchmark for dataset '{dataset}'"
+    )
+    if not success:
+        return False
+    
+    print("\n" + "="*60)
+    print("SUCCESS: All evaluation benchmarks completed!")
+    print("="*60)
+    print(f"\nResults saved to:")
+    print(f"  - {evals_dir}/benchmark_results_evals.txt")
+    return True
+
+def get_user_input(prompt, default=None, input_type=str):
+    """Get user input with optional default."""
+    if default is not None:
+        full_prompt = f"{prompt} [{default}]: "
+    else:
+        full_prompt = f"{prompt}: "
+    
+    user_input = input(full_prompt).strip()
+    
+    if not user_input and default is not None:
+        return default
+    
+    if not user_input:
+        return None
+    
+    try:
+        return input_type(user_input)
+    except ValueError:
+        print(f"Invalid input. Please enter a valid {input_type.__name__}.")
+        return get_user_input(prompt, default, input_type)
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Interactive script to run benchmarks end-to-end",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+This script provides an interactive menu to run benchmarks:
+- Tracing benchmarks: Runs UUID replacement, date updates, and both flat/nested benchmarks
+- Evaluation benchmarks: Uploads data and runs evaluation benchmarks
+
+Example:
+  python run_benchmarks.py
+        """
+    )
+    parser.add_argument(
+        "--non-interactive",
+        action="store_true",
+        help="Run in non-interactive mode (use defaults)"
+    )
+    
+    args = parser.parse_args()
+    
+    if args.non_interactive:
+        # Non-interactive mode: run all evals by default
+        print("Running in non-interactive mode - executing all evaluation benchmarks...")
+        success = run_evals_benchmarks()
+        sys.exit(0 if success else 1)
+    
+    # Interactive mode
+    print("\n" + "="*60)
+    print("LangSmith SDK Benchmarks")
+    print("="*60)
+    print("\nSelect benchmark type:")
+    print("1. Tracing Benchmarks")
+    print("2. Evaluation Benchmarks")
+    print("3. Exit")
+    
+    choice = get_user_input("\nEnter your choice", default="2", input_type=int)
+    
+    if choice == 1:
+        # Tracing benchmarks
+        print("\n" + "-"*60)
+        print("Tracing Benchmarks Configuration")
+        print("-"*60)
+        data_dir = get_user_input(
+            "Enter data directory containing trace files",
+            default="data"
+        )
+        
+        if not data_dir:
+            print("Error: Data directory is required")
+            sys.exit(1)
+        
+        success = run_tracing_benchmarks(data_dir)
+        sys.exit(0 if success else 1)
+        
+    elif choice == 2:
+        # Evaluation benchmarks
+        print("\n" + "-"*60)
+        print("Evaluation Benchmarks Configuration")
+        print("-"*60)
+        dataset = get_user_input(
+            "Enter dataset name (must exist in config.json)",
+            default="10k-long-emails-example"
+        )
+        data_dir = get_user_input(
+            "Enter data directory containing CSV files",
+            default="data"
+        )
+        
+        if not dataset:
+            print("Error: Dataset name is required")
+            sys.exit(1)
+        if not data_dir:
+            print("Error: Data directory is required")
+            sys.exit(1)
+        
+        success = run_evals_benchmarks(dataset, data_dir)
+        sys.exit(0 if success else 1)
+        
+    elif choice == 3:
+        print("\nExiting...")
+        sys.exit(0)
+    else:
+        print("\nInvalid choice. Exiting...")
+        sys.exit(1)
+
+if __name__ == "__main__":
+    main()
+
@@ -0,0 +1,163 @@
+import argparse
+import sys
+import time
+from pathlib import Path
+from datetime import datetime
+from typing import Tuple
+
+import orjson
+import dotenv
+import humanize
+
+from langsmith import Client
+
+# Load environment variables
+dotenv.load_dotenv()
+
+
+########################################
+# Helper Functions
+########################################
+
+def get_directory_size(directory: str) -> int:
+    """Calculate total size of all JSONL files in a directory."""
+    total_size = 0
+    for path in Path(directory).glob("*.jsonl"):
+        total_size += path.stat().st_size
+    return total_size
+
+
+########################################
+# Replay Class
+########################################
+
+class LangsmithReplay:
+    @staticmethod
+    def replay_trace(run_ops_file: Path, logger: Client) -> None:
+        with open(run_ops_file, 'rb') as f:
+            for line in f:
+                operation = orjson.loads(line)
+                
+                if operation["operation"] == "post":
+                    create_params = {
+                        "name": operation.get("name"),
+                        "start_time": operation.get("start_time"),
+                        "inputs": operation.get("inputs", {}),
+                        "run_type": operation.get("run_type"),
+                        "serialized": operation.get("serialized", {}),
+                        "extra": operation.get("extra", {}),
+                        "tags": operation.get("tags", []),
+                        "trace_id": operation.get("trace_id"),
+                        "dotted_order": operation.get("dotted_order"),
+                        "parent_run_id": operation.get("parent_run_id"),
+                        "id": operation.get('id'),
+                    }
+                    # Remove any keys with a None value.
+                    create_params = {k: v for k, v in create_params.items() if v is not None}
+                    logger.create_run(**create_params)
+                    
+                elif operation["operation"] == "patch":
+                    end_time = operation.get("end_time")
+                    if end_time and isinstance(end_time, str):
+                        try:
+                            end_time = datetime.fromisoformat(end_time.replace('Z', '+00:00'))
+                        except ValueError:
+                            end_time = None
+                    
+                    update_params = {
+                        "run_id": operation.get("id"),
+                        "end_time": end_time,
+                        "outputs": operation.get("outputs", {}),
+                        "error": operation.get("error"),
+                        "trace_id": operation.get("trace_id"),
+                        "dotted_order": operation.get("dotted_order"),
+                        "parent_run_id": operation.get("parent_run_id"),
+                    }
+                    update_params = {k: v for k, v in update_params.items() if v is not None}
+                    logger.update_run(**update_params)
+
+
+########################################
+# Benchmark Runner
+########################################
+
+def run_ls_benchmark(data_dir: str) -> Tuple[float, float, float, int]:
+    """Run the Langsmith benchmark."""
+    logger = Client()
+    
+    data_path = Path(data_dir)
+    run_ops_files = list(data_path.glob("processed_run_ops_*.jsonl"))
+    run_ops_files.sort()
+    num_traces = len(run_ops_files)
+    
+    user_perceived_start = time.perf_counter()
+    for run_ops_file in run_ops_files:
+        try:
+            LangsmithReplay.replay_trace(run_ops_file, logger)
+        except Exception as e:
+            print(f"Error replaying {run_ops_file}: {str(e)}", file=sys.stderr)
+    
+    user_perceived_time = time.perf_counter() - user_perceived_start
+    
+    flush_start = time.perf_counter()
+    logger.flush()
+    flush_time = time.perf_counter() - flush_start
+    
+    total_time = user_perceived_time + flush_time
+    return user_perceived_time, flush_time, total_time, num_traces
+
+
+def format_results(ls_results: Tuple[float, float, float, int],
+                   data_dir: str) -> str:
+    """Format benchmark results."""
+    _, _, ls_total, num_traces = ls_results
+    
+    total_size = get_directory_size(data_dir)
+    size_human = humanize.naturalsize(total_size)
+    
+    avg_ls = ls_total / num_traces if num_traces else 0
+    
+    return f"""\
+Langsmith Benchmark Results
+===========================
+Time Breakdown:
+    Total time      {ls_total:7.3f}s
+
+Performance:
+    Total Traces    {num_traces}
+    Total Size      {size_human}
+    Avg time/trace  {avg_ls:7.3f}s
+"""
+
+def main(data_dir: str):
+    print("Running Langsmith benchmark...")
+    ls_results = run_ls_benchmark(data_dir)
+    
+    results = format_results(ls_results, data_dir)
+    
+    # Print to console
+    print("\nBenchmark Results:\n")
+    print(results)
+    
+    # Save results to a file
+    with open("benchmark_results_flat.txt", "w") as f:
+        f.write(results)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Benchmark flat tracing performance (runs get their own traces)",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Example:
+  python benchmark_flat.py data
+        """
+    )
+    parser.add_argument(
+        "data_dir",
+        type=str,
+        help="Directory containing processed_run_ops_*.jsonl trace files"
+    )
+    
+    args = parser.parse_args()
+    main(args.data_dir)
@@ -0,0 +1,142 @@
+import argparse
+import sys
+import time
+from pathlib import Path
+from datetime import datetime
+from typing import Tuple
+
+import orjson
+import dotenv
+import humanize
+
+from langsmith import Client
+
+dotenv.load_dotenv()
+
+def get_directory_size(directory: str) -> int:
+    """Calculate total size of all JSONL files in directory."""
+    total_size = 0
+    for path in Path(directory).glob("*.jsonl"):
+        total_size += path.stat().st_size
+    return total_size
+
+class LangsmithReplay:
+    @staticmethod
+    def replay_trace(run_ops_file: Path, logger: Client) -> None:
+        with open(run_ops_file, 'rb') as f:
+            for line in f:
+                operation = orjson.loads(line)
+                
+                if operation["operation"] == "post":
+                    create_params = {
+                        "name": operation.get("name"),
+                        "start_time": operation.get("start_time"),
+                        "inputs": operation.get("inputs", {}),
+                        "run_type": operation.get("run_type"),
+                        "serialized": operation.get("serialized", {}),
+                        "extra": operation.get("extra", {}),
+                        "tags": operation.get("tags", []),
+                        "trace_id": operation.get("trace_id"),
+                        "dotted_order": operation.get("dotted_order"),
+                        "parent_run_id": operation.get("parent_run_id"),
+                        "id": operation.get('id'),
+                    }
+                    create_params = {k: v for k, v in create_params.items() if v is not None}
+                    logger.create_run(**create_params)
+                    
+                elif operation["operation"] == "patch":
+                    end_time = operation.get("end_time")
+                    if end_time and isinstance(end_time, str):
+                        try:
+                            end_time = datetime.fromisoformat(end_time.replace('Z', '+00:00'))
+                        except ValueError:
+                            end_time = None
+                    
+                    update_params = {
+                        "run_id": operation.get("id"),
+                        "end_time": end_time,
+                        "outputs": operation.get("outputs", {}),
+                        "error": operation.get("error"),
+                        "trace_id": operation.get("trace_id"),
+                        "dotted_order": operation.get("dotted_order"),
+                        "parent_run_id": operation.get("parent_run_id"),
+                    }
+                    update_params = {k: v for k, v in update_params.items() if v is not None}
+                    logger.update_run(**update_params)
+
+
+def run_ls_benchmark(data_dir: str) -> Tuple[float, float, float, int]:
+    """Run the Langsmith benchmark."""
+    logger = Client()
+    
+    data_path = Path(data_dir)
+    run_ops_files = list(data_path.glob("processed_run_ops_*.jsonl"))
+    run_ops_files.sort()
+    num_traces = len(run_ops_files)
+    
+    user_perceived_start = time.perf_counter()
+    for run_ops_file in run_ops_files:
+        try:
+            LangsmithReplay.replay_trace(run_ops_file, logger)
+        except Exception as e:
+            print(f"Error replaying {run_ops_file}: {str(e)}", file=sys.stderr)
+    
+    user_perceived_time = time.perf_counter() - user_perceived_start
+    
+    flush_start = time.perf_counter()
+    logger.flush()
+    flush_time = time.perf_counter() - flush_start
+    
+    total_time = user_perceived_time + flush_time
+    return user_perceived_time, flush_time, total_time, num_traces
+
+def format_results(ls_results: Tuple[float, float, float, int],
+                   data_dir: str) -> str:
+    """Format benchmark results."""
+    _, _, ls_total, num_traces = ls_results
+    
+    total_size = get_directory_size(data_dir)
+    size_human = humanize.naturalsize(total_size)
+    
+    avg_ls = ls_total / num_traces if num_traces else 0
+    
+    return f"""\
+Langsmith Benchmark Results
+===========================
+Time Breakdown:
+    Total time      {ls_total:7.3f}s
+
+Performance:
+    Total Traces    {num_traces}
+    Total Size      {size_human}
+    Avg time/trace  {avg_ls:7.3f}s
+"""
+
+
+def main(data_dir: str):
+    print("Running Langsmith benchmark...")
+    ls_results = run_ls_benchmark(data_dir)
+    
+    results = format_results(ls_results, data_dir)
+    print(results)
+    
+    with open("benchmark_results_nested.txt", "w") as f:
+        f.write(results)
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Benchmark nested tracing performance (runs properly nested under parents)",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Example:
+  python benchmark_nested.py data
+        """
+    )
+    parser.add_argument(
+        "data_dir",
+        type=str,
+        help="Directory containing processed_run_ops_*.jsonl trace files"
+    )
+    
+    args = parser.parse_args()
+    main(args.data_dir)
@@ -0,0 +1,9 @@
+Langsmith Benchmark Results
+===========================
+Time Breakdown:
+    Total time        2.899s
+
+Performance:
+    Total Traces    45
+    Total Size      145.6 MB
+    Avg time/trace    0.064s
@@ -0,0 +1,9 @@
+Langsmith Benchmark Results
+===========================
+Time Breakdown:
+    Total time        1.660s
+
+Performance:
+    Total Traces    45
+    Total Size      145.6 MB
+    Avg time/trace    0.037s
@@ -0,0 +1,85 @@
+from langsmith import Client
+import orjson
+import dotenv
+
+
+trace_ids = { # Replace with your trace ids
+    "1efcf279-c279-613f-b5c6-4705c507f3e7",
+    "1efcf279-6c93-6b57-8468-788f550a4171",
+    "1efcf237-d278-65a7-a2da-d75c39571b51",
+    "1efcf1f3-3ee1-6643-9513-f19ec6ea38cd",
+    "1efcf164-fcce-66e1-bb63-0673854ee9f4",
+    "1efcf15f-521f-642b-92b3-9cc39ad0e414",
+    "1efcf15f-2725-61b3-a8fc-ad30facd29bd",
+    "1efcf12d-d85d-6780-9300-cf174385be8d",
+    "1efcf134-212d-6350-af0e-01e79b5c5b02",
+    "1efcf115-bfd6-6945-a028-11213eca6861",
+    "1efcf108-9282-6966-8ef1-d0c5f16d23da",
+    "1efcf102-e767-6629-bd02-16bafbb1bfb4",
+    "1efcf0d6-43bf-6ff8-b12a-47a79c940b6e",
+    "1efcf0d4-2007-6727-be94-9884ee1c0cbe",
+    "1efcf0d2-5e61-6a0d-b6d3-e9666f398dfa",
+    "1efcf0cf-d5e7-6b82-9635-7ee2bb254b18",
+    "1efcf0cd-a49f-665b-9894-52d508078fc1",
+    "1efcf0be-fc26-6bee-9361-51e9631e9f99",
+    "1efcf09b-9bcb-68e6-8cc5-da2673832877",
+    "1efcf072-10ee-684d-ab63-51194ccc7955",
+    "1efcf06c-ec82-63c2-a2ec-1e69717454c1",
+    "1efcf05a-5bac-6bba-98d1-978315c98f98",
+    "1efcf04a-a9ae-60a6-b899-19ba02b2c8ea",
+    "1efcf01f-fb89-61bd-a8ba-1123f9f48458",
+    "1efcefe2-7fe1-6ff4-9ea8-300f343970e1",
+    "1efcef47-59e3-6165-bac0-1a6524c25453",
+    "1efceee0-a5ba-6a89-b1eb-181a839b2a28",
+    "1efceed5-37db-6423-a76f-974355024b85",
+    "1efceecb-b89e-6a62-8140-0d9edfbe5225",
+    "1efceec9-8695-6c1b-a4eb-eaab38afce6c",
+    "1efcee40-3791-6296-a156-5207b6f9291c",
+}
+
+def produce_run_ops_jsonl_files():
+    client = Client()
+    for trace_id in trace_ids:
+        results = client.list_runs(
+            project_name='example-project', # Replace with your project name
+            trace_id=trace_id,
+        )
+        results = list(results)
+        results.sort(key=lambda x: x.dotted_order)
+        with open(f'data/processed_run_ops_{trace_id}.jsonl', 'wb') as run_ops_file:
+            for run in results:
+                run_dict = dict(run)
+                post = {
+                    "operation": "post",
+                    "id": run_dict["id"],
+                    "name": run_dict["name"],
+                    "start_time": run_dict["start_time"],
+                    "serialized": run_dict["serialized"],
+                    "events": run_dict["events"],
+                    "inputs": run_dict["inputs"],
+                    "run_type": run_dict["run_type"],
+                    "extra": run_dict["extra"],
+                    "tags": run_dict["tags"],
+                    "trace_id": run_dict["trace_id"],
+                    "dotted_order": run_dict["dotted_order"],
+                    "parent_run_id": run_dict["parent_run_id"],
+                }
+                run_ops_file.write(orjson.dumps(post))
+                run_ops_file.write(b'\n')
+                patch = {
+                    "operation": "patch",
+                    "id": run_dict["id"],
+                    "name": run_dict["name"],
+                    "end_time": run_dict["end_time"],
+                    "error": run_dict["error"],
+                    "outputs": run_dict["outputs"],
+                    "trace_id": run_dict["trace_id"],
+                    "dotted_order": run_dict["dotted_order"],
+                    "parent_run_id": run_dict["parent_run_id"],
+                }
+                run_ops_file.write(orjson.dumps(patch))
+                run_ops_file.write(b'\n')
+
+if __name__ == "__main__":
+    dotenv.load_dotenv()
+    produce_run_ops_jsonl_files()
@@ -0,0 +1,83 @@
+import orjson
+from pathlib import Path
+import dotenv
+import time
+from datetime import datetime
+
+from langsmith import Client
+
+dotenv.load_dotenv()
+
+def replay_trace(run_ops_file: Path, logger: Client) -> None:
+    with open(run_ops_file, 'rb') as f:
+        for line in f:
+            operation = orjson.loads(line)
+            
+            if operation["operation"] == "post":
+                create_params = {
+                    "name": operation.get("name"),
+                    "start_time": operation.get("start_time"),
+                    "inputs": operation.get("inputs", {}),
+                    "run_type": operation.get("run_type"),
+                    "serialized": operation.get("serialized", {}),
+                    "extra": operation.get("extra", {}),
+                    "tags": operation.get("tags", []),
+                    "trace_id": operation.get("trace_id"),
+                    "dotted_order": operation.get("dotted_order"),
+                    "parent_run_id": operation.get("parent_run_id"),
+                    "id": operation.get('id'),
+                }
+                create_params = {k: v for k, v in create_params.items() if v is not None}
+                logger.create_run(**create_params)
+                
+            elif operation["operation"] == "patch":
+                end_time = operation.get("end_time")
+                if end_time and isinstance(end_time, str):
+                    try:
+                        end_time = datetime.fromisoformat(end_time.replace('Z', '+00:00'))
+                    except ValueError:
+                        end_time = None
+                
+                update_params = {
+                    "run_id": operation.get("id"),
+                    "end_time": end_time,
+                    "outputs": operation.get("outputs", {}),
+                    "error": operation.get("error"),
+                    "trace_id": operation.get("trace_id"),
+                    "dotted_order": operation.get("dotted_order"),
+                    "parent_run_id": operation.get("parent_run_id"),
+                }
+                update_params = {k: v for k, v in update_params.items() if v is not None}
+                logger.update_run(**update_params)
+
+
+def replay_all_traces(data_dir: str = "data") -> None:
+    logger = Client()
+
+    data_path = Path(data_dir)
+    run_ops_files = list(data_path.glob("processed_run_ops_*.jsonl"))
+    run_ops_files.sort()
+
+    user_percieved_start_time = time.perf_counter()
+    for run_ops_file in run_ops_files:
+        try:
+            replay_trace(run_ops_file, logger)
+        except Exception as e:
+            print(f"Error replaying {run_ops_file}: {str(e)}")
+
+    user_percieved_time = time.perf_counter() - user_percieved_start_time
+    print(f"User perceived time taken: {user_percieved_time} seconds")
+
+    flush_time = time.perf_counter()
+    logger.flush()
+    flush_time = time.perf_counter() - flush_time
+    print(f"Flush time: {flush_time} seconds")
+    
+
+if __name__ == "__main__":
+    start_time = time.perf_counter()
+    try:
+        replay_all_traces()
+    finally:
+        end_time = time.perf_counter()
+        print(f"Total time taken: {end_time - start_time} seconds")
@@ -0,0 +1,50 @@
+import uuid
+import re
+from pathlib import Path
+
+def generate_uuid_mapping(content):
+    uuid_pattern = r'[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}'
+    old_uuids = set(re.findall(uuid_pattern, content))
+    return {old: str(uuid.uuid4()) for old in old_uuids}
+
+def update_json_content(content, uuid_mapping):
+    updated_content = content
+    
+    # Sort UUIDs by length in descending order to avoid partial replacements
+    for old_uuid in sorted(uuid_mapping.keys(), key=len, reverse=True):
+        new_uuid = uuid_mapping[old_uuid]
+        updated_content = updated_content.replace(old_uuid, new_uuid)
+    
+    return updated_content
+
+def process_json_file(input_path, output_path):
+    """Process the JSON file and write updated content to output file."""
+    try:
+        with open(input_path, 'r') as file:
+            content = file.read()
+        
+        uuid_mapping = generate_uuid_mapping(content)
+        
+        updated_content = update_json_content(content, uuid_mapping)
+        
+        with open(output_path, 'w') as file:
+            file.write(updated_content)
+        
+        print(f"Successfully processed file. Output written to: {output_path}")
+        print(f"UUID mappings:")
+        for old, new in uuid_mapping.items():
+            print(f"Old: {old}")
+            print(f"New: {new}")
+            print("-" * 50)
+            
+    except Exception as e:
+        print(f"Error processing file: {str(e)}")
+
+# Process all files in data directory
+data_path = Path("data")
+run_ops_files = list(data_path.glob("*.jsonl"))
+run_ops_files.sort()
+
+for input_file in run_ops_files:
+    output_file = input_file.parent / f"{input_file.name}"
+    process_json_file(str(input_file), str(output_file))
@@ -0,0 +1,116 @@
+import re
+from pathlib import Path
+from datetime import date, timedelta
+
+
+def update_json_dates(content):
+    today = date.today()
+    
+    # Collect all unique dates from the content
+    all_dates = set()
+    
+    # Find all dates in start_time fields
+    start_time_pattern = r'("start_time":\s*")(\d{4}-\d{2}-\d{2})T'
+    start_time_matches = re.findall(start_time_pattern, content)
+    for match in start_time_matches:
+        all_dates.add(match[1])
+    
+    # Find all dates in end_time fields
+    end_time_pattern = r'("end_time":\s*")(\d{4}-\d{2}-\d{2})T'
+    end_time_matches = re.findall(end_time_pattern, content)
+    for match in end_time_matches:
+        all_dates.add(match[1])
+    
+    # Find all dates in start events
+    start_event_pattern = r'("name":"start","time":")(\d{4}-\d{2}-\d{2})T'
+    start_event_matches = re.findall(start_event_pattern, content)
+    for match in start_event_matches:
+        all_dates.add(match[1])
+    
+    # Find all dates in end events
+    end_event_pattern = r'("name":"end","time":")(\d{4}-\d{2}-\d{2})T'
+    end_event_matches = re.findall(end_event_pattern, content)
+    for match in end_event_matches:
+        all_dates.add(match[1])
+    
+    if not all_dates:
+        print("No dates found to update.")
+        return content
+    
+    # Create date mapping: latest date -> today, second latest -> yesterday, etc.
+    sorted_dates = sorted(all_dates)
+    date_mapping = {}
+    for i, old_date in enumerate(sorted_dates):
+        days_back = len(sorted_dates) - 1 - i
+        new_date = today - timedelta(days=days_back)
+        date_mapping[old_date] = new_date.isoformat()
+    
+    print(f"Date mapping:")
+    for old_date, new_date in date_mapping.items():
+        print(f"  {old_date} -> {new_date}")
+    
+    updated_content = content
+    
+    # Apply mapping to start_time fields
+    for old_date, new_date in date_mapping.items():
+        updated_content = re.sub(
+            f'("start_time":\\s*"){old_date}T',
+            f'"start_time":"{new_date}T',
+            updated_content
+        )
+    
+    # Apply mapping to end_time fields
+    for old_date, new_date in date_mapping.items():
+        updated_content = re.sub(
+            f'("end_time":\\s*"){old_date}T',
+            f'"end_time":"{new_date}T',
+            updated_content
+        )
+    
+    # Apply mapping to start events
+    for old_date, new_date in date_mapping.items():
+        updated_content = re.sub(
+            f'("name":"start","time":")({old_date})T',
+            f'"name":"start","time":"{new_date}T',
+            updated_content
+        )
+    
+    # Apply mapping to end events
+    for old_date, new_date in date_mapping.items():
+        updated_content = re.sub(
+            f'("name":"end","time":")({old_date})T',
+            f'"name":"end","time":"{new_date}T',
+            updated_content
+        )
+    
+    return updated_content
+
+
+def process_json_file(input_path, output_path):
+    """Process the JSON file and write updated content to output file."""
+    try:
+        with open(input_path, 'r') as file:
+            content = file.read()
+        
+        updated_content = update_json_dates(content)
+        
+        with open(output_path, 'w') as file:
+            file.write(updated_content)
+        
+        print(f"Successfully processed file. Output written to: {output_path}")
+            
+    except Exception as e:
+        print(f"Error processing file: {str(e)}")
+
+
+# Process all files in data directory
+data_path = Path("data")
+if data_path.exists():
+    run_ops_files = list(data_path.glob("*.jsonl"))
+    run_ops_files.sort()
+
+    for input_file in run_ops_files:
+        output_file = input_file.parent / f"{input_file.name}"
+        process_json_file(str(input_file), str(output_file))
+else:
+    print(f"Directory not found: {data_path}. Make sure to run this script from the root directory of the project.")