langsmith-cookbook/testing-examples/evaluate-existing-test-project/evaluate_runs.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1ddb1a3b-eaf7-4755-8bfe-4d9178c7927a",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Add Metrics to Existing Tests\n",
    "[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langsmith-cookbook/blob/main/testing-examples/evaluate-existing-test-project/evaluate_runs.ipynb)\n",
    "\n",
    "At times, you may want to apply an evaluator post-hoc. This is useful if you have a new evaluator (or version of an evaluator) and want to add the metrics without re-running your model. \n",
    "\n",
    "You can do this like so:\n",
    "\n",
    "```python\n",
    "from langsmith.beta import compute_test_metrics\n",
    "\n",
    "def my_evaluator(run, example):\n",
    "     score = \"foo\" in run.outputs['output']\n",
    "    return {\"key\": \"is_foo\", \"score\": score}\n",
    "\n",
    "# The name of the test you have already run.\n",
    "# This is DISTINCT from the dataset name\n",
    "test_project = \"test-abc123\"\n",
    "compute_test_metrics(test_project, evaluators=[my_evaluator])\n",
    "```\n",
    "\n",
    "Within the `compute_test_metrics` function, we list the runs in the test and apply the provided evaluators to each one.\n",
    "\n",
    "Below, we will share a quick example."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c7e62f7-5f6d-40c7-9efc-e5cd76321fda",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "Install the requisite packages, and generate the initial test results. In reality, you will already have a dataset + test results.\n",
    "\n",
    "This utility function expects `langsmith>=0.1.31`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "03d82d6f-67a3-4a2d-9b86-604bc48b5820",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %pip install -U langsmith langchain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ee6bfe5b-9736-4a8b-85e7-2b749ee747fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import uuid\n",
    "\n",
    "os.environ[\"LANGCHAIN_API_KEY\"] = \"YOUR API KEY\"\n",
    "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
    "# Update if you are self-hosted\n",
    "os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "be0ff7e9-41f6-463e-943f-f9e77b92cdc0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "View the evaluation results for project 'puzzled-cloud-96' at:\n",
      "https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/cbdb128b-a725-4662-a515-dfe0009cb15c/compare?selectedSessions=28f2c88e-3091-4fcc-bac7-c1dbd8a6a43b\n",
      "\n",
      "View all tests for Dataset My Example Dataset 512ee7 at:\n",
      "https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/cbdb128b-a725-4662-a515-dfe0009cb15c\n",
      "[------------------------------------------------->] 10/10"
     ]
    }
   ],
   "source": [
    "from langsmith import Client\n",
    "\n",
    "client = Client()\n",
    "dataset_name = \"My Example Dataset \" + uuid.uuid4().hex[:6]\n",
    "\n",
    "ds = client.create_dataset(dataset_name=dataset_name)\n",
    "client.create_examples(\n",
    "    inputs=[{\"input\": i} for i in range(10)],\n",
    "    outputs=[{\"output\": i * (3 % (i + 1))} for i in range(10)],\n",
    "    dataset_id=ds.id,\n",
    ")\n",
    "\n",
    "\n",
    "def my_chain(example_input: dict):\n",
    "    # The input to the llm_or_chain_factory is\n",
    "    # the example.inputs\n",
    "    return {\"output\": example_input[\"input\"] * 3}\n",
    "\n",
    "\n",
    "results = client.run_on_dataset(\n",
    "    dataset_name=dataset_name, llm_or_chain_factory=my_chain\n",
    ")\n",
    "\n",
    "test_name = results[\"project_name\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "996a3fb4-ae21-4b18-8ba6-d12c4fa73356",
   "metadata": {},
   "source": [
    "## Add Evaluation Metrics\n",
    "\n",
    "Now that we have existing test results, we can apply new evaluators to this project using the `compute_test_metrics` utility function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "ae6f9459-51fa-468c-bc65-0b965f5ba628",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/gf/6rnp_mbx5914kx7qmmh7xzmw0000gn/T/ipykernel_80329/988510393.py:14: UserWarning: Function compute_test_metrics is in beta.\n",
      "  compute_test_metrics(test_name, evaluators=[exact_match])\n"
     ]
    }
   ],
   "source": [
    "from langsmith.beta._evals import compute_test_metrics\n",
    "from langsmith.schemas import Example, Run\n",
    "\n",
    "\n",
    "def exact_match(run: Run, example: Example):\n",
    "    # \"output\" is the key we assigned in the create_examples step above\n",
    "    expected = example.outputs[\"output\"]\n",
    "    predicted = run.outputs[\"output\"]\n",
    "    return {\"key\": \"exact_match\", \"score\": predicted == expected}\n",
    "\n",
    "\n",
    "# The name of the test you have already run.\n",
    "# This is DISTINCT from the dataset name\n",
    "compute_test_metrics(test_name, evaluators=[exact_match])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1cdb41ef-3892-4385-8830-c6decfbf8f5c",
   "metadata": {},
   "source": [
    "Now you can check out the test results in the above link.\n",
    "\n",
    "## Conclusion\n",
    "\n",
    "Congrats! You've run evals on an existing test. This makes it easy to backfill evaluation results on old test results."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}