mirror of
https://github.com/langchain-ai/langsmith-cookbook.git
synced 2026-07-01 08:12:02 -04:00
198 lines
6.1 KiB
Plaintext
198 lines
6.1 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "1ddb1a3b-eaf7-4755-8bfe-4d9178c7927a",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"source": [
|
|
"# Add Metrics to Existing Tests\n",
|
|
"[](https://colab.research.google.com/github/langchain-ai/langsmith-cookbook/blob/main/testing-examples/evaluate-existing-test-project/evaluate_runs.ipynb)\n",
|
|
"\n",
|
|
"At times, you may want to apply an evaluator post-hoc. This is useful if you have a new evaluator (or version of an evaluator) and want to add the metrics without re-running your model. \n",
|
|
"\n",
|
|
"You can do this like so:\n",
|
|
"\n",
|
|
"```python\n",
|
|
"from langsmith.beta import compute_test_metrics\n",
|
|
"\n",
|
|
"def my_evaluator(run, example):\n",
|
|
" score = \"foo\" in run.outputs['output']\n",
|
|
" return {\"key\": \"is_foo\", \"score\": score}\n",
|
|
"\n",
|
|
"# The name of the test you have already run.\n",
|
|
"# This is DISTINCT from the dataset name\n",
|
|
"test_project = \"test-abc123\"\n",
|
|
"compute_test_metrics(test_project, evaluators=[my_evaluator])\n",
|
|
"```\n",
|
|
"\n",
|
|
"Within the `compute_test_metrics` function, we list the runs in the test and apply the provided evaluators to each one.\n",
|
|
"\n",
|
|
"Below, we will share a quick example."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9c7e62f7-5f6d-40c7-9efc-e5cd76321fda",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Prerequisites\n",
|
|
"\n",
|
|
"Install the requisite packages, and generate the initial test results. In reality, you will already have a dataset + test results.\n",
|
|
"\n",
|
|
"This utility function expects `langsmith>=0.1.31`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "03d82d6f-67a3-4a2d-9b86-604bc48b5820",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# %pip install -U langsmith langchain"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "ee6bfe5b-9736-4a8b-85e7-2b749ee747fc",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"import uuid\n",
|
|
"\n",
|
|
"os.environ[\"LANGCHAIN_API_KEY\"] = \"YOUR API KEY\"\n",
|
|
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
|
|
"# Update if you are self-hosted\n",
|
|
"os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "be0ff7e9-41f6-463e-943f-f9e77b92cdc0",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"View the evaluation results for project 'puzzled-cloud-96' at:\n",
|
|
"https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/cbdb128b-a725-4662-a515-dfe0009cb15c/compare?selectedSessions=28f2c88e-3091-4fcc-bac7-c1dbd8a6a43b\n",
|
|
"\n",
|
|
"View all tests for Dataset My Example Dataset 512ee7 at:\n",
|
|
"https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/cbdb128b-a725-4662-a515-dfe0009cb15c\n",
|
|
"[------------------------------------------------->] 10/10"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from langsmith import Client\n",
|
|
"\n",
|
|
"client = Client()\n",
|
|
"dataset_name = \"My Example Dataset \" + uuid.uuid4().hex[:6]\n",
|
|
"\n",
|
|
"ds = client.create_dataset(dataset_name=dataset_name)\n",
|
|
"client.create_examples(\n",
|
|
" inputs=[{\"input\": i} for i in range(10)],\n",
|
|
" outputs=[{\"output\": i * (3 % (i + 1))} for i in range(10)],\n",
|
|
" dataset_id=ds.id,\n",
|
|
")\n",
|
|
"\n",
|
|
"\n",
|
|
"def my_chain(example_input: dict):\n",
|
|
" # The input to the llm_or_chain_factory is\n",
|
|
" # the example.inputs\n",
|
|
" return {\"output\": example_input[\"input\"] * 3}\n",
|
|
"\n",
|
|
"\n",
|
|
"results = client.run_on_dataset(\n",
|
|
" dataset_name=dataset_name, llm_or_chain_factory=my_chain\n",
|
|
")\n",
|
|
"\n",
|
|
"test_name = results[\"project_name\"]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "996a3fb4-ae21-4b18-8ba6-d12c4fa73356",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Add Evaluation Metrics\n",
|
|
"\n",
|
|
"Now that we have existing test results, we can apply new evaluators to this project using the `compute_test_metrics` utility function."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "ae6f9459-51fa-468c-bc65-0b965f5ba628",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"/var/folders/gf/6rnp_mbx5914kx7qmmh7xzmw0000gn/T/ipykernel_80329/988510393.py:14: UserWarning: Function compute_test_metrics is in beta.\n",
|
|
" compute_test_metrics(test_name, evaluators=[exact_match])\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from langsmith.beta._evals import compute_test_metrics\n",
|
|
"from langsmith.schemas import Example, Run\n",
|
|
"\n",
|
|
"\n",
|
|
"def exact_match(run: Run, example: Example):\n",
|
|
" # \"output\" is the key we assigned in the create_examples step above\n",
|
|
" expected = example.outputs[\"output\"]\n",
|
|
" predicted = run.outputs[\"output\"]\n",
|
|
" return {\"key\": \"exact_match\", \"score\": predicted == expected}\n",
|
|
"\n",
|
|
"\n",
|
|
"# The name of the test you have already run.\n",
|
|
"# This is DISTINCT from the dataset name\n",
|
|
"compute_test_metrics(test_name, evaluators=[exact_match])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "1cdb41ef-3892-4385-8830-c6decfbf8f5c",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now you can check out the test results in the above link.\n",
|
|
"\n",
|
|
"## Conclusion\n",
|
|
"\n",
|
|
"Congrats! You've run evals on an existing test. This makes it easy to backfill evaluation results on old test results."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.2"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|