Pin litellm version constraints
Recipe Chatbot - AI Evaluations Course with LangSmith
This repository is a modified version of evaluations course repo. It's set up to walk you through the homework assignments using LangSmith, a platform that provides best-in-class tooling for observability, evals, and more.
Quick Start
-
Clone & Setup
git clone https://github.com/langchain-ai/recipe-chatbot.git cd recipe-chatbot python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -r requirements.txt -
Configure Environment Navigate to LangSmith and sign up for an account if you don't already have one. You'll need to create an API key by pressing
Settingsin the sidebar.Then, copy the
env.examplefile to a.envfile using the command below and paste the required secrets there, including your newly created LangSmith key:cp env.example .env # Edit .env to add your model and API keys -
Run the Chatbot
uvicorn backend.main:app --reload # Open http://127.0.0.1:8000
The only differences between the recipe chatbot code in this repo and the main course repo are wrapping the LiteLLM call so that it traces to LangSmith:
@traceable(name="LiteLLM", run_type="llm")
def litellm_completion(model: str, messages: List[Dict[str, str]], **kwargs: Any):
completion = litellm.completion(
model=model,
messages=messages,
**kwargs,
)
return completion
And then importing and using this wrapped method instead of calling litellm.completion directly.
Course Overview
Homework Progression
This repo contains modified homework instructions that take advantage of LangSmith's platform.
Follow along using the updated README.md for each assignment below:
-
HW1: Basic Prompt Engineering (
homeworks/hw1/README.md)- Write system prompts and expand test queries
-
HW2: Error Analysis & Failure Taxonomy (
homeworks/hw2/README.md) -
HW3: LLM-as-Judge Evaluation (
homeworks/hw3/README.md)- Automated evaluation using LangSmith
Key Features
- Backend: FastAPI with LiteLLM (multi-provider LLM support)
- Frontend: Simple chat interface with conversation history
- Annotation Tool: FastHTML-based interface for manual evaluation (
annotation/) - Retrieval: BM25-based recipe search (
backend/retrieval.py) - Query Rewriting: LLM-powered query optimization (
backend/query_rewrite_agent.py) - Evaluation Tools: Automated metrics, bias correction, and analysis scripts
Project Structure
recipe-chatbot/
├── backend/ # FastAPI app & core logic
├── frontend/ # Chat UI (HTML/CSS/JS)
├── homeworks/ # 5 progressive assignments
│ ├── hw1/ # Prompt engineering
│ ├── hw2/ # Error analysis (with walkthrough)
│ └── hw3/ # LLM-as-Judge
├── annotation/ # Manual annotation tools
├── scripts/ # Utility scripts
├── data/ # Datasets and queries
└── results/ # Evaluation outputs
Environment Variables
Configure your .env file with:
MODEL_NAME: LLM model (e.g.,openai/gpt-4,anthropic/claude-3-haiku-20240307)- API keys:
OPENAI_API_KEY,ANTHROPIC_API_KEY, etc.
See LiteLLM docs for supported providers.
Course Philosophy
This course emphasizes:
- Practical experience over theory
- Systematic evaluation over "vibes"
- Progressive complexity - each homework builds on previous work
- Industry-standard techniques for real-world AI evaluation