# Open WebUI Benchmark Suite A comprehensive benchmarking framework for testing Open WebUI performance under various load conditions. ## Overview This benchmark suite is designed to: 1. **Measure concurrent user capacity** - Test how many users can simultaneously use features like Channels 2. **Identify performance limits** - Find the point where response times degrade 3. **Compare compute profiles** - Test performance across different resource configurations 4. **Generate actionable reports** - Provide detailed metrics and recommendations ## Quick Start ### Prerequisites - Python 3.11+ - Docker and Docker Compose - A running Open WebUI instance (or use the provided Docker setup) - Chromium browser (installed automatically via Playwright for UI benchmarks) ### Installation ```bash cd benchmark python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -e . # Install Playwright browsers (required for UI benchmarks) playwright install chromium ``` ### Configuration Copy the example environment file and configure your admin credentials: ```bash cp .env.example .env ``` Edit `.env` with your Open WebUI admin credentials: ```dotenv OPEN_WEBUI_URL=http://localhost:8080 ADMIN_USER_EMAIL=your-admin@example.com ADMIN_USER_PASSWORD=your-password ``` ### Running Benchmarks 1. **Start Open WebUI with benchmark configuration:** ```bash cd docker ./run.sh default # Use the default compute profile (2 CPU, 8GB RAM) ``` 2. **Run the benchmark:** ```bash # Run the default benchmark (chat-ui with auto-scaling), which automatically finds max sustainable users based on P95 response time owb run # Set a custom response time threshold (default: 1000ms) owb run --response-threshold 2000 # Run with a fixed number of users (disables auto-scaling) owb run -m 50 # Run with visible browsers for debugging owb run --headed owb run --headed --slow-mo 500 # Slow down for visual inspection ``` 3. **List available benchmarks:** ```bash owb list ``` 4. **Run other benchmarks:** ```bash # API-based chat benchmark (no browser) owb run chat-api -m 50 # Channel API concurrency owb run channels-api -m 50 # Channel WebSocket benchmark owb run channels-ws -m 50 # Run all benchmarks owb run all ``` 5. **View results:** Results are organized by benchmark name and timestamp: ``` results/ └── chat_ui_concurrency/ └── 20260126_014205/ ├── result.json # Detailed benchmark data ├── results.csv # Tabular results └── summary.txt # Human-readable summary ``` ## Compute Profiles Compute profiles define the resource constraints for the Open WebUI container: | Profile | CPUs | Memory | Use Case | |---------|------|--------|----------| | `default` | 2 | 8GB | Local MacBook testing | | `minimal` | 1 | 4GB | Testing lower bounds | | `cloud_small` | 2 | 4GB | Small cloud VM | | `cloud_medium` | 4 | 8GB | Medium cloud VM | | `cloud_large` | 8 | 16GB | Large cloud VM | List available profiles: ```bash owb profiles ``` ## Available Benchmarks ### Chat API Concurrency (`chat-api`) Tests concurrent AI chat performance via the OpenAI-compatible API: - Creates test users and makes a model publicly available - Each user sends chat requests via the `/api/chat` endpoint - Measures response times, throughput, and error rates - Tests the backend's ability to handle concurrent LLM requests **Usage:** ```bash owb run chat-api -m 50 --model gpt-4o-mini ``` ### Chat UI Concurrency (`chat-ui`) - Default Tests concurrent AI chat performance through actual browser UI using Playwright. **This is the default benchmark** and runs in auto-scale mode by default. **Auto-scale mode (default):** - Progressively adds users until P95 response time exceeds threshold - Automatically finds maximum sustainable concurrent users - Reports performance at each level tested **Fixed mode:** - Test with a specific number of concurrent users - Enabled by specifying `--max-users` / `-m` **How it works:** - Launches real Chromium browser instances (or contexts) - Each browser logs in as a different user - Sends chat messages and waits for streaming responses - Measures actual user-experienced response times including rendering - Tests full stack performance: UI, backend, and LLM together **Usage:** ```bash # Auto-scale mode (default) - finds max sustainable users owb run owb run --response-threshold 2000 # Custom threshold (default: 1000ms) # Fixed mode - test specific user count owb run -m 50 owb run -m 50 --model gpt-4o-mini # Debugging options owb run --headed # Visible browsers owb run --headed --slow-mo 500 # Slow down for inspection ``` **Configuration:** ```yaml chat_ui: headless: true # Run browsers in headless mode slow_mo: 0 # Slow down operations by ms (debugging) viewport_width: 1280 # Browser viewport width viewport_height: 720 # Browser viewport height browser_timeout: 30000 # Default timeout in ms screenshot_on_error: true # Capture screenshots on failure use_isolated_browsers: false # Use separate browser instances vs contexts ``` **Notes:** - Browser benchmarks require more resources than API benchmarks - For high concurrency (50+), use headless mode and browser contexts - Headed mode is useful for debugging UI issues - The benchmark measures actual streaming response detection ### Channel Concurrency (`channels-api`) Tests concurrent user capacity in Open WebUI Channels: - Creates a test channel - Progressively adds users (10, 20, 30, ... up to max) - Each user sends messages at a configured rate - Measures response times and error rates - Identifies the maximum sustainable user count **Configuration options:** ```yaml channels: max_concurrent_users: 100 # Maximum users to test user_step_size: 10 # Increment users by this amount sustain_time: 30 # Seconds to run at each level message_frequency: 0.5 # Messages per second per user ``` ### Channel WebSocket (`channels-ws`) Tests WebSocket scalability for real-time message delivery in Channels: - Establishes WebSocket connections for multiple users - Tests real-time message broadcasting - Measures message delivery latency - Identifies WebSocket connection limits ## Configuration Configuration files are located in `config/`: - `benchmark_config.yaml` - Main benchmark settings - `compute_profiles.yaml` - Resource profiles for Docker containers ### Environment Variables All configuration can be set via environment variables (loaded from `.env` file): | Variable | Description | Default | |----------|-------------|---------| | `OPEN_WEBUI_URL` | Open WebUI URL for benchmarking | `http://localhost:8080` | | Variable | Description | Default | |----------|-------------|---------| | `OPEN_WEBUI_URL` | Open WebUI URL | `http://localhost:8080` | | `OLLAMA_BASE_URL` | Ollama API URL | `http://host.docker.internal:11434` | | `ENABLE_CHANNELS` | Enable Channels feature | `true` | | `ADMIN_USER_EMAIL` | Admin email | - | | `ADMIN_USER_PASSWORD` | Admin password | - | | `MAX_CONCURRENT_USERS` | Max concurrent users | `50` | | `USER_STEP_SIZE` | User increment step | `10` | | `SUSTAIN_TIME_SECONDS` | Test duration per level | `30` | | `MESSAGE_FREQUENCY` | Messages/sec per user | `0.5` | | `OPEN_WEBUI_PORT` | Container port | `8080` | | `CPU_LIMIT` | CPU limit | `2.0` | | `MEMORY_LIMIT` | Memory limit | `8g` | 1. Create a new file in `benchmark/scenarios/`: ```python from benchmark.core.base import BaseBenchmark from benchmark.core.metrics import BenchmarkResult class MyNewBenchmark(BaseBenchmark): name = "My New Benchmark" description = "Tests something new" version = "1.0.0" async def setup(self) -> None: # Set up test environment pass async def run(self) -> BenchmarkResult: # Execute the benchmark # Use self.metrics to record timings return self.metrics.get_result(self.name) async def teardown(self) -> None: # Clean up pass ``` 2. Register the benchmark in `benchmark/cli.py` 3. Add configuration options if needed in `config/benchmark_config.yaml` ### Custom Metrics Collection ```python from benchmark.core.metrics import MetricsCollector metrics = MetricsCollector() metrics.start() # Time individual operations with metrics.time_operation("my_operation"): await do_something() # Or record manually metrics.record_timing( operation="api_call", duration_ms=150.5, success=True, ) metrics.stop() result = metrics.get_result("My Benchmark") ``` ## Understanding Results ### Key Metrics | Metric | Description | Good Threshold | |--------|-------------|----------------| | `avg_response_time_ms` | Average response time | < 2000ms | | `p95_response_time_ms` | 95th percentile response time | < 3000ms | | `error_rate_percent` | Percentage of failed requests | < 1% | | `requests_per_second` | Throughput | > 10 | ### Result Files - `*.json` - Detailed results for each benchmark run - `benchmark_results_*.csv` - Combined results in CSV format - `summary_*.txt` - Human-readable summary ### Interpreting Chat UI Benchmark Results The chat-ui benchmark in auto-scale mode reports: - **max_sustainable_users**: Maximum users where P95 stays under threshold - **levels_tested**: Performance data at each user count level - **% of Threshold**: How close P95 is to the configured limit Example auto-scale result: ``` Auto-Scale Results ┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━┓ ┃ Users ┃ P95 (ms) ┃ Avg (ms) ┃ % of Threshold ┃ Errors ┃ ┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━┩ │ 10 │ 731 │ 662 │ 37% │ 0.0% │ │ 30 │ 881 │ 748 │ 44% │ 0.0% │ │ 50 │ 1178 │ 1064 │ 59% │ 0.0% │ │ 70 │ 2133 │ 1854 │ 107% │ 0.8% │ └───────┴──────────┴──────────┴────────────────┴────────┘ P95 Threshold: 2000ms Maximum Sustainable Users: 50 ``` ### Interpreting Channel Benchmark Results The channel benchmark reports: - **max_sustainable_users**: Maximum users where performance thresholds are met - **results_by_level**: Performance at each user count level - **tested_levels**: All user counts that were tested Example result analysis: ``` Users: 10 | P95: 150ms | Errors: 0% | ✓ PASS Users: 20 | P95: 280ms | Errors: 0.1% | ✓ PASS Users: 30 | P95: 520ms | Errors: 0.3% | ✓ PASS Users: 40 | P95: 1200ms | Errors: 0.8% | ✓ PASS Users: 50 | P95: 3500ms | Errors: 2.1% | ✗ FAIL Maximum sustainable users: 40 ``` ## Architecture ``` benchmark/ ├── benchmark/ │ ├── core/ # Core framework │ │ ├── base.py # Base benchmark class │ │ ├── config.py # Configuration management │ │ ├── metrics.py # Metrics collection │ │ └── runner.py # Benchmark orchestration │ ├── clients/ # API clients │ │ ├── http_client.py # HTTP/REST client │ │ ├── websocket_client.py # WebSocket client │ │ └── browser_client.py # Playwright browser automation │ ├── scenarios/ # Benchmark implementations │ │ ├── channels.py # Channel benchmarks │ │ └── chat_ui.py # Browser-based chat benchmark │ ├── utils/ # Utilities │ │ └── docker.py # Docker management │ └── cli.py # Command-line interface ├── config/ # Configuration files ├── docker/ # Docker Compose for benchmarking └── results/ # Benchmark output organized by {benchmark}/{timestamp}/ ``` ## Dependencies The benchmark suite reuses Open WebUI dependencies where possible: **From Open WebUI:** - `httpx` - HTTP client - `aiohttp` - Async HTTP - `python-socketio` - WebSocket client - `pydantic` - Data validation - `pandas` - Data analysis **Benchmark-specific:** - `playwright` - Browser automation for UI testing - `locust` - Load testing (optional, for advanced scenarios) - `rich` - Terminal output - `docker` - Docker SDK - `matplotlib` - Plotting results ## Troubleshooting ### Common Issues 1. **Connection refused**: Ensure Open WebUI is running and accessible 2. **Authentication errors**: Check admin credentials in config 3. **Docker resource errors**: Ensure Docker has enough resources allocated 4. **WebSocket timeout**: Increase `websocket_timeout` in config 5. **Browser launch failures**: Run `playwright install chromium` to install browsers 6. **Login timeout in browser tests**: Check that `.env` has correct `ADMIN_USER_NAME` (with quotes if it contains spaces) 7. **High browser concurrency fails**: Use `--headless` mode and ensure sufficient system resources ### Debug Mode Set logging level to DEBUG: ```bash export BENCHMARK_LOG_LEVEL=DEBUG owb run channels ``` ## Contributing When adding new benchmarks: 1. Follow the `BaseBenchmark` interface 2. Add tests for the new benchmark 3. Update configuration schema if needed 4. Add documentation to this README ## License MIT License - See LICENSE file