litellm

mirror of https://github.com/onyx-dot-app/litellm.git synced 2026-06-30 20:47:56 -04:00

Author	SHA1	Message	Date
YutaSaito	39641e7e68	chore: rename GraySwan to Gray Swan (#15771 )	2025-10-21 15:18:55 -07:00
Vinod Singh	d4aadda692	Auth Header Fix for MCP Tool Call (#15736 ) * fixed the Auth header for MCP Tool Call * Final fix for Auth header * testcase for mcp_auth_header_extraction, insensitive_alias_matching, insensitive_servername_matching added	2025-10-21 13:58:03 -07:00
Krrish Dholakia	1e0368521e	refactor: cleanup	2025-10-21 13:46:19 -07:00
Ishaan Jaffer	3741c43396	docs fix	2025-10-21 13:21:59 -07:00
Ishaan Jaff	8ad9bbbd02	[Docs] Add Azure AI - OCR to docs (#15768 ) * add Azure OCR to docs * docs fix * docs fix * docs fix * docs OCR	2025-10-21 13:10:45 -07:00
Ishaan Jaffer	185182bebc	Revert "add Azure OCR to docs" This reverts commit `a3699e28a4`.	2025-10-21 13:02:31 -07:00
Ishaan Jaffer	a3699e28a4	add Azure OCR to docs	2025-10-21 13:02:21 -07:00
Ishaan Jaffer	6605aba307	docs grayswan	2025-10-21 11:16:25 -07:00
YutaSaito	d79bdd491f	feat: add GraySwan Guardrails support (#15756 )	2025-10-21 11:13:50 -07:00
Talal	46d55bd92a	fix: Add response_type + PKCE parameters to OAuth authorization endpoint (#15720 ) * fix: Add response_type parameter to OAuth authorization endpoint Fixes #15684 OAuth providers like Google require the response_type parameter during the authorization flow. This commit adds response_type=code to the authorization redirect parameters, which is required by the OAuth 2.0 specification (RFC 6749 Section 4.1.1). Changes: - Added response_type=code to authorization params in discoverable_endpoints.py - Added test coverage for the response_type parameter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix oauth flow by forwarding code_challenge and forwarding code_verifier --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-21 09:43:19 -07:00
Tom Haynes	98f1d63508	use correct otel logger, and normalise otel paths (#15645 )	2025-10-21 09:16:03 -07:00
Ishaan Jaffer	8b522d88a2	is_llm_api_route	2025-10-20 18:05:35 -07:00
Ishaan Jaff	92335d991c	[Feat] Add Azure AVA (Speech AI) Cost Tracking (#15754 ) * add azure/speech/ cost tracking * test_azure_ava_tts_async * add azure/speech to model cost map * docs cost tracking * docs tts AVA * add azure/speech/azure-tts	2025-10-20 18:01:51 -07:00
Ishaan Jaffer	60fab591db	rename test files	2025-10-20 18:00:17 -07:00
Ishaan Jaff	157739da01	[Bug]: Fix Incorrect status value in responses api with gemini (#15753 ) * _map_chat_completion_finish_reason_to_responses_status * test_transform_chat_completion_response_with_reasoning_content * test_transform_chat_completion_response_output_item_status	2025-10-20 17:58:56 -07:00
Ishaan Jaffer	5ce2be732e	get_provider_text_to_speech_config	2025-10-20 17:10:09 -07:00
Ishaan Jaffer	9a25eeccb2	docs fix	2025-10-20 17:02:38 -07:00
Ishaan Jaffer	c9152003bd	bump V	2025-10-20 16:55:03 -07:00
Ishaan Jaff	73a23a6c78	[Feat] Add Azure AVA TTS integration (#15749 ) * add AzureBaseIssueTokenHandler * add BaseTextToSpeechConfig * async_text_to_speech_handler * add AzureAVATextToSpeechConfig * add get_provider_text_to_speech_config * add AzureAVATextToSpeechConfig * fixes for base_llm_http_handler * fix transform_text_to_speech_request * test_azure_ava_tts_async * test_azure_ava_tts_async * fix TextToSpeechRequestData * fix transform_text_to_speech_request * add text_to_speech_handler in LLMHttpHandler * remove old file * fix transform_text_to_speech_request * fix dispatch_text_to_speech * fix azure TTS * fix AVA TTS * fix transform * fix linting * ci/cd - use one job for audio testing * fix tests * fix llm http handler debugging * unit tests azure tts * docs Azure speech * docs fix * docs azure AVA * docs azure AVA * fix handlers * test_async_realtime_uses_max_size_parameter	2025-10-20 16:52:23 -07:00
akraines	41a6ecd5b6	Change max_tokens value to match max_output_tokens for claude sonnet 4.5: 64000 (#15715 ) See https://github.com/RooCodeInc/Roo-Code/issues/8454	2025-10-20 16:11:36 -07:00
Ishaan Jaff	0c25b1a256	[Fix] OpenAI Realtime API integration fails due to websockets.exceptions.PayloadTooBig error (#15751 ) * fix REALTIME_WEBSOCKET_MAX_MESSAGE_SIZE_BYTES * edit max_size for websockets * fix AzureOpenAIRealtime	2025-10-20 15:54:14 -07:00
Sameer Kankute	1fb798f81d	(Bug) Fix JSON serialization error in Helicone logging by removing OpenTelemetry span from metadata (#15728 ) * remove span object from helicon metadata * Add test	2025-10-20 08:53:22 -07:00
Sameer Kankute	3955a3de5d	fix the wrong request body in json mode doc (#15729 )	2025-10-20 08:44:14 -07:00
Timothée Lecomte	3ef9b2015a	feat: read from custom-llm-provider header (#15528 )	2025-10-18 22:04:53 -07:00
jlan-nl	4a74190c12	Fix: Add gpt 4.1 pricing for response endpoint (#15593 ) * Add gpt41, gpt-41-mini, and gpt-41-nano to pricing and context window json * Add gpt-41s to azure_llms dict * Undo json changes --------- Co-authored-by: IQHL (Hans Jacob Landelius) <iqhl@novnordisk.com>	2025-10-18 22:04:14 -07:00
Lucas Sugi	ae86862e74	fix: Add function responsible to call precall (#15636 ) * fix: Add function responsible to call precall * fix: Set correct route_type	2025-10-18 22:01:14 -07:00
Lucas Sugi	ce9e22688d	fix: Add pre and post call for list batches (#15673 )	2025-10-18 21:52:35 -07:00
Ishaan Jaff	f55745fc5e	[Fix] Forward anthropic-beta headers to Bedrock, VertexAI (#15700 ) * [Fix] Forward anthropic-beta headers to Bedrock and other cross-provider scenarios (#15623) * add_provider_specific_headers_to_request * fix add_provider_specific_headers_to_request * test_provider_specific_header_multi_provider * test_provider_specific_header_in_request --------- Co-authored-by: Jack Venberg <jack.venberg@rover.com>	2025-10-18 16:26:32 -07:00
Alexsander Hamir	441aed2c87	fix: update worker recommendation (#15702 )	2025-10-18 16:24:32 -07:00
Ishaan Jaffer	3fc49a029f	docs fix	2025-10-18 15:28:49 -07:00
Ishaan Jaffer	eed1ddba49	docs v1.78.5-stable	2025-10-18 15:28:07 -07:00
Ishaan Jaff	6ab9b0af9f	[Fix] Anthropic cache_control incorrectly applied to all content items instead of last item only (#15699 ) * fix: _safe_insert_cache_control_in_message * test_anthropic_cache_control_hook_system_message * docs prompt cache injection * docs fix	2025-10-18 15:18:08 -07:00
Jason Roberts	c471bf1f16	feat(guardrails): Add content masking and streaming support to PANW Prisma AIRS guardrail (#15666 ) * feat(guardrails): Add content masking and streaming support to PANW Prisma AIRS - Add mask_request_content and mask_response_content parameters - Implement content masking for prompts and responses - Add streaming support with real-time masking - Add comprehensive test coverage (28 tests) - Update documentation with masking examples and security notes * fix(guardrails): Fix PANW Prisma AIRS env var fallback and text completion support	2025-10-18 13:57:51 -07:00
YutaSaito	645f84c02e	fix: add imagePullSecrets to migrations-job (#15681 )	2025-10-18 13:56:31 -07:00
katsuhiro muto	d5e686b3e8	[Fix] Support service_tier in chat completion (#15693 ) * Support service_tier * fix test	2025-10-18 13:55:54 -07:00
Krish Dholakia	c1355e92dc	fix(proxy_server.py): re-encrypt env var on config save + use original value on decrypt error (#15671 ) * fix(proxy_server.py): re-encrypt env var on config save + use original value on decrypt error Closes https://github.com/BerriAI/litellm/issues/14854 Fixes https://github.com/BerriAI/litellm/issues/13406 * docs: email.md document PROXY_BASE_URL param * fix(proxy_server.py): pop model list before writing to db	2025-10-18 13:39:25 -07:00
Ishaan Jaffer	a91e3f1873	docs fix	2025-10-18 13:36:48 -07:00
Ishaan Jaff	2ec7ed2990	[Docs] v1.78.5 notes (#15698 ) * stash changes * docs fix * docs fix	2025-10-18 13:35:42 -07:00
Ishaan Jaffer	c9875bfd52	bump: version 1.78.4 → 1.78.5	2025-10-18 13:31:39 -07:00
Ishaan Jaffer	f35a286f64	fix update_team	2025-10-18 13:23:51 -07:00
Ishaan Jaff	f92ddb1c05	fix: Successfully added rout (#15697 )	2025-10-18 13:20:04 -07:00
Krish Dholakia	4e141df03a	(feat) Team level model-specific tpm/rpm limits + working key-level validation of tpm/rpm limit when assigned to team (#15513 ) * fix(support-model-specific-tpm/rpm-limits): Allows setting rate limits by tpm/rpm for models by team * fix(key_management_endpoints.py): enforce guaranteed throughput with key-level model tpm/rpm limits, when team-level tpm/rpm limits are set * test: add unit testing * fix: fix minor linting errors * fix: refactor	2025-10-18 13:14:04 -07:00
Ishaan Jaffer	46d754a0f9	fix workflow	2025-10-18 11:14:18 -07:00
Ishaan Jaff	b1b96ff3cf	[Perf] Alexsander fixes round 2 - Oct 18th (#15695 ) * perf(router): Optimize prompt management model check with early exit Add early return for models without '/' to avoid expensive get_model_list() calls for 99% of standard model requests (gpt-4, claude-3, etc). - Refactor _is_prompt_management_model() with "/" check before model lookup - Add unit tests to verify optimization doesn't break detection * perf(caching): optimize Redis batch cache operations and reduce unnecessary queries This commit introduces several performance optimizations to the Redis caching layer: DualCache Improvements (dual_cache.py): 1. Increase batch cache size limit from 100 to 1000 - Allows for larger batch operations, reducing Redis round-trips 2. Throttle repeated Redis queries for cache misses - Update last_redis_batch_access_time for ALL queried keys, including those with None values - Prevents excessive Redis queries for frequently-accessed non-existent keys 3. Add early exit optimization - Short-circuit when redis_result is None or contains only None values - Avoids unnecessary processing when no cache hits are found 4. Optimize key lookup performance - Replace O(n) keys.index() calls with O(1) dict lookup via key_to_index mapping - Reduces algorithmic complexity in batch operations 5. Streamline cache updates - Combine result updates and in-memory cache updates in single loop - Only cache non-None values to avoid polluting in-memory cache CooldownCache Improvements (cooldown_cache.py): 1. Enhanced early return logic - Check if all values in results are None, not just if results is None - Prevents unnecessary iteration when no valid cooldown data exists These changes significantly improve Redis caching performance, especially for: - High-throughput batch operations - Scenarios with frequent cache misses - Large-scale deployments with many concurrent requests * fix: remove unnecessary test * refactor: move default_max_redis_batch_cache_size to constants - Add DEFAULT_MAX_REDIS_BATCH_CACHE_SIZE constant (default: 1000) - Update DualCache to use constant from constants.py - Document new environment variable in config_settings.md * fix: only use in memory cache when set * fix(router): improve prompt management model detection with smart early return The previous early return optimization in _is_prompt_management_model() was checking if the model name parameter contained '/' and returning False if it didn't. This broke detection for model aliases (e.g., 'chatbot_actions') that don't have '/' in their name but map to prompt management models (e.g., 'langfuse/openai-gpt-3.5-turbo'). Changed the early return logic to only exit early when: - Model name contains '/' AND - The prefix is NOT a known prompt management provider This maintains the performance optimization for 99% of direct model calls (avoiding expensive get_model_list lookups) while correctly handling: - Direct prompt management calls (e.g., 'langfuse/model') - Model aliases without '/' (e.g., 'chatbot_actions') - Regular models with/without '/' (e.g., 'gpt-3.5-turbo', 'openai/gpt-4') Fixes test: test_router_prompt_management_factory * perf(router): optimize _pre_call_checks with shallow copy (1400x faster) Replace deepcopy with list() in _pre_call_checks - runs on every request. Only pops from list, never modifies deployment dicts, so shallow copy is safe. Performance: 1400x faster on hot path Impact: 2-5x overall throughput improvement for routing workloads Tests: Added regression test to ensure no mutation + filtering works * perf(router): replace deepcopy with shallow copy for default deployment Replace expensive copy.deepcopy() with shallow copy for default_deployment in _common_checks_available_deployment() hot path. Changes: - Use dict.copy() for top-level deployment dict - Use dict.copy() for nested litellm_params dict - Only the 'model' field is modified, so deep recursion is unnecessary Impact: - 100x+ faster for default deployment path (every request when used) - deepcopy recursively traverses entire object tree - Shallow copy only copies two dict levels (exactly what's needed) Test coverage: - Added regression test to verify deployment isolation - Ensures returned deployments don't mutate original default_deployment - Validates multiple concurrent requests get independent copies * perf(router): remove unnecessary dict copy in completion hot paths Remove unnecessary deployment['litellm_params'].copy() in _completion and _acompletion functions. The dict is only read and spread into a new dict, never modified, making the defensive copy wasteful. Changes: - Remove .copy() in _completion (sync hot path) - Remove .copy() in _acompletion (async hot path) Impact: - Every completion request (highest traffic endpoints) - Eliminates unnecessary dict allocation and copy on every call - Dict spreading already creates new dict, so no mutation possible Test coverage: - Added tests verifying deployment params unchanged after calls - Tests both sync and async completion paths - Validates optimization doesn't introduce mutations * perf(router): optimize deployment filtering in pre-call checks Replace O(n²) list pop pattern with O(n) set-based filtering in _pre_call_checks() to improve routing performance under high load. Changes: - Use set() instead of list for invalid_model_indices tracking - Replace reversed list.pop() loop with single-pass list comprehension - Eliminate redundant list→set conversion overhead Impact: - Hot path optimization: runs on every request through the router - ~2-5x faster filtering when many deployments fail validation - Most beneficial with 50+ deployments per model group or high invalidation rates (rate limits, context window exceeded) Technical details: Old: O(k²) where k = invalid deployments (pop shifts remaining elements) New: O(n) single pass with O(1) set membership checks * add: memory profiler feat(proxy): Add configurable GC thresholds and enhance memory debugging endpoints - Add PYTHON_GC_THRESHOLD env var to configure garbage collection thresholds - Add POST /debug/memory/gc/configure endpoint for runtime GC tuning - Enhance memory debugging endpoints with better structure and explanations - Add comprehensive router and cache memory tracking - Include worker PID in all debug responses for multi-worker debugging * refactor: reduce complexity in get_memory_details endpoint Extract 6 helper functions from get_memory_details to fix linter error PLR0915 (too many statements). Improves maintainability while preserving functionality. * fix(router): remove incorrect early exit in _is_prompt_management_model Removes early exit optimization that checked model_name prefix instead of the actual litellm_params model. This incorrectly returned False for custom model aliases that map to prompt management providers. Example: "my-langfuse-prompt/test_id" -> "langfuse_prompt/actual_id" The method now correctly checks the underlying model's prefix. Fixes test_is_prompt_management_model_optimization * fix(proxy): add explicit type annotations to debug_utils dictionaries Resolved 6 mypy type errors in proxy/common_utils/debug_utils.py by adding explicit Dict[str, Any] annotations to dictionary variables where mypy was incorrectly inferring narrow types. This allows the dictionaries to accept different value types (strings, nested dicts) for error handling and various return structures. Fixed: - Line 246: caches dictionary in get_memory_summary() - Line 371: cache_stats dictionary in _get_cache_memory_stats() - Line 439: litellm_router_memory dictionary in _get_router_memory_stats() * fix(proxy): fix Python 3.8 compatibility in debug_utils type annotations - Replace tuple[...], list[...] with Tuple[...], List[...] from typing - Replace Dict \| None with Optional[Dict] for Python 3.8 compatibility - Add missing imports: List, Optional, Tuple to typing imports Fixes TypeError: 'type' object is not subscriptable in Python 3.8 --------- Co-authored-by: AlexsanderHamir <alexsanderhamirgomesbaptista@gmail.com>	2025-10-18 11:12:00 -07:00
Krrish Dholakia	68d4f69a17	build(ui/): new ui build	2025-10-18 10:57:20 -07:00
Krish Dholakia	302f55c7db	Bedrock + MCP - working MCP calls to bedrock via Responses API + Log hidden params for OTEL calls (#15677 ) * fix: minor fixes to mcp streaming with bedrock * fix(bedrock/): working bedrock with mcp tools handle empty description * test: add unit test * test: test fixes * fix(vector_store_registry.py): load vector store with litellm params from config.yaml fixes minor issue where litellm params weren't being loaded in from config.yaml * docs(knowledgebase.md): document azure vector store current limitation * fix(opentelemetry.py): add hidden params to otel logs Fixes LIT-1274 * fix: fix test	2025-10-18 10:39:28 -07:00
Krrish Dholakia	bbb7b75672	docs: sidebars.js add vertex ai live api to docs	2025-10-17 23:55:44 -07:00
yuneng-jiang	8d1d400d32	litellm_Key Settings Max Budget Removal Error Fix (#15669 ) * Key Settings Max Budget Removal Fix * Add responses mode to health check * test fix * Key Settings Max Budget Removal Error Fix --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>	2025-10-17 19:41:29 -07:00
Nagailic Sergiu (Nikro)	6842d705d5	fix(token-counter): extract model_info from deployment for custom_tokenizer (#15657 ) (#15680 )	2025-10-17 19:38:45 -07:00
Teddy Amkie	21c3720732	docs: improve budget clarity (#15682 ) * docs: improve budget documentation with clear setup options * Update user budget documentation Clarify budget application rules for team keys. --------- Co-authored-by: berri-teddy <teddy@berri.ai>	2025-10-17 19:38:04 -07:00

1 2 3 4 5 ...

26683 Commits