4538 Commits

Author SHA1 Message Date
YutaSaito 39641e7e68 chore: rename GraySwan to Gray Swan (#15771) 2025-10-21 15:18:55 -07:00
Krrish Dholakia 1e0368521e refactor: cleanup 2025-10-21 13:46:19 -07:00
Ishaan Jaffer 3741c43396 docs fix 2025-10-21 13:21:59 -07:00
Ishaan Jaff 8ad9bbbd02 [Docs] Add Azure AI - OCR to docs (#15768)
* add Azure OCR to docs

* docs fix

* docs fix

* docs fix

* docs OCR
2025-10-21 13:10:45 -07:00
Ishaan Jaffer 185182bebc Revert "add Azure OCR to docs"
This reverts commit a3699e28a4.
2025-10-21 13:02:31 -07:00
Ishaan Jaffer a3699e28a4 add Azure OCR to docs 2025-10-21 13:02:21 -07:00
Ishaan Jaffer 6605aba307 docs grayswan 2025-10-21 11:16:25 -07:00
YutaSaito d79bdd491f feat: add GraySwan Guardrails support (#15756) 2025-10-21 11:13:50 -07:00
Ishaan Jaff 92335d991c [Feat] Add Azure AVA (Speech AI) Cost Tracking (#15754)
* add azure/speech/ cost tracking

* test_azure_ava_tts_async

* add azure/speech to model cost map

* docs cost tracking

* docs tts AVA

* add azure/speech/azure-tts
2025-10-20 18:01:51 -07:00
Ishaan Jaffer 9a25eeccb2 docs fix 2025-10-20 17:02:38 -07:00
Ishaan Jaff 73a23a6c78 [Feat] Add Azure AVA TTS integration (#15749)
* add AzureBaseIssueTokenHandler

* add BaseTextToSpeechConfig

* async_text_to_speech_handler

* add AzureAVATextToSpeechConfig

* add get_provider_text_to_speech_config

* add AzureAVATextToSpeechConfig

* fixes for base_llm_http_handler

* fix transform_text_to_speech_request

* test_azure_ava_tts_async

* test_azure_ava_tts_async

* fix TextToSpeechRequestData

* fix transform_text_to_speech_request

* add text_to_speech_handler in LLMHttpHandler

* remove old file

* fix transform_text_to_speech_request

* fix dispatch_text_to_speech

* fix azure TTS

* fix AVA TTS

* fix transform

* fix linting

* ci/cd - use one job for audio testing

* fix tests

* fix llm http handler debugging

* unit tests azure tts

* docs Azure speech

* docs fix

* docs azure AVA

* docs azure AVA

* fix handlers

* test_async_realtime_uses_max_size_parameter
2025-10-20 16:52:23 -07:00
Sameer Kankute 3955a3de5d fix the wrong request body in json mode doc (#15729) 2025-10-20 08:44:14 -07:00
Alexsander Hamir 441aed2c87 fix: update worker recommendation (#15702) 2025-10-18 16:24:32 -07:00
Ishaan Jaffer 3fc49a029f docs fix 2025-10-18 15:28:49 -07:00
Ishaan Jaffer eed1ddba49 docs v1.78.5-stable 2025-10-18 15:28:07 -07:00
Ishaan Jaff 6ab9b0af9f [Fix] Anthropic cache_control incorrectly applied to all content items instead of last item only (#15699)
* fix: _safe_insert_cache_control_in_message

* test_anthropic_cache_control_hook_system_message

* docs prompt cache injection

* docs fix
2025-10-18 15:18:08 -07:00
Jason Roberts c471bf1f16 feat(guardrails): Add content masking and streaming support to PANW Prisma AIRS guardrail (#15666)
* feat(guardrails): Add content masking and streaming support to PANW Prisma AIRS

- Add mask_request_content and mask_response_content parameters
- Implement content masking for prompts and responses
- Add streaming support with real-time masking
- Add comprehensive test coverage (28 tests)
- Update documentation with masking examples and security notes

* fix(guardrails): Fix PANW Prisma AIRS env var fallback and text completion support
2025-10-18 13:57:51 -07:00
Krish Dholakia c1355e92dc fix(proxy_server.py): re-encrypt env var on config save + use original value on decrypt error (#15671)
* fix(proxy_server.py): re-encrypt env var on config save + use original value on decrypt error

Closes https://github.com/BerriAI/litellm/issues/14854

Fixes https://github.com/BerriAI/litellm/issues/13406

* docs: email.md

document PROXY_BASE_URL param

* fix(proxy_server.py): pop model list before writing to db
2025-10-18 13:39:25 -07:00
Ishaan Jaffer a91e3f1873 docs fix 2025-10-18 13:36:48 -07:00
Ishaan Jaff 2ec7ed2990 [Docs] v1.78.5 notes (#15698)
* stash changes

* docs fix

* docs fix
2025-10-18 13:35:42 -07:00
Ishaan Jaff b1b96ff3cf [Perf] Alexsander fixes round 2 - Oct 18th (#15695)
* perf(router): Optimize prompt management model check with early exit

Add early return for models without '/' to avoid expensive get_model_list()
calls for 99% of standard model requests (gpt-4, claude-3, etc).

- Refactor _is_prompt_management_model() with "/" check before model lookup
- Add unit tests to verify optimization doesn't break detection

* perf(caching): optimize Redis batch cache operations and reduce unnecessary queries

This commit introduces several performance optimizations to the Redis caching layer:

**DualCache Improvements (dual_cache.py):**

1. Increase batch cache size limit from 100 to 1000
   - Allows for larger batch operations, reducing Redis round-trips

2. Throttle repeated Redis queries for cache misses
   - Update last_redis_batch_access_time for ALL queried keys, including those
     with None values
   - Prevents excessive Redis queries for frequently-accessed non-existent keys

3. Add early exit optimization
   - Short-circuit when redis_result is None or contains only None values
   - Avoids unnecessary processing when no cache hits are found

4. Optimize key lookup performance
   - Replace O(n) keys.index() calls with O(1) dict lookup via key_to_index mapping
   - Reduces algorithmic complexity in batch operations

5. Streamline cache updates
   - Combine result updates and in-memory cache updates in single loop
   - Only cache non-None values to avoid polluting in-memory cache

**CooldownCache Improvements (cooldown_cache.py):**

1. Enhanced early return logic
   - Check if all values in results are None, not just if results is None
   - Prevents unnecessary iteration when no valid cooldown data exists

These changes significantly improve Redis caching performance, especially for:
- High-throughput batch operations
- Scenarios with frequent cache misses
- Large-scale deployments with many concurrent requests

* fix: remove unnecessary test

* refactor: move default_max_redis_batch_cache_size to constants

- Add DEFAULT_MAX_REDIS_BATCH_CACHE_SIZE constant (default: 1000)
- Update DualCache to use constant from constants.py
- Document new environment variable in config_settings.md

* fix: only use in memory cache when set

* fix(router): improve prompt management model detection with smart early return

The previous early return optimization in _is_prompt_management_model() was
checking if the model name parameter contained '/' and returning False if it
didn't. This broke detection for model aliases (e.g., 'chatbot_actions') that
don't have '/' in their name but map to prompt management models
(e.g., 'langfuse/openai-gpt-3.5-turbo').

Changed the early return logic to only exit early when:
- Model name contains '/' AND
- The prefix is NOT a known prompt management provider

This maintains the performance optimization for 99% of direct model calls
(avoiding expensive get_model_list lookups) while correctly handling:
- Direct prompt management calls (e.g., 'langfuse/model')
- Model aliases without '/' (e.g., 'chatbot_actions')
- Regular models with/without '/' (e.g., 'gpt-3.5-turbo', 'openai/gpt-4')

Fixes test: test_router_prompt_management_factory

* perf(router): optimize _pre_call_checks with shallow copy (1400x faster)

Replace deepcopy with list() in _pre_call_checks - runs on every request.
Only pops from list, never modifies deployment dicts, so shallow copy is safe.

Performance: 1400x faster on hot path
Impact: 2-5x overall throughput improvement for routing workloads
Tests: Added regression test to ensure no mutation + filtering works

* perf(router): replace deepcopy with shallow copy for default deployment

Replace expensive copy.deepcopy() with shallow copy for default_deployment
in _common_checks_available_deployment() hot path.

Changes:
- Use dict.copy() for top-level deployment dict
- Use dict.copy() for nested litellm_params dict
- Only the 'model' field is modified, so deep recursion is unnecessary

Impact:
- 100x+ faster for default deployment path (every request when used)
- deepcopy recursively traverses entire object tree
- Shallow copy only copies two dict levels (exactly what's needed)

Test coverage:
- Added regression test to verify deployment isolation
- Ensures returned deployments don't mutate original default_deployment
- Validates multiple concurrent requests get independent copies

* perf(router): remove unnecessary dict copy in completion hot paths

Remove unnecessary deployment['litellm_params'].copy() in _completion
and _acompletion functions. The dict is only read and spread into a new
dict, never modified, making the defensive copy wasteful.

Changes:
- Remove .copy() in _completion (sync hot path)
- Remove .copy() in _acompletion (async hot path)

Impact:
- Every completion request (highest traffic endpoints)
- Eliminates unnecessary dict allocation and copy on every call
- Dict spreading already creates new dict, so no mutation possible

Test coverage:
- Added tests verifying deployment params unchanged after calls
- Tests both sync and async completion paths
- Validates optimization doesn't introduce mutations

* perf(router): optimize deployment filtering in pre-call checks

Replace O(n²) list pop pattern with O(n) set-based filtering in
_pre_call_checks() to improve routing performance under high load.

Changes:
- Use set() instead of list for invalid_model_indices tracking
- Replace reversed list.pop() loop with single-pass list comprehension
- Eliminate redundant list→set conversion overhead

Impact:
- Hot path optimization: runs on every request through the router
- ~2-5x faster filtering when many deployments fail validation
- Most beneficial with 50+ deployments per model group or high
  invalidation rates (rate limits, context window exceeded)

Technical details:
Old: O(k²) where k = invalid deployments (pop shifts remaining elements)
New: O(n) single pass with O(1) set membership checks

* add: memory profiler

feat(proxy): Add configurable GC thresholds and enhance memory debugging endpoints

- Add PYTHON_GC_THRESHOLD env var to configure garbage collection thresholds
- Add POST /debug/memory/gc/configure endpoint for runtime GC tuning
- Enhance memory debugging endpoints with better structure and explanations
- Add comprehensive router and cache memory tracking
- Include worker PID in all debug responses for multi-worker debugging

* refactor: reduce complexity in get_memory_details endpoint

Extract 6 helper functions from get_memory_details to fix linter
error PLR0915 (too many statements). Improves maintainability
while preserving functionality.

* fix(router): remove incorrect early exit in _is_prompt_management_model

Removes early exit optimization that checked model_name prefix instead
of the actual litellm_params model. This incorrectly returned False for
custom model aliases that map to prompt management providers.

Example: "my-langfuse-prompt/test_id" -> "langfuse_prompt/actual_id"

The method now correctly checks the underlying model's prefix.

Fixes test_is_prompt_management_model_optimization

* fix(proxy): add explicit type annotations to debug_utils dictionaries

Resolved 6 mypy type errors in proxy/common_utils/debug_utils.py by adding
explicit Dict[str, Any] annotations to dictionary variables where mypy was
incorrectly inferring narrow types. This allows the dictionaries to accept
different value types (strings, nested dicts) for error handling and various
return structures.

Fixed:
- Line 246: caches dictionary in get_memory_summary()
- Line 371: cache_stats dictionary in _get_cache_memory_stats()
- Line 439: litellm_router_memory dictionary in _get_router_memory_stats()

* fix(proxy): fix Python 3.8 compatibility in debug_utils type annotations

- Replace tuple[...], list[...] with Tuple[...], List[...] from typing
- Replace Dict | None with Optional[Dict] for Python 3.8 compatibility
- Add missing imports: List, Optional, Tuple to typing imports

Fixes TypeError: 'type' object is not subscriptable in Python 3.8

---------

Co-authored-by: AlexsanderHamir <alexsanderhamirgomesbaptista@gmail.com>
2025-10-18 11:12:00 -07:00
Krish Dholakia 302f55c7db Bedrock + MCP - working MCP calls to bedrock via Responses API + Log hidden params for OTEL calls (#15677)
* fix: minor fixes to mcp streaming with bedrock

* fix(bedrock/): working bedrock with mcp tools

handle empty description

* test: add unit test

* test: test fixes

* fix(vector_store_registry.py): load vector store with litellm params from config.yaml

fixes minor issue where litellm params weren't being loaded in from config.yaml

* docs(knowledgebase.md): document azure vector store current limitation

* fix(opentelemetry.py): add hidden params to otel logs

Fixes LIT-1274

* fix: fix test
2025-10-18 10:39:28 -07:00
Krrish Dholakia bbb7b75672 docs: sidebars.js
add vertex ai live api to docs
2025-10-17 23:55:44 -07:00
Teddy Amkie 21c3720732 docs: improve budget clarity (#15682)
* docs: improve budget documentation with clear setup options

* Update user budget documentation

Clarify budget application rules for team keys.

---------

Co-authored-by: berri-teddy <teddy@berri.ai>
2025-10-17 19:38:04 -07:00
Ishaan Jaff cea318330e [Feat] Add Guardrails for /v1/messages and /v1/responses API (#15686)
* add get_guardrails_messages_for_call_type

* fix call type for /messages

* add anthropic endpoints

* fix bedrock guardrails

* fix config.yaml

* fix types

* fix async_pre_call_hook

* ruff fix

* fix guard

* fix test bedrock guardrail

* fix linting

* fix linting

* docs guardrails

* fix mypy linting
2025-10-17 18:09:00 -07:00
Ishaan Jaffer 6784326ec0 fix DYNAMIC_RATE_LIMIT_ERROR_THRESHOLD_PER_MINUTE 2025-10-17 18:07:12 -07:00
Ishaan Jaff 3852fc96c1 [Oct Staging Branch] (#15460)
* Implement fix for thinking_blocks and converse API calls

This fixes Claude's models via the Converse API, which should also fix
Claude Code.

* Add thinking literal

* Fix mypy issues

* Type fix for redacted thinking

* Add voyage model integration in sagemaker

* Add config file logic

* Use already exiting voyage transformation

* refactor code as per comments

* fix merge error

* refactor code as per comments

* refactor code as per comments

* UI new build

* [Fix] router - regression when adding/removing models  (#15451)

* fix(router): update model_name_to_deployment_indices on deployment removal

When a deployment is deleted, the model_name_to_deployment_indices map
was not being updated, causing stale index references. This could lead
to incorrect routing behavior when deployments with the same model_name
were dynamically removed.

Changes:
- Update _update_deployment_indices_after_removal to maintain
  model_name_to_deployment_indices mapping
- Remove deleted indices and decrement indices greater than removed index
- Clean up empty entries when no deployments remain for a model name
- Update test to verify proper index shifting and cleanup behavior

* fix(router): remove redundant index building during initialization

Remove duplicate index building operations that were causing unnecessary
work during router initialization:

1. Removed redundant `_build_model_id_to_deployment_index_map` call in
   __init__ - `set_model_list` already builds all indices from scratch

2. Removed redundant `_build_model_name_index` call at end of
   `set_model_list` - the index is already built incrementally via
   `_create_deployment` -> `_add_model_to_list_and_index_map`

Both indices (model_id_to_deployment_index_map and
model_name_to_deployment_indices) are properly maintained as lookup
indexes through existing helper methods. This change eliminates O(N)
duplicate work during initialization without any behavioral changes.

The indices continue to be correctly synchronized with model_list on
all operations (add/remove/upsert).

* fix(prometheus): Fix Prometheus metric collection in a multi-workers environment (#14929)

Co-authored-by: sotazhang <sotazhang@tencent.com>

* Add tiered pricing and cost calculation for xai

* Use generic cost calculator

* Resolve conflicts in generated HTML files

* Remove penalty params as supported params for gemini preview model (#15503)

* fix conversion of thinking block

* add application level encryption in SQS (#15512)

* docs: fix doc

* docs(index.md): bump rc

* [Fix] GEMINI - CLI -  add google_routes to llm_api_routes (#15500)

* fix: add google_routes to llm_api_routes

* test: test_virtual_key_llm_api_routes_allows_google_routes

* build: bump version

* bump: version 1.78.0 → 1.78.1

* add application level encryption in SQS

* add application level encryption in SQS

---------

Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: deepanshu <deepanshu.lulla@hq.bill.com>

* [Feat] Bedrock Knowledgebase - return search_response when using /chat/completions API with LiteLLM (#15509)

* docs: fix doc

* docs(index.md): bump rc

* [Fix] GEMINI - CLI -  add google_routes to llm_api_routes (#15500)

* fix: add google_routes to llm_api_routes

* test: test_virtual_key_llm_api_routes_allows_google_routes

* add AnthropicCitation

* fix async_post_call_success_deployment_hook

* fix add vector_store_custom_logger to global callbacks

* test_e2e_bedrock_knowledgebase_retrieval_with_llm_api_call

* async_post_call_success_deployment_hook

* add async_post_call_streaming_deployment_hook

* async def test_e2e_bedrock_knowledgebase_retrieval_with_llm_api_call_streaming(setup_vector_store_registry):

* fix _call_post_streaming_deployment_hook

* fix async_post_call_streaming_deployment_hook

* test update

* docs: Accessing Search Results

* docs KB

* fix chatUI

* fix searchResults

* fix onSearchResults

* fix kb

---------

Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>

* [Feat] Add dynamic rate limits on LiteLLM Gateway  (#15518)

* docs: fix doc

* docs(index.md): bump rc

* [Fix] GEMINI - CLI -  add google_routes to llm_api_routes (#15500)

* fix: add google_routes to llm_api_routes

* test: test_virtual_key_llm_api_routes_allows_google_routes

* build: bump version

* bump: version 1.78.0 → 1.78.1

* fix: KeyRequestBase

* fix rpm_limit_type

* fix dynamic rate limits

* fix use dynamic limits here

* fix _should_enforce_rate_limit

* fix _should_enforce_rate_limit

* fix counter

* test_dynamic_rate_limiting_v3

* use _create_rate_limit_descriptors

---------

Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>

* Add google rerank endpoint

* Add docs

* fix mypy error

* fix mypy and lint errors

* Add haiku 4.5 integration

* Add haiku 4.5 integration for other regions as well

* Handle citation field correctly

* Fix filtering headers for signature calcs

* Add haiku 4.5 integration (#15650)

---------

Co-authored-by: Leslie Cheng <leslie.cheng5@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com>
Co-authored-by: Lucas <10226902+LoadingZhang@users.noreply.github.com>
Co-authored-by: sotazhang <sotazhang@tencent.com>
Co-authored-by: Deepanshu Lulla <deepanshu.lulla@gmail.com>
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: deepanshu <deepanshu.lulla@hq.bill.com>
2025-10-17 17:52:25 -07:00
Alexsander Hamir eee28a7af8 fix: add missing context (#15688) 2025-10-17 17:39:21 -07:00
Ishaan Jaff a8be3ae412 [Feat] Add Cost Tracking for /ocr endpoints (#15678)
* mistral/mistral-ocr-latest

* fix: add _hidden_params to OCRResponses

* test: hidden params exists

* feat: add mistral/mistral-ocr-2505-completion

* fix test

* add ModelInfoBase fields

* fix get OCR cost

* check response cost from OCR

* add handling for OCR costs

* add mistral-document-ai-2505

* docs OCR

* ruff check fix
2025-10-17 15:54:10 -07:00
Krish Dholakia 3ff073c811 UI - add arize on ui, LLMs - clarifai refactor to openai compatible route, added azure ai/grok-4 model family
* added oauth mcp to docs

* added azure ai/grok-4 model family

* Revert "added oauth mcp to docs"

This reverts commit 950b7cef44f14b2db1429f6fbd32548a7c95d325.

* fix: arize ui integration

* need to remove a file

This reverts commit d6c877b73ac763464f204b77135f3786342373b7.

* fix: add arize from ui

* updated clarifai functions to openai compatible (#15615)

* fix: npm build errors

* Snowflake provider support: added embeddings, PAT, account_id (#15372)

* snowflake support PAT, account_id and embeddings

* format

* test embeddings

* format

* complete test

---------

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

* Revert "Snowflake provider support: added embeddings, PAT, account_id (#15372)" (#15632)

This reverts commit c6d58e5b4af8493c020fa519d72ec6ebc90c896b.

---------

Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
Co-authored-by: Mubashir Osmani <ilikewafflesomcuh@gmail.com>
Co-authored-by: mogith-pn <143642606+mogith-pn@users.noreply.github.com>
Co-authored-by: Andrey <elkin.andr@gmail.com>
2025-10-16 20:39:15 -07:00
Ishaan Jaffer ea69f4547d Merge branch 'main' into litellm_oct_staging2 2025-10-16 17:06:29 -07:00
Ishaan Jaff f69f7d101b Merge pull request #15618 from BerriAI/litellm_bedrock_invoke_support
[Feat] Allow calling /invoke, /converse routes through AI Gateway + models on config.yaml
2025-10-16 16:56:23 -07:00
Ishaan Jaffer 0bef295b15 doc fix 2025-10-16 16:36:24 -07:00
Ishaan Jaffer bbf9d6a57b docs boto3 instructions 2025-10-16 16:34:56 -07:00
Ishaan Jaffer 79516563a9 docs add /invoke and /converse routes 2025-10-16 16:25:27 -07:00
Ishaan Jaffer 8c5d6891d4 docs pt 2025-10-16 16:18:13 -07:00
Ishaan Jaffer fe5dd23a84 docs boto3 pass thoughs 2025-10-16 16:17:02 -07:00
Ishaan Jaffer c2b69e371c docs pass through 2025-10-16 16:13:08 -07:00
Ishaan Jaffer f87c86806c docs invoke 2025-10-16 16:13:01 -07:00
Ishaan Jaff f27f2d4803 Merge branch 'main' into litellm_sso_add_pkce 2025-10-16 15:48:01 -07:00
Ishaan Jaff f98f299854 Merge pull request #15617 from BerriAI/litellm_october_alexsander_stanging
[OCT] Alexsander PERF improvements
2025-10-16 15:10:02 -07:00
AlexsanderHamir 220fa3feeb Add missing env key 2025-10-16 14:42:27 -07:00
Ishaan Jaffer 1c8b7abddb GENERIC_CLIENT_USE_PKCE 2025-10-16 13:32:12 -07:00
Ishaan Jaffer 193fa19552 fix Okta PKCE 2025-10-16 13:26:14 -07:00
Krish Dholakia 3bf32e8e5c feature: update pillar security integration to support no persistence mode in litellm proxy
feature: update pillar security integration to support no persistence mode in litellm proxy
2025-10-16 12:13:37 -07:00
Ariel Fogel b075cf4a6c PLR-2400: support no persistence in litellm proxy 2025-10-16 17:22:58 +03:00
TensorNull 55a6dd3a8b feat(cometapi): Add CometAPI provider support (embeddings, image generation, docs)
- Add CometAPI embedding and image generation transformations and configs
- Add image cost calculator and export/init files
- Register provider in constants, utils, main (embedding path) and sidebars
- Add CometAPI docs page and cookbook notebook (Colab) for usage examples
2025-10-16 13:08:14 +08:00
Ishaan Jaff bc26845ec4 [Feat] Native /ocr endpoint support (#15573)
* [Feat] Add native litellm.ocr() functions (#15567)

* fix get_supported_ocr_params

* add get_provider_ocr_config

* init OCR

* init ocr functions

* add OCRResponse Base Model

* add ocr to llm http handlers

* add main.py for OCR

* fix linting for OCR

* TestMistralOCR

* update to use DocumentType for Mistral

* fix _prepare_ocr_request

* fix transform

* add main.py for OCR

* add spec to init

* fix OCR

* TestMistralOCR

* ruff fix

* Potential fix for code scanning alert no. 3521: Clear-text logging of sensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* [Feat] Add /ocr route on LiteLLM AI Gateway - Adds support for native mistral ocr calling (#15571)

* fix get_supported_ocr_params

* add get_provider_ocr_config

* init OCR

* init ocr functions

* add OCRResponse Base Model

* add ocr to llm http handlers

* add main.py for OCR

* fix linting for OCR

* TestMistralOCR

* update to use DocumentType for Mistral

* fix _prepare_ocr_request

* fix transform

* add main.py for OCR

* add spec to init

* fix OCR

* TestMistralOCR

* ruff fix

* add router.ocr() methods

* add OCR routes

* feat add ocr routes

* add OCR routes

* feat: add OCR routes in proxy server

* working /ocr routes

* test_router_aocr_with_mistral

* docs Mistral OCR

* docs OCR

* [Feat] Add Azure AI Mistral OCR Integration  (#15572)

* fix get_supported_ocr_params

* add get_provider_ocr_config

* init OCR

* init ocr functions

* add OCRResponse Base Model

* add ocr to llm http handlers

* add main.py for OCR

* fix linting for OCR

* TestMistralOCR

* update to use DocumentType for Mistral

* fix _prepare_ocr_request

* fix transform

* add main.py for OCR

* add spec to init

* fix OCR

* TestMistralOCR

* ruff fix

* add router.ocr() methods

* add OCR routes

* feat add ocr routes

* add OCR routes

* feat: add OCR routes in proxy server

* working /ocr routes

* test_router_aocr_with_mistral

* docs Mistral OCR

* docs OCR

* add azure ai to get_provider_ocr_config

* add AzureAIOCRConfig

* TestAzureAIOCR

* TestAzureAIOCR

* test fixes for azure ai ocr

* fix async OCR transform for Azure

* fix transform_ocr_request

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-10-15 17:20:01 -07:00
Krrish Dholakia 0526334d9c docs: fix url 2025-10-15 08:33:28 -07:00
Ishaan Jaff a6c57cb5bd [Feat] Cost Tracking - specify a global vendor discount for costs. (#15546)
* fix cost_discount_config

* add CostBreakdown

* fix: set_cost_breakdown

* test_cost_discount_vertex_ai

* docs fix

* docs fix discounts

* docs fix

* docs custom pricing

* docs fix

* fixes for getting cost breakdown in response headers

* test - response headers wth discount
2025-10-14 20:07:04 -07:00