Release MLX Knife 1.1.0-beta1 - Dynamic Token Limits & Enhanced Web Client

Issues Resolved:
  • Issue #15: Token limits vs natural stop tokens race condition - FIXED
  • Issue #16: Interactive vs server token limit policies - FIXED

  Major Improvements:
  • Automatic optimal token limits - no configuration needed
  • Manual --max-tokens control still available when desired
  • Eliminates old hardcoded 500/2000 token restrictions
  • Performance gains: Up to 524x improvement for large context models
  • Enhanced web client with model capabilities display and better UX

  Additional Enhancements:
  • Enhanced /v1/models API with context_length field
  • Comprehensive test expansion: 114 → 131 tests (131/131 passing)
  • Python 3.9-3.13 compatibility verified

  Known Issues (Beta Status):
  • Server deadlock possible under extreme concurrent model loading stress
  • Workaround: Avoid simultaneous heavy model operations
This commit is contained in:
The BROKE Team
2025-08-21 17:36:44 +02:00
parent 6117e571ca
commit 74239c4e43
12 changed files with 993 additions and 42 deletions
+11 -9
View File
@@ -2,10 +2,10 @@
## Current Status
**114/114 tests passing** (August 2025)
**131/131 tests passing** (August 2025)
**Apple Silicon verified** (M1/M2/M3)
**Python 3.9-3.13 compatible**
**Production ready** - real model execution validated
**Beta ready** - comprehensive testing with real model execution
## Quick Start
@@ -42,13 +42,15 @@ This approach ensures our tests reflect real-world usage, not mocked behavior.
```
tests/
├── conftest.py # Shared fixtures and utilities
├── integration/ # System-level integration tests (62 tests)
├── integration/ # System-level integration tests (85+ tests)
│ ├── test_core_functionality.py # Basic CLI operations
│ ├── test_health_checks.py # Model corruption detection
│ ├── test_issue_14.py # Issue #14: Chat self-conversation fix
│ ├── test_issue_15_16.py # Issues #15/#16: Dynamic token limits
│ ├── test_process_lifecycle.py # Process management & cleanup
│ ├── test_run_command_advanced.py # Run command edge cases
│ └── test_server_functionality.py # OpenAI API server tests
└── unit/ # Module-level unit tests (52 tests)
└── unit/ # Module-level unit tests (45+ tests)
├── test_cache_utils.py # Cache management functions
├── test_cli.py # CLI argument parsing
└── test_mlx_runner_memory.py # Memory management tests
@@ -158,11 +160,11 @@ pytest -n auto
| Python Version | Status | Tests Passing |
|----------------|--------|---------------|
| 3.9.6 (macOS) | ✅ Verified | 114/114 |
| 3.10.x | ✅ Verified | 114/114 |
| 3.11.x | ✅ Verified | 114/114 |
| 3.12.x | ✅ Verified | 114/114 |
| 3.13.x | ✅ Verified | 114/114 |
| 3.9.6 (macOS) | ✅ Verified | 131/131 |
| 3.10.x | ✅ Verified | 131/131 |
| 3.11.x | ✅ Verified | 131/131 |
| 3.12.x | ✅ Verified | 131/131 |
| 3.13.x | ✅ Verified | 131/131 |
All versions tested with real MLX model execution (Phi-3-mini-4k-instruct-4bit).