From d4cd89fab0e97bc7662ad6ed975257484537dc9c Mon Sep 17 00:00:00 2001 From: The BROKE Cluster Team Date: Wed, 11 Feb 2026 15:05:09 +0100 Subject: [PATCH] Release 2.0.4 stable - see CHANGELOG.md for details --- CHANGELOG.md | 6 +- README.md | 49 ++-- TESTING-DETAILS.md | 18 +- TESTING.md | 6 +- benchmarks/README.md | 8 +- benchmarks/TESTING.md | 4 +- benchmarks/generate_benchmark_report.py | 185 ++++++-------- ...4-2026-02-11-wet-benchmark-2.0.4-stable.md | 235 ++++++++++++++++++ mlxk2/__init__.py | 2 +- pyproject.toml | 29 +-- requirements.txt | 19 -- 11 files changed, 364 insertions(+), 197 deletions(-) create mode 100644 benchmarks/reports/BENCHMARK-v1.1-2.0.4-2026-02-11-wet-benchmark-2.0.4-stable.md delete mode 100644 requirements.txt diff --git a/CHANGELOG.md b/CHANGELOG.md index bf1afea..bd82ff1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,6 @@ # Changelog -## [2.0.4] - 2026-XX-XX (WIP) +## [2.0.4] - 2026-02-11 > **First stable release with Audio/STT support.** This release consolidates beta.9 and beta.10 improvements into a production-ready package. @@ -18,8 +18,12 @@ - `test_whisper_tokenizer.py`: 47 tests for tiktoken workaround (get_encoding, get_tokenizer, Tokenizer class) - `TestEmbeddingGate`: 3 tests for embedding model runtime blocking +- **Test Infrastructure:** Test suite now runs consistently with `unset HF_HOME` (uses fallback models) + ### Fixed (since beta.10) +- **`serve --model` Pre-Validation:** Model ambiguity and not-found errors now detected before server starts. Previously showed stacktrace; now shows clean error message with suggestions. + - **Run Preflight Consistency:** `run.py` now passes `probe` and `framework` to `audio_runtime_compatibility()`. STT model_type and tekken.json gates now work in CLI (previously only in list/health). - **STT model_type Gate:** Extended to accept `vibevoice` and `audio` model types (was only `whisper`/`voxtral`). VibeVoice-ASR and future STT models no longer blocked incorrectly. diff --git a/README.md b/README.md index e1638b4..93db695 100644 --- a/README.md +++ b/README.md @@ -4,9 +4,9 @@ MLX Knife Demo

-**Current Version: 2.0.4-beta.10** (Stable: 2.0.3) +**Current Version: 2.0.4** (Stable) -[![GitHub Release](https://img.shields.io/badge/version-2.0.4--beta.10-blue.svg)](https://github.com/mzau/mlx-knife/releases) +[![GitHub Release](https://img.shields.io/badge/version-2.0.4-blue.svg)](https://github.com/mzau/mlx-knife/releases) [![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0) [![Python 3.10-3.12](https://img.shields.io/badge/python-3.10--3.12-blue.svg)](https://www.python.org/downloads/) [![Apple Silicon](https://img.shields.io/badge/Apple%20Silicon-green.svg)](https://support.apple.com/en-us/HT211814) @@ -17,10 +17,8 @@ ## Features -> **⚠️ Beta.9 Audio Bug:** If you installed `mlx-knife[audio]==2.0.4b9` from PyPI, audio transcription fails with "Processor not found". Upgrade to beta.10: `pip install mlx-knife[all]==2.0.4b10` - -### What's New in 2.0.4 (Coming Soon - Currently Beta) -- **Audio Transcription (STT)** - Whisper speech-to-text (`--audio` flag, `pip install mlx-knife[audio]`) +### What's New in 2.0.4 +- **Audio Transcription (STT)** - Whisper speech-to-text (`--audio` flag) - **Vision Models with EXIF Metadata** - Image analysis + automatic GPS/date/camera extraction visible to the model - **Unix Pipe Integration** - Chain models without temp files (`vision → text` workflows) - **Local Development Workflow** - Clone → Repair → Test models without HuggingFace round-trips @@ -79,42 +77,33 @@ This license applies **only** to the `mlx-knife` code and **does not extend** to ### Python Compatibility ✅ **Python 3.10 - 3.12** - Full support (Text + Vision + Audio) -❌ **Python 3.9** - Not supported (MLX 0.30+ requires 3.10+) +❌ **Python 3.9** - Use version 2.0.3 (text + cache management only) ❌ **Python 3.13+** - Not supported (miniaudio lacks pre-built wheels) -**Note:** Vision/Audio features require Python 3.10+. Recommended: **Python 3.10 or 3.11** for best compatibility. +**Recommended:** Python 3.10 or 3.11 for best compatibility. ## Installation -### 1. PyPI Stable (2.0.3 - Text models only) +### 1. PyPI (Recommended) ```bash pip install mlx-knife -mlxk --version # → mlxk 2.0.3 -``` - -**Requirements:** macOS Apple Silicon, Python 3.9-3.12 - -### 2. PyPI Beta (2.0.4-beta.10 - Text + Vision + Audio) - -```bash -pip install mlx-knife[all]==2.0.4b10 -mlxk --version # → mlxk 2.0.4b10 +mlxk --version # → mlxk 2.0.4 ``` **Requirements:** macOS Apple Silicon, Python 3.10-3.12 -**Features:** Audio STT (Whisper), Vision with EXIF metadata, full tiktoken workaround +**Includes:** Text, Vision, Audio (Whisper STT), EXIF metadata, Unix pipes -### 3. Developer Installation +### 2. Developer Installation ```bash git clone https://github.com/mzau/mlx-knife.git cd mlx-knife -pip install -e ".[all,dev,test]" +pip install -e ".[dev,test]" -mlxk --version # → mlxk 2.0.4b10 +mlxk --version # → mlxk 2.0.4 pytest -v ``` @@ -285,8 +274,7 @@ Image analysis via the `--image` flag (CLI and server). Requires Python 3.10+. #### Requirements - **Python 3.10+** (mlx-vlm dependency) -- **Installation:** `pip install mlx-knife[vision]` -- **Backend:** mlx-vlm 0.3.10 (auto-installed from PyPI) +- **Backend:** mlx-vlm 0.3.10 (included in base install) #### Usage @@ -424,7 +412,7 @@ curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/jso **⚠️ Important:** Vision support relies on mlx-vlm (upstream), which has known stability issues. While `mlxk health` verifies file integrity, **runtime failures may occur** with certain model architectures due to upstream bugs. -**✅ Tested & Working Models** (mlx-knife v2.0.4-beta.6): +**✅ Tested & Working Models** (mlx-knife v2.0.4): | Model | Size | Notes | |-------|------|-------| @@ -452,14 +440,13 @@ mlxk convert --repair-index ### Audio Transcription (Speech-to-Text) -> **🎙️ New in beta.9/10:** Professional STT via dedicated Whisper models (mlx-audio backend). Beta.10 fixes PyPI install (no Git workaround needed). Backward compatible with Gemma-3n multimodal audio (mlx-vlm). +> **🎙️ Audio Transcription:** Speech-to-text via Whisper models (mlx-audio backend). Works out-of-the-box with PyPI install. Backward compatible with Gemma-3n multimodal audio (mlx-vlm). **Requirements:** -- **Python 3.10+** (mlx-audio dependency) -- **Installation:** `pip install mlx-knife[audio]` (tiktoken workaround bundled) +- **Python 3.10+** (mlx-audio dependency, included in base install) - **No system dependencies:** MP3/WAV decoding via embedded libsndfile (no ffmpeg or Homebrew required) -**✅ Recommended Models** (mlx-knife v2.0.4-beta.10): +**✅ Recommended Models** (mlx-knife v2.0.4): | Model | Backend | Size | Duration | Notes | |-------|---------|------|----------|-------| @@ -1240,7 +1227,7 @@ Apache License 2.0 — see `LICENSE` (root) and `mlxk2/NOTICE`.

Made with ❤️ by The BROKE team BROKE Logo
- Version 2.0.4-beta.10 | February 2026
+ Version 2.0.4 | February 2026
💬 Web UI: nChat - lightweight chat interface🔮 Multi-node: BROKE Cluster

diff --git a/TESTING-DETAILS.md b/TESTING-DETAILS.md index 5c5d84f..072a141 100644 --- a/TESTING-DETAILS.md +++ b/TESTING-DETAILS.md @@ -4,24 +4,24 @@ This document contains version-specific details, complete file listings, and imp ## Current Status -✅ **2.0.4-beta.10** — **Audio PyPI Fix** (tiktoken workaround complete); Runtime compatibility accuracy; Audio transcription (Whisper via mlx-audio); Server `/v1/audio/transcriptions` endpoint; Probe/Policy architecture complete; Vision support Phase 1-3 (CLI + Server); Pipes/Memory-Aware; EXIF metadata; **Test Portfolio Separation complete**; Workspace Infrastructure (ADR-018 Phase 0a+0b+0c); Convert Operation (ADR-018 Phase 1); Resumable Clone; **Benchmark Schema v0.2.2** (Precise test timing). +✅ **2.0.4** — **First stable release with Vision + Audio.** Vision support (CLI + Server, EXIF metadata); Audio transcription (Whisper via mlx-audio); Runtime compatibility accuracy; Server `/v1/audio/transcriptions` endpoint; Probe/Policy architecture; Pipes/Memory-Aware; **Test Portfolio Separation complete**; Workspace Infrastructure (ADR-018 Phase 0a+0b+0c); Convert Operation (ADR-018 Phase 1); Resumable Clone; **Benchmark Schema v0.2.2**. ### Test Results (Official Reference) **Standard Unit Tests (Multi-Python):** ``` Platform: macOS 26.2 (Tahoe), M2 Max, 64GB RAM -Python 3.10: 647 passed, 11 skipped in 19.78s -Python 3.11: 647 passed, 11 skipped in 19.91s -Python 3.12: 647 passed, 11 skipped in 20.94s -Note: Default suite works on 16GB. Wet-umbrella: 64GB recommended (M1 Max 32GB untested) +Python 3.10: 697 passed, 13 skipped +Python 3.11: 697 passed, 13 skipped +Python 3.12: 697 passed, 13 skipped +Note: Default suite works on 16GB. Full integration tests: 64GB recommended ``` -**Wet Umbrella (4-Phase Integration):** +**Full Integration Tests (`./scripts/test-wet-umbrella.sh`):** ``` -Phase 1 (wet marker): 168 passed, 73 skipped, 680 deselected (Schema v0.2.2) -Phase 2-4 (live_pull/clone/pipe): 3 passed, 742 deselected -Total: 171 passed across all phases +Phase 1 (portfolio tests): 179 passed, 73 skipped, 732 deselected +Phase 2-4 (live operations): 3 passed +Total: 182 passed across all phases ``` ✅ **Production verified & reported:** M1, M1 Max, M2 Max in real-world use diff --git a/TESTING.md b/TESTING.md index 696dba1..487511f 100644 --- a/TESTING.md +++ b/TESTING.md @@ -300,7 +300,7 @@ pytest -m live_stop_tokens -v ## Python Version Compatibility -**All tests validated on Python 3.9-3.14** +**Tests validated on Python 3.10-3.12** (Python 3.9 not supported since 2.0.4) Multi-version testing: ```bash @@ -308,8 +308,8 @@ Multi-version testing: ./test-multi-python.sh # Manual verification -python3.9 -m venv test_39 -source test_39/bin/activate +python3.10 -m venv test_310 +source test_310/bin/activate pip install -e .[test] && pytest ``` diff --git a/benchmarks/README.md b/benchmarks/README.md index 930c421..8f3a56f 100644 --- a/benchmarks/README.md +++ b/benchmarks/README.md @@ -15,7 +15,7 @@ This directory contains benchmark infrastructure for mlx-knife: benchmarks/ ├── reports/ # JSONL test reports + Markdown analyses │ ├── 2025-12-20-v2.0.4b3.jsonl # Raw data (one file per test run) -│ └── BENCHMARK-v1.0-*.md # Generated analysis reports +│ └── BENCHMARK-*.md # Generated analysis reports ├── schemas/ # JSON Schema definitions │ ├── report-v0.1.schema.json # legacy schema │ ├── report-v0.2.2.schema.json # Current schema @@ -23,7 +23,7 @@ benchmarks/ ├── tools/ # Standalone tools │ ├── memmon.py # Memory monitor (background sampling) │ └── memplot.py # Memory timeline visualizer -├── generate_benchmark_report.py # Report generator (Template v1.0) +├── generate_benchmark_report.py # Report generator (Template v1.1) ├── validate_reports.py # Schema validation ├── README.md # ← You are here └── TESTING.md # Benchmark handbook (How-To) @@ -33,7 +33,7 @@ benchmarks/ | Tool | Purpose | |------|---------| -| `generate_benchmark_report.py` | JSONL → Markdown report (Template v1.0) | +| `generate_benchmark_report.py` | JSONL → Markdown report (Template v1.1) | | `validate_reports.py` | Schema validation of JSONL files | | `tools/memmon.py` | Memory + CPU + GPU monitoring (200ms sampling) | | `tools/memplot.py` | Interactive 3-row timeline (Memory/CPU/GPU, HTML) | @@ -59,7 +59,7 @@ See `schemas/LEARNINGS-FOR-v1.0.md` for details. ## Recent Reports Latest baseline reports are in `reports/` directory: -- Pattern: `BENCHMARK-v1.0---*.md` +- Pattern: `BENCHMARK-