From e021fb32cd66e1db8f95477619898bb2b97246b9 Mon Sep 17 00:00:00 2001
From: The BROKE Cluster Team
Date: Thu, 5 Feb 2026 10:42:50 +0100
Subject: [PATCH] Release 2.0.4-beta.10: Audio PyPI fix (tiktoken workaround
complete)
Audio/Whisper works with pip install - no Git workaround needed.
See CHANGELOG.md for details.
Tested: 647 passed, 11 skipped (Python 3.10-3.12)
---
.gitignore | 3 +-
CHANGELOG.md | 40 ++
README.md | 28 +-
TESTING-DETAILS.md | 4 +-
.../ADR/ADR-020-Audio-Backend-Architecture.md | 32 +-
docs/ARCHITECTURE.md | 124 +++++
mlxk2/__init__.py | 2 +-
mlxk2/audio/__init__.py | 9 +
mlxk2/audio/whisper_tokenizer.py | 442 ++++++++++++++++++
mlxk2/core/audio_runner.py | 61 +++
mlxk2/operations/common.py | 92 +++-
mlxk2/operations/run.py | 3 +-
pyproject.toml | 17 +-
13 files changed, 819 insertions(+), 38 deletions(-)
create mode 100644 mlxk2/audio/__init__.py
create mode 100644 mlxk2/audio/whisper_tokenizer.py
diff --git a/.gitignore b/.gitignore
index 979d451..1531258 100644
--- a/.gitignore
+++ b/.gitignore
@@ -4,7 +4,7 @@ venv31?/
venv_*/
venv-*/
test_env*/
-test_results*.log
+test-install-venv/
mypy_*.log
ruff_*.log
*/__pycache__/*
@@ -29,6 +29,7 @@ ML-workspaces/
# Test artifacts (generated reports)
*_report.json
+test_results_3_*.log
test-img-collection/
small-img-collection
benchmarks/reports/*.html
diff --git a/CHANGELOG.md b/CHANGELOG.md
index b4bcf09..0c96e75 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,45 @@
# Changelog
+## [2.0.4-beta.10] - 2026-02-05
+
+> **⚠️ Upgrade Notice:** If you installed beta.9 from PyPI, audio transcription does not work due to an incomplete tiktoken patch. Please upgrade to beta.10: `pip install mlx-knife[all]==2.0.4b10`
+
+### Highlights
+
+**Audio Works Out-of-the-Box:** Complete tiktoken workaround for mlx-audio Issue #479. PyPI installation (`pip install mlx-knife[audio]`) now works without any manual Git installs. We bundle the full Whisper tokenizer (~340 LOC) from mlx-audio commit 9349644 and patch `Model.get_tokenizer()` at runtime to fallback to tiktoken when HuggingFace processor is unavailable.
+
+**Beta.9 Audio Bug:** The beta.9 release on PyPI had an incomplete tiktoken patch - it bundled the assets but didn't patch `Model.get_tokenizer()` (the class was incorrectly named `Whisper` instead of `Model`). This caused "Processor not found" errors with Whisper models.
+
+**Runtime Compatibility Accuracy:** Fixed `runtime_compatible` field in `mlxk list --health` showing incorrect values. Now properly gates embedding models, mis-routed audio models (Qwen3-Omni), transformers 5.x video_processor bugs, and unsupported tokenizers (Voxtral tekken.json).
+
+### Added
+
+- **Whisper Tokenizer Patch (mlx-audio Issue #479):**
+ - New `mlxk2/audio/` module with `whisper_tokenizer.py` (~340 LOC)
+ - Complete `Tokenizer` class and `get_tokenizer()` from mlx-audio commit 9349644
+ - `audio_runner.py`: `_apply_whisper_tokenizer_patch()` patches `Model.get_tokenizer()`
+ - tiktoken>=0.7.0 dependency (OpenAI core library, stable API)
+ - Bundled tiktoken assets: `gpt2.tiktoken`, `multilingual.tiktoken`
+
+### Fixed
+
+- **Audio/Whisper with PyPI mlx-audio:** `pip install mlx-knife[audio]` now works without Git install workaround. The tiktoken regression in mlx-audio 0.3.1 (Issue #479) is fully patched.
+
+- **`runtime_compatible` Accuracy:**
+ - Embedding models (Qwen3-Embedding) → Gate 5: "not supported by mlxk run"
+ - Mis-routed audio models (Qwen3-Omni) → Gate 3a: "model_type not supported by mlx-audio"
+ - transformers 5.x bugs (Qwen2-VL, MiMo-VL) → Gate 4a: "Video processor bug"
+ - Voxtral tekken.json → Gate 3a: "tekken.json tokenizer not supported"
+
+### Changed
+
+- **Documentation:**
+ - README.md: Audio installation simplified (no more Git install instructions)
+ - ARCHITECTURE.md: Added "Runtime Compatibility Decision Tree" and Probe concept
+ - ADR-020: Qwen3-Omni clarification (routes to mlx-vlm, not mlx-audio)
+
+---
+
## [2.0.4-beta.9] - 2026-02-04
### Highlights
diff --git a/README.md b/README.md
index a60f773..ad5f4af 100644
--- a/README.md
+++ b/README.md
@@ -4,9 +4,9 @@
-**Current Version: 2.0.4-beta.9** (Stable: 2.0.3)
+**Current Version: 2.0.4-beta.10** (Stable: 2.0.3)
-[](https://github.com/mzau/mlx-knife/releases)
+[](https://github.com/mzau/mlx-knife/releases)
[](https://www.apache.org/licenses/LICENSE-2.0)
[](https://www.python.org/downloads/)
[](https://support.apple.com/en-us/HT211814)
@@ -17,6 +17,8 @@
## Features
+> **⚠️ Beta.9 Audio Bug:** If you installed `mlx-knife[audio]==2.0.4b9` from PyPI, audio transcription fails with "Processor not found". Upgrade to beta.10: `pip install mlx-knife[all]==2.0.4b10`
+
### What's New in 2.0.4 (Coming Soon - Currently Beta)
- **Audio Transcription (STT)** - Whisper speech-to-text (`--audio` flag, `pip install mlx-knife[audio]`)
- **Vision Models with EXIF Metadata** - Image analysis + automatic GPS/date/camera extraction visible to the model
@@ -94,15 +96,15 @@ mlxk --version # → mlxk 2.0.3
**Requirements:** macOS Apple Silicon, Python 3.9-3.12
-### 2. PyPI Beta (2.0.4-beta.9 - Text + Vision + Audio)
+### 2. PyPI Beta (2.0.4-beta.10 - Text + Vision + Audio)
```bash
-pip install mlx-knife[all]==2.0.4b9
-mlxk --version # → mlxk 2.0.4b9
+pip install mlx-knife[all]==2.0.4b10
+mlxk --version # → mlxk 2.0.4b10
```
**Requirements:** macOS Apple Silicon, Python 3.10-3.12
-**Features:** Audio STT (Whisper), Vision with EXIF metadata, tiktoken workaround bundled
+**Features:** Audio STT (Whisper), Vision with EXIF metadata, full tiktoken workaround
### 3. Developer Installation
@@ -111,7 +113,7 @@ git clone https://github.com/mzau/mlx-knife.git
cd mlx-knife
pip install -e ".[all,dev,test]"
-mlxk --version # → mlxk 2.0.4b9
+mlxk --version # → mlxk 2.0.4b10
pytest -v
```
@@ -449,18 +451,14 @@ mlxk convert