The BROKE Cluster Team
d4cd89fab0
Release 2.0.4 stable - see CHANGELOG.md for details
2026-02-11 15:05:09 +01:00
The BROKE Cluster Team
dab7ffb6fc
fix: P0 bugfixes + test infrastructure + benchmark metadata sync
...
P0 Bugfixes:
- cache.py: Handle empty HF_HOME strings in get_current_cache_root()
- clone.py: Remove obsolete _validate_same_volume() check
- common.py: Use importlib.metadata instead of importing transformers
Test Infrastructure:
- runner/__init__.py: Replace "mock" fallback with clear RuntimeError
- Fix mock paths in test_runner_core, test_token_limits, etc.
- Add VISION_TEST_MODELS + AUDIO_TEST_MODELS fallbacks
- Portfolio fixtures work with and without HF_HOME
Benchmark Fixes:
- Sort models/tests alphabetically instead of by regression %
- Fix vision metadata drift: pixtral-12b-8bit → pixtral-12b-4bit
Documentation:
- ADR-022: Workspace-First Paradigm (draft)
- ADR-018: Phase 2 details expanded
- TESTING.md/TESTING-DETAILS.md: Fallback docs updated
2026-02-10 15:52:36 +01:00
The BROKE Cluster Team
e021fb32cd
Release 2.0.4-beta.10: Audio PyPI fix (tiktoken workaround complete)
...
Audio/Whisper works with pip install - no Git workaround needed.
See CHANGELOG.md for details.
Tested: 647 passed, 11 skipped (Python 3.10-3.12)
2026-02-05 10:42:50 +01:00
The BROKE Cluster Team
bf7480d042
Release 2.0.4-beta.9: Audio transcription via mlx-audio
...
Major Features:
- Audio transcription via mlx-audio backend (Whisper, >10min duration)
- OpenAI /v1/audio/transcriptions endpoint
- Memory Gate System (Vision: 8GB, Audio: 4GB)
- Config-based backend routing (ADR-020)
- Benchmark toolchain (memmon/memplot, Schema v0.2.2)
Key Fixes:
- EuroLLM tokenizer decoding
- Vision-model text-only routing regression
- Multimodal model context length detection
- Memory cleanup bug (mx.metal.clear_cache)
- Orphan process bug
Test Results:
- Unit tests: 647 passed, 11 skipped (Python 3.10-3.12)
- wet-umbrella: 171 passed total
See CHANGELOG.md for complete details and known issues.
2026-02-04 03:10:30 +01:00
The BROKE Cluster Team
e8b10ea10b
Release 2.0.4-beta.8: Audio transcription support (experimental)
...
Audio input via --audio flag (CLI) and input_audio content type (Server API).
Uses mlx-vlm native audio processing. ~30s duration limit (model constraint).
Currently only Gemma-3n tested (requires --repair-index fix).
Also includes:
- SERVER-HANDBOOK compliance (image limits, validation error envelopes)
- Dependency updates: mlx>=0.30.0, mlx-lm>=0.30.0, huggingface-hub>=1.0.0
- Audio E2E test suite + ADR-019
2026-01-23 20:20:59 +01:00
The BROKE Cluster Team
5751545b8b
Release 2.0.4-beta.7: Server robustness + Vision per-chunk streaming
...
- Server: exit codes, /v1/models crash fix, vision routing, MLXK2_MAX_TOKENS
- Vision: true SSE streaming, hallucination fix (local numbering)
- Workspace: list prefix-match, push ambiguous pattern handling
- Docs: SERVER-HANDBOOK accuracy updates
See CHANGELOG.md for details.
2026-01-18 16:57:32 +01:00
The BROKE Cluster Team
53d9cca82d
Release 2.0.4-beta.6: Local workspace workflow + Vision batch processing
...
- Complete local development cycle: clone → repair → run/show/server on
workspace paths without HuggingFace round-trips
- Vision processing now defaults to safe chunking (one image at a time,
prevents OOM + hallucination)
- Resumable clone with --force-resume and deterministic temp cache naming
- Improved test infrastructure (umbrella marker convention)
- 161 Wet Umbrella tests passing including new Vision→Geo pipe integration tests
See CHANGELOG.md for complete details.
2026-01-07 17:11:07 +01:00
The BROKE Cluster Team
25609e4dcb
Release 2.0.4-beta.5: Community repair tool + OS-agnostic benchmarking
...
Closes #49 (Mistral Tokenizer Bug)
Major features:
- Workspace Infrastructure (ADR-018 Phase 0a): Managed workspace detection,
provenance metadata, backward compatible with unmanaged workspaces
- Convert Operation (ADR-018 Phase 1): `mlxk convert --repair-index` fixes
mlx-vlm #624 affected models (7+ models including Qwen2.5-VL, gemma-3)
- Resumable Pull: Auto-detect partial downloads with `--force-resume`
- Wet Umbrella Test Integration: Single entry point for all real model tests
Fixes:
- #49 : BPE space markers now correctly converted (Mistral-family models)
- Vision Portfolio Discovery: Filter by capabilities instead of model_type
- Memory Cleanup Hook: Triggers for both live_e2e and wet markers
Test suite: 528 passed, 60 skipped (Python 3.9-3.14)
2025-12-31 16:05:18 +01:00
The BROKE Cluster Team
d3f7d091bc
Release 2.0.4-beta.3: Dependency compatibility + Documentation
...
Bugfixes and compatibility improvements. No new features.
Core fixes:
- Framework detection for web API models (Issue #48 )
- Video-only model filtering from vision capability
- Page size detection for memory metrics (macOS)
- Model switch log timing (after load completion)
Compatibility:
- hub 1.x + transformers 5.0 support
- Python 3.9-3.14 verified (494 tests passing)
Testing infrastructure:
- Benchmark schema v0.2.0 (hardware profiling, system health)
- Benchmark template v1.0 (automated JSONL→Markdown reports)
- Memory timeline visualization (memplot.py)
- Unified model filter (build_model_object single source)
Documentation:
- Multi-Modal Support section in README (Vision subsection)
- JSON API 0.1.5-0.1.6 marked Stable
- Vision promoted from alpha to beta status
- Removed conceptual drift and outdated references
See CHANGELOG.md for complete details.
2025-12-23 12:19:04 +01:00
The BROKE Cluster Team
86f669dc82
Release 2.0.4-beta.1: Vision + Pipes + Memory
...
- Vision Support (Issue #45 ): CLI + Server with OpenAI-compatible image API, EXIF metadata
- Unix Pipes (ADR-014): stdin support, isatty detection, SIGPIPE handling
- Memory-Aware Loading (ADR-016): Pre-load checks with >70% RAM warnings
- Python 3.9-3.14: Full compatibility verified (476-485 tests passing)
- Fixed: --log-json regression (Issue #44 ), Vision multimodal history filtering
See CHANGELOG.md for complete details.
2025-12-16 19:35:30 +01:00
The BROKE Cluster Team
05f1c30486
Release 2.0.3: Foundation for pipes
...
Foundation release for Unix pipe integration with stderr separation,
benchmark infrastructure, and reasoning control improvements.
Breaking Changes:
- stdout/stderr separation (Issue #43 ) - errors to stderr in human mode
- JSON mode unchanged (all output to stdout)
Features:
- Benchmark reporting infrastructure (ADR-013 Phase 0)
- --no-reasoning flag (Issue #40 partial - GPT-OSS/QwQ only)
- Interactive mode reasoning control (review_report.md fixes)
Bug Fixes:
- huggingface-hub 1.x incompatibility (critical dependency fix)
- Streaming parity tests refactored (Portfolio Discovery)
Testing:
- 308 tests passing (Python 3.9-3.13)
- 35 skipped (opt-in live tests)
- 79/91 E2E tests passing with HF_HOME
See CHANGELOG.md for complete details and migration guide.
2025-11-17 22:54:06 +01:00
The BROKE Cluster Team
d32d3185dd
Release 2.0.2: Test infrastructure hardening & empirical validation
...
Stable release completing Issue #32 recovery plan - all tests passing.
Bug Fixes:
- Test collection regression (E2E suite parametrization)
- Stop token ordering (batch + streaming modes)
- E2E test temperature flakiness (deterministic sampling)
- Web API framework detection (PR #42 by @limey, fixes #41 )
- E2E test marker fix (show_model_portfolio diagnostics)
Architecture:
- mlx-lm API evaluation: Keep manual text-based implementation
- Stop token workarounds: All 3 validated (Phi-3, DeepSeek-R1, GPT-oss)
Testing:
- Portfolio Discovery: 73/81 tests, 17 models, 0 failures
- E2E infrastructure hardened (TOKENIZERS, polling, gc.collect())
- Multi-Python validation: 3.9-3.13 passing
Documentation:
- ADR-009 Outstanding Work completed + Implementation Plan removed
- TESTING-DETAILS.md: Portfolio Discovery + E2E Architecture updated
- CHANGELOG.md: Complete 2.0.2 stable release notes
2025-11-15 22:10:08 +01:00