12 Commits

Author SHA1 Message Date
The BROKE Cluster Team d4cd89fab0 Release 2.0.4 stable - see CHANGELOG.md for details 2026-02-11 15:05:09 +01:00
The BROKE Cluster Team dab7ffb6fc fix: P0 bugfixes + test infrastructure + benchmark metadata sync
P0 Bugfixes:
- cache.py: Handle empty HF_HOME strings in get_current_cache_root()
- clone.py: Remove obsolete _validate_same_volume() check
- common.py: Use importlib.metadata instead of importing transformers

Test Infrastructure:
- runner/__init__.py: Replace "mock" fallback with clear RuntimeError
- Fix mock paths in test_runner_core, test_token_limits, etc.
- Add VISION_TEST_MODELS + AUDIO_TEST_MODELS fallbacks
- Portfolio fixtures work with and without HF_HOME

Benchmark Fixes:
- Sort models/tests alphabetically instead of by regression %
- Fix vision metadata drift: pixtral-12b-8bit → pixtral-12b-4bit

Documentation:
- ADR-022: Workspace-First Paradigm (draft)
- ADR-018: Phase 2 details expanded
- TESTING.md/TESTING-DETAILS.md: Fallback docs updated
2026-02-10 15:52:36 +01:00
The BROKE Cluster Team e021fb32cd Release 2.0.4-beta.10: Audio PyPI fix (tiktoken workaround complete)
Audio/Whisper works with pip install - no Git workaround needed.
See CHANGELOG.md for details.

Tested: 647 passed, 11 skipped (Python 3.10-3.12)
2026-02-05 10:42:50 +01:00
The BROKE Cluster Team bf7480d042 Release 2.0.4-beta.9: Audio transcription via mlx-audio
Major Features:
- Audio transcription via mlx-audio backend (Whisper, >10min duration)
- OpenAI /v1/audio/transcriptions endpoint
- Memory Gate System (Vision: 8GB, Audio: 4GB)
- Config-based backend routing (ADR-020)
- Benchmark toolchain (memmon/memplot, Schema v0.2.2)

Key Fixes:
- EuroLLM tokenizer decoding
- Vision-model text-only routing regression
- Multimodal model context length detection
- Memory cleanup bug (mx.metal.clear_cache)
- Orphan process bug

Test Results:
- Unit tests: 647 passed, 11 skipped (Python 3.10-3.12)
- wet-umbrella: 171 passed total

See CHANGELOG.md for complete details and known issues.
2026-02-04 03:10:30 +01:00
The BROKE Cluster Team e8b10ea10b Release 2.0.4-beta.8: Audio transcription support (experimental)
Audio input via --audio flag (CLI) and input_audio content type (Server API).
Uses mlx-vlm native audio processing. ~30s duration limit (model constraint).
Currently only Gemma-3n tested (requires --repair-index fix).

Also includes:
- SERVER-HANDBOOK compliance (image limits, validation error envelopes)
- Dependency updates: mlx>=0.30.0, mlx-lm>=0.30.0, huggingface-hub>=1.0.0
- Audio E2E test suite + ADR-019
2026-01-23 20:20:59 +01:00
The BROKE Cluster Team 5751545b8b Release 2.0.4-beta.7: Server robustness + Vision per-chunk streaming
- Server: exit codes, /v1/models crash fix, vision routing, MLXK2_MAX_TOKENS
- Vision: true SSE streaming, hallucination fix (local numbering)
- Workspace: list prefix-match, push ambiguous pattern handling
- Docs: SERVER-HANDBOOK accuracy updates

See CHANGELOG.md for details.
2026-01-18 16:57:32 +01:00
The BROKE Cluster Team 53d9cca82d Release 2.0.4-beta.6: Local workspace workflow + Vision batch processing
- Complete local development cycle: clone → repair → run/show/server on
  workspace paths without HuggingFace round-trips
- Vision processing now defaults to safe chunking (one image at a time,
  prevents OOM + hallucination)
- Resumable clone with --force-resume and deterministic temp cache naming
- Improved test infrastructure (umbrella marker convention)
- 161 Wet Umbrella tests passing including new Vision→Geo pipe integration tests

See CHANGELOG.md for complete details.
2026-01-07 17:11:07 +01:00
The BROKE Cluster Team 25609e4dcb Release 2.0.4-beta.5: Community repair tool + OS-agnostic benchmarking
Closes #49 (Mistral Tokenizer Bug)

Major features:
- Workspace Infrastructure (ADR-018 Phase 0a): Managed workspace detection,
  provenance metadata, backward compatible with unmanaged workspaces
- Convert Operation (ADR-018 Phase 1): `mlxk convert --repair-index` fixes
  mlx-vlm #624 affected models (7+ models including Qwen2.5-VL, gemma-3)
- Resumable Pull: Auto-detect partial downloads with `--force-resume`
- Wet Umbrella Test Integration: Single entry point for all real model tests

Fixes:
- #49: BPE space markers now correctly converted (Mistral-family models)
- Vision Portfolio Discovery: Filter by capabilities instead of model_type
- Memory Cleanup Hook: Triggers for both live_e2e and wet markers

Test suite: 528 passed, 60 skipped (Python 3.9-3.14)
2025-12-31 16:05:18 +01:00
The BROKE Cluster Team d3f7d091bc Release 2.0.4-beta.3: Dependency compatibility + Documentation
Bugfixes and compatibility improvements. No new features.

Core fixes:
- Framework detection for web API models (Issue #48)
- Video-only model filtering from vision capability
- Page size detection for memory metrics (macOS)
- Model switch log timing (after load completion)

Compatibility:
- hub 1.x + transformers 5.0 support
- Python 3.9-3.14 verified (494 tests passing)

Testing infrastructure:
- Benchmark schema v0.2.0 (hardware profiling, system health)
- Benchmark template v1.0 (automated JSONL→Markdown reports)
- Memory timeline visualization (memplot.py)
- Unified model filter (build_model_object single source)

Documentation:
- Multi-Modal Support section in README (Vision subsection)
- JSON API 0.1.5-0.1.6 marked Stable
- Vision promoted from alpha to beta status
- Removed conceptual drift and outdated references

See CHANGELOG.md for complete details.
2025-12-23 12:19:04 +01:00
The BROKE Cluster Team 86f669dc82 Release 2.0.4-beta.1: Vision + Pipes + Memory
- Vision Support (Issue #45): CLI + Server with OpenAI-compatible image API, EXIF metadata
- Unix Pipes (ADR-014): stdin support, isatty detection, SIGPIPE handling
- Memory-Aware Loading (ADR-016): Pre-load checks with >70% RAM warnings
- Python 3.9-3.14: Full compatibility verified (476-485 tests passing)
- Fixed: --log-json regression (Issue #44), Vision multimodal history filtering

See CHANGELOG.md for complete details.
2025-12-16 19:35:30 +01:00
The BROKE Cluster Team 05f1c30486 Release 2.0.3: Foundation for pipes
Foundation release for Unix pipe integration with stderr separation,
benchmark infrastructure, and reasoning control improvements.

Breaking Changes:
- stdout/stderr separation (Issue #43) - errors to stderr in human mode
- JSON mode unchanged (all output to stdout)

Features:
- Benchmark reporting infrastructure (ADR-013 Phase 0)
- --no-reasoning flag (Issue #40 partial - GPT-OSS/QwQ only)
- Interactive mode reasoning control (review_report.md fixes)

Bug Fixes:
- huggingface-hub 1.x incompatibility (critical dependency fix)
- Streaming parity tests refactored (Portfolio Discovery)

Testing:
- 308 tests passing (Python 3.9-3.13)
- 35 skipped (opt-in live tests)
- 79/91 E2E tests passing with HF_HOME

See CHANGELOG.md for complete details and migration guide.
2025-11-17 22:54:06 +01:00
The BROKE Cluster Team d32d3185dd Release 2.0.2: Test infrastructure hardening & empirical validation
Stable release completing Issue #32 recovery plan - all tests passing.

Bug Fixes:
- Test collection regression (E2E suite parametrization)
- Stop token ordering (batch + streaming modes)
- E2E test temperature flakiness (deterministic sampling)
- Web API framework detection (PR #42 by @limey, fixes #41)
- E2E test marker fix (show_model_portfolio diagnostics)

Architecture:
- mlx-lm API evaluation: Keep manual text-based implementation
- Stop token workarounds: All 3 validated (Phi-3, DeepSeek-R1, GPT-oss)

Testing:
- Portfolio Discovery: 73/81 tests, 17 models, 0 failures
- E2E infrastructure hardened (TOKENIZERS, polling, gc.collect())
- Multi-Python validation: 3.9-3.13 passing

Documentation:
- ADR-009 Outstanding Work completed + Implementation Plan removed
- TESTING-DETAILS.md: Portfolio Discovery + E2E Architecture updated
- CHANGELOG.md: Complete 2.0.2 stable release notes
2025-11-15 22:10:08 +01:00