mlx-knife

mirror of https://github.com/cloudstack-llc/mlx-knife.git synced 2026-07-01 20:44:14 -04:00

Author	SHA1	Message	Date
The BROKE Cluster Team	d4cd89fab0	Release 2.0.4 stable - see CHANGELOG.md for details	2026-02-11 15:05:09 +01:00
The BROKE Cluster Team	dab7ffb6fc	fix: P0 bugfixes + test infrastructure + benchmark metadata sync P0 Bugfixes: - cache.py: Handle empty HF_HOME strings in get_current_cache_root() - clone.py: Remove obsolete _validate_same_volume() check - common.py: Use importlib.metadata instead of importing transformers Test Infrastructure: - runner/__init__.py: Replace "mock" fallback with clear RuntimeError - Fix mock paths in test_runner_core, test_token_limits, etc. - Add VISION_TEST_MODELS + AUDIO_TEST_MODELS fallbacks - Portfolio fixtures work with and without HF_HOME Benchmark Fixes: - Sort models/tests alphabetically instead of by regression % - Fix vision metadata drift: pixtral-12b-8bit → pixtral-12b-4bit Documentation: - ADR-022: Workspace-First Paradigm (draft) - ADR-018: Phase 2 details expanded - TESTING.md/TESTING-DETAILS.md: Fallback docs updated	2026-02-10 15:52:36 +01:00
The BROKE Cluster Team	e021fb32cd	Release 2.0.4-beta.10: Audio PyPI fix (tiktoken workaround complete) Audio/Whisper works with pip install - no Git workaround needed. See CHANGELOG.md for details. Tested: 647 passed, 11 skipped (Python 3.10-3.12)	2026-02-05 10:42:50 +01:00
The BROKE Cluster Team	bf7480d042	Release 2.0.4-beta.9: Audio transcription via mlx-audio Major Features: - Audio transcription via mlx-audio backend (Whisper, >10min duration) - OpenAI /v1/audio/transcriptions endpoint - Memory Gate System (Vision: 8GB, Audio: 4GB) - Config-based backend routing (ADR-020) - Benchmark toolchain (memmon/memplot, Schema v0.2.2) Key Fixes: - EuroLLM tokenizer decoding - Vision-model text-only routing regression - Multimodal model context length detection - Memory cleanup bug (mx.metal.clear_cache) - Orphan process bug Test Results: - Unit tests: 647 passed, 11 skipped (Python 3.10-3.12) - wet-umbrella: 171 passed total See CHANGELOG.md for complete details and known issues.	2026-02-04 03:10:30 +01:00
The BROKE Cluster Team	e8b10ea10b	Release 2.0.4-beta.8: Audio transcription support (experimental) Audio input via --audio flag (CLI) and input_audio content type (Server API). Uses mlx-vlm native audio processing. ~30s duration limit (model constraint). Currently only Gemma-3n tested (requires --repair-index fix). Also includes: - SERVER-HANDBOOK compliance (image limits, validation error envelopes) - Dependency updates: mlx>=0.30.0, mlx-lm>=0.30.0, huggingface-hub>=1.0.0 - Audio E2E test suite + ADR-019	2026-01-23 20:20:59 +01:00
The BROKE Cluster Team	5751545b8b	Release 2.0.4-beta.7: Server robustness + Vision per-chunk streaming - Server: exit codes, /v1/models crash fix, vision routing, MLXK2_MAX_TOKENS - Vision: true SSE streaming, hallucination fix (local numbering) - Workspace: list prefix-match, push ambiguous pattern handling - Docs: SERVER-HANDBOOK accuracy updates See CHANGELOG.md for details.	2026-01-18 16:57:32 +01:00
The BROKE Cluster Team	53d9cca82d	Release 2.0.4-beta.6: Local workspace workflow + Vision batch processing - Complete local development cycle: clone → repair → run/show/server on workspace paths without HuggingFace round-trips - Vision processing now defaults to safe chunking (one image at a time, prevents OOM + hallucination) - Resumable clone with --force-resume and deterministic temp cache naming - Improved test infrastructure (umbrella marker convention) - 161 Wet Umbrella tests passing including new Vision→Geo pipe integration tests See CHANGELOG.md for complete details.	2026-01-07 17:11:07 +01:00
The BROKE Cluster Team	25609e4dcb	Release 2.0.4-beta.5: Community repair tool + OS-agnostic benchmarking Closes #49 (Mistral Tokenizer Bug) Major features: - Workspace Infrastructure (ADR-018 Phase 0a): Managed workspace detection, provenance metadata, backward compatible with unmanaged workspaces - Convert Operation (ADR-018 Phase 1): `mlxk convert --repair-index` fixes mlx-vlm #624 affected models (7+ models including Qwen2.5-VL, gemma-3) - Resumable Pull: Auto-detect partial downloads with `--force-resume` - Wet Umbrella Test Integration: Single entry point for all real model tests Fixes: - #49: BPE space markers now correctly converted (Mistral-family models) - Vision Portfolio Discovery: Filter by capabilities instead of model_type - Memory Cleanup Hook: Triggers for both live_e2e and wet markers Test suite: 528 passed, 60 skipped (Python 3.9-3.14)	2025-12-31 16:05:18 +01:00
The BROKE Cluster Team	d3f7d091bc	Release 2.0.4-beta.3: Dependency compatibility + Documentation Bugfixes and compatibility improvements. No new features. Core fixes: - Framework detection for web API models (Issue #48) - Video-only model filtering from vision capability - Page size detection for memory metrics (macOS) - Model switch log timing (after load completion) Compatibility: - hub 1.x + transformers 5.0 support - Python 3.9-3.14 verified (494 tests passing) Testing infrastructure: - Benchmark schema v0.2.0 (hardware profiling, system health) - Benchmark template v1.0 (automated JSONL→Markdown reports) - Memory timeline visualization (memplot.py) - Unified model filter (build_model_object single source) Documentation: - Multi-Modal Support section in README (Vision subsection) - JSON API 0.1.5-0.1.6 marked Stable - Vision promoted from alpha to beta status - Removed conceptual drift and outdated references See CHANGELOG.md for complete details.	2025-12-23 12:19:04 +01:00
The BROKE Cluster Team	86f669dc82	Release 2.0.4-beta.1: Vision + Pipes + Memory - Vision Support (Issue #45): CLI + Server with OpenAI-compatible image API, EXIF metadata - Unix Pipes (ADR-014): stdin support, isatty detection, SIGPIPE handling - Memory-Aware Loading (ADR-016): Pre-load checks with >70% RAM warnings - Python 3.9-3.14: Full compatibility verified (476-485 tests passing) - Fixed: --log-json regression (Issue #44), Vision multimodal history filtering See CHANGELOG.md for complete details.	2025-12-16 19:35:30 +01:00
The BROKE Cluster Team	05f1c30486	Release 2.0.3: Foundation for pipes Foundation release for Unix pipe integration with stderr separation, benchmark infrastructure, and reasoning control improvements. Breaking Changes: - stdout/stderr separation (Issue #43) - errors to stderr in human mode - JSON mode unchanged (all output to stdout) Features: - Benchmark reporting infrastructure (ADR-013 Phase 0) - --no-reasoning flag (Issue #40 partial - GPT-OSS/QwQ only) - Interactive mode reasoning control (review_report.md fixes) Bug Fixes: - huggingface-hub 1.x incompatibility (critical dependency fix) - Streaming parity tests refactored (Portfolio Discovery) Testing: - 308 tests passing (Python 3.9-3.13) - 35 skipped (opt-in live tests) - 79/91 E2E tests passing with HF_HOME See CHANGELOG.md for complete details and migration guide.	2025-11-17 22:54:06 +01:00
The BROKE Cluster Team	d32d3185dd	Release 2.0.2: Test infrastructure hardening & empirical validation Stable release completing Issue #32 recovery plan - all tests passing. Bug Fixes: - Test collection regression (E2E suite parametrization) - Stop token ordering (batch + streaming modes) - E2E test temperature flakiness (deterministic sampling) - Web API framework detection (PR #42 by @limey, fixes #41) - E2E test marker fix (show_model_portfolio diagnostics) Architecture: - mlx-lm API evaluation: Keep manual text-based implementation - Stop token workarounds: All 3 validated (Phi-3, DeepSeek-R1, GPT-oss) Testing: - Portfolio Discovery: 73/81 tests, 17 models, 0 failures - E2E infrastructure hardened (TOKENIZERS, polling, gc.collect()) - Multi-Python validation: 3.9-3.13 passing Documentation: - ADR-009 Outstanding Work completed + Implementation Plan removed - TESTING-DETAILS.md: Portfolio Discovery + E2E Architecture updated - CHANGELOG.md: Complete 2.0.2 stable release notes	2025-11-15 22:10:08 +01:00

12 Commits