Release 2.0.4-beta.1: Vision + Pipes + Memory

- Vision Support (Issue #45): CLI + Server with OpenAI-compatible image API, EXIF metadata - Unix Pipes (ADR-014): stdin support, isatty detection, SIGPIPE handling - Memory-Aware Loading (ADR-016): Pre-load checks with >70% RAM warnings - Python 3.9-3.14: Full compatibility verified (476-485 tests passing) - Fixed: --log-json regression (Issue #44), Vision multimodal history filtering See CHANGELOG.md for complete details.
2026-06-30 20:48:03 -04:00 · 2025-12-16 19:35:30 +01:00
parent 05f1c30486
commit 86f669dc82
79 changed files with 11667 additions and 1141 deletions
@@ -4,9 +4,9 @@
  <img src="https://github.com/mzau/mlx-knife/raw/main/mlxk-demo.gif" alt="MLX Knife Demo" width="900">
 </p>

-**Current Stable Version: 2.0.3**
+**Current Version: 2.0.4-beta.1** (Stable: 2.0.3)

-[![GitHub Release](https://img.shields.io/badge/version-2.0.3-green.svg)](https://github.com/mzau/mlx-knife/releases)
+[![GitHub Release](https://img.shields.io/badge/version-2.0.4--beta.1-blue.svg)](https://github.com/mzau/mlx-knife/releases)
 [![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)
 [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
 [![Apple Silicon](https://img.shields.io/badge/Apple%20Silicon-green.svg)](https://support.apple.com/en-us/HT211814)
@@ -20,20 +20,56 @@
 - **Model Information**: Detailed model metadata including quantization info
 - **Download Models**: Pull models from HuggingFace with progress tracking
 - **Run Models**: Native MLX execution with streaming and chat modes
+- **Vision Models**: Image analysis (Python 3.10+, alpha)
+- **Unix Pipes**: Chain models via stdin/stdout - no temp files (beta)
 - **Health Checks**: Verify model integrity and MLX runtime compatibility
 - **Cache Management**: Clean up and organize your model storage
 - **Privacy & Network**: No background network or telemetry; only explicit Hugging Face interactions when you run pull or the experimental push.

+### Unix Pipe Integration (Beta, 2.0.4)
+Chain models with standard Unix pipes - no temp files needed:
+```bash
+export MLXK2_ENABLE_PIPES=1
+
+# Model chaining
+cat article.txt | mlx-run translator_model - | mlx-run summarizer_model - "3 bullets"
+
+# Works with Unix tools
+mlx-run chat_model "explain quicksort" | tee explanation.txt | head -20
+```
+Robust handling of SIGPIPE and early pipe termination (`| head`, `| grep -m1`).
+
 ### Requirements
 - macOS with Apple Silicon
 - Python 3.9+ (native macOS version or newer)
 - 8GB+ RAM recommended + RAM to run LLM

+## ⚖️ Model Usage and Licenses
+
+`mlx-knife` is a **tooling layer** for running ML models (e.g. from Hugging Face) locally.
+The project does **not** distribute any model weights and does **not** decide which models you use or how you use them.
+
+Please note:
+
+- Each model (weights, tokenizer, configuration, etc.) is governed by its **own license**.
+- When `mlx-knife` downloads a model from a third-party service (e.g. Hugging Face), it does so **on your behalf**.
+- **You** are responsible for:
+  - reading and understanding the license of each model you use,
+  - complying with any restrictions (e.g. *Non-Commercial*, *Research Only*, RAIL, etc.),
+  - ensuring that your use of a given model (private, research, commercial, on-prem services, etc.) is legally permitted.
+
+The `mlx-knife` source code itself is provided under the open-source license specified in this repository.
+This license applies **only** to the `mlx-knife` code and **does not extend** to any external models.
+
+> This is not legal advice. Always refer to the original model license text and, if necessary, seek professional legal counsel.
+
 ### Python Compatibility
 MLX Knife has been comprehensively tested and verified on:

-✅ **Python 3.9.6** (native macOS) - Primary target
-✅ **Python 3.10-3.13** - Fully compatible
+✅ **Python 3.9.6 - 3.14** - Text LLMs fully supported (mlx-lm 0.28.4+)
+✅ **Python 3.10 - 3.14** - Vision models supported (mlx-vlm 0.3.9+)
+
+**Note:** Vision features require Python 3.10+. Native macOS Python 3.9.6 users need to upgrade (e.g., via Homebrew).



@@ -46,7 +82,7 @@ MLX Knife has been comprehensively tested and verified on:
 pip install mlx-knife

 # Verify installation
-mlxk --version  # → mlxk 2.0.3
+mlxk --version  # → mlxk 2.0.3 (stable) or 2.0.4-beta.1 (dev)
 ```

 ### Development Installation
@@ -60,7 +96,7 @@ cd mlx-knife
 pip install -e ".[dev,test]"

 # Verify installation
-mlxk --version  # → mlxk 2.0.3
+mlxk --version  # → mlxk 2.0.4-beta.1

 # Run tests and quality checks (before committing)
 pytest -v
@@ -99,6 +135,26 @@ mlxk run "Phi-3-mini" -c
 mlxk serve --port 8080
 ```

+## Web Interface
+
+For a web-based chat UI, use **[nChat](https://github.com/mzau/broke-nchat)** - a lightweight web interface for the BROKE ecosystem:
+
+```bash
+# Clone once (local setup):
+git clone https://github.com/mzau/broke-nchat.git
+cd broke-nchat
+
+# Start mlx-knife server:
+mlxk serve
+
+# Open web UI:
+open index.html
+```
+
+**On-Prem:** Pure HTML/CSS/JS - runs entirely locally, zero dependencies.
+
+**Note:** nChat is a separate project designed for the entire BROKE ecosystem (MLX Knife + BROKE Cluster). See [nChat README](https://github.com/mzau/broke-nchat/blob/main/README.md) for CORS configuration.
+

 ## Commands

@@ -113,6 +169,7 @@ mlxk serve --port 8080
 | `rm` | Model deletion with lock cleanup and fuzzy matching |
 | 🔒 `push` | **Alpha feature** - Upload to HuggingFace Hub; requires `MLXK2_ENABLE_ALPHA_FEATURES=1` |
 | 🔒 `clone` | **Alpha feature** - Model workspace cloning; requires `MLXK2_ENABLE_ALPHA_FEATURES=1` |
+| 🔒 `pipe mode` | **Beta feature** - Unix pipes with `mlxk run <model> - ...`; requires `MLXK2_ENABLE_PIPES=1` |



@@ -144,6 +201,28 @@ mlxk show "Phi-3-mini" --json | jq '.data.model'

 ### Examples

+#### Pipe mode (Alpha: set `MLXK2_ENABLE_PIPES=1`)
+
+```bash
+# Read prompt from stdin and append trailing text (auto batch in pipes)
+echo "from stdin" | MLXK2_ENABLE_PIPES=1 mlxk run "<model>" - "append extra context"
+
+# JSON interactive guard (no prompt) emits JSON error on stdout, exit!=0
+MLXK2_ENABLE_PIPES=1 mlxk run "<model>" --json
+
+# Pipe list JSON into run for summarization
+MLXK2_ENABLE_PIPES=1 mlxk list --json \
+  | MLXK2_ENABLE_PIPES=1 mlxk run "<model>" - "Summarize the model list as a concise table."
+
+# Shortcut wrapper (same semantics)
+MLXK2_ENABLE_PIPES=1 mlx-run "<model>" - "translate into german" < README.md
+```
+
+Notes:
+- Stdin requires `MLXK2_ENABLE_PIPES=1` (alpha gate). Without it, `-` is rejected.
+- When stdout is a pipe (non-TTY), streaming is disabled automatically to keep clean output.
+- Use full model IDs in place of `<model>`; HF_HOME should point to your cache for live runs.
+
 #### List Models
 ```bash
 mlxk list --json
@@ -425,10 +504,122 @@ Using token hf_AbCdEfGhIjKlMnOpQrStUvWxYz123456 from /Users/john/models
 Using token [REDACTED_TOKEN] from ~/models
 ```

-### Environment Variables

- `MLXK2_LOG_JSON=1`: Enable JSON log format (alternative to `--log-json` flag)
- `MLXK2_LOG_LEVEL`: Override log level (used internally for subprocess mode)
+## Configuration Reference
+
+MLX Knife supports comprehensive runtime configuration via environment variables. All settings can be controlled without code changes.
+
+### Feature Gates
+
+Enable experimental and alpha features:
+
+| Variable | Description | Default | Since |
+|----------|-------------|---------|-------|
+| `MLXK2_ENABLE_ALPHA_FEATURES` | Enable alpha commands (`clone`, `push`) | `0` (disabled) | 2.0.0 |
+| `MLXK2_ENABLE_PIPES` | Enable Unix pipe integration (`mlxk run <model> -`) | `0` (disabled) | 2.0.4 |
+| `MLXK2_EXIF_METADATA` | Extract EXIF metadata from images (Vision models) | `1` (enabled) | 2.0.4 |
+
+**Examples:**
+```bash
+# Enable pipe mode for stdin processing
+export MLXK2_ENABLE_PIPES=1
+echo "Hello" | mlxk run model - "translate to Spanish"
+
+# Disable EXIF extraction for privacy (enabled by default)
+export MLXK2_EXIF_METADATA=0
+mlxk run vision-model --image photo.jpg "describe this"
+
+# Enable alpha features for development
+export MLXK2_ENABLE_ALPHA_FEATURES=1
+mlxk clone model-name ./workspace
+mlxk push ./workspace org/model --private --create
+```
+
+### Server Configuration
+
+Control server behavior without command-line flags:
+
+| Variable | Description | Default | Since |
+|----------|-------------|---------|-------|
+| `MLXK2_HOST` | Server bind address | `127.0.0.1` | 2.0.0 |
+| `MLXK2_PORT` | Server port | `8000` | 2.0.0 |
+| `MLXK2_PRELOAD_MODEL` | Model to load at startup (set by `--model` flag) | (none) | 2.0.0-beta |
+| `MLXK2_MAX_TOKENS` | Override default max_tokens for all requests | (auto) | 2.0.4 |
+| `MLXK2_RELOAD` | Enable Uvicorn auto-reload (development only) | `0` (disabled) | 2.0.0 |
+
+**Examples:**
+```bash
+# Custom host/port binding
+MLXK2_HOST=0.0.0.0 MLXK2_PORT=9000 mlxk serve
+
+# Preload model for faster first request
+MLXK2_PRELOAD_MODEL="mlx-community/Qwen2.5-3B-Instruct-4bit" mlxk serve
+
+# Override max_tokens for all requests
+MLXK2_MAX_TOKENS=4096 mlxk serve
+
+# Development mode with auto-reload
+MLXK2_RELOAD=1 mlxk serve
+```
+
+### Logging Configuration
+
+Control log output format and verbosity:
+
+| Variable | Description | Default | Since |
+|----------|-------------|---------|-------|
+| `MLXK2_LOG_JSON` | Enable JSON log format | `0` (text) | 2.0.0 |
+| `MLXK2_LOG_LEVEL` | Log level (`debug`, `info`, `warning`, `error`) | `info` | 2.0.0 |
+
+**Examples:**
+```bash
+# JSON logs for log aggregation tools
+MLXK2_LOG_JSON=1 mlxk serve
+
+# Quiet mode (warnings and errors only)
+MLXK2_LOG_LEVEL=warning mlxk serve
+
+# Verbose debug output
+MLXK2_LOG_LEVEL=debug mlxk serve
+```
+
+**Note:** CLI flags (`--log-json`, `--log-level`) take precedence over environment variables.
+
+### HuggingFace Integration
+
+Control HuggingFace Hub authentication and cache:
+
+| Variable | Description | Default | Since |
+|----------|-------------|---------|-------|
+| `HF_HOME` | HuggingFace cache directory | `~/.cache/huggingface` | N/A |
+| `HF_TOKEN` | HuggingFace API token (for private models, `push`) | (none) | N/A |
+| `HUGGINGFACE_HUB_TOKEN` | Alternative token variable (fallback) | (none) | N/A |
+
+**Examples:**
+```bash
+# Custom cache location
+HF_HOME=/data/models mlxk list
+
+# Authentication for private models
+HF_TOKEN=hf_... mlxk pull org/private-model
+
+# Upload to HuggingFace Hub (requires MLXK2_ENABLE_ALPHA_FEATURES=1)
+HF_TOKEN=hf_... mlxk push ./workspace org/model --private
+```
+
+### Configuration Priority
+
+When multiple sources define the same setting, precedence order is:
+
+1. **CLI flags** (highest priority) - e.g., `--log-json`, `--port`
+2. **Environment variables** - e.g., `MLXK2_LOG_JSON=1`
+3. **Defaults** (lowest priority) - documented above
+
+**Example:**
+```bash
+# CLI flag wins over environment variable
+MLXK2_PORT=9000 mlxk serve --port 8080  # Uses port 8080, not 9000
+```


 ## HuggingFace Cache Safety
@@ -455,7 +646,7 @@ mlxk health --json | jq '.data.summary'
 ```


-## Hidden Alpha Features: `clone` and `push`
+## Hidden Alpha Features: `clone`, `push`, and pipe mode

 ### `clone` - Model Workspace Creation

@@ -505,6 +696,42 @@ mlxk push --private ./workspace org/model --create --commit "init"

 These features are not final and may change or be removed in future releases.

+### `pipe mode` - stdin for `run` (beta, `mlx-run` shorthand)
+
+Pipe mode is beta (feature complete) and requires `MLXK2_ENABLE_PIPES=1`. It lets `mlxk run` (and `mlx-run`) read stdin when you pass `-` as the prompt.
+
+- Gate: `MLXK2_ENABLE_PIPES=1` (will become default in a future stable release).
+- Auto-batch: When stdout is a pipe (non-TTY), streaming is disabled automatically for clean output.
+- Robust: Handles SIGPIPE and BrokenPipeError gracefully (`| head`, `| grep -m1` work correctly).
+- Scope: Applies to `mlxk run` and `mlx-run`; other commands unchanged.
+- Usage examples (replace `<model>` with a cached MLX chat model):
+
+```bash
+# stdin + trailing text (batch when piped)
+MLXK2_ENABLE_PIPES=1 echo "from stdin" | mlxk run "<model>" - "append extra context"
+
+# JSON interactive guard (no prompt) → JSON error on stdout, exit 1
+MLXK2_ENABLE_PIPES=1 mlxk run "<model>" --json
+
+# list → run summarization
+MLXK2_ENABLE_PIPES=1 mlxk list --json \
+  | MLXK2_ENABLE_PIPES=1 mlxk run "<model>" - "Summarize the model list as a concise table."
+
+# Wrapper shorthand
+MLXK2_ENABLE_PIPES=1 mlx-run "<model>" - "translate into german" < README.md
+```
+
+Pipe mode API is stable.
+
+### `vision` - mlx-vlm (Python 3.10+, non-streaming)
+
+- Install extras: `pip install -e .[vision]` (pulls `mlx-vlm` from GitHub, Python 3.10+).
+- Backend: Uses `mlx-vlm` (vision); streaming is disabled for vision runs.
+- Usage:
+  - Text-only on a vision model: `mlxk run "mlx-community/Llama-3.2-11B-Vision-Instruct-4bit" "what is 2+2"`
+  - Image + text: `mlxk run "<vision-model>" --image cat.jpg "describe the cat"`
+  - Image-only (auto prompt): `mlxk run "<vision-model>" --image cat.jpg`
+

 ## Testing

@@ -544,7 +771,7 @@ This branch follows the established MLX-Knife development patterns:

 ```bash
 # Run quality checks
-python test-multi-python.sh  # Tests across Python 3.9-3.13
+python test-multi-python.sh  # Tests across Python 3.9-3.14
 ./run_linting.sh             # Code quality validation

 # Key files:
@@ -580,6 +807,7 @@ Apache License 2.0 — see `LICENSE` (root) and `mlxk2/NOTICE`.

 <p align="center">
  <b>Made with ❤️ by The BROKE team <img src="broke-logo.png" alt="BROKE Logo" width="30" align="middle"></b><br>
-  <i>Version 2.0.3 | November 2025</i><br>
-  <a href="https://github.com/mzau/broke-cluster">🔮 Next: BROKE Cluster for multi-node deployments</a>
+  <i>Version 2.0.4-beta.1 | December 2025</i><br>
+  <a href="https://github.com/mzau/broke-nchat">💬 Web UI: nChat - lightweight chat interface</a> •
+  <a href="https://github.com/mzau/broke-cluster">🔮 Multi-node: BROKE Cluster</a>
 </p>