Release 2.0.0: Full rewrite with Apache 2.0 license

MLX Knife 2.0 replaces 1.x as the primary version. Highlights: - Full 1.x feature parity (list, show, pull, rm, run, server, health) - JSON API for automation (--json flag) - Enhanced error handling and logging - Runtime compatibility checks - Improved stop token detection - License: MIT→Apache 2.0 Breaking changes: - mlxk rm: requires --force flag for models with active locks Migration guide: MIGRATION.md Changelog: CHANGELOG.md Testing: 297/317 tests passed, Python 3.9-3.13 verified Merge branch 'feature/2.0.0-alpha.1'
2026-06-30 20:48:03 -04:00 · 2025-11-06 15:34:04 +01:00
parent d07bf66990 ae80bbe554
commit a2ca3da2d9
147 changed files with 24523 additions and 10847 deletions
@@ -1,28 +1,17 @@
-# <img src="https://github.com/mzau/mlx-knife/raw/main/broke-logo.png" alt="BROKE Logo" width="60" style="vertical-align: middle;"> MLX Knife
+# <img src="https://github.com/mzau/mlx-knife/raw/main/broke-logo.png" alt="BROKE Logo" width="60" style="vertical-align: middle;"> MLX-Knife 2.0

 <p align="center">
-  <img src="https://github.com/mzau/mlx-knife/raw/main/mlxk-demo.gif" alt="MLX Knife Demo" width="1000">
+  <img src="https://github.com/mzau/mlx-knife/raw/main/mlxk-demo.gif" alt="MLX Knife Demo" width="900">
 </p>

-A lightweight, ollama-like CLI for managing and running MLX models on Apple Silicon. **CLI-only tool designed for personal, local use** - perfect for individual developers and researchers working with MLX models.
+**Current Stable Version: 2.0.0**

-> **Note**: MLX Knife is designed as a command-line interface tool only. While some internal functions are accessible via Python imports, only CLI usage is officially supported.
-
-**Current Version**: 1.1.1 (September 2025) - **STABLE RELEASE** 🚀
- Features in 1.1.1 — MXFP4 support and GPT-OSS reasoning models:
-  - Full MXFP4 quantization support (MLX ≥0.29.0, MLX-LM ≥0.27.0),
-  - GPT-OSS reasoning model formatting with `--hide-reasoning` flag,
-  - Enhanced quantization display in `show` command,
-  - Tested with `gpt-oss-20b-MXFP4-Q8` from mlx-community.
-  - Details: see CHANGELOG.md. Install with `pip install mlx-knife`.
- **Reliable Test System**: 166/166 tests passing across Python 3.9–3.13  
- **Python 3.9-3.13**: Full compatibility verified across all Python versions
- **Key Issues Resolved**: Issues #21, #22, #23 fixed and thoroughly tested
-
-[![GitHub Release](https://img.shields.io/github/v/release/mzau/mlx-knife)](https://github.com/mzau/mlx-knife/releases)
-[![MLX](https://img.shields.io/badge/MLX-macOS-green.svg)](https://github.com/ml-explore/mlx-lm)
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![GitHub Release](https://img.shields.io/badge/version-2.0.0-green.svg)](https://github.com/mzau/mlx-knife/releases)
+[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)
 [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
+[![Apple Silicon](https://img.shields.io/badge/Apple%20Silicon-green.svg)](https://support.apple.com/en-us/HT211814)
+[![MLX](https://img.shields.io/badge/MLX-Latest-orange.svg)](https://github.com/ml-explore/mlx)
+

 ## Features

@@ -31,309 +20,549 @@ A lightweight, ollama-like CLI for managing and running MLX models on Apple Sili
 - **Model Information**: Detailed model metadata including quantization info
 - **Download Models**: Pull models from HuggingFace with progress tracking
 - **Run Models**: Native MLX execution with streaming and chat modes
- **Health Checks**: Verify model integrity and completeness
+- **Health Checks**: Verify model integrity and MLX runtime compatibility
 - **Cache Management**: Clean up and organize your model storage
-
-### Local Server & Web Interface
- **OpenAI-Compatible API**: Local REST API with `/v1/chat/completions`, `/v1/completions`, `/v1/models`
- **Web Chat Interface**: Built-in HTML chat interface with markdown rendering  
- **Single-User Design**: Optimized for personal use, not multi-user production environments
- **Conversation Context**: Full chat history maintained for follow-up questions
- **Streaming Support**: Real-time token streaming via Server-Sent Events
- **Configurable Limits**: Set default max tokens via `--max-tokens` parameter
- **Model Hot-Swapping**: Switch between models per conversation
- **Tool Integration**: Compatible with OpenAI-compatible clients (Cursor IDE, etc.)
-
-### Run Experience
- **Direct MLX Integration**: Models load and run natively without subprocess overhead
- **Real-time Streaming**: Watch tokens generate with proper spacing and formatting
- **Interactive Chat**: Full conversational mode with history tracking
- **Memory Insights**: See GPU memory usage after model loading and generation
- **Dynamic Stop Tokens**: Automatic detection and filtering of model-specific stop tokens
- **Customizable Generation**: Control temperature, max_tokens, top_p, and repetition penalty
- **Context-Managed Memory**: Context manager pattern ensures automatic cleanup and prevents memory leaks
- **Exception-Safe**: Robust error handling with guaranteed resource cleanup
-
-## Installation
-
-### Via PyPI (Recommended)
-```bash
-pip install mlx-knife
-```
+- **Privacy & Network**: No background network or telemetry; only explicit Hugging Face interactions when you run pull or the experimental push.

 ### Requirements
- macOS with Apple Silicon (M1/M2/M3)
+- macOS with Apple Silicon
 - Python 3.9+ (native macOS version or newer)
 - 8GB+ RAM recommended + RAM to run LLM

 ### Python Compatibility
 MLX Knife has been comprehensively tested and verified on:

-✅ **Python 3.9.6** (native macOS) - Primary target  
-✅ **Python 3.10-3.13** - Fully compatible  
+✅ **Python 3.9.6** (native macOS) - Primary target
+✅ **Python 3.10-3.13** - Fully compatible

-All versions include full MLX model execution testing with real models.

-### Install from Source
+
+## Installation
+
+### Via PyPI (Recommended)

 ```bash
-# Clone the repository
+# Install stable release from PyPI
+pip install mlx-knife
+
+# Verify installation
+mlxk --version  # → mlxk 2.0.0
+```
+
+### Development Installation
+
+```bash
+# Clone and install from source
 git clone https://github.com/mzau/mlx-knife.git
 cd mlx-knife

-# Install in development mode
-pip install -e .
-
-# Or install normally
-pip install .
-
-# Install with development tools (ruff, mypy, tests)
+# Install with all development dependencies (required for testing and code quality)
 pip install -e ".[dev,test]"
+
+# Verify installation
+mlxk --version  # → mlxk 2.0.0
+
+# Run tests and quality checks (before committing)
+pytest -v
+ruff check mlxk2/ --fix
+mypy mlxk2/
 ```

-### Install Dependencies Only
+**Note:** For minimal user installation without dev tools: `pip install -e .`
+
+### Migrating from 1.x
+
+If you're upgrading from MLX Knife 1.x, see [MIGRATION.md](MIGRATION.md) for important information about the license change (MIT → Apache 2.0) and behavior changes.

-```bash
-pip install -r requirements.txt
-```

 ## Quick Start

-### CLI Usage
 ```bash
-# List all MLX models in your cache
+# List models (human-readable)
 mlxk list
+mlxk list --health
+mlxk list --verbose --health

-# Show detailed info about a model
-mlxk show Phi-3-mini-4k-instruct-4bit
-
-# Download a new model
-mlxk pull mlx-community/Mistral-7B-Instruct-v0.3-4bit
-
-# Run a model with a prompt
-mlxk run Phi-3-mini "What is the capital of France?"
-
-# GPT-OSS reasoning model with formatted output
-mlxk run gpt-oss-20b-MXFP4-Q8 "Explain quantum computing"
-
-# Hide reasoning steps, show only final answer (GPT-OSS models)
-mlxk run gpt-oss-20b-MXFP4-Q8 "What is 2+2?" --hide-reasoning
-
-# Start interactive chat
-mlxk run Phi-3-mini
-
-# Check model health
+# Check cache health
 mlxk health
+
+# Show model details
+mlxk show "mlx-community/Phi-3-mini-4k-instruct-4bit"
+
+# Pull a model
+mlxk pull "mlx-community/Llama-3.2-3B-Instruct-4bit"
+
+# Run interactive chat
+mlxk run "Phi-3-mini" -c
+
+# Start OpenAI-compatible server
+mlxk serve --port 8080
 ```

-### Web Chat Interface

-MLX Knife includes a built-in web interface for easy model interaction:
+## Commands
+
+| Command | Description |
+|---------|-------------|
+| `server`/`serve` | OpenAI-compatible API server; SIGINT-robust (Supervisor); SSE streaming |
+| `run` | Interactive and single-shot model execution with streaming/batch modes |
+| `list` | Model discovery with JSON output |
+| `health` | Corruption detection and cache analysis |
+| `show` | Detailed model information with --files, --config |
+| `pull` | HuggingFace model downloads with corruption detection |
+| `rm` | Model deletion with lock cleanup and fuzzy matching |
+| 🔒 `push` | **Alpha feature** - Upload to HuggingFace Hub; requires `MLXK2_ENABLE_ALPHA_FEATURES=1` |
+| 🔒 `clone` | **Alpha feature** - Model workspace cloning; requires `MLXK2_ENABLE_ALPHA_FEATURES=1` |
+
+
+
+## JSON API
+
+> **📋 Complete API Specification**: See [JSON API Specification](docs/json-api-specification.md) for comprehensive schema, error codes, and examples.
+
+All commands support both human-readable and JSON output (`--json` flag) for automation and scripting, enabling seamless integration with CI/CD pipelines and cluster management systems.
+
+### Command Structure
+
+All commands support JSON output via `--json` flag:

 ```bash
-# Start the OpenAI-compatible API server
-mlxk server --port 8000 --max-tokens 4000
-
-# Get web chat interface from GitHub
-curl -O https://raw.githubusercontent.com/mzau/mlx-knife/main/simple_chat.html
-
-# Open web chat interface in your browser
-open simple_chat.html
+mlxk list --json | jq '.data.models[].name'
+mlxk health --json | jq '.data.summary'
+mlxk show "Phi-3-mini" --json | jq '.data.model'
 ```

-**Features:**
- **No installation required** - Pure HTML/CSS/JS
- **Real-time streaming** - Watch tokens appear as they're generated
- **Model selection** - Choose any MLX model from your cache
- **Conversation history** - Full context for follow-up questions
- **Markdown rendering** - Proper formatting for code, lists, tables
- **Mobile-friendly** - Responsive design works on all devices
+**Response Format:**
+```json
+{
+    "status": "success|error",
+    "command": "list|health|show|pull|rm|clone|version|push|run|server",
+    "data": { /* command-specific data */ },
+    "error": null | { "type": "...", "message": "..." }
+}
+```

-### Local API Server Integration
+### Examples

-The MLX Knife server provides OpenAI-compatible endpoints for **local development and personal use**:
+#### List Models
+```bash
+mlxk list --json
+# Output:
+{
+  "status": "success",
+  "command": "list",
+  "data": {
+    "models": [
+      {
+        "name": "mlx-community/Phi-3-mini-4k-instruct-4bit",
+        "hash": "a5339a41b2e3abcdef1234567890ab12345678ef",
+        "size_bytes": 4613734656,
+        "last_modified": "2024-10-15T08:23:41Z",
+        "framework": "MLX",
+        "model_type": "chat",
+        "capabilities": ["text-generation", "chat"],
+        "health": "healthy",
+        "runtime_compatible": true,
+        "reason": null,
+        "cached": true
+      }
+    ],
+    "count": 1
+  },
+  "error": null
+}
+```
+
+#### Health Check
+```bash
+mlxk health --json
+# Output:
+{
+  "status": "success",
+  "command": "health",
+  "data": {
+    "healthy": [
+      {
+        "name": "mlx-community/Phi-3-mini-4k-instruct-4bit",
+        "status": "healthy",
+        "reason": "Model is healthy"
+      }
+    ],
+    "unhealthy": [],
+    "summary": { "total": 1, "healthy_count": 1, "unhealthy_count": 0 }
+  },
+  "error": null
+}
+```
+
+#### Show Model Details
+```bash
+mlxk show "Phi-3-mini" --json --files
+# Output (simplified):
+{
+  "status": "success",
+  "command": "show",
+  "data": {
+    "model": {
+      "name": "mlx-community/Phi-3-mini-4k-instruct-4bit",
+      "hash": "a5339a41b2e3abcdefgh1234567890ab12345678",
+      "size_bytes": 4613734656,
+      "framework": "MLX",
+      "model_type": "chat",
+      "capabilities": ["text-generation", "chat"],
+      "last_modified": "2024-10-15T08:23:41Z",
+      "health": "healthy",
+      "runtime_compatible": true,
+      "reason": null,
+      "cached": true
+    },
+    "files": [
+      {"name": "config.json", "size": "1.2KB", "type": "config"},
+      {"name": "model.safetensors", "size": "2.3GB", "type": "weights"}
+    ],
+    "metadata": null
+  },
+  "error": null
+}
+```
+
+### Hash Syntax Support
+
+All commands support `@hash` syntax for specific model versions:

 ```bash
-# Start local server (single-user, no authentication)
-mlxk server --host 127.0.0.1 --port 8000
-
-# Test with curl
-curl -X POST "http://localhost:8000/v1/chat/completions" \
-  -H "Content-Type: application/json" \
-  -d '{"model": "Phi-3-mini-4k-instruct-4bit", "messages": [{"role": "user", "content": "Hello!"}]}'
-
-# Integration with development tools (community-tested):
-# - Cursor IDE: Set API URL to http://localhost:8000/v1
-# - LibreChat: Configure as custom OpenAI endpoint  
-# - Open WebUI: Add as local OpenAI-compatible API
-# - SillyTavern: Add as OpenAI API with custom URL
+mlxk health "Qwen3@e96" --json     # Check specific hash
+mlxk show "model@3df9bfd" --json   # Short hash matching
+mlxk rm "Phi-3@e967" --json --force  # Delete specific version
 ```

-**Note**: Tool integrations are community-tested. Some tools may require specific configuration or have compatibility limitations. Please report issues via GitHub.
+### Integration Examples

-## Command Reference
-
-### Available Commands
-
-#### `list` - Browse Models
+#### Broke-Cluster Integration
 ```bash
-mlxk list                    # Show chat-capable MLX models (strict view)
-mlxk list --verbose          # Show MLX models with full paths
-mlxk list --all              # Show all models with framework and TYPE
-mlxk list --all --verbose    # All models with full paths
-mlxk list --health           # Include health status
-mlxk list Phi-3              # Filter by model name
-mlxk list --verbose Phi-3    # Show detailed info (same as show)
+# Get available model names for scheduling
+MODELS=$(mlxk list --json | jq -r '.data.models[].name')
+
+# Check cache health before deployment
+HEALTH=$(mlxk health --json | jq '.data.summary.healthy_count')
+if [ "$HEALTH" -eq 0 ]; then
+    echo "No healthy models available"
+    exit 1
+fi
+
+# Download required models
+mlxk pull "mlx-community/Phi-3-mini-4k-instruct-4bit" --json
 ```

-#### `show` - Model Details
+#### CI/CD Pipeline Usage
 ```bash
-mlxk show <model>            # Display model information
-mlxk show <model> --files    # Include file listing
-mlxk show <model> --config   # Show config.json content
+# Verify model integrity in CI
+mlxk health --json | jq -e '.data.summary.unhealthy_count == 0'
+
+# Clean up CI artifacts
+mlxk rm "test-model-*" --json --force
+
+# Pre-warm cache for deployment
+mlxk pull "production-model" --json
 ```

-#### `pull` - Download Models
+#### Model Management Automation
 ```bash
-mlxk pull <model>            # Download from HuggingFace
-mlxk pull <org>/<model>      # Full model path
+# Find models by pattern
+LARGE_MODELS=$(mlxk list --json | jq -r '.data.models[] | select(.name | contains("30B")) | .name')
+
+# Show detailed info for analysis
+for model in $LARGE_MODELS; do
+    mlxk show "$model" --json --config | jq '.data.model_config'
+done
 ```

-#### `run` - Execute Models
-```bash
-mlxk run <model> "prompt"              # Single prompt (minimal output)
-mlxk run <model> "prompt" --verbose    # Show loading, memory, and stats
-mlxk run <model>                       # Interactive chat
-mlxk run <model> "prompt" --no-stream  # Batch output
-mlxk run <model> --max-tokens 1000     # Custom length
-mlxk run <model> --temperature 0.9     # Higher creativity
-mlxk run <model> --no-chat-template    # Raw completion mode
-mlxk run <model> --hide-reasoning      # Hide reasoning (GPT-OSS models only)
-```

-#### `rm` - Remove Models
-```bash
-mlxk rm <model>              # Delete model with cache cleanup confirmation  
-mlxk rm <model>@<hash>       # Delete specific version (removes entire model)
-mlxk rm <model> --force      # Skip confirmations, auto-cleanup cache files
-```
+## Human Output

-**Features:**
- Removes entire model directory (not just snapshots)
- Cleans up orphaned HuggingFace lock files  
- Handles corrupted models gracefully
- Smart prompting (only asks about cache cleanup if needed)
+MLX Knife provides rich human-readable output by default (without `--json` flag).

-#### `health` - Check Integrity
-```bash
-mlxk health                  # Check all models
-mlxk health <model>          # Check specific model
-```
-
-#### `server` - Start API Server
-```bash
-mlxk server                           # Start on localhost:8000
-mlxk server --port 8001               # Custom port
-mlxk server --host 0.0.0.0 --port 8000  # Allow external access
-mlxk server --max-tokens 4000         # Set default max tokens (default: 2000)
-mlxk server --reload                  # Development mode with auto-reload
-```
-
-### Command Aliases
-After installation, these commands are equivalent:
- `mlxk` (recommended)
- `mlx-knife`
- `mlx_knife`
-
-## Configuration
-
-### Cache Location
-By default, models are stored in `~/.cache/huggingface/hub`. Configure with:
+### Basic Usage

 ```bash
-# Set custom cache location
-export HF_HOME="/path/to/your/cache"
-
-# Example: External SSD
-export HF_HOME="/Volumes/ExternalSSD/models"
+mlxk list
+mlxk list --health
+mlxk health
+mlxk show "mlx-community/Phi-3-mini-4k-instruct-4bit"
 ```

-### Model Name Expansion
-Short names are automatically expanded for MLX models:
- `Phi-3-mini-4k-instruct-4bit` → `mlx-community/Phi-3-mini-4k-instruct-4bit`
- Models already containing `/` are used as-is
+### List Filters

-## Advanced Usage
+- `list`: Shows MLX chat models only (compact names, safe default)
+- `list --verbose`: Shows all MLX models (chat + base) with full org/names and Framework column
+- `list --all`: Shows all frameworks (MLX, GGUF, PyTorch)
+- Flags are combinable: `--all --verbose`, `--all --health`, `--verbose --health`

-### Generation Parameters
+### Health Status Display (--health flag)
+
+The `--health` flag adds health status information to the output:
+
+**Compact mode** (default, `--all`):
+- Shows single "Health" column with values:
+  - `healthy` - File integrity OK and MLX runtime compatible
+  - `healthy*` - File integrity OK but not MLX runtime compatible (use `--verbose` for details)
+  - `unhealthy` - File integrity failed or unknown format
+
+**Verbose mode** (`--verbose --health`):
+- Splits into "Integrity" and "Runtime" columns:
+  - **Integrity:** `healthy` / `unhealthy`
+  - **Runtime:** `yes` / `no` / `-` (dash = gate blocked by failed integrity)
+  - **Reason:** Explanation when problems detected (wrapped at 26 chars for readability)
+
+**Examples:**

 ```bash
-# Creative writing (high temperature, diverse output)
-mlxk run Mistral-7B "Write a story" --temperature 0.9 --top-p 0.95
+# Compact health view
+mlxk list --health
+# Output:
+# Name                    | Hash    | Size   | Modified | Type | Health
+# Llama-3.2-3B-Instruct   | a1b2c3d | 2.1GB  | 2d ago   | chat | healthy
+# Qwen2-7B-Instruct       | 1a2b3c4 | 4.8GB  | 3d ago   | chat | healthy*

-# Precise tasks (low temperature, focused output)
-mlxk run Phi-3-mini "Extract key points" --temperature 0.3 --top-p 0.9
+# Verbose health view with details
+mlxk list --verbose --health
+# Output:
+# Name                    | Hash    | Size   | Modified | Framework | Type | Integrity | Runtime | Reason
+# Llama-3.2-3B-Instruct   | a1b2c3d | 2.1GB  | 2d ago   | MLX       | chat | healthy   | yes     | -
+# Qwen2-7B-Instruct       | 1a2b3c4 | 4.8GB  | 3d ago   | PyTorch   | chat | healthy   | no      | Incompatible: PyTorch

-# Long-form generation
-mlxk run Mixtral-8x7B "Explain quantum computing" --max-tokens 2000
-
-# Reduce repetition
-mlxk run model "prompt" --repetition-penalty 1.2
+# All frameworks with health status
+mlxk list --all --health
+# Output:
+# Name                    | Hash    | Size   | Modified | Framework | Type    | Health
+# Llama-3.2-3B-Instruct   | a1b2c3d | 2.1GB  | 2d ago   | MLX       | chat    | healthy
+# llama-3.2-gguf-q4       | b2c3d4e | 1.8GB  | 3d ago   | GGUF      | unknown | healthy*
+# broken-download         | -       | 500MB  | 1h ago   | Unknown   | unknown | unhealthy
 ```

-### Working with Specific Commits
+**Design Philosophy:**
+- `unhealthy` is a catch-all for anything not understood/supported (broken downloads, unknown formats, creative HuggingFace structures)
+- `healthy` guarantees the model will work with `mlxk2 run`
+- `healthy*` means files are intact but MLX runtime can't execute them (e.g., GGUF/PyTorch models, incompatible model_type, or mlx-lm version too old)
+
+Note: JSON output is unaffected by these human-only filters and always includes full health/runtime data.
+
+
+## Logging & Debugging
+
+MLX Knife 2.0 provides structured logging with configurable output formats and levels.
+
+### Log Levels
+
+Control verbosity with `--log-level` (server mode):

 ```bash
-# Use specific model version
-mlxk show model@commit_hash
-mlxk run model@commit_hash "prompt"
+# Default: Show startup, model loading, and errors
+mlxk serve --log-level info
+
+# Quiet: Only warnings and errors
+mlxk serve --log-level warning
+
+# Silent: Only errors
+mlxk serve --log-level error
+
+# Verbose: All logs including HTTP requests
+mlxk serve --log-level debug
 ```

-### Non-MLX Model Handling
+**Log Level Behavior:**
+- `debug`: All logs + Uvicorn HTTP access logs (`GET /v1/models`, etc.)
+- `info`: Application logs (startup, model switching, errors) + HTTP access logs
+- `warning`: Only warnings and errors (no startup messages, no HTTP access logs)
+- `error`: Only error messages
+
+### JSON Logs (Machine-Readable)
+
+Enable structured JSON output for log aggregation tools:

-The tool automatically detects framework compatibility:
 ```bash
-# Attempting to run PyTorch model
-mlxk run bert-base-uncased
-# Error: Model bert-base-uncased is not MLX-compatible (Framework: PyTorch)!
-# Use MLX-Community models: https://huggingface.co/mlx-community
+# JSON logs (recommended - CLI flag)
+mlxk serve --log-json
+
+# JSON logs (alternative - environment variable)
+MLXK2_LOG_JSON=1 mlxk serve
 ```

-## Troubleshooting
+**Note:** `--log-json` also formats Uvicorn access logs as JSON for consistent output.

-### Model Not Found
+**JSON Format:**
+```json
+{"ts": 1760830072.96, "level": "INFO", "msg": "MLX Knife Server 2.0 starting up..."}
+{"ts": 1760830073.14, "level": "INFO", "msg": "Switching to model: mlx-community/...", "model": "..."}
+{"ts": 1760830074.52, "level": "ERROR", "msg": "Model type bert not supported.", "logger": "root"}
+```
+
+**Fields:**
+- `ts`: Unix timestamp
+- `level`: Log level (INFO, WARN, ERROR, DEBUG)
+- `msg`: Log message (HF tokens and user paths automatically redacted)
+- `logger`: Source logger (`mlxk2` = application, `root` = external libraries like mlx-lm)
+- Additional fields: `model`, `request_id`, `detail`, `duration_ms` (context-dependent)
+
+### Security: Automatic Redaction
+
+**Sensitive data is automatically removed from logs:**
+- HuggingFace tokens (`hf_...`) → `[REDACTED_TOKEN]`
+- User home paths (`/Users/john/...`) → `~/...`
+
+**Example:**
 ```bash
-# If model isn't found, try full path
-mlxk pull mlx-community/Model-Name-4bit
+# Original (unsafe):
+Using token hf_AbCdEfGhIjKlMnOpQrStUvWxYz123456 from /Users/john/models

-# List available models
-mlxk list --all
+# Logged (safe):
+Using token [REDACTED_TOKEN] from ~/models
 ```

-### Performance Issues
- Ensure sufficient RAM for model size
- Close other applications to free memory
- Use smaller quantized models (4-bit recommended)
+### Environment Variables
+
+- `MLXK2_LOG_JSON=1`: Enable JSON log format (alternative to `--log-json` flag)
+- `MLXK2_LOG_LEVEL`: Override log level (used internally for subprocess mode)
+
+
+## HuggingFace Cache Safety
+
+MLX-Knife 2.0 respects standard HuggingFace cache structure and practices:
+
+### Best Practices for Shared Environments
+- **Read operations** (`list`, `health`, `show`) always safe with concurrent processes
+- **Write operations** (`pull`, `rm`) coordinate during maintenance windows
+- **Lock cleanup** automatic but avoid during active downloads
+- **Your responsibility:** Coordinate with team, use good timing
+
+### Example Safe Workflow
+```bash
+# Check what's in cache (always safe)
+mlxk list --json | jq '.data.count'
+
+# Maintenance window - coordinate with team
+mlxk rm "corrupted-model" --json --force
+mlxk pull "replacement-model" --json
+
+# Back to normal operations
+mlxk health --json | jq '.data.summary'
+```
+
+
+## Hidden Alpha Features: `clone` and `push`
+
+### `clone` - Model Workspace Creation
+
+`mlxk clone` is a hidden alpha feature. Enable with `MLXK2_ENABLE_ALPHA_FEATURES=1`. It creates a local workspace from a cached model for modification and development.
+
+- Creates isolated workspace from cached models
+- Supports APFS copy-on-write optimization on same-volume scenarios
+- Includes health check integration for workspace validation
+- Use case: Fork-modify-push workflows
+
+Example:
+```bash
+# Enable alpha features
+export MLXK2_ENABLE_ALPHA_FEATURES=1
+
+# Clone model to workspace
+mlxk clone org/model ./workspace
+```
+
+### `push` - Upload to Hub
+
+`mlxk push` is a hidden alpha feature. Enable with `MLXK2_ENABLE_ALPHA_FEATURES=1`. It uploads a local folder to a Hugging Face model repository using `huggingface_hub/upload_folder`.
+
+- Requires `HF_TOKEN` (write-enabled).
+- Default branch: `main` (explicitly override with `--branch`).
+- Safety: `--private` is required to avoid accidental public uploads.
+- No validation or manifests. Basic hard excludes are applied by default: `.git/**`, `.DS_Store`, `__pycache__/`, common virtualenv folders (`.venv/`, `venv/`), and `*.pyc`.
+- `.hfignore` (gitignore-like) in the workspace is supported and merged with the defaults.
+- Repo creation: use `--create` if the target repo does not exist; harmless on existing repos. Missing branches are created during upload.
+- JSON output: includes `commit_sha`, `commit_url`, `no_changes`, `uploaded_files_count` (when available), `local_files_count` (approx), `change_summary` and a short `message`.
+- Quiet JSON by default: with `--json` (without `--verbose`) progress bars/console logs are suppressed; hub logs are still captured in `data.hf_logs`.
+- Human output: derived from JSON; add `--verbose` to include extras such as the commit URL or a short message variant. JSON schema is unchanged.
+- Local workspace check: use `--check-only` to validate a workspace without uploading. Produces `workspace_health` in JSON (no token/network required).
+- Dry-run planning: use `--dry-run` to compute a plan vs remote without uploading. Returns `dry_run: true`, `dry_run_summary {added, modified:null, deleted}`, and sample `added_files`/`deleted_files`.
+- Testing: see TESTING.md ("Push Testing (2.0)") for offline tests and opt-in live checks with markers/env.
+- Intended for early testers only. Carefully review the result on the Hub after pushing.
+- Responsibility: **You are responsible for complying with Hugging Face Hub policies and applicable laws (e.g., copyright/licensing) for any uploaded content.**
+
+Example:
+```bash
+# Enable alpha features
+export MLXK2_ENABLE_ALPHA_FEATURES=1
+
+# Use push command
+mlxk push --private ./workspace org/model --create --commit "init"
+```
+
+These features are not final and may change or be removed in future releases.
+
+
+## Testing
+
+The 2.0 test suite runs by default (pytest discovery points to `tests_2.0/`):
+
+```bash
+# Run 2.0 tests (default)
+pytest -v
+
+# Explicitly run legacy 1.x tests (not maintained on this branch)
+pytest tests/ -v
+
+# Test categories (2.0 example):
+# - ADR-002 edge cases
+# - Integration scenarios
+# - Model naming logic
+# - Robustness testing
+
+# Current status: all current 2.0 tests pass (some optional schema tests may be skipped without extras)
+```
+
+**Test Architecture:**
+- **Isolated Cache System** - Zero risk to user data
+- **Atomic Context Switching** - Production/test cache separation
+- **Mock Models** - Realistic test scenarios
+- **Edge Case Coverage** - All documented failure modes tested
+
+
+## Compatibility Notes
+
+- Streaming note: Some UIs buffer SSE; verify real-time with `curl -N`. Server sends clear interrupt markers on abort.

-### Streaming Issues
- Some models may have spacing issues - this is handled automatically
- Use `--no-stream` for batch output if needed

 ## Contributing

-Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.
+This branch follows the established MLX-Knife development patterns:

-## Security
+```bash
+# Run quality checks
+python test-multi-python.sh  # Tests across Python 3.9-3.13
+./run_linting.sh             # Code quality validation

-For security concerns, please see [SECURITY.md](SECURITY.md) or contact us at broke@gmx.eu.
+# Key files:
+mlxk2/                       # 2.0.0 implementation
+tests_2.0/                   # 2.0 test suite
+docs/ADR/                    # Architecture decision records
+```
+
+See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
+
+
+## Support & Feedback
+
+- **Issues**: [GitHub Issues](https://github.com/mzau/mlx-knife/issues)
+- **Discussions**: [GitHub Discussions](https://github.com/mzau/mlx-knife/discussions)
+- **API Specification**: [JSON API Specification](docs/json-api-specification.md)
+- **Documentation**: See `docs/` directory for technical details
+- **Security Policy**: See [SECURITY.md](SECURITY.md)

-MLX Knife runs entirely locally - no data is sent to external servers except when downloading models from HuggingFace.

 ## License

-MIT License - see [LICENSE](LICENSE) file for details
+Apache License 2.0 — see `LICENSE` (root) and `mlxk2/NOTICE`.

-Copyright (c) 2025 The BROKE team 🦫

 ## Acknowledgments

@@ -345,6 +574,6 @@ Copyright (c) 2025 The BROKE team 🦫

 <p align="center">
  <b>Made with ❤️ by The BROKE team <img src="broke-logo.png" alt="BROKE Logo" width="30" style="vertical-align: middle;"></b><br>
-  <i>Version 1.1.1 | September 2025</i><br>
+  <i>Version 2.0.0 | November 2025</i><br>
  <a href="https://github.com/mzau/broke-cluster">🔮 Next: BROKE Cluster for multi-node deployments</a>
 </p>