mirror of
https://github.com/cloudstack-llc/mlx-knife.git
synced 2026-06-30 20:48:03 -04:00
Release 2.0.0: Full rewrite with Apache 2.0 license
MLX Knife 2.0 replaces 1.x as the primary version. Highlights: - Full 1.x feature parity (list, show, pull, rm, run, server, health) - JSON API for automation (--json flag) - Enhanced error handling and logging - Runtime compatibility checks - Improved stop token detection - License: MIT→Apache 2.0 Breaking changes: - mlxk rm: requires --force flag for models with active locks Migration guide: MIGRATION.md Changelog: CHANGELOG.md Testing: 297/317 tests passed, Python 3.9-3.13 verified Merge branch 'feature/2.0.0-alpha.1'
This commit is contained in:
@@ -1,28 +1,17 @@
|
||||
# <img src="https://github.com/mzau/mlx-knife/raw/main/broke-logo.png" alt="BROKE Logo" width="60" style="vertical-align: middle;"> MLX Knife
|
||||
# <img src="https://github.com/mzau/mlx-knife/raw/main/broke-logo.png" alt="BROKE Logo" width="60" style="vertical-align: middle;"> MLX-Knife 2.0
|
||||
|
||||
<p align="center">
|
||||
<img src="https://github.com/mzau/mlx-knife/raw/main/mlxk-demo.gif" alt="MLX Knife Demo" width="1000">
|
||||
<img src="https://github.com/mzau/mlx-knife/raw/main/mlxk-demo.gif" alt="MLX Knife Demo" width="900">
|
||||
</p>
|
||||
|
||||
A lightweight, ollama-like CLI for managing and running MLX models on Apple Silicon. **CLI-only tool designed for personal, local use** - perfect for individual developers and researchers working with MLX models.
|
||||
**Current Stable Version: 2.0.0**
|
||||
|
||||
> **Note**: MLX Knife is designed as a command-line interface tool only. While some internal functions are accessible via Python imports, only CLI usage is officially supported.
|
||||
|
||||
**Current Version**: 1.1.1 (September 2025) - **STABLE RELEASE** 🚀
|
||||
- Features in 1.1.1 — MXFP4 support and GPT-OSS reasoning models:
|
||||
- Full MXFP4 quantization support (MLX ≥0.29.0, MLX-LM ≥0.27.0),
|
||||
- GPT-OSS reasoning model formatting with `--hide-reasoning` flag,
|
||||
- Enhanced quantization display in `show` command,
|
||||
- Tested with `gpt-oss-20b-MXFP4-Q8` from mlx-community.
|
||||
- Details: see CHANGELOG.md. Install with `pip install mlx-knife`.
|
||||
- **Reliable Test System**: 166/166 tests passing across Python 3.9–3.13
|
||||
- **Python 3.9-3.13**: Full compatibility verified across all Python versions
|
||||
- **Key Issues Resolved**: Issues #21, #22, #23 fixed and thoroughly tested
|
||||
|
||||
[](https://github.com/mzau/mlx-knife/releases)
|
||||
[](https://github.com/ml-explore/mlx-lm)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://github.com/mzau/mlx-knife/releases)
|
||||
[](https://www.apache.org/licenses/LICENSE-2.0)
|
||||
[](https://www.python.org/downloads/)
|
||||
[](https://support.apple.com/en-us/HT211814)
|
||||
[](https://github.com/ml-explore/mlx)
|
||||
|
||||
|
||||
## Features
|
||||
|
||||
@@ -31,309 +20,549 @@ A lightweight, ollama-like CLI for managing and running MLX models on Apple Sili
|
||||
- **Model Information**: Detailed model metadata including quantization info
|
||||
- **Download Models**: Pull models from HuggingFace with progress tracking
|
||||
- **Run Models**: Native MLX execution with streaming and chat modes
|
||||
- **Health Checks**: Verify model integrity and completeness
|
||||
- **Health Checks**: Verify model integrity and MLX runtime compatibility
|
||||
- **Cache Management**: Clean up and organize your model storage
|
||||
|
||||
### Local Server & Web Interface
|
||||
- **OpenAI-Compatible API**: Local REST API with `/v1/chat/completions`, `/v1/completions`, `/v1/models`
|
||||
- **Web Chat Interface**: Built-in HTML chat interface with markdown rendering
|
||||
- **Single-User Design**: Optimized for personal use, not multi-user production environments
|
||||
- **Conversation Context**: Full chat history maintained for follow-up questions
|
||||
- **Streaming Support**: Real-time token streaming via Server-Sent Events
|
||||
- **Configurable Limits**: Set default max tokens via `--max-tokens` parameter
|
||||
- **Model Hot-Swapping**: Switch between models per conversation
|
||||
- **Tool Integration**: Compatible with OpenAI-compatible clients (Cursor IDE, etc.)
|
||||
|
||||
### Run Experience
|
||||
- **Direct MLX Integration**: Models load and run natively without subprocess overhead
|
||||
- **Real-time Streaming**: Watch tokens generate with proper spacing and formatting
|
||||
- **Interactive Chat**: Full conversational mode with history tracking
|
||||
- **Memory Insights**: See GPU memory usage after model loading and generation
|
||||
- **Dynamic Stop Tokens**: Automatic detection and filtering of model-specific stop tokens
|
||||
- **Customizable Generation**: Control temperature, max_tokens, top_p, and repetition penalty
|
||||
- **Context-Managed Memory**: Context manager pattern ensures automatic cleanup and prevents memory leaks
|
||||
- **Exception-Safe**: Robust error handling with guaranteed resource cleanup
|
||||
|
||||
## Installation
|
||||
|
||||
### Via PyPI (Recommended)
|
||||
```bash
|
||||
pip install mlx-knife
|
||||
```
|
||||
- **Privacy & Network**: No background network or telemetry; only explicit Hugging Face interactions when you run pull or the experimental push.
|
||||
|
||||
### Requirements
|
||||
- macOS with Apple Silicon (M1/M2/M3)
|
||||
- macOS with Apple Silicon
|
||||
- Python 3.9+ (native macOS version or newer)
|
||||
- 8GB+ RAM recommended + RAM to run LLM
|
||||
|
||||
### Python Compatibility
|
||||
MLX Knife has been comprehensively tested and verified on:
|
||||
|
||||
✅ **Python 3.9.6** (native macOS) - Primary target
|
||||
✅ **Python 3.10-3.13** - Fully compatible
|
||||
✅ **Python 3.9.6** (native macOS) - Primary target
|
||||
✅ **Python 3.10-3.13** - Fully compatible
|
||||
|
||||
All versions include full MLX model execution testing with real models.
|
||||
|
||||
### Install from Source
|
||||
|
||||
## Installation
|
||||
|
||||
### Via PyPI (Recommended)
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
# Install stable release from PyPI
|
||||
pip install mlx-knife
|
||||
|
||||
# Verify installation
|
||||
mlxk --version # → mlxk 2.0.0
|
||||
```
|
||||
|
||||
### Development Installation
|
||||
|
||||
```bash
|
||||
# Clone and install from source
|
||||
git clone https://github.com/mzau/mlx-knife.git
|
||||
cd mlx-knife
|
||||
|
||||
# Install in development mode
|
||||
pip install -e .
|
||||
|
||||
# Or install normally
|
||||
pip install .
|
||||
|
||||
# Install with development tools (ruff, mypy, tests)
|
||||
# Install with all development dependencies (required for testing and code quality)
|
||||
pip install -e ".[dev,test]"
|
||||
|
||||
# Verify installation
|
||||
mlxk --version # → mlxk 2.0.0
|
||||
|
||||
# Run tests and quality checks (before committing)
|
||||
pytest -v
|
||||
ruff check mlxk2/ --fix
|
||||
mypy mlxk2/
|
||||
```
|
||||
|
||||
### Install Dependencies Only
|
||||
**Note:** For minimal user installation without dev tools: `pip install -e .`
|
||||
|
||||
### Migrating from 1.x
|
||||
|
||||
If you're upgrading from MLX Knife 1.x, see [MIGRATION.md](MIGRATION.md) for important information about the license change (MIT → Apache 2.0) and behavior changes.
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### CLI Usage
|
||||
```bash
|
||||
# List all MLX models in your cache
|
||||
# List models (human-readable)
|
||||
mlxk list
|
||||
mlxk list --health
|
||||
mlxk list --verbose --health
|
||||
|
||||
# Show detailed info about a model
|
||||
mlxk show Phi-3-mini-4k-instruct-4bit
|
||||
|
||||
# Download a new model
|
||||
mlxk pull mlx-community/Mistral-7B-Instruct-v0.3-4bit
|
||||
|
||||
# Run a model with a prompt
|
||||
mlxk run Phi-3-mini "What is the capital of France?"
|
||||
|
||||
# GPT-OSS reasoning model with formatted output
|
||||
mlxk run gpt-oss-20b-MXFP4-Q8 "Explain quantum computing"
|
||||
|
||||
# Hide reasoning steps, show only final answer (GPT-OSS models)
|
||||
mlxk run gpt-oss-20b-MXFP4-Q8 "What is 2+2?" --hide-reasoning
|
||||
|
||||
# Start interactive chat
|
||||
mlxk run Phi-3-mini
|
||||
|
||||
# Check model health
|
||||
# Check cache health
|
||||
mlxk health
|
||||
|
||||
# Show model details
|
||||
mlxk show "mlx-community/Phi-3-mini-4k-instruct-4bit"
|
||||
|
||||
# Pull a model
|
||||
mlxk pull "mlx-community/Llama-3.2-3B-Instruct-4bit"
|
||||
|
||||
# Run interactive chat
|
||||
mlxk run "Phi-3-mini" -c
|
||||
|
||||
# Start OpenAI-compatible server
|
||||
mlxk serve --port 8080
|
||||
```
|
||||
|
||||
### Web Chat Interface
|
||||
|
||||
MLX Knife includes a built-in web interface for easy model interaction:
|
||||
## Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `server`/`serve` | OpenAI-compatible API server; SIGINT-robust (Supervisor); SSE streaming |
|
||||
| `run` | Interactive and single-shot model execution with streaming/batch modes |
|
||||
| `list` | Model discovery with JSON output |
|
||||
| `health` | Corruption detection and cache analysis |
|
||||
| `show` | Detailed model information with --files, --config |
|
||||
| `pull` | HuggingFace model downloads with corruption detection |
|
||||
| `rm` | Model deletion with lock cleanup and fuzzy matching |
|
||||
| 🔒 `push` | **Alpha feature** - Upload to HuggingFace Hub; requires `MLXK2_ENABLE_ALPHA_FEATURES=1` |
|
||||
| 🔒 `clone` | **Alpha feature** - Model workspace cloning; requires `MLXK2_ENABLE_ALPHA_FEATURES=1` |
|
||||
|
||||
|
||||
|
||||
## JSON API
|
||||
|
||||
> **📋 Complete API Specification**: See [JSON API Specification](docs/json-api-specification.md) for comprehensive schema, error codes, and examples.
|
||||
|
||||
All commands support both human-readable and JSON output (`--json` flag) for automation and scripting, enabling seamless integration with CI/CD pipelines and cluster management systems.
|
||||
|
||||
### Command Structure
|
||||
|
||||
All commands support JSON output via `--json` flag:
|
||||
|
||||
```bash
|
||||
# Start the OpenAI-compatible API server
|
||||
mlxk server --port 8000 --max-tokens 4000
|
||||
|
||||
# Get web chat interface from GitHub
|
||||
curl -O https://raw.githubusercontent.com/mzau/mlx-knife/main/simple_chat.html
|
||||
|
||||
# Open web chat interface in your browser
|
||||
open simple_chat.html
|
||||
mlxk list --json | jq '.data.models[].name'
|
||||
mlxk health --json | jq '.data.summary'
|
||||
mlxk show "Phi-3-mini" --json | jq '.data.model'
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- **No installation required** - Pure HTML/CSS/JS
|
||||
- **Real-time streaming** - Watch tokens appear as they're generated
|
||||
- **Model selection** - Choose any MLX model from your cache
|
||||
- **Conversation history** - Full context for follow-up questions
|
||||
- **Markdown rendering** - Proper formatting for code, lists, tables
|
||||
- **Mobile-friendly** - Responsive design works on all devices
|
||||
**Response Format:**
|
||||
```json
|
||||
{
|
||||
"status": "success|error",
|
||||
"command": "list|health|show|pull|rm|clone|version|push|run|server",
|
||||
"data": { /* command-specific data */ },
|
||||
"error": null | { "type": "...", "message": "..." }
|
||||
}
|
||||
```
|
||||
|
||||
### Local API Server Integration
|
||||
### Examples
|
||||
|
||||
The MLX Knife server provides OpenAI-compatible endpoints for **local development and personal use**:
|
||||
#### List Models
|
||||
```bash
|
||||
mlxk list --json
|
||||
# Output:
|
||||
{
|
||||
"status": "success",
|
||||
"command": "list",
|
||||
"data": {
|
||||
"models": [
|
||||
{
|
||||
"name": "mlx-community/Phi-3-mini-4k-instruct-4bit",
|
||||
"hash": "a5339a41b2e3abcdef1234567890ab12345678ef",
|
||||
"size_bytes": 4613734656,
|
||||
"last_modified": "2024-10-15T08:23:41Z",
|
||||
"framework": "MLX",
|
||||
"model_type": "chat",
|
||||
"capabilities": ["text-generation", "chat"],
|
||||
"health": "healthy",
|
||||
"runtime_compatible": true,
|
||||
"reason": null,
|
||||
"cached": true
|
||||
}
|
||||
],
|
||||
"count": 1
|
||||
},
|
||||
"error": null
|
||||
}
|
||||
```
|
||||
|
||||
#### Health Check
|
||||
```bash
|
||||
mlxk health --json
|
||||
# Output:
|
||||
{
|
||||
"status": "success",
|
||||
"command": "health",
|
||||
"data": {
|
||||
"healthy": [
|
||||
{
|
||||
"name": "mlx-community/Phi-3-mini-4k-instruct-4bit",
|
||||
"status": "healthy",
|
||||
"reason": "Model is healthy"
|
||||
}
|
||||
],
|
||||
"unhealthy": [],
|
||||
"summary": { "total": 1, "healthy_count": 1, "unhealthy_count": 0 }
|
||||
},
|
||||
"error": null
|
||||
}
|
||||
```
|
||||
|
||||
#### Show Model Details
|
||||
```bash
|
||||
mlxk show "Phi-3-mini" --json --files
|
||||
# Output (simplified):
|
||||
{
|
||||
"status": "success",
|
||||
"command": "show",
|
||||
"data": {
|
||||
"model": {
|
||||
"name": "mlx-community/Phi-3-mini-4k-instruct-4bit",
|
||||
"hash": "a5339a41b2e3abcdefgh1234567890ab12345678",
|
||||
"size_bytes": 4613734656,
|
||||
"framework": "MLX",
|
||||
"model_type": "chat",
|
||||
"capabilities": ["text-generation", "chat"],
|
||||
"last_modified": "2024-10-15T08:23:41Z",
|
||||
"health": "healthy",
|
||||
"runtime_compatible": true,
|
||||
"reason": null,
|
||||
"cached": true
|
||||
},
|
||||
"files": [
|
||||
{"name": "config.json", "size": "1.2KB", "type": "config"},
|
||||
{"name": "model.safetensors", "size": "2.3GB", "type": "weights"}
|
||||
],
|
||||
"metadata": null
|
||||
},
|
||||
"error": null
|
||||
}
|
||||
```
|
||||
|
||||
### Hash Syntax Support
|
||||
|
||||
All commands support `@hash` syntax for specific model versions:
|
||||
|
||||
```bash
|
||||
# Start local server (single-user, no authentication)
|
||||
mlxk server --host 127.0.0.1 --port 8000
|
||||
|
||||
# Test with curl
|
||||
curl -X POST "http://localhost:8000/v1/chat/completions" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "Phi-3-mini-4k-instruct-4bit", "messages": [{"role": "user", "content": "Hello!"}]}'
|
||||
|
||||
# Integration with development tools (community-tested):
|
||||
# - Cursor IDE: Set API URL to http://localhost:8000/v1
|
||||
# - LibreChat: Configure as custom OpenAI endpoint
|
||||
# - Open WebUI: Add as local OpenAI-compatible API
|
||||
# - SillyTavern: Add as OpenAI API with custom URL
|
||||
mlxk health "Qwen3@e96" --json # Check specific hash
|
||||
mlxk show "model@3df9bfd" --json # Short hash matching
|
||||
mlxk rm "Phi-3@e967" --json --force # Delete specific version
|
||||
```
|
||||
|
||||
**Note**: Tool integrations are community-tested. Some tools may require specific configuration or have compatibility limitations. Please report issues via GitHub.
|
||||
### Integration Examples
|
||||
|
||||
## Command Reference
|
||||
|
||||
### Available Commands
|
||||
|
||||
#### `list` - Browse Models
|
||||
#### Broke-Cluster Integration
|
||||
```bash
|
||||
mlxk list # Show chat-capable MLX models (strict view)
|
||||
mlxk list --verbose # Show MLX models with full paths
|
||||
mlxk list --all # Show all models with framework and TYPE
|
||||
mlxk list --all --verbose # All models with full paths
|
||||
mlxk list --health # Include health status
|
||||
mlxk list Phi-3 # Filter by model name
|
||||
mlxk list --verbose Phi-3 # Show detailed info (same as show)
|
||||
# Get available model names for scheduling
|
||||
MODELS=$(mlxk list --json | jq -r '.data.models[].name')
|
||||
|
||||
# Check cache health before deployment
|
||||
HEALTH=$(mlxk health --json | jq '.data.summary.healthy_count')
|
||||
if [ "$HEALTH" -eq 0 ]; then
|
||||
echo "No healthy models available"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Download required models
|
||||
mlxk pull "mlx-community/Phi-3-mini-4k-instruct-4bit" --json
|
||||
```
|
||||
|
||||
#### `show` - Model Details
|
||||
#### CI/CD Pipeline Usage
|
||||
```bash
|
||||
mlxk show <model> # Display model information
|
||||
mlxk show <model> --files # Include file listing
|
||||
mlxk show <model> --config # Show config.json content
|
||||
# Verify model integrity in CI
|
||||
mlxk health --json | jq -e '.data.summary.unhealthy_count == 0'
|
||||
|
||||
# Clean up CI artifacts
|
||||
mlxk rm "test-model-*" --json --force
|
||||
|
||||
# Pre-warm cache for deployment
|
||||
mlxk pull "production-model" --json
|
||||
```
|
||||
|
||||
#### `pull` - Download Models
|
||||
#### Model Management Automation
|
||||
```bash
|
||||
mlxk pull <model> # Download from HuggingFace
|
||||
mlxk pull <org>/<model> # Full model path
|
||||
# Find models by pattern
|
||||
LARGE_MODELS=$(mlxk list --json | jq -r '.data.models[] | select(.name | contains("30B")) | .name')
|
||||
|
||||
# Show detailed info for analysis
|
||||
for model in $LARGE_MODELS; do
|
||||
mlxk show "$model" --json --config | jq '.data.model_config'
|
||||
done
|
||||
```
|
||||
|
||||
#### `run` - Execute Models
|
||||
```bash
|
||||
mlxk run <model> "prompt" # Single prompt (minimal output)
|
||||
mlxk run <model> "prompt" --verbose # Show loading, memory, and stats
|
||||
mlxk run <model> # Interactive chat
|
||||
mlxk run <model> "prompt" --no-stream # Batch output
|
||||
mlxk run <model> --max-tokens 1000 # Custom length
|
||||
mlxk run <model> --temperature 0.9 # Higher creativity
|
||||
mlxk run <model> --no-chat-template # Raw completion mode
|
||||
mlxk run <model> --hide-reasoning # Hide reasoning (GPT-OSS models only)
|
||||
```
|
||||
|
||||
#### `rm` - Remove Models
|
||||
```bash
|
||||
mlxk rm <model> # Delete model with cache cleanup confirmation
|
||||
mlxk rm <model>@<hash> # Delete specific version (removes entire model)
|
||||
mlxk rm <model> --force # Skip confirmations, auto-cleanup cache files
|
||||
```
|
||||
## Human Output
|
||||
|
||||
**Features:**
|
||||
- Removes entire model directory (not just snapshots)
|
||||
- Cleans up orphaned HuggingFace lock files
|
||||
- Handles corrupted models gracefully
|
||||
- Smart prompting (only asks about cache cleanup if needed)
|
||||
MLX Knife provides rich human-readable output by default (without `--json` flag).
|
||||
|
||||
#### `health` - Check Integrity
|
||||
```bash
|
||||
mlxk health # Check all models
|
||||
mlxk health <model> # Check specific model
|
||||
```
|
||||
|
||||
#### `server` - Start API Server
|
||||
```bash
|
||||
mlxk server # Start on localhost:8000
|
||||
mlxk server --port 8001 # Custom port
|
||||
mlxk server --host 0.0.0.0 --port 8000 # Allow external access
|
||||
mlxk server --max-tokens 4000 # Set default max tokens (default: 2000)
|
||||
mlxk server --reload # Development mode with auto-reload
|
||||
```
|
||||
|
||||
### Command Aliases
|
||||
After installation, these commands are equivalent:
|
||||
- `mlxk` (recommended)
|
||||
- `mlx-knife`
|
||||
- `mlx_knife`
|
||||
|
||||
## Configuration
|
||||
|
||||
### Cache Location
|
||||
By default, models are stored in `~/.cache/huggingface/hub`. Configure with:
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Set custom cache location
|
||||
export HF_HOME="/path/to/your/cache"
|
||||
|
||||
# Example: External SSD
|
||||
export HF_HOME="/Volumes/ExternalSSD/models"
|
||||
mlxk list
|
||||
mlxk list --health
|
||||
mlxk health
|
||||
mlxk show "mlx-community/Phi-3-mini-4k-instruct-4bit"
|
||||
```
|
||||
|
||||
### Model Name Expansion
|
||||
Short names are automatically expanded for MLX models:
|
||||
- `Phi-3-mini-4k-instruct-4bit` → `mlx-community/Phi-3-mini-4k-instruct-4bit`
|
||||
- Models already containing `/` are used as-is
|
||||
### List Filters
|
||||
|
||||
## Advanced Usage
|
||||
- `list`: Shows MLX chat models only (compact names, safe default)
|
||||
- `list --verbose`: Shows all MLX models (chat + base) with full org/names and Framework column
|
||||
- `list --all`: Shows all frameworks (MLX, GGUF, PyTorch)
|
||||
- Flags are combinable: `--all --verbose`, `--all --health`, `--verbose --health`
|
||||
|
||||
### Generation Parameters
|
||||
### Health Status Display (--health flag)
|
||||
|
||||
The `--health` flag adds health status information to the output:
|
||||
|
||||
**Compact mode** (default, `--all`):
|
||||
- Shows single "Health" column with values:
|
||||
- `healthy` - File integrity OK and MLX runtime compatible
|
||||
- `healthy*` - File integrity OK but not MLX runtime compatible (use `--verbose` for details)
|
||||
- `unhealthy` - File integrity failed or unknown format
|
||||
|
||||
**Verbose mode** (`--verbose --health`):
|
||||
- Splits into "Integrity" and "Runtime" columns:
|
||||
- **Integrity:** `healthy` / `unhealthy`
|
||||
- **Runtime:** `yes` / `no` / `-` (dash = gate blocked by failed integrity)
|
||||
- **Reason:** Explanation when problems detected (wrapped at 26 chars for readability)
|
||||
|
||||
**Examples:**
|
||||
|
||||
```bash
|
||||
# Creative writing (high temperature, diverse output)
|
||||
mlxk run Mistral-7B "Write a story" --temperature 0.9 --top-p 0.95
|
||||
# Compact health view
|
||||
mlxk list --health
|
||||
# Output:
|
||||
# Name | Hash | Size | Modified | Type | Health
|
||||
# Llama-3.2-3B-Instruct | a1b2c3d | 2.1GB | 2d ago | chat | healthy
|
||||
# Qwen2-7B-Instruct | 1a2b3c4 | 4.8GB | 3d ago | chat | healthy*
|
||||
|
||||
# Precise tasks (low temperature, focused output)
|
||||
mlxk run Phi-3-mini "Extract key points" --temperature 0.3 --top-p 0.9
|
||||
# Verbose health view with details
|
||||
mlxk list --verbose --health
|
||||
# Output:
|
||||
# Name | Hash | Size | Modified | Framework | Type | Integrity | Runtime | Reason
|
||||
# Llama-3.2-3B-Instruct | a1b2c3d | 2.1GB | 2d ago | MLX | chat | healthy | yes | -
|
||||
# Qwen2-7B-Instruct | 1a2b3c4 | 4.8GB | 3d ago | PyTorch | chat | healthy | no | Incompatible: PyTorch
|
||||
|
||||
# Long-form generation
|
||||
mlxk run Mixtral-8x7B "Explain quantum computing" --max-tokens 2000
|
||||
|
||||
# Reduce repetition
|
||||
mlxk run model "prompt" --repetition-penalty 1.2
|
||||
# All frameworks with health status
|
||||
mlxk list --all --health
|
||||
# Output:
|
||||
# Name | Hash | Size | Modified | Framework | Type | Health
|
||||
# Llama-3.2-3B-Instruct | a1b2c3d | 2.1GB | 2d ago | MLX | chat | healthy
|
||||
# llama-3.2-gguf-q4 | b2c3d4e | 1.8GB | 3d ago | GGUF | unknown | healthy*
|
||||
# broken-download | - | 500MB | 1h ago | Unknown | unknown | unhealthy
|
||||
```
|
||||
|
||||
### Working with Specific Commits
|
||||
**Design Philosophy:**
|
||||
- `unhealthy` is a catch-all for anything not understood/supported (broken downloads, unknown formats, creative HuggingFace structures)
|
||||
- `healthy` guarantees the model will work with `mlxk2 run`
|
||||
- `healthy*` means files are intact but MLX runtime can't execute them (e.g., GGUF/PyTorch models, incompatible model_type, or mlx-lm version too old)
|
||||
|
||||
Note: JSON output is unaffected by these human-only filters and always includes full health/runtime data.
|
||||
|
||||
|
||||
## Logging & Debugging
|
||||
|
||||
MLX Knife 2.0 provides structured logging with configurable output formats and levels.
|
||||
|
||||
### Log Levels
|
||||
|
||||
Control verbosity with `--log-level` (server mode):
|
||||
|
||||
```bash
|
||||
# Use specific model version
|
||||
mlxk show model@commit_hash
|
||||
mlxk run model@commit_hash "prompt"
|
||||
# Default: Show startup, model loading, and errors
|
||||
mlxk serve --log-level info
|
||||
|
||||
# Quiet: Only warnings and errors
|
||||
mlxk serve --log-level warning
|
||||
|
||||
# Silent: Only errors
|
||||
mlxk serve --log-level error
|
||||
|
||||
# Verbose: All logs including HTTP requests
|
||||
mlxk serve --log-level debug
|
||||
```
|
||||
|
||||
### Non-MLX Model Handling
|
||||
**Log Level Behavior:**
|
||||
- `debug`: All logs + Uvicorn HTTP access logs (`GET /v1/models`, etc.)
|
||||
- `info`: Application logs (startup, model switching, errors) + HTTP access logs
|
||||
- `warning`: Only warnings and errors (no startup messages, no HTTP access logs)
|
||||
- `error`: Only error messages
|
||||
|
||||
### JSON Logs (Machine-Readable)
|
||||
|
||||
Enable structured JSON output for log aggregation tools:
|
||||
|
||||
The tool automatically detects framework compatibility:
|
||||
```bash
|
||||
# Attempting to run PyTorch model
|
||||
mlxk run bert-base-uncased
|
||||
# Error: Model bert-base-uncased is not MLX-compatible (Framework: PyTorch)!
|
||||
# Use MLX-Community models: https://huggingface.co/mlx-community
|
||||
# JSON logs (recommended - CLI flag)
|
||||
mlxk serve --log-json
|
||||
|
||||
# JSON logs (alternative - environment variable)
|
||||
MLXK2_LOG_JSON=1 mlxk serve
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
**Note:** `--log-json` also formats Uvicorn access logs as JSON for consistent output.
|
||||
|
||||
### Model Not Found
|
||||
**JSON Format:**
|
||||
```json
|
||||
{"ts": 1760830072.96, "level": "INFO", "msg": "MLX Knife Server 2.0 starting up..."}
|
||||
{"ts": 1760830073.14, "level": "INFO", "msg": "Switching to model: mlx-community/...", "model": "..."}
|
||||
{"ts": 1760830074.52, "level": "ERROR", "msg": "Model type bert not supported.", "logger": "root"}
|
||||
```
|
||||
|
||||
**Fields:**
|
||||
- `ts`: Unix timestamp
|
||||
- `level`: Log level (INFO, WARN, ERROR, DEBUG)
|
||||
- `msg`: Log message (HF tokens and user paths automatically redacted)
|
||||
- `logger`: Source logger (`mlxk2` = application, `root` = external libraries like mlx-lm)
|
||||
- Additional fields: `model`, `request_id`, `detail`, `duration_ms` (context-dependent)
|
||||
|
||||
### Security: Automatic Redaction
|
||||
|
||||
**Sensitive data is automatically removed from logs:**
|
||||
- HuggingFace tokens (`hf_...`) → `[REDACTED_TOKEN]`
|
||||
- User home paths (`/Users/john/...`) → `~/...`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# If model isn't found, try full path
|
||||
mlxk pull mlx-community/Model-Name-4bit
|
||||
# Original (unsafe):
|
||||
Using token hf_AbCdEfGhIjKlMnOpQrStUvWxYz123456 from /Users/john/models
|
||||
|
||||
# List available models
|
||||
mlxk list --all
|
||||
# Logged (safe):
|
||||
Using token [REDACTED_TOKEN] from ~/models
|
||||
```
|
||||
|
||||
### Performance Issues
|
||||
- Ensure sufficient RAM for model size
|
||||
- Close other applications to free memory
|
||||
- Use smaller quantized models (4-bit recommended)
|
||||
### Environment Variables
|
||||
|
||||
- `MLXK2_LOG_JSON=1`: Enable JSON log format (alternative to `--log-json` flag)
|
||||
- `MLXK2_LOG_LEVEL`: Override log level (used internally for subprocess mode)
|
||||
|
||||
|
||||
## HuggingFace Cache Safety
|
||||
|
||||
MLX-Knife 2.0 respects standard HuggingFace cache structure and practices:
|
||||
|
||||
### Best Practices for Shared Environments
|
||||
- **Read operations** (`list`, `health`, `show`) always safe with concurrent processes
|
||||
- **Write operations** (`pull`, `rm`) coordinate during maintenance windows
|
||||
- **Lock cleanup** automatic but avoid during active downloads
|
||||
- **Your responsibility:** Coordinate with team, use good timing
|
||||
|
||||
### Example Safe Workflow
|
||||
```bash
|
||||
# Check what's in cache (always safe)
|
||||
mlxk list --json | jq '.data.count'
|
||||
|
||||
# Maintenance window - coordinate with team
|
||||
mlxk rm "corrupted-model" --json --force
|
||||
mlxk pull "replacement-model" --json
|
||||
|
||||
# Back to normal operations
|
||||
mlxk health --json | jq '.data.summary'
|
||||
```
|
||||
|
||||
|
||||
## Hidden Alpha Features: `clone` and `push`
|
||||
|
||||
### `clone` - Model Workspace Creation
|
||||
|
||||
`mlxk clone` is a hidden alpha feature. Enable with `MLXK2_ENABLE_ALPHA_FEATURES=1`. It creates a local workspace from a cached model for modification and development.
|
||||
|
||||
- Creates isolated workspace from cached models
|
||||
- Supports APFS copy-on-write optimization on same-volume scenarios
|
||||
- Includes health check integration for workspace validation
|
||||
- Use case: Fork-modify-push workflows
|
||||
|
||||
Example:
|
||||
```bash
|
||||
# Enable alpha features
|
||||
export MLXK2_ENABLE_ALPHA_FEATURES=1
|
||||
|
||||
# Clone model to workspace
|
||||
mlxk clone org/model ./workspace
|
||||
```
|
||||
|
||||
### `push` - Upload to Hub
|
||||
|
||||
`mlxk push` is a hidden alpha feature. Enable with `MLXK2_ENABLE_ALPHA_FEATURES=1`. It uploads a local folder to a Hugging Face model repository using `huggingface_hub/upload_folder`.
|
||||
|
||||
- Requires `HF_TOKEN` (write-enabled).
|
||||
- Default branch: `main` (explicitly override with `--branch`).
|
||||
- Safety: `--private` is required to avoid accidental public uploads.
|
||||
- No validation or manifests. Basic hard excludes are applied by default: `.git/**`, `.DS_Store`, `__pycache__/`, common virtualenv folders (`.venv/`, `venv/`), and `*.pyc`.
|
||||
- `.hfignore` (gitignore-like) in the workspace is supported and merged with the defaults.
|
||||
- Repo creation: use `--create` if the target repo does not exist; harmless on existing repos. Missing branches are created during upload.
|
||||
- JSON output: includes `commit_sha`, `commit_url`, `no_changes`, `uploaded_files_count` (when available), `local_files_count` (approx), `change_summary` and a short `message`.
|
||||
- Quiet JSON by default: with `--json` (without `--verbose`) progress bars/console logs are suppressed; hub logs are still captured in `data.hf_logs`.
|
||||
- Human output: derived from JSON; add `--verbose` to include extras such as the commit URL or a short message variant. JSON schema is unchanged.
|
||||
- Local workspace check: use `--check-only` to validate a workspace without uploading. Produces `workspace_health` in JSON (no token/network required).
|
||||
- Dry-run planning: use `--dry-run` to compute a plan vs remote without uploading. Returns `dry_run: true`, `dry_run_summary {added, modified:null, deleted}`, and sample `added_files`/`deleted_files`.
|
||||
- Testing: see TESTING.md ("Push Testing (2.0)") for offline tests and opt-in live checks with markers/env.
|
||||
- Intended for early testers only. Carefully review the result on the Hub after pushing.
|
||||
- Responsibility: **You are responsible for complying with Hugging Face Hub policies and applicable laws (e.g., copyright/licensing) for any uploaded content.**
|
||||
|
||||
Example:
|
||||
```bash
|
||||
# Enable alpha features
|
||||
export MLXK2_ENABLE_ALPHA_FEATURES=1
|
||||
|
||||
# Use push command
|
||||
mlxk push --private ./workspace org/model --create --commit "init"
|
||||
```
|
||||
|
||||
These features are not final and may change or be removed in future releases.
|
||||
|
||||
|
||||
## Testing
|
||||
|
||||
The 2.0 test suite runs by default (pytest discovery points to `tests_2.0/`):
|
||||
|
||||
```bash
|
||||
# Run 2.0 tests (default)
|
||||
pytest -v
|
||||
|
||||
# Explicitly run legacy 1.x tests (not maintained on this branch)
|
||||
pytest tests/ -v
|
||||
|
||||
# Test categories (2.0 example):
|
||||
# - ADR-002 edge cases
|
||||
# - Integration scenarios
|
||||
# - Model naming logic
|
||||
# - Robustness testing
|
||||
|
||||
# Current status: all current 2.0 tests pass (some optional schema tests may be skipped without extras)
|
||||
```
|
||||
|
||||
**Test Architecture:**
|
||||
- **Isolated Cache System** - Zero risk to user data
|
||||
- **Atomic Context Switching** - Production/test cache separation
|
||||
- **Mock Models** - Realistic test scenarios
|
||||
- **Edge Case Coverage** - All documented failure modes tested
|
||||
|
||||
|
||||
## Compatibility Notes
|
||||
|
||||
- Streaming note: Some UIs buffer SSE; verify real-time with `curl -N`. Server sends clear interrupt markers on abort.
|
||||
|
||||
### Streaming Issues
|
||||
- Some models may have spacing issues - this is handled automatically
|
||||
- Use `--no-stream` for batch output if needed
|
||||
|
||||
## Contributing
|
||||
|
||||
Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.
|
||||
This branch follows the established MLX-Knife development patterns:
|
||||
|
||||
## Security
|
||||
```bash
|
||||
# Run quality checks
|
||||
python test-multi-python.sh # Tests across Python 3.9-3.13
|
||||
./run_linting.sh # Code quality validation
|
||||
|
||||
For security concerns, please see [SECURITY.md](SECURITY.md) or contact us at broke@gmx.eu.
|
||||
# Key files:
|
||||
mlxk2/ # 2.0.0 implementation
|
||||
tests_2.0/ # 2.0 test suite
|
||||
docs/ADR/ # Architecture decision records
|
||||
```
|
||||
|
||||
See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
|
||||
|
||||
|
||||
## Support & Feedback
|
||||
|
||||
- **Issues**: [GitHub Issues](https://github.com/mzau/mlx-knife/issues)
|
||||
- **Discussions**: [GitHub Discussions](https://github.com/mzau/mlx-knife/discussions)
|
||||
- **API Specification**: [JSON API Specification](docs/json-api-specification.md)
|
||||
- **Documentation**: See `docs/` directory for technical details
|
||||
- **Security Policy**: See [SECURITY.md](SECURITY.md)
|
||||
|
||||
MLX Knife runs entirely locally - no data is sent to external servers except when downloading models from HuggingFace.
|
||||
|
||||
## License
|
||||
|
||||
MIT License - see [LICENSE](LICENSE) file for details
|
||||
Apache License 2.0 — see `LICENSE` (root) and `mlxk2/NOTICE`.
|
||||
|
||||
Copyright (c) 2025 The BROKE team 🦫
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
@@ -345,6 +574,6 @@ Copyright (c) 2025 The BROKE team 🦫
|
||||
|
||||
<p align="center">
|
||||
<b>Made with ❤️ by The BROKE team <img src="broke-logo.png" alt="BROKE Logo" width="30" style="vertical-align: middle;"></b><br>
|
||||
<i>Version 1.1.1 | September 2025</i><br>
|
||||
<i>Version 2.0.0 | November 2025</i><br>
|
||||
<a href="https://github.com/mzau/broke-cluster">🔮 Next: BROKE Cluster for multi-node deployments</a>
|
||||
</p>
|
||||
|
||||
Reference in New Issue
Block a user