mirror of https://github.com/cloudstack-llc/mlx-knife.git synced 2026-07-01 20:44:14 -04:00

T

The BROKE Cluster Team 57bf6d86be 2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

Major Features Added:
  • Complete run command implementation with interactive/single-shot modes
  • MLXRunner core engine ported from 1.x with modular architecture
  • OpenAI-compatible server with SIGINT-robust supervisor mode
  • Experimental push feature properly isolated behind environment variable

  Key Improvements:
  - Full feature parity with 1.1.1 stable releases
  - Enhanced human output formatting across all commands
  - Clean separation of stable (184 tests) vs experimental features
  - Updated demo GIF showcasing improved 2.0 interface

  Fixes:
  - Pull operation cache pollution (Issue #30) with preflight access checks
  - Test stability improvements across all environments

  Architecture:
  - Modular runner design with focused helper modules
  - Thread-safe model loading and memory management
  - stable testing across Python 3.9-3.13

  Ready for use as comprehensive 1.x alternative.

2025-09-14 18:04:18 +02:00

.claude/agents

2.0.0-alpha: default 2.0 tests, cache safety, and docs

2025-08-29 16:57:45 +02:00

docs

2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

2025-09-14 18:04:18 +02:00

mlxk2

2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

2025-09-14 18:04:18 +02:00

scripts

2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

2025-09-14 18:04:18 +02:00

tests_2.0

2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

2025-09-14 18:04:18 +02:00

.gitignore

2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

2025-09-14 18:04:18 +02:00

broke-logo.png

Initial commit: MLX Knife 1.0-rc1

2025-08-12 23:00:55 +02:00

CHANGELOG.md

2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

2025-09-14 18:04:18 +02:00

CODE_OF_CONDUCT.md

Initial commit: MLX Knife 1.0-rc1

2025-08-12 23:00:55 +02:00

CONTRIBUTING.md

2.0.0-alpha.3: lenient MLX detection + push branch handling

2025-09-08 01:14:01 +02:00

LICENSE

2.0.0-alpha.3: lenient MLX detection + push branch handling

2025-09-08 01:14:01 +02:00

mlxk-demo.gif

2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

2025-09-14 18:04:18 +02:00

mlxk-demo.tape

2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

2025-09-14 18:04:18 +02:00

pyproject-mlxk-json.toml

2.0.0-alpha.3: lenient MLX detection + push branch handling

2025-09-08 01:14:01 +02:00

pyproject.toml

2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

2025-09-14 18:04:18 +02:00

pytest.ini

2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

2025-09-14 18:04:18 +02:00

README.md

2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

2025-09-14 18:04:18 +02:00

requirements.txt

2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

2025-09-14 18:04:18 +02:00

SECURITY.md

2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

2025-09-14 18:04:18 +02:00

settings.json

Initial commit: MLX Knife 1.0-rc1

2025-08-12 23:00:55 +02:00

simple_chat.html

Release MLX Knife 1.1.0-beta2 - Critical Bug Fixes & Test Stability

2025-08-22 23:16:50 +02:00

test-multi-python.sh

2.0.0-alpha.1: human output default; strict health (#27 , PyTorch index)

2025-08-31 22:25:43 +02:00

TESTING.md

2.0.0-beta.3: Feature Complete - Full 1.1.1 Parity Achieved

2025-09-14 18:04:18 +02:00

README.md

MLX-Knife 2.0.0-beta.3

New: JSON-First Model Management for Automation & Scripting

🚧 Beta: Server is included and SIGINT-robust (Supervisor). run is now complete in 2.0.

Stable Version: 1.1.1

Features

Core Functionality

List & Manage Models: Browse your HuggingFace cache with MLX-specific filtering
Model Information: Detailed model metadata including quantization info
Download Models: Pull models from HuggingFace with progress tracking
Run Models: Native MLX execution with streaming and chat modes
Health Checks: Verify model integrity and completeness
Cache Management: Clean up and organize your model storage
Privacy & Network: No background network or telemetry; only explicit Hugging Face interactions when you run pull or the experimental push.

Requirements

macOS with Apple Silicon (M1/M2/M3)
Python 3.9+ (native macOS version or newer)
8GB+ RAM recommended + RAM to run LLM

Python Compatibility

MLX Knife has been comprehensively tested and verified on:

✅ Python 3.9.6 (native macOS) - Primary target
✅ Python 3.10-3.13 - Fully compatible

Quick Start

# Installation (local development)
git clone https://github.com/mzau/mlx-knife.git
cd mlx-knife
pip install -e .

Install with development tools (ruff, mypy, tests)

pip install -e ".[dev,test]"


## Human output (default)
mlxk2 list
mlxk2 list --health
mlxk2 list --all --verbose
mlxk2 health
mlxk2 show "mlx-community/Phi-3-mini-4k-instruct-4bit"

### List filters (human)
- `list`: shows MLX chat models only (safe default for run/server selection)
- `list --verbose`: shows all MLX models (chat + base)
- `list --all`: shows all frameworks (MLX, GGUF, PyTorch)
- `list --all --verbose`: same selection as `--all`, with fuller names/details

Note: JSON output is unaffected by these human-only filters.

## JSON API
mlxk2 list --json | jq '.data.models[].name'
mlxk2 health --json | jq '.data.summary'
mlxk2 show "Phi-3-mini" --json | jq '.data.model'

Compatibility Notes

2.0 CLI is JSON-first with human output by default; use --json for API responses.
Full feature parity with 1.x achieved including run command.
Streaming note: Some UIs buffer SSE; verify real-time with curl -N. Server sends clear interrupt markers on abort.

Beta Status Summary

✅ Server included and SIGINT-robust (Supervisor). SSE streaming behaves predictably (happy/interrupt). 404/503 mappings preserved.
✅ JSON-first CLI stable: list, health, show, pull, rm, run.
🔒 push hidden experimental feature (requires MLXK2_ENABLE_EXPERIMENTAL_PUSH=1).

What 2.0.0-beta Includes

Command	Status	Description
✅ `server`	Included	OpenAI-compatible API server; SIGINT-robust (Supervisor); SSE streaming
✅ `run`	Complete	Interactive and single-shot model execution with streaming/batch modes
✅ `list`	Complete	Model discovery with JSON output
✅ `health`	Complete	Corruption detection and cache analysis
✅ `show`	Complete	Detailed model information with --files, --config
✅ `pull`	Complete	HuggingFace model downloads with corruption detection
✅ `rm`	Complete	Model deletion with lock cleanup and fuzzy matching
🔒 `push`	Hidden Experimental	Upload-only; requires `MLXK2_ENABLE_EXPERIMENTAL_PUSH=1` to enable

Hidden Experimental: `push` (upload only)

mlxk2 push is a hidden experimental feature (M0). Enable with MLXK2_ENABLE_EXPERIMENTAL_PUSH=1. It uploads a local folder to a Hugging Face model repository using huggingface_hub/upload_folder.

Requires HF_TOKEN (write-enabled).
Default branch: main (explicitly override with --branch).
Safety: --private is required to avoid accidental public uploads.
No validation or manifests. Basic hard excludes are applied by default: .git/**, .DS_Store, __pycache__/, common virtualenv folders (.venv/, venv/), and *.pyc.
.hfignore (gitignore-like) in the workspace is supported and merged with the defaults.
Repo creation: use --create if the target repo does not exist; harmless on existing repos. Missing branches are created during upload.
JSON-first: output includes commit_sha, commit_url, no_changes, uploaded_files_count (when available), local_files_count (approx), change_summary and a short message.
Quiet JSON by default: with --json (without --verbose) progress bars/console logs are suppressed; hub logs are still captured in data.hf_logs.
Human output: derived from JSON; add --verbose to include extras such as the commit URL or a short message variant. JSON schema is unchanged.
Local workspace check: use --check-only to validate a workspace without uploading. Produces workspace_health in JSON (no token/network required).
Dry-run planning: use --dry-run to compute a plan vs remote without uploading. Returns dry_run: true, dry_run_summary {added, modified:null, deleted}, and sample added_files/deleted_files.
Testing: see TESTING.md ("Push Testing (2.0)") for offline tests and opt-in live checks with markers/env.
Intended for early testers only. Carefully review the result on the Hub after pushing.
Responsibility: You are responsible for complying with Hugging Face Hub policies and applicable laws (e.g., copyright/licensing) for any uploaded content.

Example:

# Enable experimental push feature
export MLXK2_ENABLE_EXPERIMENTAL_PUSH=1

# Use push command
mlxk2 push --private ./workspace org/model --create --commit "init"

This feature is not final and may change or be removed.

Installation & Parallel Usage

Development Installation

# Install 2.0.0-beta (this branch)
pip install -e /path/to/mlx-knife

# Verify installation
mlxk-json --version  # → mlxk2 2.0.0-beta.3
mlxk2 --version      # → mlxk2 2.0.0-beta.3

Parallel with MLX-Knife 1.x

Both versions can coexist safely:

# Install stable 1.x for server/run features
pip install mlx-knife

# Commands available:
mlxk list                    # 1.x - Human-readable output
mlxk server --port 8080      # 1.x - Server mode
mlxk run "model" -p "Hello"  # 1.x - Interactive execution

mlxk-json list --json        # 2.0 - JSON API
python -m mlxk2.cli list     # 2.0 - Module invocation

Package Names:

MLX-Knife 1.x: mlx-knife → mlxk command
MLX-Knife 2.0: mlxk-json → mlxk-json, mlxk2 commands

JSON API Documentation

📋 Complete API Specification: See the JSON API spec for comprehensive schema, error codes, and examples: JSON API Specification

Command Structure

All commands follow this JSON response format:

{
    "status": "success|error", 
    "command": "list|health|show|pull|rm|push",
    "data": { /* command-specific data */ },
    "error": null | { "message": "...", "details": "..." }
}

Examples

For full, up-to-date examples for every command, refer to the spec: JSON API Specification

List Models

mlxk-json list --json
# Output:
{
  "status": "success",
  "command": "list",
  "data": {
    "models": [
      {
        "name": "mlx-community/Phi-3-mini-4k-instruct-4bit",
        "hash": "a5339a41b2e3abcdefgh1234567890ab12345678",
        "size_bytes": 4613734656,
        "last_modified": "2024-10-15T08:23:41Z",
        "framework": "MLX",
        "model_type": "chat",
        "capabilities": ["text-generation", "chat"],
        "health": "healthy",
        "cached": true
      }
    ],
    "count": 1
  },
  "error": null
}

Health Check

mlxk-json health --json
# Output:
{
  "status": "success",
  "command": "health",
  "data": {
    "healthy": [
      { "name": "mlx-community/Phi-3-mini-4k-instruct-4bit", "status": "healthy", "reason": "Model is healthy" }
    ],
    "unhealthy": [],
    "summary": { "total": 1, "healthy_count": 1, "unhealthy_count": 0 }
  },
  "error": null
}

Show Model Details

mlxk-json show "Phi-3-mini" --json --files
# Output (simplified):
{
  "status": "success",
  "command": "show",
  "data": {
    "model": {
      "name": "mlx-community/Phi-3-mini-4k-instruct-4bit",
      "hash": "a5339a41b2e3abcdefgh1234567890ab12345678",
      "size_bytes": 4613734656,
      "framework": "MLX",
      "model_type": "chat",
      "capabilities": ["text-generation", "chat"],
      "last_modified": "2024-10-15T08:23:41Z",
      "health": "healthy",
      "cached": true
    },
    "files": [
      {"name": "config.json", "size": "1.2KB", "type": "config"},
      {"name": "model.safetensors", "size": "2.3GB", "type": "weights"}
    ],
    "metadata": null
  },
  "error": null
}

Hash Syntax Support

All commands support @hash syntax for specific model versions:

mlxk-json health "Qwen3@e96" --json     # Check specific hash
mlxk-json show "model@3df9bfd" --json   # Short hash matching
mlxk-json rm "Phi-3@e967" --json --force  # Delete specific version

HuggingFace Cache Safety

MLX-Knife 2.0 respects standard HuggingFace cache structure and practices:

Best Practices for Shared Environments

Read operations (list, health, show) always safe with concurrent processes
Write operations (pull, rm) coordinate during maintenance windows
Lock cleanup automatic but avoid during active downloads
Your responsibility: Coordinate with team, use good timing

Example Safe Workflow

# Check what's in cache (always safe)
mlxk-json list --json | jq '.data.count'

# Maintenance window - coordinate with team
mlxk-json rm "corrupted-model" --json --force
mlxk-json pull "replacement-model" --json

# Back to normal operations
mlxk-json health --json | jq '.data.summary'

Real-World Examples

🔗 Integration Reference: External projects should implement against the JSON API spec — this beta validates that implementation matches documentation: JSON API Specification

Broke-Cluster Integration

# Get available model names for scheduling
MODELS=$(mlxk-json list --json | jq -r '.data.models[].name')

# Check cache health before deployment
HEALTH=$(mlxk-json health --json | jq '.data.summary.healthy_count')
if [ "$HEALTH" -eq 0 ]; then
    echo "No healthy models available"
    exit 1
fi

# Download required models
mlxk-json pull "mlx-community/Phi-3-mini-4k-instruct-4bit" --json

CI/CD Pipeline Usage

# Verify model integrity in CI
mlxk-json health --json | jq -e '.data.summary.unhealthy_count == 0'

# Clean up CI artifacts
mlxk-json rm "test-model-*" --json --force

# Pre-warm cache for deployment
mlxk-json pull "production-model" --json

Model Management Automation

# Find models by pattern
LARGE_MODELS=$(mlxk-json list --json | jq -r '.data.models[] | select(.name | contains("30B")) | .name')

# Show detailed info for analysis
for model in $LARGE_MODELS; do
    mlxk-json show "$model" --json --config | jq '.data.model_config'
done

Testing

The 2.0 test suite runs by default (pytest discovery points to tests_2.0/):

# Run 2.0 tests (default)
pytest -v

# Explicitly run legacy 1.x tests (not maintained on this branch)
pytest tests/ -v

# Test categories (2.0 example):
# - ADR-002 edge cases
# - Integration scenarios  
# - Model naming logic
# - Robustness testing

# Current status: all current 2.0 tests pass (some optional schema tests may be skipped without extras)

Test Architecture:

Isolated Cache System - Zero risk to user data
Atomic Context Switching - Production/test cache separation
Mock Models - Realistic test scenarios
Edge Case Coverage - All documented failure modes tested

Known Notes

Streaming UX: Some UIs buffer SSE; verify real-time with curl -N. The server emits a clear interrupt marker on abort.
Error handling/logging: Unified error envelope and structured logs are planned post‑beta.3 (see ADR‑004).

Development Status

Version Roadmap

2.0.0-beta.3 ← You are here (feature complete; full 1.x parity achieved; all core commands implemented)
2.0.0-rc: CLI compatibility improvements: mlxk alias alongside mlxk2; final production hardening
2.0.0-stable: Stable release after RC feedback

Architecture Decisions

JSON-First: All output structured for scripting and automation
Cache Safety: Respects HuggingFace standards, no custom formats
Atomic Operations: Clean separation between test and production contexts
Backward Compatibility: Parallel deployment with 1.x maintained

Contributing

This branch follows the established MLX-Knife development patterns:

# Run quality checks
python test-multi-python.sh  # Tests across Python 3.9-3.13
./run_linting.sh             # Code quality validation

# Key files:
mlxk2/                       # 2.0.0 implementation
tests_2.0/                   # 2.0 test suite  
docs/ADR/                    # Architecture decision records

See CONTRIBUTING.md for detailed guidelines.

Support & Feedback

Issues: GitHub Issues
Discussions: GitHub Discussions
API Specification: JSON API Specification
Documentation: See docs/ directory for technical details
Security Policy: See SECURITY.md

License

2.x (mlxk2, this branch): Apache License 2.0 — see LICENSE (root) and mlxk2/NOTICE.
1.x (main branch): MIT License — see LICENSE on main.

Note: This branch is hard‑split for 2.0. The 1.x implementation and tests were removed here to avoid confusion and license duality; refer to the main branch for 1.x.

For production use: Consider MLX-Knife 1.1.0 until 2.0.0-beta is available.

Beta Testing Goals

✅ Validate JSON API specification matches implementation
✅ Real-world integration feedback from external projects
✅ Edge case coverage (naming, health, token limits)
✅ Server SIGINT robustness, SSE happy/interrupt behavior

MLX-Knife 2.0.0-beta — JSON-first CLI for local model management.

Acknowledgments

Built for Apple Silicon using the MLX framework
Models hosted by the MLX Community on HuggingFace
Inspired by ollama's user experience

Made with ❤️ by The BROKE team
Version 2.0.0-beta.3 | September 2025
🔮 Next: BROKE Cluster for multi-node deployments

README.md Unescape Escape

MLX-Knife 2.0.0-beta.3

New: JSON-First Model Management for Automation & Scripting

Features

Core Functionality

Requirements

Python Compatibility

Quick Start

Install with development tools (ruff, mypy, tests)

Compatibility Notes

Beta Status Summary

What 2.0.0-beta Includes

Hidden Experimental: push (upload only)

Installation & Parallel Usage

Development Installation

Parallel with MLX-Knife 1.x

JSON API Documentation

Command Structure

Examples

List Models

Health Check

Show Model Details

Hash Syntax Support

HuggingFace Cache Safety

Best Practices for Shared Environments

Example Safe Workflow

Real-World Examples

Broke-Cluster Integration

CI/CD Pipeline Usage

Model Management Automation

Testing

Known Notes

Development Status

Version Roadmap

Architecture Decisions

Contributing

Support & Feedback

License

Beta Testing Goals

Sponsors

Acknowledgments

README.md

Hidden Experimental: `push` (upload only)