mirror of https://github.com/cloudstack-llc/mlx-knife.git synced 2026-07-01 20:44:14 -04:00

Files

T

Local Test eedb91b75c Feat: add experimental push (2.0.0-alpha.2)

- Push (upload-only): quiet JSON by default; capture hub logs in data.hf_logs
  - No-op detection aligned to hub signal; clear commit fields; uploaded_files_count=0
  - Add --dry-run (plan vs remote) and --check-only (offline preflight); merge .hfignore; extend
  default ignores
  - Human output: concise; --verbose shows commit URL; JSON shape unchanged
  - Tests: add offline dry-run cases; live push remains opt-in (wet/live_push)
  - Docs: README push section updated; TESTING.md reference + mini-matrix;
  - Changelog: add 2.0.0-alpha.2; note Issue #31 under 1.1.1 pending
  - Spec: keep schema stable (0.1.3); CLI/version docs consistent

2025-09-05 22:42:39 +02:00

16 KiB

Raw Blame History

MLX-Knife 2.0.0-alpha.2

New: JSON-First Model Management for Automation & Scripting

🚧 Alpha Development: Server and run are not included yet in 2.0.0-alpha.2. Use MLX-Knife 1.1.0 for those features.

Stable Version: 1.1.0

Features

Core Functionality

List & Manage Models: Browse your HuggingFace cache with MLX-specific filtering
Model Information: Detailed model metadata including quantization info
Download Models: Pull models from HuggingFace with progress tracking
Run Models: Native MLX execution with streaming and chat modes (version 1.0.0 stable only)
Health Checks: Verify model integrity and completeness
Cache Management: Clean up and organize your model storage

Requirements

macOS with Apple Silicon (M1/M2/M3)
Python 3.9+ (native macOS version or newer)
8GB+ RAM recommended + RAM to run LLM

Python Compatibility

MLX Knife has been comprehensively tested and verified on:

✅ Python 3.9.6 (native macOS) - Primary target
✅ Python 3.10-3.13 - Fully compatible

Quick Start

# Installation (local development)
git clone https://github.com/mzau/mlx-knife.git
cd mlx-knife
pip install -e .

Install with development tools (ruff, mypy, tests)

pip install -e ".[dev,test]"


## Human output (default)
mlxk2 list
mlxk2 list --health
mlxk2 list --all --verbose
mlxk2 health
mlxk2 show "mlx-community/Phi-3-mini-4k-instruct-4bit"

## JSON API
mlxk2 list --json | jq '.data.models[].name'
mlxk2 health --json | jq '.data.summary'
mlxk2 show "Phi-3-mini" --json | jq '.data.model'

Differences vs 1.0.0

CLI: new entry points mlxk2 and mlxk-json (1.0.0 used mlxk).
Output: human output by default; add --json for machine-readable responses (new vs 1.0.0).
List formatting: improved compact table with relative times in the Modified column (e.g., 3h ago) and a new Type column; compact MLX-only view by default.
Flags (human-only): --all (all frameworks), --health (add Health column), --verbose (show full org/model).
JSON API: current spec v0.1.3; CLI accepts --json after subcommands.
Missing features (compared to 1.0.0): server and run are not included in 2.0 alpha.2 (use mlxk 1.x).

⚠️ Alpha Status Disclaimer

This is an alpha because:

Not feature-complete vs 1.0.0 (server and run pending).
Major internal refactor to a JSON-first CLI (new package mlxk2).

Status:

✅ Core commands: list, health, show, pull, rm.
✅ JSON outputs stable and schema-aligned; human output available by default.
✅ Suitable for automation/integration; can run alongside 1.x for server/run.

What 2.0.0-alpha Includes

Command	Status	Description
✅ `list`	Complete	Model discovery with JSON output
✅ `health`	Complete	Corruption detection and cache analysis
✅ `show`	Complete	Detailed model information with --files, --config
✅ `pull`	Complete	HuggingFace model downloads with corruption detection
✅ `rm`	Complete	Model deletion with lock cleanup and fuzzy matching
🧪 `push`	Experimental (alpha)	Upload-only; quiet JSON; supports `--check-only` and `--dry-run`

What's Coming Later

Feature	Target Version	Status
🔄 `server`	2.0.0-rc	OpenAI-compatible API server
🔄 `run`	2.0.0-rc	Interactive model execution
✅ Human-readable output	2.0.0-alpha.2	CLI formatting layer
🔄 `embed`	TBD	Embedding generation (if merged from 1.x)

Experimental: `push` (upload only)

mlxk2 push is experimental (M0). It uploads a local folder to a Hugging Face model repository using huggingface_hub/upload_folder.

Requires HF_TOKEN (write-enabled).
Default branch: main (explicitly override with --branch).
Alpha safety: --private is required to avoid accidental public uploads.
No validation or manifests. Basic hard excludes are applied by default: .git/**, .DS_Store, __pycache__/, common virtualenv folders (.venv/, venv/), and *.pyc.
.hfignore (gitignore-like) in the workspace is supported and merged with the defaults.
Repo creation: use --create if the target repo does not exist; harmless on existing repos. Missing branches are created during upload.
JSON-first: output includes commit_sha, commit_url, no_changes, uploaded_files_count (when available), local_files_count (approx), change_summary and a short message.
Quiet JSON by default: with --json (without --verbose) progress bars/console logs are suppressed; hub logs are still captured in data.hf_logs.
Human output: derived from JSON; add --verbose to include extras such as the commit URL or a short message variant. JSON schema is unchanged.
Local workspace check: use --check-only to validate a workspace without uploading. Produces workspace_health in JSON (no token/network required).
Dry-run planning: use --dry-run to compute a plan vs remote without uploading. Returns dry_run: true, dry_run_summary {added, modified:null, deleted}, and sample added_files/deleted_files.
Testing: see TESTING.md ("Push Testing (2.0)") for offline tests and opt-in live checks with markers/env.
Intended for early testers only. Carefully review the result on the Hub after pushing.
Responsibility: You are responsible for complying with Hugging Face Hub policies and applicable laws (e.g., copyright/licensing) for any uploaded content.

Example:

mlxk2 push --private ./workspace org/model --create --commit "init"

This feature is not final and may change or be removed.

Installation & Parallel Usage

Development Installation

# Install 2.0.0-alpha (this branch)
pip install -e /path/to/mlx-knife

# Verify installation
mlxk-json --version  # → mlxk2 2.0.0-alpha.2
mlxk2 --version      # → mlxk2 2.0.0-alpha.2

Parallel with MLX-Knife 1.x

Both versions can coexist safely:

# Install stable 1.x for server/run features
pip install mlx-knife

# Commands available:
mlxk list                    # 1.x - Human-readable output
mlxk server --port 8080      # 1.x - Server mode
mlxk run "model" -p "Hello"  # 1.x - Interactive execution

mlxk-json list --json        # 2.0 - JSON API
python -m mlxk2.cli list     # 2.0 - Module invocation

Package Names:

MLX-Knife 1.x: mlx-knife → mlxk command
MLX-Knife 2.0: mlxk-json → mlxk-json, mlxk2 commands

JSON API Documentation

📋 Complete API Specification: See the JSON API spec for comprehensive schema, error codes, and examples: JSON API Specification

Command Structure

All commands follow this JSON response format:

{
    "status": "success|error", 
    "command": "list|health|show|pull|rm|push",
    "data": { /* command-specific data */ },
    "error": null | { "message": "...", "details": "..." }
}

Examples

For full, up-to-date examples for every command, refer to the spec: JSON API Specification

List Models

mlxk-json list --json
# Output:
{
  "status": "success",
  "command": "list",
  "data": {
    "models": [
      {
        "name": "mlx-community/Phi-3-mini-4k-instruct-4bit",
        "hash": "a5339a41b2e3abcdefgh1234567890ab12345678",
        "size_bytes": 4613734656,
        "last_modified": "2024-10-15T08:23:41Z",
        "framework": "MLX",
        "model_type": "chat",
        "capabilities": ["text-generation", "chat"],
        "health": "healthy",
        "cached": true
      }
    ],
    "count": 1
  },
  "error": null
}

Health Check

mlxk-json health --json
# Output:
{
  "status": "success",
  "command": "health",
  "data": {
    "healthy": [
      { "name": "mlx-community/Phi-3-mini-4k-instruct-4bit", "status": "healthy", "reason": "Model is healthy" }
    ],
    "unhealthy": [],
    "summary": { "total": 1, "healthy_count": 1, "unhealthy_count": 0 }
  },
  "error": null
}

Show Model Details

mlxk-json show "Phi-3-mini" --json --files
# Output (simplified):
{
  "status": "success",
  "command": "show",
  "data": {
    "model": {
      "name": "mlx-community/Phi-3-mini-4k-instruct-4bit",
      "hash": "a5339a41b2e3abcdefgh1234567890ab12345678",
      "size_bytes": 4613734656,
      "framework": "MLX",
      "model_type": "chat",
      "capabilities": ["text-generation", "chat"],
      "last_modified": "2024-10-15T08:23:41Z",
      "health": "healthy",
      "cached": true
    },
    "files": [
      {"name": "config.json", "size": "1.2KB", "type": "config"},
      {"name": "model.safetensors", "size": "2.3GB", "type": "weights"}
    ],
    "metadata": null
  },
  "error": null
}

Hash Syntax Support

All commands support @hash syntax for specific model versions:

mlxk-json health "Qwen3@e96" --json     # Check specific hash
mlxk-json show "model@3df9bfd" --json   # Short hash matching
mlxk-json rm "Phi-3@e967" --json --force  # Delete specific version

HuggingFace Cache Safety

MLX-Knife 2.0 respects standard HuggingFace cache structure and practices:

Best Practices for Shared Environments

Read operations (list, health, show) always safe with concurrent processes
Write operations (pull, rm) coordinate during maintenance windows
Lock cleanup automatic but avoid during active downloads
Your responsibility: Coordinate with team, use good timing

Example Safe Workflow

# Check what's in cache (always safe)
mlxk-json list --json | jq '.data.count'

# Maintenance window - coordinate with team
mlxk-json rm "corrupted-model" --json --force
mlxk-json pull "replacement-model" --json

# Back to normal operations
mlxk-json health --json | jq '.data.summary'

Real-World Examples

🔗 Integration Reference: External projects should implement against the JSON API spec — this alpha phase validates that implementation matches documentation: JSON API Specification

Broke-Cluster Integration

# Get available model names for scheduling
MODELS=$(mlxk-json list --json | jq -r '.data.models[].name')

# Check cache health before deployment
HEALTH=$(mlxk-json health --json | jq '.data.summary.healthy_count')
if [ "$HEALTH" -eq 0 ]; then
    echo "No healthy models available"
    exit 1
fi

# Download required models
mlxk-json pull "mlx-community/Phi-3-mini-4k-instruct-4bit" --json

CI/CD Pipeline Usage

# Verify model integrity in CI
mlxk-json health --json | jq -e '.data.summary.unhealthy_count == 0'

# Clean up CI artifacts
mlxk-json rm "test-model-*" --json --force

# Pre-warm cache for deployment
mlxk-json pull "production-model" --json

Model Management Automation

# Find models by pattern
LARGE_MODELS=$(mlxk-json list --json | jq -r '.data.models[] | select(.name | contains("30B")) | .name')

# Show detailed info for analysis
for model in $LARGE_MODELS; do
    mlxk-json show "$model" --json --config | jq '.data.model_config'
done

Testing

The 2.0 test suite runs by default (pytest discovery points to tests_2.0/):

# Run 2.0 tests (default)
pytest -v

# Explicitly run legacy 1.x tests (not maintained on this branch)
pytest tests/ -v

# Test categories (2.0 example):
# - ADR-002 edge cases
# - Integration scenarios  
# - Model naming logic
# - Robustness testing

# Current status: all current 2.0 tests pass (some optional schema tests may be skipped without extras)

Revolutionary Test Architecture:

Isolated Cache System - Zero risk to user data
Atomic Context Switching - Production/test cache separation
Comprehensive Mock Models - Realistic test scenarios
Edge Case Coverage - All documented failure modes tested

Known Issues & Limitations

Critical Issues

Health Check False Positive: Health check may report incomplete downloads as healthy during model pull operations (affects both 1.1.0 and 2.0.0-alpha)

Alpha Limitations

Server and run not included (use 1.x)
Limited error message UX in some paths (to be refined)

GitHub Issues

Issue #18: Server signal handling limitation (known, will fix in 2.0.0-rc)
Issue #24: Lock cleanup command (planned for future release)

Development Status

Version Roadmap

2.0.0-alpha ← You are here (JSON API core complete)
2.0.0-beta: 6-8 weeks robust testing, production validation
2.0.0-rc: Server/run features, full 1.x parity
2.0.0-stable: Community validated, enterprise ready

Architecture Decisions

JSON-First: All output structured for scripting and automation
Cache Safety: Respects HuggingFace standards, no custom formats
Atomic Operations: Clean separation between test and production contexts
Backward Compatibility: Parallel deployment with 1.x maintained

Contributing

This branch follows the established MLX-Knife development patterns:

# Run quality checks
python test-multi-python.sh  # Tests across Python 3.9-3.13
./run_linting.sh             # Code quality validation

# Key files:
mlxk2/                       # 2.0.0 implementation
tests_2.0/                   # Alpha test suite  
docs/ADR/                    # Architecture decision records

See CONTRIBUTING.md for detailed guidelines.

Support & Feedback

Issues: GitHub Issues
Discussions: GitHub Discussions
API Specification: JSON API Specification
Documentation: See docs/ directory for technical details

For production use: Consider MLX-Knife 1.1.0 until 2.0.0-beta is available.

Alpha Testing Goals

✅ Validate JSON API specification matches implementation
✅ Real-world integration feedback from external projects
✅ Edge case discovery through broke-cluster usage
✅ API stability testing before beta release

MLX-Knife 2.0.0-alpha - Built for automation, tested for reliability, designed for the future.

Acknowledgments

Built for Apple Silicon using the MLX framework
Models hosted by the MLX Community on HuggingFace
Inspired by ollama's user experience

Made with ❤️ by The BROKE team
Version 2.0.0-alpha.2 | September 2025
🔮 Next: BROKE Cluster for multi-node deployments

16 KiB Raw Blame History