2.0.0-alpha.3: lenient MLX detection + push branch handling

- Detect MLX/chat via README front‑matter + tokenizer; unify list/show; human list filters aligned (Refs #31)
  - Push: create missing branch with --create and retry once on “Invalid rev id”; tolerate missing branches
  offline; no‑op still creates branch with --create
  - Tests: add offline retry test; detection/human coverage; live list (opt‑in); 98/98 passing
  - Docs/Meta: CHANGELOG/TESTING/README/SECURITY/CLAUDE updated; hard split 1.x from this branch; Apache‑2.0 + NOTICE
This commit is contained in:
The BROKE Cluster Team
2025-09-08 01:08:57 +02:00
parent eedb91b75c
commit 3f57248121
44 changed files with 1137 additions and 8538 deletions
+62 -47
View File
@@ -2,10 +2,10 @@
## Current Status
**150/150 tests passing** (August 2025) - **STABLE RELEASE** 🚀
**98/98 tests passing** (September 2025) — 2.0.0-alpha.3; 9 skipped (opt-in)
**Apple Silicon verified** (M1/M2/M3)
**Python 3.9-3.13 compatible**
**Production ready** - comprehensive testing with real model execution
**Alpha (CLI/JSON)** — default suite green locally (no inference)
**Isolated test system** - user cache stays pristine with temp cache isolation
**3-category test strategy** - optimized for performance and safety
@@ -15,32 +15,34 @@
# Install package + tests
pip install -e .[test]
# Download test model (optional - most tests use isolated cache)
mlxk pull mlx-community/Phi-3-mini-4k-instruct-4bit
# Download test model (optional; most 2.0 tests use isolated cache)
# Only needed for opt-in live tests or local experiments
# mlxk pull mlx-community/Phi-3-mini-4k-instruct-4bit
# Run 2.0 tests (default: tests_2.0/)
# Run 2.0 tests (default discovery: tests_2.0/)
pytest -v
# Run legacy 1.x suite explicitly (not maintained here)
pytest tests/ -v
# Fast unit tests only
pytest tests/unit/
# Live tests (opt-in; not part of default):
# - Live push (requires env):
# export MLXK2_LIVE_PUSH=1
# export HF_TOKEN=...; export MLXK2_LIVE_REPO=org/model; export MLXK2_LIVE_WORKSPACE=/abs/path
# pytest -q -m live_push
# - Live list (uses your HF_HOME; requires at least one MLX chat + one MLX base in cache):
# export HF_HOME=/path/to/huggingface/cache
# pytest -q -m live_list
# Before committing
ruff check mlx_knife/ --fix && mypy mlx_knife/ && pytest
ruff check mlxk2/ --fix && mypy mlxk2/ && pytest -v
```
## Why Local Testing?
MLX Knife requires **Apple Silicon hardware** and **real MLX models** for comprehensive testing:
MLX Knife tests fall into two categories for 2.0:
- **Hardware Requirement**: MLX framework only runs on Apple Silicon (M1/M2/M3)
- **Model Requirement**: Tests use actual models (4GB+) for realistic validation
- **Industry Standard**: Local testing is normal for MLX projects
- **Quality Assurance**: Real hardware testing ensures actual functionality
- CLI/JSON tests (default): Run on any supported Python on macOS; no model inference required; use an isolated HF cache (no network).
- Live/Inference tests (opt-in; future RC for server/run): Require Apple Silicon (M1/M2/M3) and real models.
This approach ensures our tests reflect real-world usage, not mocked behavior.
For push/list live tests in 2.0 alpha, see the opt-in commands above.
## Test Structure
@@ -49,22 +51,34 @@ This approach ensures our tests reflect real-world usage, not mocked behavior.
```
tests_2.0/
├── __init__.py
├── conftest.py # Isolated test cache, fixtures
├── test_edge_cases_adr002.py # Edge-case naming, ADR-002
├── test_health_multifile.py # Multi-file health completeness
├── test_integration.py # Model resolution, health integration
├── test_issue_27.py # Health policy consistency
├── test_model_naming.py # Pattern/@hash parsing and resolution
├── test_robustness.py # General robustness tests
├── test_json_api_list.py # JSON API v0.1.2 (list contract)
├── test_json_api_show.py # JSON API v0.1.2 (show contract)
── spec/
├── test_cli_version_output.py # version command JSON shape
├── test_spec_doc_examples_validate.py # docs examples vs schema (jsonschema)
── test_spec_version_sync.py # docs version == code constant
├── conftest.py # Isolated test cache, fixtures
├── test_human_output.py # Human rendering (list/health)
├── test_detection_readme_tokenizer.py # Issue #31 (README/tokenizer detection)
├── test_json_api_list.py # JSON API (list contract)
├── test_json_api_show.py # JSON API (show contract)
├── test_edge_cases_adr002.py # Edge-case naming, ADR-002
├── test_health_multifile.py # Multi-file health completeness
├── test_integration.py # Model resolution, health integration
├── test_issue_27.py # Health policy consistency
── test_model_naming.py # Pattern/@hash parsing and resolution
├── test_robustness.py # General robustness tests
├── test_cli_push_args.py # Push CLI args (offline)
── test_push_minimal.py # Push minimal (offline)
├── test_push_extended.py # Push extended (offline)
├── test_push_dry_run.py # Push dry-run planning (offline)
├── test_push_workspace_check.py # Push check-only (offline)
├── spec/
│ ├── test_cli_version_output.py # version command JSON shape
│ ├── test_spec_doc_examples_validate.py # docs examples vs schema
│ ├── test_spec_version_sync.py # docs version == code constant
│ ├── test_push_error_matches_schema.py # push error schema
│ └── test_push_output_matches_schema.py # push success schema
└── live/ # Opt-in live tests (markers)
├── test_push_live.py # requires MLXK2_LIVE_PUSH, HF_TOKEN
└── test_list_human_live.py # requires HF_HOME
```
Note: This tree is illustrative (not exhaustive). Push-related tests are documented in the dedicated "Push Testing (2.0)" section below to avoid drift.
Note: Live tests are opt-in via markers (`-m live_push`, `-m live_list`) and environment. Default `pytest` discovery runs only the offline suite above.
## Push Testing (2.0)
@@ -76,7 +90,7 @@ This section summarizes what our test suite covers for the experimental `push` f
- Args:
- `--private` (required in alpha): Safety gate to avoid public uploads.
- `--create`: Create the repository if it does not exist (model repo).
- `--branch`: Target branch, default `main`.
- `--branch`: Target branch, default `main`. Missing branches are tolerated; with `--create`, the branch is proactively created (and upload retried once if the hub initially rejects the revision).
- `--commit`: Commit message, default `"mlx-knife push"`.
- `--check-only`: Analyze workspace locally; no network call; returns `data.workspace_health`.
- `--dry-run`: Compare local workspace to the remote branch and summarize changes without uploading (requires repo read access).
@@ -118,6 +132,7 @@ Notes on output verbosity and behavior
- Human mode is chatty by default: progress + oneliner summary. `--verbose` appends the commit URL when present.
- Nochanges detection: If the hub reports “No files have been modified… Skipping to prevent empty commit.”, JSON sets `no_changes: true`, `uploaded_files_count: 0`, and nulls `commit_sha`/`commit_url`. Human shows “— no changes”. This hub signal is preferred over inferring from file lists.
- `--dry-run` human output: prints a concise plan line `dry-run: +A ~M -D` (modifications are an approximation and may be `~?` in rare cases).
- Branch creation with `--create`: Even if the push is a noop, the target branch is created upfront.
Examples (expected)
- Noop repush (JSON): `commit_sha: null`, `commit_url: null`, `uploaded_files_count: 0`, `no_changes: true`, `message` mirrors hub text, `hf_logs` contains hub lines.
@@ -198,18 +213,19 @@ Spec/Schema
- **Schema shape:** Push success/error outputs validate against `docs/json-api-schema.json`.
- **No-op push:** Detects `no_changes: true`, sets `uploaded_files_count: 0`, carries hub message into JSON (`message`/`hf_logs`), and human output shows "no changes" without duplicate logs.
- **Commit path:** Extracts `commit_sha`, `commit_url`, `change_summary` (+/~/), correct `uploaded_files_count`; human `--verbose` includes URL.
- **Repo/Branch handling:** Missing repo requires `--create`; with `--create` sets `created_repo: true`. Missing branch is tolerated; upload creates it.
- **Repo/Branch handling:** Missing repo requires `--create`; with `--create` sets `created_repo: true`. Missing branch is tolerated; upload attempts proceed. With `--create`, the branch is proactively created and the upload is retried once if the hub rejects the revision (e.g., “Invalid rev id”).
- **Ignore rules:** `.hfignore` is merged with default ignores and forwarded to the hub.
Files:
- `tests_2.0/test_cli_push_args.py` (CLI errors and JSON outputs)
- `tests_2.0/test_push_extended.py` (no-op vs commit, branch/repo, .hfignore, human)
- `tests_2.0/test_push_extended.py` (no-op vs commit, branch/repo, .hfignore, human; includes retry on invalid revision with `--create`)
- `tests_2.0/spec/test_push_output_matches_schema.py` (schema success path)
Run (venv39):
- `source venv39/bin/activate && pip install -e .`
- `pytest -q tests_2.0/test_cli_push_args.py tests_2.0/test_push_extended.py`
- `pytest -q tests_2.0/spec/test_push_output_matches_schema.py`
- Targeted retry test: `pytest -q tests_2.0/test_push_extended.py::test_push_retry_creates_branch_on_upload_revision_error`
**Live (opt-in / wet)**
- Purpose: sanity-check real HF behavior (auth, no-op vs commit, URLs).
@@ -282,7 +298,7 @@ Notes
- Not part of the 2.0 default run; execute explicitly with `pytest tests/ -v`.
- Contains extensive integration/server tests unrelated to the 2.0 JSON CLI.
## 3-Category Test Strategy (MLX Knife 1.1.0+)
## Legacy 1.x: 3-Category Test Strategy (main)
MLX Knife uses a **3-category test strategy** to balance test isolation, performance, and user cache protection:
@@ -722,21 +738,20 @@ When submitting PRs, please include:
- Python version
- Which model(s) you tested with
2. **Test results summary**:
```
Platform: macOS 14.5, M2 Pro
Python: 3.11.6
Model: Phi-3-mini-4k-instruct-4bit
Results: 150/150 tests passed
```
2. **Test results summary (2.0)**:
```
Platform: macOS 14.5, M2 Pro
Python: 3.11.6
Results: 98/98 tests passed; 9 skipped (opt-in)
```
3. **Any issues encountered** and how you resolved them
## Summary
**MLX Knife 1.1.0 STABLE Testing Status:**
**Legacy 1.x Testing Status (main):**
✅ **Production Ready** - 150/150 tests passing
✅ **Stable** - 150/150 tests passing
✅ **Isolated Test System** - User cache stays pristine with temp cache isolation
✅ **3-Category Strategy** - Optimized for performance and safety
✅ **Multi-Python Support** - Python 3.9-3.13 verified
@@ -748,11 +763,11 @@ When submitting PRs, please include:
✅ **LibreSSL Warning Fix** - Issue #22: macOS Python 3.9 warning suppression
✅ **Lock Cleanup Fix** - Issue #23: Enhanced rm command with lock cleanup
This comprehensive testing framework validates MLX Knife's **production readiness** through isolated testing with automatic model downloads and separate real MLX validation.
This testing framework validates MLX Knife's stability through isolated testing with automatic model downloads and separate real MLX validation.
## Server-Based Testing (Advanced)
## Server-Based Testing (Legacy 1.x; 2.0 RC planned)
Some tests require a running MLX Knife server with loaded models. These tests are marked with `@pytest.mark.server` and are **not run by default** with `pytest`.
In 1.x (main), some tests require a running MLX Knife server with loaded models and are marked with `@pytest.mark.server`. For 2.0, server/run will return in the RC; until then, server tests are legacy-only.
### Why Separate Server Tests?