mirror of
https://github.com/cloudstack-llc/mlx-knife.git
synced 2026-07-01 20:44:14 -04:00
1f70b4984a
Community-driven feature development implementing key GitHub Issues:
- Fix Issue #7: Health check consistency for fuzzy model names - unified health check logic ensures
identical status regardless of name format (Phi-3 vs mlx-community/Phi-3-mini-4k-instruct-4bit)
- Add Issue #6: Repository name validation - pre-validation for HuggingFace Hub 96-character limit with
clear error messages
- Add Issue #13: Hash-based disambiguation - use commit hashes to resolve ambiguous model names (mlxk show
Llama@de2dfaf5 → mlx-community/Llama-3.3-70B-Instruct-4bit)
Enhanced user experience:
- Pure local hash resolution without external API calls, offline-capable
- Improved model name disambiguation logic for better workflow
- Real user workflow support - see hashes in mlxk list, use directly in other commands
332 lines
8.1 KiB
Markdown
332 lines
8.1 KiB
Markdown
# MLX Knife Testing Guide
|
|
|
|
## Current Status
|
|
|
|
✅ **114/114 tests passing** (August 2025)
|
|
✅ **Apple Silicon verified** (M1/M2/M3)
|
|
✅ **Python 3.9-3.13 compatible**
|
|
✅ **Production ready** - real model execution validated
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Install with test dependencies
|
|
pip install -e ".[test]"
|
|
|
|
# Download test model (required for most tests)
|
|
mlxk pull mlx-community/Phi-3-mini-4k-instruct-4bit
|
|
|
|
# Run all tests
|
|
pytest
|
|
|
|
# Fast unit tests only
|
|
pytest tests/unit/
|
|
|
|
# Before committing
|
|
ruff check mlx_knife/ --fix && mypy mlx_knife/ && pytest
|
|
```
|
|
|
|
## Why Local Testing?
|
|
|
|
MLX Knife requires **Apple Silicon hardware** and **real MLX models** for comprehensive testing:
|
|
|
|
- **Hardware Requirement**: MLX framework only runs on Apple Silicon (M1/M2/M3)
|
|
- **Model Requirement**: Tests use actual models (4GB+) for realistic validation
|
|
- **Industry Standard**: Local testing is normal for MLX projects
|
|
- **Quality Assurance**: Real hardware testing ensures actual functionality
|
|
|
|
This approach ensures our tests reflect real-world usage, not mocked behavior.
|
|
|
|
## Test Structure
|
|
|
|
```
|
|
tests/
|
|
├── conftest.py # Shared fixtures and utilities
|
|
├── integration/ # System-level integration tests (62 tests)
|
|
│ ├── test_core_functionality.py # Basic CLI operations
|
|
│ ├── test_health_checks.py # Model corruption detection
|
|
│ ├── test_process_lifecycle.py # Process management & cleanup
|
|
│ ├── test_run_command_advanced.py # Run command edge cases
|
|
│ └── test_server_functionality.py # OpenAI API server tests
|
|
└── unit/ # Module-level unit tests (52 tests)
|
|
├── test_cache_utils.py # Cache management functions
|
|
├── test_cli.py # CLI argument parsing
|
|
└── test_mlx_runner_memory.py # Memory management tests
|
|
```
|
|
|
|
## Test Prerequisites
|
|
|
|
### Required Setup
|
|
|
|
1. **Apple Silicon Mac** (M1/M2/M3)
|
|
2. **Python 3.9 or newer**
|
|
3. **Test dependencies installed**:
|
|
```bash
|
|
pip install -e ".[test]"
|
|
```
|
|
4. **At least one MLX model**:
|
|
```bash
|
|
mlxk pull mlx-community/Phi-3-mini-4k-instruct-4bit
|
|
```
|
|
|
|
### Optional Setup
|
|
|
|
For full test coverage, you may want additional models:
|
|
```bash
|
|
# Smaller model for quick tests
|
|
mlxk pull mlx-community/Phi-3-mini-128k-instruct-4bit
|
|
|
|
# Different architecture for variety
|
|
mlxk pull mlx-community/Mistral-7B-Instruct-v0.3-4bit
|
|
```
|
|
|
|
## Test Commands
|
|
|
|
### Basic Test Execution
|
|
|
|
```bash
|
|
# All tests (recommended before commits)
|
|
pytest
|
|
|
|
# Only integration tests (system-level)
|
|
pytest tests/integration/
|
|
|
|
# Only unit tests (fast)
|
|
pytest tests/unit/
|
|
|
|
# Verbose output
|
|
pytest -v
|
|
|
|
# Show test coverage
|
|
pytest --cov=mlx_knife --cov-report=html
|
|
```
|
|
|
|
### Specific Test Categories
|
|
|
|
```bash
|
|
# Process lifecycle tests (critical for production)
|
|
pytest tests/integration/test_process_lifecycle.py -v
|
|
|
|
# Health check robustness (model corruption detection)
|
|
pytest tests/integration/test_health_checks.py -v
|
|
|
|
# Core functionality (basic CLI commands)
|
|
pytest tests/integration/test_core_functionality.py -v
|
|
|
|
# Advanced run command tests
|
|
pytest tests/integration/test_run_command_advanced.py -v
|
|
|
|
# Server functionality tests
|
|
pytest tests/integration/test_server_functionality.py -v
|
|
```
|
|
|
|
### Test Filtering
|
|
|
|
```bash
|
|
# Run only basic operations tests
|
|
pytest -k "TestBasicOperations" -v
|
|
|
|
# Skip server tests (faster)
|
|
pytest -k "not server" -v
|
|
|
|
# Skip tests requiring actual models
|
|
pytest -k "not requires_model" -v
|
|
|
|
# Run only process lifecycle tests
|
|
pytest -k "process_lifecycle or zombie" -v
|
|
|
|
# Run health check tests only
|
|
pytest -k "health" -v
|
|
```
|
|
|
|
### Timeout and Performance
|
|
|
|
```bash
|
|
# Set custom timeout (default: 300s)
|
|
pytest --timeout=60
|
|
|
|
# Show slowest tests
|
|
pytest --durations=10
|
|
|
|
# Parallel execution (if pytest-xdist installed)
|
|
pytest -n auto
|
|
```
|
|
|
|
## Python Version Compatibility
|
|
|
|
### Verification Results (August 2025)
|
|
|
|
| Python Version | Status | Tests Passing |
|
|
|----------------|--------|---------------|
|
|
| 3.9.6 (macOS) | ✅ Verified | 114/114 |
|
|
| 3.10.x | ✅ Verified | 114/114 |
|
|
| 3.11.x | ✅ Verified | 114/114 |
|
|
| 3.12.x | ✅ Verified | 114/114 |
|
|
| 3.13.x | ✅ Verified | 114/114 |
|
|
|
|
All versions tested with real MLX model execution (Phi-3-mini-4k-instruct-4bit).
|
|
|
|
### Manual Multi-Python Testing
|
|
|
|
If you have multiple Python versions installed, you can verify compatibility:
|
|
|
|
```bash
|
|
# Run the multi-Python verification script
|
|
./test-multi-python.sh
|
|
|
|
# Or manually test specific versions
|
|
python3.9 -m venv test_39
|
|
source test_39/bin/activate
|
|
pip install -e ".[test]"
|
|
pytest
|
|
deactivate && rm -rf test_39
|
|
```
|
|
|
|
## Code Quality & Development
|
|
|
|
### Code Quality Tools
|
|
|
|
MLX Knife includes comprehensive code quality tools:
|
|
|
|
```bash
|
|
# Install development dependencies
|
|
pip install -e ".[dev]"
|
|
|
|
# Automatic code formatting and linting
|
|
ruff check mlx_knife/ --fix
|
|
|
|
# Type checking with mypy
|
|
mypy mlx_knife/
|
|
|
|
# Complete development workflow
|
|
ruff check mlx_knife/ --fix && mypy mlx_knife/ && pytest
|
|
```
|
|
|
|
### Development Workflow
|
|
|
|
Before committing changes:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# pre-commit-check.sh - Run before committing
|
|
set -e
|
|
|
|
echo "🧪 Running MLX Knife pre-commit checks..."
|
|
|
|
# 1. Code style
|
|
echo "Checking code style..."
|
|
ruff check mlx_knife/ --fix
|
|
|
|
# 2. Type checking
|
|
echo "Checking types..."
|
|
mypy mlx_knife/
|
|
|
|
# 3. Quick smoke test
|
|
echo "Running quick tests..."
|
|
pytest tests/unit/ -v
|
|
|
|
echo "✅ All checks passed. Safe to commit!"
|
|
```
|
|
|
|
## Local Development Testing
|
|
|
|
### Adding New Tests
|
|
1. **Integration tests** go in `tests/integration/`
|
|
2. **Unit tests** go in `tests/unit/`
|
|
3. Use existing fixtures from `conftest.py`
|
|
4. Follow naming: `test_*.py`, `Test*` classes, `test_*` methods
|
|
|
|
### Test Categories (Markers)
|
|
```python
|
|
@pytest.mark.integration # Slower system tests
|
|
@pytest.mark.unit # Fast isolated tests
|
|
@pytest.mark.slow # Tests >30 seconds
|
|
@pytest.mark.requires_model # Needs actual MLX model
|
|
@pytest.mark.network # Requires internet
|
|
```
|
|
|
|
### Mock Utilities
|
|
- `mock_model_cache()`: Creates fake model directories
|
|
- `mlx_knife_process()`: Manages subprocess lifecycle
|
|
- `process_monitor()`: Tracks zombie processes
|
|
- `temp_cache_dir()`: Isolated test environment
|
|
|
|
## Test Philosophy
|
|
|
|
Following the **"Process Hygiene over Edge-Case Perfection"** principle:
|
|
|
|
1. **Process Cleanliness**: No zombies, no leaks ✅
|
|
2. **Health Checks**: Reliable corruption detection ✅
|
|
3. **Core Operations**: Basic functionality works ✅
|
|
4. **Error Handling**: Graceful failures ✅
|
|
|
|
The test suite validates production readiness with real Apple Silicon hardware and actual MLX models.
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**Tests hang forever:**
|
|
```bash
|
|
pytest --timeout=60
|
|
```
|
|
|
|
**Import errors:**
|
|
```bash
|
|
pip install -e ".[test]"
|
|
```
|
|
|
|
**Process cleanup issues:**
|
|
```bash
|
|
ps aux | grep mlx_knife # Check for zombies
|
|
```
|
|
|
|
**Cache conflicts:**
|
|
```bash
|
|
export HF_HOME="/tmp/test_cache"
|
|
pytest --cache-clear
|
|
```
|
|
|
|
### Test Environment
|
|
|
|
```bash
|
|
# Clean test run
|
|
rm -rf .pytest_cache __pycache__
|
|
pytest tests/ -v --cache-clear
|
|
|
|
# Debug specific test
|
|
pytest tests/integration/test_health_checks.py::TestHealthCheckRobustness::test_healthy_model_detection -v -s
|
|
```
|
|
|
|
## Contributing Test Results
|
|
|
|
When submitting PRs, please include:
|
|
|
|
1. **Your test environment**:
|
|
- macOS version
|
|
- Apple Silicon chip (M1/M2/M3)
|
|
- Python version
|
|
- Which model(s) you tested with
|
|
|
|
2. **Test results summary**:
|
|
```
|
|
Platform: macOS 14.5, M2 Pro
|
|
Python: 3.11.6
|
|
Model: Phi-3-mini-4k-instruct-4bit
|
|
Results: 114/114 tests passed
|
|
```
|
|
|
|
3. **Any issues encountered** and how you resolved them
|
|
|
|
## Summary
|
|
|
|
**MLX Knife 1.0.3 Testing Status:**
|
|
|
|
✅ **Production Ready** - 114/114 tests passing
|
|
✅ **Multi-Python Support** - Python 3.9-3.13 verified
|
|
✅ **Code Quality** - ruff/mypy integration working
|
|
✅ **Real Model Testing** - Phi-3-mini execution confirmed
|
|
✅ **Memory Management** - Context managers prevent leaks
|
|
✅ **Exception Safety** - Context managers ensure cleanup
|
|
|
|
This comprehensive testing framework validates MLX Knife's **production readiness** through local testing on real Apple Silicon hardware with actual MLX models. |