mlx-knife/TESTING.md

# MLX Knife Testing Guide

## Current Status

✅ **114/114 tests passing** (August 2025)
✅ **Apple Silicon verified** (M1/M2/M3)
✅ **Python 3.9-3.13 compatible**
✅ **Production ready** - real model execution validated

## Quick Start

```bash
# Install with test dependencies
pip install -e ".[test]"

# Download test model (required for most tests)
mlxk pull mlx-community/Phi-3-mini-4k-instruct-4bit

# Run all tests
pytest

# Fast unit tests only
pytest tests/unit/

# Before committing
ruff check mlx_knife/ --fix && mypy mlx_knife/ && pytest
```

## Why Local Testing?

MLX Knife requires **Apple Silicon hardware** and **real MLX models** for comprehensive testing:

- **Hardware Requirement**: MLX framework only runs on Apple Silicon (M1/M2/M3)
- **Model Requirement**: Tests use actual models (4GB+) for realistic validation
- **Industry Standard**: Local testing is normal for MLX projects
- **Quality Assurance**: Real hardware testing ensures actual functionality

This approach ensures our tests reflect real-world usage, not mocked behavior.

## Test Structure

```
tests/
├── conftest.py                     # Shared fixtures and utilities
├── integration/                    # System-level integration tests (62 tests)
│   ├── test_core_functionality.py      # Basic CLI operations
│   ├── test_health_checks.py           # Model corruption detection
│   ├── test_process_lifecycle.py       # Process management & cleanup
│   ├── test_run_command_advanced.py    # Run command edge cases
│   └── test_server_functionality.py    # OpenAI API server tests
└── unit/                          # Module-level unit tests (52 tests)
    ├── test_cache_utils.py            # Cache management functions
    ├── test_cli.py                    # CLI argument parsing
    └── test_mlx_runner_memory.py     # Memory management tests
```

## Test Prerequisites

### Required Setup

1. **Apple Silicon Mac** (M1/M2/M3)
2. **Python 3.9 or newer**
3. **Test dependencies installed**:
   ```bash
   pip install -e ".[test]"
   ```
4. **At least one MLX model**:
   ```bash
   mlxk pull mlx-community/Phi-3-mini-4k-instruct-4bit
   ```

### Optional Setup

For full test coverage, you may want additional models:
```bash
# Smaller model for quick tests
mlxk pull mlx-community/Phi-3-mini-128k-instruct-4bit

# Different architecture for variety
mlxk pull mlx-community/Mistral-7B-Instruct-v0.3-4bit
```

## Test Commands

### Basic Test Execution

```bash
# All tests (recommended before commits)
pytest

# Only integration tests (system-level)
pytest tests/integration/

# Only unit tests (fast)
pytest tests/unit/

# Verbose output
pytest -v

# Show test coverage
pytest --cov=mlx_knife --cov-report=html
```

### Specific Test Categories

```bash
# Process lifecycle tests (critical for production)
pytest tests/integration/test_process_lifecycle.py -v

# Health check robustness (model corruption detection)
pytest tests/integration/test_health_checks.py -v

# Core functionality (basic CLI commands)
pytest tests/integration/test_core_functionality.py -v

# Advanced run command tests
pytest tests/integration/test_run_command_advanced.py -v

# Server functionality tests
pytest tests/integration/test_server_functionality.py -v
```

### Test Filtering

```bash
# Run only basic operations tests
pytest -k "TestBasicOperations" -v

# Skip server tests (faster)
pytest -k "not server" -v

# Skip tests requiring actual models
pytest -k "not requires_model" -v

# Run only process lifecycle tests
pytest -k "process_lifecycle or zombie" -v

# Run health check tests only
pytest -k "health" -v
```

### Timeout and Performance

```bash
# Set custom timeout (default: 300s)
pytest --timeout=60

# Show slowest tests
pytest --durations=10

# Parallel execution (if pytest-xdist installed)
pytest -n auto
```

## Python Version Compatibility

### Verification Results (August 2025)

| Python Version | Status | Tests Passing |
|----------------|--------|---------------|
| 3.9.6 (macOS)  | ✅ Verified | 114/114 |
| 3.10.x         | ✅ Verified | 114/114 |
| 3.11.x         | ✅ Verified | 114/114 |
| 3.12.x         | ✅ Verified | 114/114 |
| 3.13.x         | ✅ Verified | 114/114 |

All versions tested with real MLX model execution (Phi-3-mini-4k-instruct-4bit).

### Manual Multi-Python Testing

If you have multiple Python versions installed, you can verify compatibility:

```bash
# Run the multi-Python verification script
./test-multi-python.sh

# Or manually test specific versions
python3.9 -m venv test_39
source test_39/bin/activate
pip install -e ".[test]"
pytest
deactivate && rm -rf test_39
```

## Code Quality & Development

### Code Quality Tools

MLX Knife includes comprehensive code quality tools:

```bash
# Install development dependencies
pip install -e ".[dev]"

# Automatic code formatting and linting
ruff check mlx_knife/ --fix

# Type checking with mypy
mypy mlx_knife/

# Complete development workflow
ruff check mlx_knife/ --fix && mypy mlx_knife/ && pytest
```

### Development Workflow

Before committing changes:

```bash
#!/bin/bash
# pre-commit-check.sh - Run before committing
set -e

echo "🧪 Running MLX Knife pre-commit checks..."

# 1. Code style
echo "Checking code style..."
ruff check mlx_knife/ --fix

# 2. Type checking
echo "Checking types..."
mypy mlx_knife/

# 3. Quick smoke test
echo "Running quick tests..."
pytest tests/unit/ -v

echo "✅ All checks passed. Safe to commit!"
```

## Local Development Testing

### Adding New Tests
1. **Integration tests** go in `tests/integration/`
2. **Unit tests** go in `tests/unit/`
3. Use existing fixtures from `conftest.py`
4. Follow naming: `test_*.py`, `Test*` classes, `test_*` methods

### Test Categories (Markers)
```python
@pytest.mark.integration  # Slower system tests
@pytest.mark.unit         # Fast isolated tests
@pytest.mark.slow         # Tests >30 seconds
@pytest.mark.requires_model  # Needs actual MLX model
@pytest.mark.network      # Requires internet
```

### Mock Utilities
- `mock_model_cache()`: Creates fake model directories
- `mlx_knife_process()`: Manages subprocess lifecycle
- `process_monitor()`: Tracks zombie processes
- `temp_cache_dir()`: Isolated test environment

## Test Philosophy

Following the **"Process Hygiene over Edge-Case Perfection"** principle:

1. **Process Cleanliness**: No zombies, no leaks ✅
2. **Health Checks**: Reliable corruption detection ✅
3. **Core Operations**: Basic functionality works ✅
4. **Error Handling**: Graceful failures ✅

The test suite validates production readiness with real Apple Silicon hardware and actual MLX models.

## Troubleshooting

### Common Issues

**Tests hang forever:**
```bash
pytest --timeout=60
```

**Import errors:**
```bash
pip install -e ".[test]"
```

**Process cleanup issues:**
```bash
ps aux | grep mlx_knife  # Check for zombies
```

**Cache conflicts:**
```bash
export HF_HOME="/tmp/test_cache"
pytest --cache-clear
```

### Test Environment

```bash
# Clean test run
rm -rf .pytest_cache __pycache__
pytest tests/ -v --cache-clear

# Debug specific test
pytest tests/integration/test_health_checks.py::TestHealthCheckRobustness::test_healthy_model_detection -v -s
```

## Contributing Test Results

When submitting PRs, please include:

1. **Your test environment**:
   - macOS version
   - Apple Silicon chip (M1/M2/M3)
   - Python version
   - Which model(s) you tested with

2. **Test results summary**:
   ```
   Platform: macOS 14.5, M2 Pro
   Python: 3.11.6
   Model: Phi-3-mini-4k-instruct-4bit
   Results: 114/114 tests passed
   ```

3. **Any issues encountered** and how you resolved them

## Summary

**MLX Knife 1.0.3 Testing Status:**

✅ **Production Ready** - 114/114 tests passing
✅ **Multi-Python Support** - Python 3.9-3.13 verified
✅ **Code Quality** - ruff/mypy integration working
✅ **Real Model Testing** - Phi-3-mini execution confirmed
✅ **Memory Management** - Context managers prevent leaks
✅ **Exception Safety** - Context managers ensure cleanup

This comprehensive testing framework validates MLX Knife's **production readiness** through local testing on real Apple Silicon hardware with actual MLX models.