Release MLX Knife 1.0.4 - Issue #14 Chat Self-Conversation Fix & Web UI Overhaul

Fix Issue #14: Interactive chat self-conversation bug resolved
  - Added context-sensitive chat stop tokens (\nHuman:, \nAssistant:, \nYou:, \nUser:)
  - Smart priority system: native model stop tokens first, chat tokens as fallback
  - Affects both `mlxk run` and `mlxk server` modes with backward compatibility

  Web UI complete transformation (simple_chat.html):
  - 🦫 Beaver branding replaces 🔪 knife emoji
  - Model and chat history persistence across browser sessions
  - Smart model switching with option to keep or clear chat history

  Testing infrastructure enhancements:
  - Automated server testing with RAM-aware model filtering
  - 15 new regression tests across 7+ MLX models validating Issue #14 fix
  - Comprehensive TESTING.md guide for server-based testing

  All 114 tests passing
This commit is contained in:
The BROKE Team
2025-08-19 20:43:44 +02:00
parent 1f70b4984a
commit 6117e571ca
10 changed files with 806 additions and 34 deletions
+120 -4
View File
@@ -126,8 +126,8 @@ pytest tests/integration/test_server_functionality.py -v
# Run only basic operations tests
pytest -k "TestBasicOperations" -v
# Skip server tests (faster)
pytest -k "not server" -v
# Server tests are automatically excluded by default
# (no command needed - this is the default behavior)
# Skip tests requiring actual models
pytest -k "not requires_model" -v
@@ -243,6 +243,7 @@ echo "✅ All checks passed. Safe to commit!"
@pytest.mark.slow # Tests >30 seconds
@pytest.mark.requires_model # Needs actual MLX model
@pytest.mark.network # Requires internet
@pytest.mark.server # Requires MLX Knife server (excluded from default pytest)
```
### Mock Utilities
@@ -320,7 +321,7 @@ When submitting PRs, please include:
## Summary
**MLX Knife 1.0.3 Testing Status:**
**MLX Knife 1.0.4 Testing Status:**
✅ **Production Ready** - 114/114 tests passing
✅ **Multi-Python Support** - Python 3.9-3.13 verified
@@ -328,5 +329,120 @@ When submitting PRs, please include:
✅ **Real Model Testing** - Phi-3-mini execution confirmed
✅ **Memory Management** - Context managers prevent leaks
✅ **Exception Safety** - Context managers ensure cleanup
✅ **Chat Bug Fixed** - Issue #14 self-conversation regression tests added
✅ **Server Tests** - Automated MLX Knife server testing infrastructure
This comprehensive testing framework validates MLX Knife's **production readiness** through local testing on real Apple Silicon hardware with actual MLX models.
This comprehensive testing framework validates MLX Knife's **production readiness** through local testing on real Apple Silicon hardware with actual MLX models.
## Server-Based Testing (Advanced)
Some tests require a running MLX Knife server with loaded models. These tests are marked with `@pytest.mark.server` and are **not run by default** with `pytest`.
### Why Separate Server Tests?
- **Test count varies** by loaded models (makes CI reporting inconsistent)
- **Large memory requirements** - need different models for different RAM sizes
- **Longer execution time** - each model needs to load individually
- **Manual setup required** - need to download appropriate models first
### Prerequisites for Server Tests
| System RAM | Recommended Models | Commands |
|------------|-------------------|----------|
| **16GB** | Small models only | `mlxk pull mlx-community/Qwen2.5-0.5B-Instruct-4bit`<br>`mlxk pull mlx-community/Llama-3.2-1B-Instruct-4bit`<br>`mlxk pull mlx-community/Llama-3.2-3B-Instruct-4bit` |
| **32GB** | + Medium models | `mlxk pull mlx-community/Phi-3-mini-4k-instruct-4bit`<br>`mlxk pull mlx-community/Mistral-7B-Instruct-v0.2-4bit`<br>`mlxk pull mlx-community/Mixtral-8x7B-Instruct-v0.1-4bit` |
| **64GB** | + Large models | `mlxk pull mlx-community/Mistral-Small-3.2-24B-Instruct-2506-4bit`<br>`mlxk pull mlx-community/Qwen3-30B-A3B-Instruct-2507-4bit`<br>`mlxk pull mlx-community/Llama-3.3-70B-Instruct-4bit` |
| **96GB+** | + Huge models | `mlxk pull mlx-community/Qwen3-Coder-480B-A35B-Instruct-4bit` |
### Running Server Tests
**Issue #14 Regression Tests** (Chat Self-Conversation Bug):
```bash
# Set environment
export HF_HOME=/path/to/your/cache
# Smoke test first (see which models are available)
python tests/integration/test_issue_14.py
# Run server tests only (excluded from default pytest)
pytest -m server -v
# Run specific Issue #14 tests
pytest tests/integration/test_issue_14.py -m server -v
```
**Expected Output:**
```
🦫 MLX Knife Issue #14 Test - Smoke Test
==================================================
📊 Safe models for this system: 6
💾 System RAM: 64GB total, 40GB available
🎯 mlx-community/Mistral-7B-Instruct-v0.2-4bit
└─ Size: 7B, RAM needed: 8GB
🎯 mlx-community/Llama-3.2-3B-Instruct-4bit
└─ Size: 3B, RAM needed: 4GB
[...]
========== test session starts ==========
tests/integration/test_issue_14.py::test_server_health[mlx_server] PASSED
tests/integration/test_issue_14.py::test_issue_14_self_conversation_regression_original[mlx-community/Mistral-7B-Instruct-v0.2-4bit-7B-8] PASSED
[...6 more model tests...]
========== 7 passed in 45.23s ==========
```
### Future Server Tests (Planned)
**Issue #15** - Token Limit vs Stop Token Race Condition:
```bash
pytest tests/integration/test_issue_15.py -m server -v
```
**Issue #16** - Interactive vs Server Token Policies:
```bash
pytest tests/integration/test_issue_16.py -m server -v
```
### Troubleshooting Server Tests
**Permission warnings are normal:**
```
WARNING: ⚠️ Cannot scan network connections (permission denied)
INFO: 🔧 Falling back to process-based cleanup only
```
This is expected on macOS - the tests continue with process-based cleanup.
**Memory issues:**
- Tests automatically skip models exceeding 80% available RAM
- Use smaller models if you see consistent memory failures
- Consider external SSD for model cache to reduce memory pressure
**Server startup failures:**
```bash
# Debug server manually
python -m mlx_knife.cli server --port 8000
# Check model health
mlxk health
# Verify environment
echo $HF_HOME
```
### Adding New Server Tests
When contributing server-based tests:
```python
@pytest.mark.server
def test_new_feature(mlx_server, model_name: str, size_str: str, ram_needed: int):
"""Test new feature with MLX models."""
# Use mlx_server fixture for automatic server management
# Test implementation here
```
1. **Mark with `@pytest.mark.server`** - excludes from default `pytest`
2. **Use `mlx_server` fixture** - automatic server lifecycle management
3. **Test RAM requirements** - use `get_safe_models_for_system()` helper
4. **Document in TESTING.md** - add to this guide