fix: P0 bugfixes + test infrastructure + benchmark metadata sync

P0 Bugfixes: - cache.py: Handle empty HF_HOME strings in get_current_cache_root() - clone.py: Remove obsolete _validate_same_volume() check - common.py: Use importlib.metadata instead of importing transformers Test Infrastructure: - runner/__init__.py: Replace "mock" fallback with clear RuntimeError - Fix mock paths in test_runner_core, test_token_limits, etc. - Add VISION_TEST_MODELS + AUDIO_TEST_MODELS fallbacks - Portfolio fixtures work with and without HF_HOME Benchmark Fixes: - Sort models/tests alphabetically instead of by regression % - Fix vision metadata drift: pixtral-12b-8bit → pixtral-12b-4bit Documentation: - ADR-022: Workspace-First Paradigm (draft) - ADR-018: Phase 2 details expanded - TESTING.md/TESTING-DETAILS.md: Fallback docs updated
2026-07-01 20:44:14 -04:00 · 2026-02-10 15:52:36 +01:00
parent 7f10187bee
commit dab7ffb6fc
21 changed files with 1443 additions and 278 deletions
@@ -271,12 +271,9 @@ HF_HOME=/path/to/cache pytest -m live_e2e -v

 **Stop token validation** (ADR-009):
 ```bash
-# Option A: Portfolio Discovery (recommended)
-export HF_HOME=/path/to/cache
 pytest -m live_stop_tokens -v
-
-# Option B: Hardcoded models (requires 3 specific models in cache)
-# See TESTING-DETAILS.md for model list
+# Uses Portfolio Discovery if models found, else fallback models
+# See TESTING-DETAILS.md "Required Models for Live Tests"
 ```

 **Push/Clone tests** (alpha features):