mirror of
https://github.com/BillyOutlast/proxmox-rocm-toolkit.git
synced 2026-07-01 19:54:40 -04:00
Update memory profile script and troubleshooting guide for multimodal models
This commit is contained in:
@@ -34,6 +34,7 @@ Typical log pattern:
|
||||
### Why this happens
|
||||
|
||||
Ollama can detect ROCm correctly but still OOM during model graph/KV allocation, often due to high context size or aggressive defaults.
|
||||
If logs show a multimodal architecture such as `qwen3vl` and stack frames under `multimodal.go`, a small-parameter model can still fail due to multimodal graph reservation.
|
||||
|
||||
### Fix
|
||||
|
||||
@@ -59,6 +60,8 @@ Preset selection quick guide:
|
||||
| 20B to 32B | `balanced` | Best first choice for stability on large models. |
|
||||
| 30B+ with load failures/OOM | `safe` | Uses lower context, `KEEP_ALIVE=0`, and higher GPU overhead reserve. |
|
||||
|
||||
For troubleshooting, test a text-only model first (for example `qwen3:8b`) before VL/multimodal models.
|
||||
|
||||
If a model fails to load, step down from `max` → `balanced` → `safe` before manual tuning.
|
||||
|
||||
Manual method:
|
||||
|
||||
@@ -69,12 +69,12 @@ done
|
||||
|
||||
case "$PRESET" in
|
||||
safe)
|
||||
PRESET_CONTEXT_LENGTH="4096"
|
||||
PRESET_CONTEXT_LENGTH="2048"
|
||||
PRESET_NUM_PARALLEL="1"
|
||||
PRESET_MAX_LOADED_MODELS="1"
|
||||
PRESET_FLASH_ATTENTION="false"
|
||||
PRESET_KEEP_ALIVE="0"
|
||||
PRESET_GPU_OVERHEAD_BYTES="2147483648"
|
||||
PRESET_GPU_OVERHEAD_BYTES="4294967296"
|
||||
;;
|
||||
balanced)
|
||||
PRESET_CONTEXT_LENGTH="8192"
|
||||
@@ -202,6 +202,7 @@ set -euo pipefail
|
||||
install -d /etc/systemd/system/ollama.service.d
|
||||
cat >/etc/systemd/system/ollama.service.d/20-memory-tuning.conf <<EOF
|
||||
[Service]
|
||||
LimitMEMLOCK=infinity
|
||||
Environment=OLLAMA_CONTEXT_LENGTH=${CONTEXT_LENGTH}
|
||||
Environment=OLLAMA_MAX_LOADED_MODELS=${MAX_LOADED_MODELS}
|
||||
Environment=OLLAMA_NUM_PARALLEL=${NUM_PARALLEL}
|
||||
|
||||
Reference in New Issue
Block a user