yindo/rocm-automated

Fork 0

mirror of https://github.com/BillyOutlast/rocm-automated.git synced 2026-02-04 03:51:19 +01:00

Go to file

John Doe 3bf9b9f020 remove sudo

2026-01-30 19:56:11 -05:00

.github/workflows

remove sudo

2026-01-30 19:56:11 -05:00

build-scripts

swapping back to podman

2025-11-23 13:20:08 -05:00

comfyui-build

adding ComfyUI Easy Use as preisntalled pakcage for use with image_qwen_image.json

2025-11-27 13:56:07 -05:00

Dockerfiles

removing python3.13

2026-01-05 16:05:32 -05:00

build.sh

leave ollama dir

2025-11-23 16:09:03 -05:00

comfyui-setup.sh

debug build

2025-11-23 23:47:39 -05:00

docker-compose.yaml

Adding COMFYUI_WORKFLOW_NODES config

2025-11-27 14:35:52 -05:00

GITEA_ACTIONS_SETUP.md

gitea build

2026-01-30 19:37:37 -05:00

OPEN_WEBUI_COMFYUI_SETUP.md

adding node configruation help

2025-11-27 14:38:08 -05:00

README.md

automated buolds

2026-01-30 19:32:03 -05:00

setup-directories.sh

overthought all that

2025-11-23 16:47:41 -05:00

README.md

ROCm 7.1 Automated Docker Environment

A comprehensive Docker-based environment for running AI workloads on AMD GPUs with ROCm 7.1 support. This project provides optimized containers for Ollama LLM inference and Stable Diffusion image generation.

🚀 Features

ROCm 7.1 Support: Latest AMD GPU compute platform
Ollama Integration: Optimized LLM inference with ROCm backend
Stable Diffusion: AI image generation with AMD GPU acceleration
Multi-GPU Support: Automatic detection and utilization of multiple AMD GPUs
Performance Optimized: Tuned for maximum throughput and minimal latency
Easy Deployment: One-command setup with Docker Compose

📋 Prerequisites

Hardware Requirements

AMD GPU: RDNA 2/3 architecture (RX 6000/7000 series or newer)
Memory: 16GB+ system RAM recommended
VRAM: 8GB+ GPU memory for large models

Software Requirements

Linux Distribution: Ubuntu 22.04+, Fedora 38+, or compatible
Docker: 24.0+ with BuildKit support
Docker Compose: 2.20+
Podman (alternative): 4.0+

Supported GPUs

Radeon RX 7900 XTX/XT
Radeon RX 7800/7700 XT
Radeon RX 6950/6900/6800/6700 XT
AMD APUs with RDNA graphics (limited performance)

🛠️ Installation

1. Clone Repository

git clone https://github.com/BillyOutlast/rocm-automated.git
cd rocm-automated

2. Set GPU Override (if needed)

For newer or unsupported GPU architectures:

# Check your GPU architecture
rocminfo | grep "Name:"

# Set override for newer GPUs (example for RX 7000 series)
export HSA_OVERRIDE_GFX_VERSION=11.0.0

3. Download and Start Services

# Pull the latest prebuilt images and start all services
docker-compose up -d

# View logs
docker-compose logs -f

Alternative: Build Images Locally

If you prefer to build the images locally instead of using prebuilt ones:

# Make build script executable
chmod +x build.sh

# Build all Docker images
./build.sh

# Then start services
docker-compose up -d

🐳 Docker Images

Available Prebuilt Images

getterup/ollama-rocm7.1:latest: Ollama with ROCm 7.1 backend for LLM inference
getterup/stable-diffusion.cpp-rocm7.1:gfx1151: Stable Diffusion with ROCm 7.1 acceleration
getterup/comfyui:rocm7.1: ComfyUI with ROCm 7.1 support
ghcr.io/open-webui/open-webui:main: Web interface for Ollama

What's Included

These prebuilt images come with:

ROCm 7.1 runtime libraries
GPU-specific optimizations
Performance tuning for inference workloads
Ready-to-run configurations

Build Process (Optional)

The automated build script can create custom images with:

ROCm 7.1 runtime libraries
GPU-specific optimizations
Performance tuning for inference workloads

📊 Services

Ollama LLM Service

Port: 11434
Container: ollama

Features:

Multi-model support (Llama, Mistral, CodeLlama, etc.)
ROCm-optimized inference engine
Flash Attention support
Quantized model support (Q4, Q8)

Usage Examples

# Pull a model
docker exec ollama ollama pull llama3.2

# Run inference
curl -X POST http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.2", "prompt": "Hello, world!"}'

# Chat interface
curl -X POST http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.2", "messages": [{"role": "user", "content": "Hi there!"}]}'

Stable Diffusion Service

Port: 7860
Container: stable-diffusion.cpp

Features:

Text-to-image generation
ROCm acceleration
Multiple model formats
Customizable parameters

⚙️ Configuration

Environment Variables

Ollama Service

environment:
  - OLLAMA_DEBUG=1                    # Debug level (0-2)
  - OLLAMA_FLASH_ATTENTION=true       # Enable flash attention
  - OLLAMA_KV_CACHE_TYPE="q8_0"      # KV cache quantization
  - ROCR_VISIBLE_DEVICES=0            # GPU selection
  - OLLAMA_KEEP_ALIVE=-1              # Keep models loaded
  - OLLAMA_MAX_LOADED_MODELS=1        # Max concurrent models

GPU Configuration

environment:
  - HSA_OVERRIDE_GFX_VERSION="11.5.1" # GPU architecture override
  - HSA_ENABLE_SDMA=0                 # Disable SDMA for stability

Volume Mounts

volumes:
  - ./ollama:/root/.ollama:Z          # Model storage
  - ./stable-diffusion.cpp:/app:Z     # SD model storage

Device Access

devices:
  - /dev/kfd:/dev/kfd                 # ROCm compute device
  - /dev/dri:/dev/dri                 # GPU render nodes
group_add:
  - video                             # Video group access

🔧 Performance Tuning

GPU Selection

For multi-GPU systems, specify the preferred device:

# List available GPUs
rocminfo

# Set specific GPU
export ROCR_VISIBLE_DEVICES=0

Memory Optimization

# For large models, increase system memory limits
echo 'vm.max_map_count=262144' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Model Optimization

Use quantized models (Q4_K_M, Q8_0) for better performance
Enable flash attention for transformer models
Adjust context length based on available VRAM

🚨 Troubleshooting

Common Issues

GPU Not Detected

# Check ROCm installation
rocminfo

# Verify device permissions
ls -la /dev/kfd /dev/dri/

# Check container access
docker exec ollama rocminfo

Memory Issues

# Check VRAM usage
rocm-smi

# Monitor system memory
free -h

# Reduce model size or use quantization

Performance Issues

# Enable performance mode
sudo sh -c 'echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor'

# Check GPU clocks
rocm-smi -d 0 --showclocks

Debug Commands

# View Ollama logs
docker-compose logs -f ollama

# Check GPU utilization
watch -n 1 rocm-smi

# Test GPU compute
docker exec ollama rocminfo | grep "Compute Unit"

📁 Project Structure

rocm-automated/
├── build.sh                              # Automated build script
├── docker-compose.yaml                   # Service orchestration
├── Dockerfile.rocm-7.1                   # Base ROCm image
├── Dockerfile.ollama-rocm-7.1            # Ollama with ROCm
├── Dockerfile.stable-diffusion.cpp-rocm7.1-gfx1151  # Stable Diffusion
├── ollama/                               # Ollama data directory
└── stable-diffusion.cpp/                # SD model storage

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

ROCm Platform - AMD's open-source GPU compute platform
Ollama - Local LLM inference engine
Stable Diffusion CPP - Efficient SD implementation
rjmalagon/ollama-linux-amd-apu - AMD APU optimizations
ComfyUI - Advanced node-based interface for Stable Diffusion workflows
phueper/ollama-linux-amd-apu - Enhanced Ollama build with ROCm 7 optimizations

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
ROCm Documentation: AMD ROCm Docs

🏷️ Version History

v1.0.0: Initial release with ROCm 7.1 support
v1.1.0: Added Ollama integration and multi-GPU support
v1.2.0: Performance optimizations and Stable Diffusion support

⚠️ Known Hardware Limitations

External GPU Enclosures

AOOSTAR AG02 EGPU: ASM246X chipset is known to have compatiblity issues with linux and may downgrade to 8 GT/s PCIe x1 (tested on Fedora 42). This may impact performance with large models requiring significant VRAM transfers.

Mini PCs

Minisforum MS-A1: Tested by Level1Techs, shown to have resizable BAR issues with eGPUs over USB4 connections. May result in reduced performance or compatibility problems with ROCm workloads.

⭐ Star this repository if it helped you! ⭐

Made with ❤️ for the AMD GPU community