P6: Complete 8 initiatives - Agent files, deployment options, CLI, dashboards, plugins

P6-7: Agent File Completion (34 files - 11 agents × 3 files + guides)
  - Added BOOTSTRAP.md, IDENTITY.md, TOOLS.md for all 11 agents
  - Created AGENT_CREATION_GUIDE.md

P6-2: Per-Agent Model Configuration (9 files)
  - Agent model router and config library
  - YAML configs for arbiter, coder agents
  - Configuration documentation

P6-3: Health Check Dashboard (20+ files)
  - Complete frontend React application
  - API endpoints, WebSocket server
  - Collectors for agents, resources, services
  - Alert management and configuration

P6-4: LiteLLM Observability Integration (10 files)
  - LiteLLM metrics collector and API
  - Frontend components for model/budget tracking
  - Integration documentation

P6-1: Non-Docker Deployment (16 files)
  - Bare metal and VM deployment docs
  - Systemd service files
  - Installation scripts for Ubuntu/RHEL
  - Migration guide and troubleshooting

P6-6: Cloud-Native Deployments (45+ files)
  - AWS, Azure, GCP Terraform configurations
  - Kubernetes base deployments with Kustomize overlays
  - Cloud deployment documentation

P6-5: Unified Deployment CLI (28 files)
  - Complete CLI with 12 commands
  - Deployers for Docker, Kubernetes, cloud, baremetal
  - Health checker, backup manager, config manager

P6-8: Plugin Installation Guide (15 files)
  - Plugin development and installation guides
  - Plugin CLI documentation and registry
  - Templates for basic, skill, and tool plugins
This commit is contained in:
John Doe
2026-03-31 20:33:43 -04:00
parent cd42fedb8d
commit bdad3889a4
54 changed files with 17508 additions and 0 deletions
+669
View File
@@ -0,0 +1,669 @@
# AWS Deployment Guide for Heretek OpenClaw
**Version:** 1.0.0
**Last Updated:** 2026-03-31
**OpenClaw Version:** v2026.3.28
This guide provides comprehensive instructions for deploying Heretek OpenClaw on Amazon Web Services (AWS) using Terraform Infrastructure as Code (IaC).
---
## Table of Contents
1. [Overview](#overview)
2. [Prerequisites](#prerequisites)
3. [Architecture](#architecture)
4. [Cost Estimates](#cost-estimates)
5. [Quick Start](#quick-start)
6. [Configuration](#configuration)
7. [Deployment Steps](#deployment-steps)
8. [Post-Deployment](#post-deployment)
9. [GPU Support](#gpu-support)
10. [Monitoring](#monitoring)
11. [Backup & Recovery](#backup--recovery)
12. [Troubleshooting](#troubleshooting)
---
## Overview
This Terraform configuration deploys a production-ready OpenClaw environment on AWS with:
- **EKS (Elastic Kubernetes Service)** - Managed Kubernetes cluster
- **RDS PostgreSQL** - Managed PostgreSQL with pgvector support
- **ElastiCache Redis** - Managed Redis for caching and sessions
- **ECR (Elastic Container Registry)** - Private container registry
- **ALB (Application Load Balancer)** - Traffic routing and SSL termination
- **CloudWatch** - Monitoring, logging, and alerting
### Components
| Component | Service | Purpose |
|-----------|---------|---------|
| Gateway | EKS | OpenClaw Gateway (port 18789) |
| LiteLLM | EKS | LLM proxy and routing (port 4000) |
| Database | RDS PostgreSQL 15 | Primary data store with pgvector |
| Cache | ElastiCache Redis 7 | Session management, caching |
| Container Registry | ECR | Private image storage |
| Load Balancer | ALB | HTTPS termination, routing |
| Monitoring | CloudWatch | Metrics, logs, alarms |
---
## Prerequisites
### Required Tools
```bash
# Install Terraform
brew install terraform # macOS
# or download from https://www.terraform.io/downloads
# Install AWS CLI
brew install awscli # macOS
# or follow https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
# Install kubectl
brew install kubectl
# Install Helm
brew install helm
```
### AWS Account Setup
1. **AWS Account** - Active AWS account with administrative access
2. **IAM User** - User with programmatic access credentials
3. **Budget Alert** - Set up billing alerts in AWS Budgets
### Configure AWS Credentials
```bash
# Configure AWS CLI
aws configure
# Or use environment variables
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"
```
### Required AWS Permissions
| Service | Required Permissions |
|---------|---------------------|
| EKS | Full access |
| EC2 | Full access |
| RDS | Full access |
| ElastiCache | Full access |
| ECR | Full access |
| ELB | Full access |
| IAM | Create roles and policies |
| CloudWatch | Full access |
| S3 | Create buckets |
| KMS | Create and manage keys |
| Route53 | DNS management (optional) |
---
## Architecture
```
┌─────────────────────────────────────────────┐
│ AWS Region │
│ us-east-1 │
└─────────────────────────────────────────────┘
┌─────────────────────────────────┼─────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
│ Public Subnet 1 │ │ Public Subnet 2 │ │ Public Subnet 3 │
│ (us-east-1a) │ │ (us-east-1b) │ │ (us-east-1c) │
│ │ │ │ │ │
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
│ │ NAT Gateway │ │ │ │ NAT Gateway │ │ │ │ NAT Gateway │ │
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
│ │ │
└─────────────────────────────────┼─────────────────────────────────┘
┌─────────────────────────────────┼─────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
│ Private Subnet 1 │ │ Private Subnet 2 │ │ Private Subnet 3 │
│ (us-east-1a) │ │ (us-east-1b) │ │ (us-east-1c) │
│ │ │ │ │ │
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
│ │ EKS Nodes │ │ │ │ EKS Nodes │ │ │ │ EKS Nodes │ │
│ │ (General) │ │ │ │ (Compute) │ │ │ │ (GPU) │ │
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
│ │ │ │ │ │
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
│ │ RDS Primary │ │ │ │ ElastiCache │ │ │ │ ECR Repo │ │
│ │ PostgreSQL │ │ │ │ Redis │ │ │ │ Images │ │
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
│ │ │
└─────────────────────────────────┼─────────────────────────────────┘
┌─────────────────────────────────┼─────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
│ Database Subnet 1 │ │ Database Subnet 2 │ │ Database Subnet 3 │
│ (us-east-1a) │ │ (us-east-1b) │ │ (us-east-1c) │
│ │ │ │ │ │
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ │
│ │ RDS Standby │ │ │ │ ElastiCache │ │ │ │
│ │ (Multi-AZ) │ │ │ │ Replica │ │ │ │
│ └────────────────┘ │ │ └────────────────┘ │ │ │
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
```
---
## Cost Estimates
### Development Environment
| Resource | Configuration | Monthly Cost (USD) |
|----------|--------------|-------------------|
| EKS Cluster | Control Plane | $73.00 |
| EKS Nodes | 2x m6i.xlarge | $280.00 |
| RDS PostgreSQL | db.m6i.large, 50GB | $125.00 |
| ElastiCache Redis | cache.m6i.large | $75.00 |
| ALB | Standard | $16.00 |
| NAT Gateway | 1x | $32.00 |
| ECR Storage | 10GB | $2.50 |
| CloudWatch Logs | 10GB | $3.00 |
| Data Transfer | Estimated | $50.00 |
| **Total** | | **~$656.50/month** |
### Production Environment
| Resource | Configuration | Monthly Cost (USD) |
|----------|--------------|-------------------|
| EKS Cluster | Control Plane | $73.00 |
| EKS Nodes General | 3x m6i.2xlarge | $840.00 |
| EKS Nodes Compute | 4x c6i.4xlarge | $2,000.00 |
| EKS Nodes GPU | 2x g5.2xlarge | $4,000.00 |
| RDS PostgreSQL | db.m6i.xlarge, Multi-AZ, 200GB | $500.00 |
| ElastiCache Redis | cache.m6i.xlarge, Multi-AZ | $300.00 |
| ALB | Standard | $16.00 |
| NAT Gateway | 3x | $96.00 |
| ECR Storage | 50GB | $12.50 |
| CloudWatch Logs | 50GB | $15.00 |
| Data Transfer | Estimated | $200.00 |
| **Total** | | **~$8,052.50/month** |
> **Note:** GPU costs are significant. Consider using spot instances or on-demand scaling for cost optimization.
### Cost Optimization Tips
1. **Use Spot Instances** for non-critical workloads (up to 70% savings)
2. **Enable Cluster Autoscaler** to scale nodes based on demand
3. **Use Savings Plans** for predictable workloads
4. **Right-size instances** based on actual usage
5. **Enable RDS Reserved Instances** for production databases
---
## Quick Start
### Clone Repository
```bash
git clone https://github.com/Heretek-AI/heretek-openclaw.git
cd heretek-openclaw/deploy/aws/terraform
```
### Initialize Terraform
```bash
terraform init
```
### Create Terraform Variables File
```bash
cat > terraform.tfvars <<EOF
aws_region = "us-east-1"
environment = "dev"
owner = "your-team"
vpc_cidr = "10.0.0.0/16"
db_password = "generate-secure-password"
redis_auth_token = "generate-secure-token"
# Optional: GPU support for Ollama
enable_gpu_support = false
# Optional: Custom domain
domain_name = "openclaw.example.com"
acm_certificate_arn = "arn:aws:acm:us-east-1:123456789012:certificate/xxx"
EOF
```
### Plan and Apply
```bash
# Review the plan
terraform plan -out=tfplan
# Apply the configuration
terraform apply tfplan
```
### Configure kubectl
```bash
aws eks update-kubeconfig --name openclaw-dev-eks --region us-east-1
```
### Deploy OpenClaw to EKS
```bash
cd ../../kubernetes
kubectl apply -k overlays/dev
```
---
## Configuration
### Input Variables
| Variable | Description | Default | Required |
|----------|-------------|---------|----------|
| `aws_region` | AWS region | `us-east-1` | No |
| `environment` | Environment name | `dev` | Yes |
| `owner` | Resource owner | `platform-team` | No |
| `vpc_cidr` | VPC CIDR block | `10.0.0.0/16` | No |
| `enable_gpu_support` | Enable GPU nodes | `false` | No |
| `db_password` | RDS master password | `null` | Yes |
| `redis_auth_token` | Redis auth token | `null` | Yes |
| `acm_certificate_arn` | SSL certificate ARN | `null` | No |
| `domain_name` | Custom domain | `null` | No |
### Environment-Specific Overrides
#### Development (`terraform.dev.tfvars`)
```hcl
environment = "dev"
single_nat_gateway = true
db_multi_az = false
redis_multi_az_enabled = false
enable_cloudwatch_alarms = false
node_groups = {
general = {
instance_types = ["m6i.large"]
min_size = 1
max_size = 2
desired_size = 1
}
compute = {
instance_types = ["c6i.xlarge"]
min_size = 0
max_size = 2
desired_size = 1
}
}
```
#### Production (`terraform.prod.tfvars`)
```hcl
environment = "prod"
single_nat_gateway = false
db_multi_az = true
redis_multi_az_enabled = true
enable_cloudwatch_alarms = true
alb_deletion_protection = true
node_groups = {
general = {
instance_types = ["m6i.2xlarge"]
min_size = 3
max_size = 10
desired_size = 3
}
compute = {
instance_types = ["c6i.4xlarge"]
min_size = 2
max_size = 20
desired_size = 4
}
gpu = {
instance_types = ["g5.2xlarge"]
min_size = 1
max_size = 4
desired_size = 2
}
}
```
---
## Deployment Steps
### Step 1: Prepare AWS Account
```bash
# Verify AWS credentials
aws sts get-caller-identity
# Check service quotas
aws service-quotas list-service-quotas --service-code eks
aws service-quotas list-service-quotas --service-code rds
aws service-quotas list-service-quotas --service-code elasticache
```
### Step 2: Configure Terraform Backend
```bash
# Create S3 bucket for state
aws s3api create-bucket --bucket openclaw-terraform-state --region us-east-1
# Create DynamoDB table for locking
aws dynamodb create-table \
--table-name openclaw-terraform-locks \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST
```
### Step 3: Initialize and Apply
```bash
# Initialize with S3 backend
terraform init \
-backend-config="bucket=openclaw-terraform-state" \
-backend-config="key=openclaw/dev/terraform.tfstate" \
-backend-config="region=us-east-1" \
-backend-config="dynamodb_table=openclaw-terraform-locks"
# Plan
terraform plan -var-file=terraform.dev.tfvars -out=tfplan
# Apply
terraform apply tfplan
```
### Step 4: Verify Deployment
```bash
# Check EKS cluster
aws eks describe-cluster --name openclaw-dev-eks
# Check RDS instance
aws rds describe-db-instances --db-instance-identifier openclaw-dev-pg
# Check ElastiCache cluster
aws elasticache describe-cache-clusters --cache-cluster-id openclaw-dev-redis
# Check ECR repositories
aws ecr describe-repositories
```
---
## Post-Deployment
### Configure kubectl
```bash
# Update kubeconfig
aws eks update-kubeconfig --name openclaw-dev-eks --region us-east-1
# Verify cluster access
kubectl get nodes
kubectl get namespaces
```
### Deploy OpenClaw Helm Chart
```bash
# Add Helm repository (if published)
helm repo add heretek https://heretek.github.io/helm-charts
helm repo update
# Deploy using Helm
helm install openclaw ./charts/openclaw \
--namespace openclaw \
--create-namespace \
--values values.dev.yaml \
--set image.repository=123456789012.dkr.ecr.us-east-1.amazonaws.com/openclaw-gateway \
--set litellm.image.repository=123456789012.dkr.ecr.us-east-1.amazonaws.com/litellm-proxy
```
### Configure Secrets
```bash
# Create Kubernetes secrets
kubectl create secret generic openclaw-secrets \
--namespace openclaw \
--from-literal=database-url="postgresql://openclaw:password@openclaw-dev-pg.xxx.us-east-1.rds.amazonaws.com:5432/openclaw" \
--from-literal=redis-url="redis://:token@openclaw-dev-redis.xxx.cache.amazonaws.com:6379" \
--from-literal=minimax-api-key="your-minimax-key" \
--from-literal=zai-api-key="your-zai-key"
```
### Verify Services
```bash
# Check pods
kubectl get pods -n openclaw
# Check services
kubectl get svc -n openclaw
# Check logs
kubectl logs -n openclaw -l app=openclaw-gateway
kubectl logs -n openclaw -l app=litellm
```
---
## GPU Support
### Enable GPU Nodes
```hcl
# terraform.tfvars
enable_gpu_support = true
gpu_instance_types = ["g5.xlarge", "g5.2xlarge"]
```
### Install NVIDIA Device Plugin
```bash
kubectl apply -f https://raw.githubusercontent.com/GoogleContainerTools/kpt-packages/master/second-party/nvidia-device-plugin/gke.yaml
```
### Configure Ollama for GPU
```yaml
# values.yaml
ollama:
enabled: true
gpu:
enabled: true
type: nvidia
resources:
limits:
nvidia.com/gpu: 1
```
---
## Monitoring
### CloudWatch Dashboard
The deployment creates a CloudWatch dashboard with:
- EKS cluster metrics
- Node group metrics
- RDS PostgreSQL metrics
- ElastiCache Redis metrics
- ALB request metrics
- Application logs
### Access Dashboard
```bash
# Get dashboard name from Terraform output
terraform output cloudwatch_dashboard_arn
# Open in AWS Console
open "https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards:name=openclaw-dev-dashboard"
```
### CloudWatch Alarms
Default alarms configured:
| Alarm | Metric | Threshold |
|-------|--------|-----------|
| EKS CPU Utilization | Cluster CPU | > 80% |
| RDS CPU Utilization | DB CPU | > 80% |
| RDS Free Storage | DB Storage | < 10GB |
| Redis CPU Utilization | Cache CPU | > 80% |
| Redis Memory | Freeable Memory | < 256MB |
| ALB 5XX Errors | HTTP 5XX count | > 10 |
---
## Backup & Recovery
### Automated Backups
| Resource | Backup Strategy | Retention |
|----------|----------------|-----------|
| RDS PostgreSQL | Automated snapshots | 7 days |
| ElastiCache Redis | Snapshot on delete | Manual |
| ECR Images | Lifecycle policy | 30 days |
| Terraform State | S3 versioning | Unlimited |
### Manual Backup
```bash
# RDS snapshot
aws rds create-db-snapshot \
--db-instance-identifier openclaw-dev-pg \
--db-snapshot-identifier openclaw-manual-snapshot-$(date +%Y%m%d)
# ElastiCache snapshot
aws elasticache create-snapshot \
--cache-cluster-id openclaw-dev-redis \
--snapshot-name openclaw-redis-snapshot-$(date +%Y%m%d)
# ECR image backup
aws ecr batch-get-image \
--repository-name openclaw-gateway \
--image-ids imageTag=latest \
--query 'images[].imageManifest' \
--output text > openclaw-gateway-manifest.json
```
### Disaster Recovery
1. **Restore RDS from snapshot**
2. **Recreate ElastiCache from snapshot**
3. **Reapply Terraform**
4. **Restore Kubernetes workloads**
---
## Troubleshooting
### Common Issues
#### EKS Nodes Not Joining Cluster
```bash
# Check node status
kubectl get nodes
# Check node logs
aws eks describe-cluster --name openclaw-dev-eks
# Verify IAM role permissions
aws iam get-role-policy --role-name openclaw-dev-eks-nodes-role --policy-name AmazonEKSWorkerNodePolicy
```
#### RDS Connection Issues
```bash
# Check security group rules
aws ec2 describe-security-groups --group-ids sg-xxx
# Verify database connectivity
psql -h openclaw-dev-pg.xxx.us-east-1.rds.amazonaws.com -U openclaw -d openclaw
```
#### ALB Health Check Failures
```bash
# Check target group health
aws elbv2 describe-target-health --target-group-arn arn:aws:elasticloadbalancing:xxx
# Verify health check path
curl -v http://<pod-ip>:18789/health
```
### Support Resources
- [AWS EKS Documentation](https://docs.aws.amazon.com/eks/)
- [AWS RDS Documentation](https://docs.aws.amazon.com/AmazonRDS/)
- [Terraform AWS Provider](https://registry.terraform.io/providers/hashicorp/aws/latest/docs)
- [OpenClaw Documentation](../../docs/)
---
## Cleanup
### Destroy Infrastructure
```bash
# Delete Kubernetes resources first
kubectl delete namespace openclaw
# Destroy Terraform resources
terraform destroy -var-file=terraform.dev.tfvars
# Verify deletion
aws eks describe-cluster --name openclaw-dev-eks # Should return error
```
### Manual Cleanup
```bash
# Delete ECR repositories
aws ecr delete-repository --repository-name openclaw-gateway --force
aws ecr delete-repository --repository-name litellm-proxy --force
# Delete S3 bucket
aws s3 rb s3://openclaw-terraform-state --force
# Delete DynamoDB table
aws dynamodb delete-table --table-name openclaw-terraform-locks
```
---
## Next Steps
1. **Configure CI/CD** - Set up automated deployments
2. **Enable Monitoring** - Configure alerts and dashboards
3. **Set Up Backup** - Implement backup automation
4. **Security Hardening** - Review security configurations
5. **Cost Optimization** - Implement cost controls
---
🦞 *The thought that never ends.*
+356
View File
@@ -0,0 +1,356 @@
# ==============================================================================
# Heretek OpenClaw - AWS Application Load Balancer Configuration
# ==============================================================================
# ALB for OpenClaw traffic routing and SSL termination
# ==============================================================================
# ------------------------------------------------------------------------------
# ALB Security Group
# ------------------------------------------------------------------------------
resource "aws_security_group" "alb" {
name = "${local.name_prefix}-alb-sg"
description = "Security group for Application Load Balancer"
vpc_id = var.vpc_id
ingress {
description = "HTTP from anywhere"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTPS from anywhere"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-alb-sg"
})
}
# ------------------------------------------------------------------------------
# Application Load Balancer
# ------------------------------------------------------------------------------
resource "aws_lb" "openclaw" {
name = "${local.name_prefix}-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = var.subnet_ids
enable_deletion_protection = var.alb_deletion_protection
enable_http2 = true
drop_invalid_header_fields = true
idle_timeout = 60
access_logs {
bucket = aws_s3_bucket.alb_logs[0].bucket
prefix = "alb-logs"
enabled = var.environment == "prod"
}
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-alb"
})
depends_on = [aws_internet_gateway.openclaw]
}
# ------------------------------------------------------------------------------
# S3 Bucket for ALB Access Logs
# ------------------------------------------------------------------------------
resource "aws_s3_bucket" "alb_logs" {
count = var.environment == "prod" ? 1 : 0
bucket = "${local.name_prefix}-alb-logs-${data.aws_caller_identity.current.account_id}"
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-alb-logs"
})
}
resource "aws_s3_bucket_policy" "alb_logs" {
count = var.environment == "prod" ? 1 : 0
bucket = aws_s3_bucket.alb_logs[0].id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "AllowALBLogDelivery"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
}
Action = [
"s3:PutObject",
"s3:PutObjectAcl"
]
Resource = "${aws_s3_bucket.alb_logs[0].arn}/*"
}
]
})
}
resource "aws_s3_bucket_server_side_encryption_configuration" "alb_logs" {
count = var.environment == "prod" ? 1 : 0
bucket = aws_s3_bucket.alb_logs[0].id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
resource "aws_s3_bucket_lifecycle_configuration" "alb_logs" {
count = var.environment == "prod" ? 1 : 0
bucket = aws_s3_bucket.alb_logs[0].id
rule {
id = "expire-old-logs"
status = "Enabled"
expiration {
days = 90
}
}
}
# ------------------------------------------------------------------------------
# HTTP Listener (Redirect to HTTPS)
# ------------------------------------------------------------------------------
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.openclaw.arn
port = 80
protocol = "HTTP"
default_action {
type = "redirect"
redirect {
port = "443"
protocol = "HTTPS"
status_code = "HTTP_301"
}
}
}
# ------------------------------------------------------------------------------
# HTTPS Listener
# ------------------------------------------------------------------------------
resource "aws_lb_listener" "https" {
load_balancer_arn = aws_lb.openclaw.arn
port = 443
protocol = "HTTPS"
ssl_policy = var.ssl_policy
certificate_arn = var.acm_certificate_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.gateway.arn
}
lifecycle {
ignore_changes = [certificate_arn]
}
}
# ------------------------------------------------------------------------------
# Target Groups
# ------------------------------------------------------------------------------
# OpenClaw Gateway Target Group
resource "aws_lb_target_group" "gateway" {
name = "${local.name_prefix}-gateway"
port = 18789
protocol = "HTTP"
vpc_id = var.vpc_id
health_check {
enabled = true
healthy_threshold = 2
interval = 30
matcher = "200-299"
path = "/health"
port = "traffic-port"
protocol = "HTTP"
timeout = 5
unhealthy_threshold = 2
}
stickiness {
type = "lb_cookie"
cookie_duration = 86400
enabled = false
}
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-gateway"
Component = "gateway"
})
lifecycle {
create_before_destroy = true
}
}
# LiteLLM Proxy Target Group
resource "aws_lb_target_group" "litellm" {
name = "${local.name_prefix}-litellm"
port = 4000
protocol = "HTTP"
vpc_id = var.vpc_id
health_check {
enabled = true
healthy_threshold = 2
interval = 30
matcher = "200-299"
path = "/health"
port = "traffic-port"
protocol = "HTTP"
timeout = 5
unhealthy_threshold = 2
}
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-litellm"
Component = "litellm"
})
lifecycle {
create_before_destroy = true
}
}
# ------------------------------------------------------------------------------
# Listener Rules
# ------------------------------------------------------------------------------
# Route to LiteLLM based on path
resource "aws_lb_listener_rule" "litellm" {
listener_arn = aws_lb_listener.https.arn
priority = 100
action {
type = "forward"
target_group_arn = aws_lb_target_group.litellm.arn
}
condition {
path_pattern {
values = ["/v1/*", "/litellm/*"]
}
}
}
# Route to Gateway for WebSocket connections
resource "aws_lb_listener_rule" "gateway_websocket" {
listener_arn = aws_lb_listener.https.arn
priority = 200
action {
type = "forward"
target_group_arn = aws_lb_target_group.gateway.arn
}
condition {
path_pattern {
values = ["/ws/*", "/gateway/*"]
}
}
}
# ------------------------------------------------------------------------------
# ALB IAM Role for S3 Logging
# ------------------------------------------------------------------------------
resource "aws_iam_service_linked_role" "alb" {
aws_service_name = "elasticloadbalancing.amazonaws.com"
tags = local.common_tags
}
# ------------------------------------------------------------------------------
# ACM Certificate (Optional - if not provided)
# ------------------------------------------------------------------------------
resource "aws_acm_certificate" "openclaw" {
count = var.acm_certificate_arn == null ? 1 : 0
domain_name = var.domain_name
validation_method = "DNS"
subject_alternative_names = var.subject_alternative_names
lifecycle {
create_before_destroy = true
}
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-certificate"
})
}
resource "aws_acm_certificate_validation" "openclaw" {
count = var.acm_certificate_arn == null ? 1 : 0
certificate_arn = aws_acm_certificate.openclaw[0].arn
validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}
resource "aws_route53_record" "cert_validation" {
for_each = var.acm_certificate_arn == null ? {
for dvo in aws_acm_certificate.openclaw[0].domain_validation_options : dvo.domain_name => {
name = dvo.resource_record_name
record = dvo.resource_record_value
type = dvo.resource_record_type
}
} : {}
allow_overwrite = true
name = each.value.name
records = [each.value.record]
ttl = 60
type = each.value.type
zone_id = var.route53_zone_id
}
# ------------------------------------------------------------------------------
# Route53 DNS Records
# ------------------------------------------------------------------------------
resource "aws_route53_record" "openclaw" {
count = var.acm_certificate_arn == null ? 1 : 0
zone_id = var.route53_zone_id
name = var.domain_name
type = "A"
alias {
name = aws_lb.openclaw.dns_name
zone_id = aws_lb.openclaw.zone_id
evaluate_target_health = true
}
}
+293
View File
@@ -0,0 +1,293 @@
# ==============================================================================
# Heretek OpenClaw - AWS ECR Configuration
# ==============================================================================
# Elastic Container Registry for OpenClaw container images
# ==============================================================================
# ------------------------------------------------------------------------------
# ECR Lifecycle Policy Document
# ------------------------------------------------------------------------------
locals {
ecr_lifecycle_policy = jsonencode({
rules = [
{
rulePriority = 1
description = "Expire images older than 30 days"
selection = {
tagStatus = "untagged"
countType = "sinceImagePushed"
countUnit = "days"
countNumber = var.lifecycle_policy_days
}
action = {
type = "expire"
}
},
{
rulePriority = 2
description = "Keep last N tagged images"
selection = {
tagStatus = "tagged"
tagPrefixList = ["latest", "main"]
countType = "imageCountMoreThan"
countNumber = 10
}
action = {
type = "expire"
}
}
]
})
}
# ------------------------------------------------------------------------------
# ECR Repository - OpenClaw Gateway
# ------------------------------------------------------------------------------
resource "aws_ecr_repository" "openclaw_gateway" {
name = "openclaw-gateway"
image_tag_mutability = "MUTABLE"
force_delete = var.environment == "dev"
image_scanning_configuration {
scan_on_push = true
}
encryption_configuration {
encryption_type = "KMS"
kms_key = aws_kms_key.ecr.arn
}
tags = merge(local.common_tags, {
Name = "openclaw-gateway"
Component = "gateway"
})
}
resource "aws_ecr_lifecycle_policy" "openclaw_gateway" {
repository = aws_ecr_repository.openclaw_gateway.name
policy = local.ecr_lifecycle_policy
}
# ------------------------------------------------------------------------------
# ECR Repository - LiteLLM Proxy
# ------------------------------------------------------------------------------
resource "aws_ecr_repository" "litellm_proxy" {
name = "litellm-proxy"
image_tag_mutability = "MUTABLE"
force_delete = var.environment == "dev"
image_scanning_configuration {
scan_on_push = true
}
encryption_configuration {
encryption_type = "KMS"
kms_key = aws_kms_key.ecr.arn
}
tags = merge(local.common_tags, {
Name = "litellm-proxy"
Component = "litellm"
})
}
resource "aws_ecr_lifecycle_policy" "litellm_proxy" {
repository = aws_ecr_repository.litellm_proxy.name
policy = local.ecr_lifecycle_policy
}
# ------------------------------------------------------------------------------
# ECR Repository - Ollama (Optional for Custom Images)
# ------------------------------------------------------------------------------
resource "aws_ecr_repository" "ollama" {
count = var.enable_gpu_support ? 1 : 0
name = "ollama"
image_tag_mutability = "MUTABLE"
force_delete = var.environment == "dev"
image_scanning_configuration {
scan_on_push = true
}
encryption_configuration {
encryption_type = "KMS"
kms_key = aws_kms_key.ecr.arn
}
tags = merge(local.common_tags, {
Name = "ollama"
Component = "ollama"
})
}
resource "aws_ecr_lifecycle_policy" "ollama" {
count = var.enable_gpu_support ? 1 : 0
repository = aws_ecr_repository.ollama[0].name
policy = local.ecr_lifecycle_policy
}
# ------------------------------------------------------------------------------
# ECR Repository - Monitoring Stack (Optional)
# ------------------------------------------------------------------------------
resource "aws_ecr_repository" "monitoring" {
name = "monitoring"
image_tag_mutability = "MUTABLE"
force_delete = var.environment == "dev"
image_scanning_configuration {
scan_on_push = true
}
encryption_configuration {
encryption_type = "KMS"
kms_key = aws_kms_key.ecr.arn
}
tags = merge(local.common_tags, {
Name = "monitoring"
Component = "monitoring"
})
}
resource "aws_ecr_lifecycle_policy" "monitoring" {
repository = aws_ecr_repository.monitoring.name
policy = local.ecr_lifecycle_policy
}
# ------------------------------------------------------------------------------
# KMS Key for ECR Encryption
# ------------------------------------------------------------------------------
resource "aws_kms_key" "ecr" {
description = "KMS key for ECR repository encryption"
deletion_window_in_days = 7
enable_key_rotation = true
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "Enable IAM User Permissions"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
}
Action = "kms:*"
Resource = "*"
},
{
Sid = "Allow ECR Service"
Effect = "Allow"
Principal = {
Service = "ecr.amazonaws.com"
}
Action = [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
]
Resource = "*"
},
{
Sid = "Allow EKS Service"
Effect = "Allow"
Principal = {
Service = "eks.amazonaws.com"
}
Action = [
"kms:Decrypt",
"kms:GenerateDataKey*"
]
Resource = "*"
}
]
})
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-ecr-key"
})
}
resource "aws_kms_alias" "ecr" {
name = "alias/${local.name_prefix}-ecr"
target_key_id = aws_kms_key.ecr.key_id
}
# ------------------------------------------------------------------------------
# ECR Access Policy for Cross-Account (Optional)
# ------------------------------------------------------------------------------
resource "aws_ecr_repository_policy" "openclaw_gateway" {
repository = aws_ecr_repository.openclaw_gateway.name
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "Allow EKS Pull"
Effect = "Allow"
Principal = {
Service = "eks.amazonaws.com"
}
Action = [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability"
]
}
]
})
}
resource "aws_ecr_repository_policy" "litellm_proxy" {
repository = aws_ecr_repository.litellm_proxy.name
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "Allow EKS Pull"
Effect = "Allow"
Principal = {
Service = "eks.amazonaws.com"
}
Action = [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability"
]
}
]
})
}
# ------------------------------------------------------------------------------
# ECR Pull-Through Cache Rules (Optional - for Docker Hub, etc.)
# ------------------------------------------------------------------------------
resource "aws_ecr_pull_through_cache_rule" "docker_hub" {
count = var.environment == "prod" ? 1 : 0
ecr_repository_prefix = "dockerhub"
upstream_registry_url = "registry-1.docker.io"
tags = local.common_tags
}
resource "aws_ecr_pull_through_cache_rule" "ghcr" {
count = var.environment == "prod" ? 1 : 0
ecr_repository_prefix = "ghcr"
upstream_registry_url = "ghcr.io"
tags = local.common_tags
}
+589
View File
@@ -0,0 +1,589 @@
# ==============================================================================
# Heretek OpenClaw - AWS EKS Configuration
# ==============================================================================
# EKS cluster and node group configurations
# ==============================================================================
# ------------------------------------------------------------------------------
# EKS Cluster
# ------------------------------------------------------------------------------
resource "aws_eks_cluster" "openclaw_cluster" {
name = "${local.name_prefix}-eks"
version = var.eks_version
role_arn = aws_iam_role.eks_cluster.arn
vpc_config {
subnet_ids = var.subnet_ids
endpoint_private_access = true
endpoint_public_access = true
security_group_ids = [aws_security_group.eks_cluster.id]
}
enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-eks"
})
depends_on = [
aws_iam_role_policy_attachment.eks_cluster_policy,
aws_iam_role_policy_attachment.eks_vpc_resource_controller
]
}
# ------------------------------------------------------------------------------
# EKS Cluster IAM Role
# ------------------------------------------------------------------------------
resource "aws_iam_role" "eks_cluster" {
name = "${local.name_prefix}-eks-cluster-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "eks.amazonaws.com"
}
}
]
})
tags = local.common_tags
}
resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.eks_cluster.name
}
resource "aws_iam_role_policy_attachment" "eks_vpc_resource_controller" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController"
role = aws_iam_role.eks_cluster.name
}
# ------------------------------------------------------------------------------
# EKS Cluster Security Group
# ------------------------------------------------------------------------------
resource "aws_security_group" "eks_cluster" {
name = "${local.name_prefix}-eks-cluster-sg"
description = "Security group for EKS cluster control plane"
vpc_id = var.vpc_id
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = local.common_tags
}
# ------------------------------------------------------------------------------
# OIDC Provider for IRSA
# ------------------------------------------------------------------------------
resource "aws_iam_openid_connect_provider" "eks" {
count = var.enable_irsa ? 1 : 0
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = ["9e99a48a9960b14926bb7f3b02e22da2b0ab7280"]
url = aws_eks_cluster.openclaw_cluster.identity[0].oidc[0].issuer
tags = local.common_tags
}
# ------------------------------------------------------------------------------
# EKS Node Group - General Purpose
# ------------------------------------------------------------------------------
resource "aws_eks_node_group" "general" {
cluster_name = aws_eks_cluster.openclaw_cluster.name
node_group_name = "${local.name_prefix}-general"
node_role_arn = aws_iam_role.eks_nodes.arn
subnet_ids = var.subnet_ids
instance_types = var.node_groups.general.instance_types
scaling_config {
desired_size = var.node_groups.general.desired_size
max_size = var.node_groups.general.max_size
min_size = var.node_groups.general.min_size
}
disk_size = var.node_groups.general.disk_size
ami_type = "AL2_x86_64"
capacity_type = "ON_DEMAND"
force_update_version = true
labels = {
"workload-type" = "general"
"environment" = var.environment
}
taint {
key = "workload-type"
value = "general"
effect = "NO_SCHEDULE"
}
lifecycle {
ignore_changes = [
scaling_config[0].desired_size
]
}
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-general-ng"
})
depends_on = [
aws_iam_role_policy_attachment.eks_worker_node_policy,
aws_iam_role_policy_attachment.eks_cni_policy,
aws_iam_role_policy_attachment.eks_ecr_read_only
]
}
# ------------------------------------------------------------------------------
# EKS Node Group - Compute Optimized
# ------------------------------------------------------------------------------
resource "aws_eks_node_group" "compute" {
cluster_name = aws_eks_cluster.openclaw_cluster.name
node_group_name = "${local.name_prefix}-compute"
node_role_arn = aws_iam_role.eks_nodes.arn
subnet_ids = var.subnet_ids
instance_types = var.node_groups.compute.instance_types
scaling_config {
desired_size = var.node_groups.compute.desired_size
max_size = var.node_groups.compute.max_size
min_size = var.node_groups.compute.min_size
}
disk_size = var.node_groups.compute.disk_size
ami_type = "AL2_x86_64"
capacity_type = "ON_DEMAND"
force_update_version = true
labels = {
"workload-type" = "compute"
"environment" = var.environment
}
taint {
key = "workload-type"
value = "compute"
effect = "NO_SCHEDULE"
}
lifecycle {
ignore_changes = [
scaling_config[0].desired_size
]
}
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-compute-ng"
})
depends_on = [
aws_iam_role_policy_attachment.eks_worker_node_policy,
aws_iam_role_policy_attachment.eks_cni_policy,
aws_iam_role_policy_attachment.eks_ecr_read_only
]
}
# ------------------------------------------------------------------------------
# EKS Node Group - GPU (Optional)
# ------------------------------------------------------------------------------
resource "aws_eks_node_group" "gpu" {
count = var.enable_gpu_support ? 1 : 0
cluster_name = aws_eks_cluster.openclaw_cluster.name
node_group_name = "${local.name_prefix}-gpu"
node_role_arn = aws_iam_role.eks_nodes.arn
subnet_ids = var.subnet_ids
instance_types = var.gpu_instance_types
scaling_config {
desired_size = 1
max_size = 4
min_size = 0
}
disk_size = 100
ami_type = "AL2_x86_64_GPU"
capacity_type = "ON_DEMAND"
force_update_version = true
labels = {
"workload-type" = "gpu"
"environment" = var.environment
"gpu" = "true"
}
taint {
key = "nvidia.com/gpu"
value = "true"
effect = "NO_SCHEDULE"
}
lifecycle {
ignore_changes = [
scaling_config[0].desired_size
]
}
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-gpu-ng"
})
depends_on = [
aws_iam_role_policy_attachment.eks_worker_node_policy,
aws_iam_role_policy_attachment.eks_cni_policy,
aws_iam_role_policy_attachment.eks_ecr_read_only
]
}
# ------------------------------------------------------------------------------
# EKS Nodes IAM Role
# ------------------------------------------------------------------------------
resource "aws_iam_role" "eks_nodes" {
name = "${local.name_prefix}-eks-nodes-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}
]
})
tags = local.common_tags
}
resource "aws_iam_role_policy_attachment" "eks_worker_node_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.eks_nodes.name
}
resource "aws_iam_role_policy_attachment" "eks_cni_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.eks_nodes.name
}
resource "aws_iam_role_policy_attachment" "eks_ecr_read_only" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.eks_nodes.name
}
# ------------------------------------------------------------------------------
# EKS Nodes Security Group
# ------------------------------------------------------------------------------
resource "aws_security_group" "eks_nodes" {
name = "${local.name_prefix}-eks-nodes-sg"
description = "Security group for EKS worker nodes"
vpc_id = var.vpc_id
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = local.common_tags
}
# Allow communication from EKS control plane
resource "aws_security_group_rule" "eks_nodes_ingress_cluster" {
description = "Allow EKS control plane to communicate with nodes"
security_group_id = aws_security_group.eks_nodes.id
protocol = "tcp"
from_port = 1025
to_port = 65535
source_security_group_id = aws_security_group.eks_cluster.id
type = "ingress"
}
# Allow communication between nodes
resource "aws_security_group_rule" "eks_nodes_self" {
description = "Allow nodes to communicate with each other"
security_group_id = aws_security_group.eks_nodes.id
protocol = "tcp"
from_port = 0
to_port = 65535
self = true
type = "ingress"
}
# ------------------------------------------------------------------------------
# Cluster Autoscaler IAM Policy
# ------------------------------------------------------------------------------
resource "aws_iam_policy" "cluster_autoscaler" {
count = var.enable_cluster_autoscaler ? 1 : 0
name = "${local.name_prefix}-cluster-autoscaler-policy"
description = "IAM policy for Cluster Autoscaler"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeLaunchTemplateVersions"
]
Effect = "Allow"
Resource = "*"
}
]
})
tags = local.common_tags
}
# ------------------------------------------------------------------------------
# AWS Load Balancer Controller IAM Policy
# ------------------------------------------------------------------------------
resource "aws_iam_policy" "aws_load_balancer_controller" {
name = "${local.name_prefix}-aws-load-balancer-controller-policy"
description = "IAM policy for AWS Load Balancer Controller"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"iam:CreateServiceLinkedRole"
]
Effect = "Allow"
Resource = "*"
Condition = {
StringEquals = {
"iam:AWSServiceName" = "elasticloadbalancing.amazonaws.com"
}
}
},
{
Action = [
"ec2:DescribeAccountAttributes",
"ec2:DescribeAddresses",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInternetGateways",
"ec2:DescribeVpcs",
"ec2:DescribeVpcPeeringConnections",
"ec2:DescribeSubnets",
"ec2:DescribeSecurityGroups",
"ec2:DescribeInstances",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeTags",
"ec2:GetCoipPoolUsage",
"ec2:DescribeCoipPools",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeLoadBalancerAttributes",
"elasticloadbalancing:DescribeListeners",
"elasticloadbalancing:DescribeListenerCertificates",
"elasticloadbalancing:DescribeSSLPolicies",
"elasticloadbalancing:DescribeRules",
"elasticloadbalancing:DescribeTargetGroups",
"elasticloadbalancing:DescribeTargetGroupAttributes",
"elasticloadbalancing:DescribeTargetHealth",
"elasticloadbalancing:DescribeTags"
]
Effect = "Allow"
Resource = "*"
},
{
Action = [
"cognito-idp:DescribeUserPoolClient",
"acm:ListCertificates",
"acm:DescribeCertificate",
"iam:ListServerCertificates",
"iam:GetServerCertificate",
"waf-regional:GetWebACL",
"waf-regional:GetWebACLForResource",
"waf-regional:AssociateWebACL",
"waf-regional:DisassociateWebACL",
"wafv2:GetWebACL",
"wafv2:GetWebACLForResource",
"wafv2:AssociateWebACL",
"wafv2:DisassociateWebACL",
"shield:GetSubscriptionState",
"shield:DescribeProtection",
"shield:CreateProtection",
"shield:DeleteProtection"
]
Effect = "Allow"
Resource = "*"
},
{
Action = [
"ec2:AuthorizeSecurityGroupIngress",
"ec2:RevokeSecurityGroupIngress"
]
Effect = "Allow"
Resource = "*"
},
{
Action = [
"ec2:CreateSecurityGroup"
]
Effect = "Allow"
Resource = "*"
},
{
Action = [
"ec2:CreateTags"
]
Effect = "Allow"
Resource = "arn:aws:ec2:*:*:security-group/*"
Condition = {
StringEquals = {
"ec2:CreateAction" = "CreateSecurityGroup"
}
Null = {
"aws:RequestTag/elbv2.k8s.aws/cluster" = "false"
}
}
},
{
Action = [
"ec2:CreateTags",
"ec2:DeleteTags"
]
Effect = "Allow"
Resource = "arn:aws:ec2:*:*:security-group/*"
Condition = {
Null = {
"aws:RequestTag/elbv2.k8s.aws/cluster" = "true"
"aws:ResourceTag/elbv2.k8s.aws/cluster" = "false"
}
}
},
{
Action = [
"ec2:AuthorizeSecurityGroupIngress",
"ec2:RevokeSecurityGroupIngress",
"ec2:DeleteSecurityGroup"
]
Effect = "Allow"
Resource = "*"
Condition = {
Null = {
"aws:ResourceTag/elbv2.k8s.aws/cluster" = "false"
}
}
},
{
Action = [
"elasticloadbalancing:CreateLoadBalancer",
"elasticloadbalancing:CreateTargetGroup"
]
Effect = "Allow"
Resource = "*"
Condition = {
Null = {
"aws:RequestTag/elbv2.k8s.aws/cluster" = "false"
}
}
},
{
Action = [
"elasticloadbalancing:CreateListener",
"elasticloadbalancing:DeleteListener",
"elasticloadbalancing:CreateRule",
"elasticloadbalancing:DeleteRule"
]
Effect = "Allow"
Resource = "*"
},
{
Action = [
"elasticloadbalancing:AddTags",
"elasticloadbalancing:RemoveTags"
]
Effect = "Allow"
Resource = [
"arn:aws:elasticloadbalancing:*:*:targetgroup/*/*",
"arn:aws:elasticloadbalancing:*:*:loadbalancer/net/*/*",
"arn:aws:elasticloadbalancing:*:*:loadbalancer/app/*/*"
]
Condition = {
Null = {
"aws:RequestTag/elbv2.k8s.aws/cluster" = "true"
"aws:ResourceTag/elbv2.k8s.aws/cluster" = "false"
}
}
},
{
Action = [
"elasticloadbalancing:AddTags",
"elasticloadbalancing:RemoveTags"
]
Effect = "Allow"
Resource = [
"arn:aws:elasticloadbalancing:*:*:listener/net/*/*/*",
"arn:aws:elasticloadbalancing:*:*:listener/app/*/*/*",
"arn:aws:elasticloadbalancing:*:*:listener-rule/net/*/*/*",
"arn:aws:elasticloadbalancing:*:*:listener-rule/app/*/*/*"
]
},
{
Action = [
"elasticloadbalancing:ModifyLoadBalancerAttributes",
"elasticloadbalancing:SetIpAddressType",
"elasticloadbalancing:SetSecurityGroups",
"elasticloadbalancing:SetSubnets",
"elasticloadbalancing:DeleteLoadBalancer",
"elasticloadbalancing:ModifyTargetGroup",
"elasticloadbalancing:ModifyTargetGroupAttributes",
"elasticloadbalancing:DeleteTargetGroup"
]
Effect = "Allow"
Resource = "*"
Condition = {
Null = {
"aws:ResourceTag/elbv2.k8s.aws/cluster" = "false"
}
}
},
{
Action = [
"elasticloadbalancing:RegisterTargets",
"elasticloadbalancing:DeregisterTargets"
]
Effect = "Allow"
Resource = "arn:aws:elasticloadbalancing:*:*:targetgroup/*/*"
}
]
})
tags = local.common_tags
}
+245
View File
@@ -0,0 +1,245 @@
# ==============================================================================
# Heretek OpenClaw - AWS ElastiCache Redis Configuration
# ==============================================================================
# ElastiCache Redis for OpenClaw caching and session management
# ==============================================================================
# ------------------------------------------------------------------------------
# ElastiCache Subnet Group
# ------------------------------------------------------------------------------
resource "aws_elasticache_subnet_group" "openclaw" {
name = "${local.name_prefix}-redis-subnet-group"
subnet_ids = var.subnet_ids
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-redis-subnet-group"
})
}
# ------------------------------------------------------------------------------
# ElastiCache Security Group
# ------------------------------------------------------------------------------
resource "aws_security_group" "elasticache" {
name = "${local.name_prefix}-elasticache-sg"
description = "Security group for ElastiCache Redis"
vpc_id = var.vpc_id
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-elasticache-sg"
})
}
# Allow Redis access from EKS nodes
resource "aws_security_group_rule" "elasticache_ingress_from_nodes" {
description = "Allow Redis access from EKS nodes"
security_group_id = aws_security_group.elasticache.id
protocol = "tcp"
from_port = 6379
to_port = 6379
source_security_group_id = var.security_group_ids[0]
type = "ingress"
}
# ------------------------------------------------------------------------------
# ElastiCache Parameter Group
# ------------------------------------------------------------------------------
resource "aws_elasticache_parameter_group" "openclaw" {
family = "redis7"
name = "${local.name_prefix}-redis-params"
parameter {
name = "maxmemory-policy"
value = "allkeys-lru"
}
parameter {
name = "timeout"
value = "300"
}
parameter {
name = "tcp-keepalive"
value = "60"
}
parameter {
name = "slowlog-log-slower-than"
value = "10000"
}
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-redis-params"
})
lifecycle {
create_before_destroy = true
}
}
# ------------------------------------------------------------------------------
# ElastiCache Redis Cluster (Replication Group)
# ------------------------------------------------------------------------------
resource "aws_elasticache_replication_group" "openclaw" {
replication_group_id = "${local.name_prefix}-redis"
description = "ElastiCache Redis cluster for OpenClaw"
node_type = var.redis_node_type
num_cache_clusters = var.redis_automatic_failover_enabled ? 2 : var.redis_num_cache_nodes
engine = "redis"
engine_version = var.redis_engine_version
parameter_group_name = aws_elasticache_parameter_group.openclaw.name
subnet_group_name = aws_elasticache_subnet_group.openclaw.name
security_group_ids = [aws_security_group.elasticache.id]
# Authentication
auth_token = var.redis_auth_token
at_rest_encryption_enabled = true
transit_encryption_enabled = true
# High availability
automatic_failover_enabled = var.redis_automatic_failover_enabled
multi_az_enabled = var.redis_multi_az_enabled
# Persistence
snapshot_retention_limit = var.environment == "prod" ? 7 : 0
snapshot_window = "03:00-04:00"
maintenance_window = "Mon:04:00-Mon:05:00"
# Notifications
notification_topic_arn = var.alarm_notification_arn
# Monitoring
log_delivery_configuration {
destination = aws_cloudwatch_log_group.slowlog[0].name
destination_type = "cloudwatch-logs"
log_format = "json"
log_type = "slow-log"
}
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-redis"
})
lifecycle {
prevent_destroy = true
}
}
# ------------------------------------------------------------------------------
# CloudWatch Log Group for Slow Query Log
# ------------------------------------------------------------------------------
resource "aws_cloudwatch_log_group" "slowlog" {
count = var.environment == "prod" ? 1 : 0
name = "/aws/elasticache/${local.name_prefix}-slowlog"
retention_in_days = 30
tags = local.common_tags
}
# ------------------------------------------------------------------------------
# ElastiCache Global Datastore (Cross-Region Replication for DR)
# ------------------------------------------------------------------------------
resource "aws_elasticache_global_replication_group" "openclaw" {
count = var.environment == "prod" && var.redis_multi_az_enabled ? 1 : 0
global_replication_group_id_suffix = "${local.name_prefix}-global"
primary_replication_group_id = aws_elasticache_replication_group.openclaw.id
tags = local.common_tags
}
# ------------------------------------------------------------------------------
# ElastiCache Serverless (Alternative for Variable Workloads)
# ------------------------------------------------------------------------------
resource "aws_elasticache_serverless_cache" "openclaw" {
count = var.environment == "dev" ? 1 : 0
name = "${local.name_prefix}-redis-serverless"
engine = "REDIS"
subnet_ids = var.subnet_ids
security_group_ids = [aws_security_group.elasticache.id]
major_engine_version = "7"
description = "Serverless Redis cache for development environment"
tags = local.common_tags
}
# ------------------------------------------------------------------------------
# CloudWatch Alarms for ElastiCache
# ------------------------------------------------------------------------------
resource "aws_cloudwatch_metric_alarm" "elasticache_cpu" {
count = var.enable_cloudwatch_alarms ? 1 : 0
alarm_name = "${local.name_prefix}-redis-cpu-utilization"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/ElastiCache"
period = 300
statistic = "Average"
threshold = 80
alarm_description = "Redis CPU utilization is too high"
alarm_actions = var.alarm_notification_arn != null ? [var.alarm_notification_arn] : []
ok_actions = var.alarm_notification_arn != null ? [var.alarm_notification_arn] : []
dimensions = {
CacheClusterId = aws_elasticache_replication_group.openclaw.primary_cluster_id
}
tags = local.common_tags
}
resource "aws_cloudwatch_metric_alarm" "elasticache_memory" {
count = var.enable_cloudwatch_alarms ? 1 : 0
alarm_name = "${local.name_prefix}-redis-memory-utilization"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "FreeableMemory"
namespace = "AWS/ElastiCache"
period = 300
statistic = "Average"
threshold = 268435456 # 256MB
comparison_operator = "LessThanThreshold"
alarm_description = "Redis freeable memory is too low"
alarm_actions = var.alarm_notification_arn != null ? [var.alarm_notification_arn] : []
ok_actions = var.alarm_notification_arn != null ? [var.alarm_notification_arn] : []
dimensions = {
CacheClusterId = aws_elasticache_replication_group.openclaw.primary_cluster_id
}
tags = local.common_tags
}
resource "aws_cloudwatch_metric_alarm" "elasticache_connections" {
count = var.enable_cloudwatch_alarms ? 1 : 0
alarm_name = "${local.name_prefix}-redis-connections"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CurrConnections"
namespace = "AWS/ElastiCache"
period = 300
statistic = "Average"
threshold = 1000
alarm_description = "Redis current connections is too high"
alarm_actions = var.alarm_notification_arn != null ? [var.alarm_notification_arn] : []
ok_actions = var.alarm_notification_arn != null ? [var.alarm_notification_arn] : []
dimensions = {
CacheClusterId = aws_elasticache_replication_group.openclaw.primary_cluster_id
}
tags = local.common_tags
}
+368
View File
@@ -0,0 +1,368 @@
# ==============================================================================
# Heretek OpenClaw - AWS Terraform Configuration
# ==============================================================================
# Main configuration file for AWS infrastructure
# ==============================================================================
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.24"
}
helm = {
source = "hashicorp/helm"
version = "~> 2.12"
}
null = {
source = "hashicorp/null"
version = "~> 3.2"
}
}
backend "s3" {
# Configure backend with variables or environment
# bucket = "terraform-state-bucket"
# key = "openclaw/terraform.tfstate"
# region = "us-east-1"
# encrypt = true
# dynamodb_table = "terraform-locks"
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Project = "openclaw"
Environment = var.environment
ManagedBy = "terraform"
Owner = var.owner
}
}
}
provider "kubernetes" {
host = aws_eks_cluster.openclaw_cluster.endpoint
cluster_ca_certificate = base64decode(aws_eks_cluster.openclaw_cluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.openclaw_cluster.token
}
provider "helm" {
kubernetes {
host = aws_eks_cluster.openclaw_cluster.endpoint
cluster_ca_certificate = base64decode(aws_eks_cluster.openclaw_cluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.openclaw_cluster.token
}
}
# ==============================================================================
# Data Sources
# ==============================================================================
data "aws_eks_cluster_auth" "openclaw_cluster" {
name = aws_eks_cluster.openclaw_cluster.name
}
data "aws_availability_zones" "available" {
state = "available"
}
data "aws_caller_identity" "current" {}
data "aws_partition" "current" {}
# ==============================================================================
# Local Values
# ==============================================================================
locals {
name_prefix = "openclaw-${var.environment}"
common_tags = {
Project = "openclaw"
Environment = var.environment
Version = var.app_version
ManagedBy = "terraform"
}
gpu_instance_types = var.enable_gpu_support ? var.gpu_instance_types : []
# ECR repository URLs
ecr_repository_urls = {
gateway = aws_ecr_repository.openclaw_gateway.repository_url
litellm = aws_ecr_repository.litellm_proxy.repository_url
}
}
# ==============================================================================
# Random Resources
# ==============================================================================
resource "random_string" "suffix" {
length = 8
special = false
upper = false
}
# ==============================================================================
# VPC Module
# ==============================================================================
module "vpc" {
source = "./vpc"
vpc_cidr = var.vpc_cidr
aws_region = var.aws_region
availability_zones = slice(data.aws_availability_zones.available.names, 0, 3)
name_prefix = local.name_prefix
enable_nat_gateway = var.enable_nat_gateway
single_nat_gateway = var.single_nat_gateway
enable_flow_logs = var.enable_vpc_flow_logs
flow_logs_retention = var.flow_logs_retention_days
public_subnet_cidrs = var.public_subnet_cidrs
private_subnet_cidrs = var.private_subnet_cidrs
database_subnet_cidrs = var.database_subnet_cidrs
tags = local.common_tags
}
# ==============================================================================
# EKS Cluster
# ==============================================================================
module "eks" {
source = "./eks"
cluster_name = "${local.name_prefix}-eks"
cluster_version = var.eks_version
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
# Control plane configuration
enable_irsa = var.enable_irsa
enable_cluster_autoscaler = var.enable_cluster_autoscaler
# Node group configuration
node_groups = var.node_groups
gpu_node_groups = local.gpu_instance_types
gpu_enabled = var.enable_gpu_support
# Addons
enable_aws_load_balancer_controller = true
enable_metrics_server = true
enable_cluster_autoscaler_addon = var.enable_cluster_autoscaler
tags = local.common_tags
}
# ==============================================================================
# RDS PostgreSQL
# ==============================================================================
module "rds" {
source = "./rds"
identifier_prefix = "${local.name_prefix}-pg"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.database_subnet_ids
security_group_ids = [module.eks.node_security_group_id]
# Database configuration
engine_version = var.postgresql_version
instance_class = var.db_instance_class
allocated_storage = var.db_allocated_storage
max_allocated_storage = var.db_max_allocated_storage
# Authentication
db_name = var.db_name
db_username = var.db_username
db_password = var.db_password
db_password_kms_key_id = var.db_password_kms_key_id
# High availability
multi_az = var.db_multi_az
publicly_accessible = false
# Backup and maintenance
backup_retention_period = var.db_backup_retention_period
backup_window = var.db_backup_window
maintenance_window = var.db_maintenance_window
# Monitoring
enabled_cloudwatch_logs_exports = ["postgresql"]
performance_insights_enabled = var.db_performance_insights_enabled
performance_insights_retention_period = var.db_performance_insights_retention
tags = local.common_tags
}
# ==============================================================================
# ElastiCache Redis
# ==============================================================================
module "elasticache" {
source = "./elasticache"
cache_cluster_id = "${local.name_prefix}-redis"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
security_group_ids = [module.eks.node_security_group_id]
# Redis configuration
node_type = var.redis_node_type
engine_version = var.redis_engine_version
num_cache_nodes = var.redis_num_cache_nodes
parameter_group_name = var.redis_parameter_group_name
# High availability
automatic_failover_enabled = var.redis_automatic_failover_enabled
multi_az_enabled = var.redis_multi_az_enabled
# Security
auth_token = var.redis_auth_token
auth_token_kms_key_id = var.redis_auth_token_kms_key_id
at_rest_encryption_enabled = true
transit_encryption_enabled = true
tags = local.common_tags
}
# ==============================================================================
# ECR Repositories
# ==============================================================================
module "ecr" {
source = "./ecr"
repositories = {
openclaw_gateway = {
name = "openclaw-gateway"
image_tag_mutability = "MUTABLE"
scan_on_push = true
}
litellm_proxy = {
name = "litellm-proxy"
image_tag_mutability = "MUTABLE"
scan_on_push = true
}
}
lifecycle_policy_enabled = true
lifecycle_policy_days = 30
tags = local.common_tags
}
# ==============================================================================
# Application Load Balancer
# ==============================================================================
module "alb" {
source = "./alb"
alb_name = "${local.name_prefix}-alb"
vpc_id = var.vpc_id
subnet_ids = module.vpc.public_subnet_ids
security_group_ids = [module.eks.node_security_group_id]
# Listener configuration
http_port = 80
https_port = 443
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
certificate_arn = var.acm_certificate_arn
# Target groups
target_groups = [
{
name = "openclaw-gateway"
port = 18789
protocol = "HTTP"
health_check_path = "/health"
},
{
name = "litellm-proxy"
port = 4000
protocol = "HTTP"
health_check_path = "/health"
}
]
enable_deletion_protection = var.alb_deletion_protection
enable_http2 = true
drop_invalid_header_fields = true
tags = local.common_tags
}
# ==============================================================================
# CloudWatch Monitoring
# ==============================================================================
module "cloudwatch" {
source = "./monitoring"
name_prefix = local.name_prefix
eks_cluster_name = aws_eks_cluster.openclaw_cluster.name
rds_identifier = module.rds.db_instance_identifier
redis_cluster_id = module.elasticache.redis_cluster_id
# Dashboard configuration
enable_dashboard = true
dashboard_name = "${local.name_prefix}-dashboard"
# Alarm configuration
enable_alarms = var.enable_cloudwatch_alarms
alarm_notification_arn = var.alarm_notification_arn
# Log groups
log_retention_days = var.log_retention_days
tags = local.common_tags
}
# ==============================================================================
# Outputs
# ==============================================================================
output "vpc_id" {
description = "VPC ID"
value = module.vpc.vpc_id
}
output "eks_cluster_endpoint" {
description = "EKS cluster endpoint"
value = aws_eks_cluster.openclaw_cluster.endpoint
}
output "eks_cluster_name" {
description = "EKS cluster name"
value = aws_eks_cluster.openclaw_cluster.name
}
output "rds_endpoint" {
description = "RDS PostgreSQL endpoint"
value = module.rds.db_instance_endpoint
}
output "redis_endpoint" {
description = "ElastiCache Redis endpoint"
value = module.elasticache.redis_endpoint
}
output "alb_dns_name" {
description = "ALB DNS name"
value = module.alb.alb_dns_name
}
output "ecr_repository_urls" {
description = "ECR repository URLs"
value = local.ecr_repository_urls
}
+263
View File
@@ -0,0 +1,263 @@
# ==============================================================================
# Heretek OpenClaw - AWS Terraform Outputs
# ==============================================================================
# Output values for AWS infrastructure
# ==============================================================================
# ------------------------------------------------------------------------------
# VPC Outputs
# ------------------------------------------------------------------------------
output "vpc_id" {
description = "The ID of the VPC"
value = module.vpc.vpc_id
}
output "vpc_cidr_block" {
description = "The CIDR block of the VPC"
value = module.vpc.vpc_cidr_block
}
output "public_subnet_ids" {
description = "List of public subnet IDs"
value = module.vpc.public_subnet_ids
}
output "private_subnet_ids" {
description = "List of private subnet IDs"
value = module.vpc.private_subnet_ids
}
output "database_subnet_ids" {
description = "List of database subnet IDs"
value = module.vpc.database_subnet_ids
}
output "nat_gateway_ids" {
description = "List of NAT Gateway IDs"
value = module.vpc.nat_gateway_ids
}
# ------------------------------------------------------------------------------
# EKS Outputs
# ------------------------------------------------------------------------------
output "eks_cluster_id" {
description = "The ID of the EKS cluster"
value = aws_eks_cluster.openclaw_cluster.id
}
output "eks_cluster_name" {
description = "The name of the EKS cluster"
value = aws_eks_cluster.openclaw_cluster.name
}
output "eks_cluster_endpoint" {
description = "The endpoint of the EKS cluster"
value = aws_eks_cluster.openclaw_cluster.endpoint
sensitive = true
}
output "eks_cluster_certificate_authority" {
description = "The certificate authority of the EKS cluster"
value = aws_eks_cluster.openclaw_cluster.certificate_authority[0].data
sensitive = true
}
output "eks_cluster_version" {
description = "The Kubernetes version of the EKS cluster"
value = aws_eks_cluster.openclaw_cluster.version
}
output "eks_cluster_security_group_id" {
description = "The security group ID of the EKS cluster"
value = aws_eks_cluster.openclaw_cluster.vpc_config[0].cluster_security_group_id
}
output "eks_node_security_group_id" {
description = "The node security group ID"
value = module.eks.node_security_group_id
}
output "eks_oidc_provider_arn" {
description = "The ARN of the OIDC provider"
value = aws_iam_openid_connect_provider.eks.arn
}
output "eks_kubeconfig_command" {
description = "Command to update kubeconfig"
value = "aws eks update-kubeconfig --name ${aws_eks_cluster.openclaw_cluster.name} --region ${var.aws_region}"
}
# ------------------------------------------------------------------------------
# RDS PostgreSQL Outputs
# ------------------------------------------------------------------------------
output "rds_instance_id" {
description = "The ID of the RDS instance"
value = module.rds.db_instance_id
}
output "rds_instance_arn" {
description = "The ARN of the RDS instance"
value = module.rds.db_instance_arn
}
output "rds_endpoint" {
description = "The endpoint of the RDS instance"
value = module.rds.db_instance_endpoint
}
output "rds_port" {
description = "The port of the RDS instance"
value = module.rds.db_instance_port
}
output "rds_database_name" {
description = "The name of the database"
value = module.rds.db_name
}
output "rds_username" {
description = "The master username of the RDS instance"
value = module.rds.db_username
sensitive = true
}
output "rds_connection_string" {
description = "The PostgreSQL connection string"
value = "postgresql://${module.rds.db_username}:${var.db_password}@${module.rds.db_instance_endpoint}/${module.rds.db_name}"
sensitive = true
}
output "rds_security_group_id" {
description = "The security group ID of the RDS instance"
value = module.rds.db_security_group_id
}
# ------------------------------------------------------------------------------
# ElastiCache Redis Outputs
# ------------------------------------------------------------------------------
output "redis_cluster_id" {
description = "The ID of the Redis cluster"
value = module.elasticache.redis_cluster_id
}
output "redis_endpoint" {
description = "The endpoint of the Redis cluster"
value = module.elasticache.redis_endpoint
}
output "redis_port" {
description = "The port of the Redis cluster"
value = module.elasticache.redis_port
}
output "redis_connection_string" {
description = "The Redis connection string"
value = "redis://${var.redis_auth_token != null ? "${var.redis_auth_token}@" : ""}${module.elasticache.redis_endpoint}:${module.elasticache.redis_port}"
sensitive = true
}
output "redis_security_group_id" {
description = "The security group ID of the Redis cluster"
value = module.elasticache.redis_security_group_id
}
# ------------------------------------------------------------------------------
# ECR Outputs
# ------------------------------------------------------------------------------
output "ecr_repository_arns" {
description = "ARNs of ECR repositories"
value = module.ecr.repository_arns
}
output "ecr_repository_urls" {
description = "URLs of ECR repositories"
value = module.ecr.repository_urls
}
output "ecr_registry_id" {
description = "ECR registry ID"
value = module.ecr.registry_id
}
# ------------------------------------------------------------------------------
# ALB Outputs
# ------------------------------------------------------------------------------
output "alb_id" {
description = "The ID of the ALB"
value = module.alb.alb_id
}
output "alb_arn" {
description = "The ARN of the ALB"
value = module.alb.alb_arn
}
output "alb_dns_name" {
description = "The DNS name of the ALB"
value = module.alb.alb_dns_name
}
output "alb_zone_id" {
description = "The Zone ID of the ALB"
value = module.alb.alb_zone_id
}
output "alb_security_group_id" {
description = "The security group ID of the ALB"
value = module.alb.alb_security_group_id
}
output "alb_http_listener_arn" {
description = "The ARN of the HTTP listener"
value = module.alb.http_listener_arn
}
output "alb_https_listener_arn" {
description = "The ARN of the HTTPS listener"
value = module.alb.https_listener_arn
}
# ------------------------------------------------------------------------------
# CloudWatch Outputs
# ------------------------------------------------------------------------------
output "cloudwatch_dashboard_arn" {
description = "The ARN of the CloudWatch dashboard"
value = module.cloudwatch.dashboard_arn
}
output "cloudwatch_log_groups" {
description = "Map of CloudWatch log group names"
value = module.cloudwatch.log_group_names
}
output "cloudwatch_alarm_arns" {
description = "List of CloudWatch alarm ARNs"
value = module.cloudwatch.alarm_arns
}
# ------------------------------------------------------------------------------
# Cost Estimation
# ------------------------------------------------------------------------------
output "estimated_monthly_cost" {
description = "Estimated monthly cost breakdown"
value = {
eks_cluster = "~$73 (control plane)"
eks_nodes_general = "~$${var.node_groups.general.desired_size * 140} (general nodes)"
eks_nodes_compute = "~$${var.node_groups.compute.desired_size * 250} (compute nodes)"
eks_nodes_gpu = var.enable_gpu_support ? "~$${2 * 2000} (GPU nodes)" : "$0"
rds_postgresql = "~$${var.db_multi_az ? 250 : 125} (db.${var.db_instance_class})"
elasticache_redis = "~$${var.redis_multi_az_enabled ? 150 : 75} (${var.redis_node_type})"
nat_gateway = var.single_nat_gateway ? "~$32" : "~$64"
alb = "~$16"
data_transfer = "Variable"
total_estimate = "See AWS Cost Explorer for accurate pricing"
}
}
+318
View File
@@ -0,0 +1,318 @@
# ==============================================================================
# Heretek OpenClaw - AWS RDS PostgreSQL Configuration
# ==============================================================================
# RDS PostgreSQL database for OpenClaw
# ==============================================================================
# ------------------------------------------------------------------------------
# RDS Subnet Group
# ------------------------------------------------------------------------------
resource "aws_db_subnet_group" "openclaw" {
name = "${local.name_prefix}-db-subnet-group"
subnet_ids = var.database_subnet_ids
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-db-subnet-group"
})
}
# ------------------------------------------------------------------------------
# RDS Security Group
# ------------------------------------------------------------------------------
resource "aws_security_group" "rds" {
name = "${local.name_prefix}-rds-sg"
description = "Security group for RDS PostgreSQL"
vpc_id = var.vpc_id
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-rds-sg"
})
}
# Allow PostgreSQL access from EKS nodes
resource "aws_security_group_rule" "rds_ingress_from_nodes" {
description = "Allow PostgreSQL access from EKS nodes"
security_group_id = aws_security_group.rds.id
protocol = "tcp"
from_port = 5432
to_port = 5432
source_security_group_id = var.security_group_ids[0]
type = "ingress"
}
# Allow outbound traffic
resource "aws_security_group_rule" "rds_egress" {
description = "Allow outbound traffic"
security_group_id = aws_security_group.rds.id
protocol = "tcp"
from_port = 443
to_port = 443
cidr_blocks = ["0.0.0.0/0"]
type = "egress"
}
# ------------------------------------------------------------------------------
# RDS PostgreSQL Instance
# ------------------------------------------------------------------------------
resource "aws_db_instance" "openclaw" {
identifier = "${local.name_prefix}-pg"
# Engine configuration
engine = "postgres"
engine_version = var.postgresql_version
instance_class = var.db_instance_class
allocated_storage = var.db_allocated_storage
max_allocated_storage = var.db_max_allocated_storage
storage_type = "gp3"
storage_encrypted = true
kms_key_id = var.db_password_kms_key_id
# Database configuration
db_name = var.db_name
username = var.db_username
password = var.db_password
# Network configuration
db_subnet_group_name = aws_db_subnet_group.openclaw.name
vpc_security_group_ids = [aws_security_group.rds.id]
publicly_accessible = false
# High availability
multi_az = var.db_multi_az
availability_zone = var.db_multi_az ? null : data.aws_availability_zones.available.names[0]
# Backup configuration
backup_retention_period = var.db_backup_retention_period
backup_window = var.db_backup_window
maintenance_window = var.db_maintenance_window
copy_tags_to_snapshot = true
delete_automated_backups = var.environment == "dev"
skip_final_snapshot = var.environment == "dev"
final_snapshot_identifier = var.environment == "dev" ? null : "${local.name_prefix}-final-snapshot"
# Monitoring
enabled_cloudwatch_logs_exports = ["postgresql"]
performance_insights_enabled = var.db_performance_insights_enabled
performance_insights_retention_period = var.db_performance_insights_enabled ? var.db_performance_insights_retention : null
monitoring_interval = 60
monitoring_role_arn = aws_iam_role.rds_monitoring.arn
# Parameters
parameter_group_name = aws_db_parameter_group.openclaw.name
option_group_name = aws_db_option_group.openclaw.name
# Tags
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-pg"
})
lifecycle {
prevent_destroy = true
}
}
# ------------------------------------------------------------------------------
# RDS Parameter Group
# ------------------------------------------------------------------------------
resource "aws_db_parameter_group" "openclaw" {
name = "${local.name_prefix}-pg-params"
family = "postgres${var.postgresql_version}"
parameter {
name = "log_statement"
value = "all"
}
parameter {
name = "log_min_duration_statement"
value = "1000"
}
parameter {
name = "shared_preload_libraries"
value = "pg_stat_statements"
}
parameter {
name = "pg_stat_statements.track"
value = "all"
}
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-pg-params"
})
}
# ------------------------------------------------------------------------------
# RDS Option Group
# ------------------------------------------------------------------------------
resource "aws_db_option_group" "openclaw" {
name = "${local.name_prefix}-pg-options"
option_group_description = "Option group for OpenClaw PostgreSQL"
engine_name = "postgres"
major_engine_version = var.postgresql_version
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-pg-options"
})
}
# ------------------------------------------------------------------------------
# RDS Monitoring IAM Role
# ------------------------------------------------------------------------------
resource "aws_iam_role" "rds_monitoring" {
name = "${local.name_prefix}-rds-monitoring-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "monitoring.rds.amazonaws.com"
}
}
]
})
tags = local.common_tags
}
resource "aws_iam_role_policy_attachment" "rds_monitoring" {
role = aws_iam_role.rds_monitoring.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonRDSEnhancedMonitoringRole"
}
# ------------------------------------------------------------------------------
# RDS Read Replica (Optional for Production)
# ------------------------------------------------------------------------------
resource "aws_db_instance" "openclaw_replica" {
count = var.environment == "prod" && var.db_multi_az ? 1 : 0
identifier = "${local.name_prefix}-pg-replica"
replicate_source_db = aws_db_instance.openclaw.identifier
instance_class = var.db_instance_class
# Network configuration
db_subnet_group_name = aws_db_subnet_group.openclaw.name
vpc_security_group_ids = [aws_security_group.rds.id]
publicly_accessible = false
# Backup configuration
backup_retention_period = 0
skip_final_snapshot = true
# Monitoring
monitoring_interval = 60
monitoring_role_arn = aws_iam_role.rds_monitoring.arn
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-pg-replica"
Role = "read-replica"
})
}
# ------------------------------------------------------------------------------
# RDS Proxy (Optional for Connection Pooling)
# ------------------------------------------------------------------------------
resource "aws_secretsmanager_secret" "rds_credentials" {
name = "${local.name_prefix}/rds/credentials"
tags = local.common_tags
}
resource "aws_secretsmanager_secret_version" "rds_credentials" {
secret_id = aws_secretsmanager_secret.rds_credentials.id
secret_string = jsonencode({
username = var.db_username
password = var.db_password
dbname = var.db_name
host = aws_db_instance.openclaw.address
port = aws_db_instance.openclaw.port
})
}
resource "aws_db_proxy" "openclaw" {
count = var.environment == "prod" ? 1 : 0
name = "${local.name_prefix}-db-proxy"
debug_logging = false
engine_family = "POSTGRESQL"
idle_client_timeout = 1800
require_tls = true
role_arn = aws_iam_role.rds_proxy.arn
vpc_security_group_ids = [aws_security_group.rds.id]
vpc_subnet_ids = var.database_subnet_ids
auth {
auth_scheme = "SECRETS"
iam_auth = "DISABLED"
secret_arn = aws_secretsmanager_secret.rds_credentials.arn
client_password = "REQUIRED"
}
tags = local.common_tags
}
resource "aws_db_proxy_default_target_group" "openclaw" {
count = var.environment == "prod" ? 1 : 0
db_proxy_name = aws_db_proxy.openclaw[0].name
connection_pool_config {
connection_borrow_timeout = 120
init_query = "SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY;"
max_connections_percent = 100
max_idle_connections_percent = 50
session_pinning_filters = ["EXCLUDE_CHANGE_SET"]
}
}
resource "aws_db_proxy_target" "openclaw" {
count = var.environment == "prod" ? 1 : 0
db_instance_identifier = aws_db_instance.openclaw.identifier
db_proxy_name = aws_db_proxy.openclaw[0].name
target_group_name = aws_db_proxy_default_target_group.openclaw[0].name
}
# ------------------------------------------------------------------------------
# RDS Proxy IAM Role
# ------------------------------------------------------------------------------
resource "aws_iam_role" "rds_proxy" {
count = var.environment == "prod" ? 1 : 0
name = "${local.name_prefix}-rds-proxy-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "rds.amazonaws.com"
}
}
]
})
tags = local.common_tags
}
resource "aws_iam_role_policy_attachment" "rds_proxy" {
count = var.environment == "prod" ? 1 : 0
role = aws_iam_role.rds_proxy[0].name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonRDSProxyFullAccess"
}
+351
View File
@@ -0,0 +1,351 @@
# ==============================================================================
# Heretek OpenClaw - AWS Terraform Variables
# ==============================================================================
# Input variables for AWS infrastructure
# ==============================================================================
# ------------------------------------------------------------------------------
# General Configuration
# ------------------------------------------------------------------------------
variable "aws_region" {
description = "AWS region for resources"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Deployment environment (dev, staging, prod)"
type = string
default = "dev"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be one of: dev, staging, prod."
}
}
variable "owner" {
description = "Owner of the resources"
type = string
default = "platform-team"
}
variable "app_version" {
description = "Application version to deploy"
type = string
default = "2026.3.28"
}
# ------------------------------------------------------------------------------
# VPC Configuration
# ------------------------------------------------------------------------------
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
default = "10.0.0.0/16"
}
variable "public_subnet_cidrs" {
description = "CIDR blocks for public subnets"
type = list(string)
default = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
}
variable "private_subnet_cidrs" {
description = "CIDR blocks for private subnets"
type = list(string)
default = ["10.0.10.0/24", "10.0.11.0/24", "10.0.12.0/24"]
}
variable "database_subnet_cidrs" {
description = "CIDR blocks for database subnets"
type = list(string)
default = ["10.0.20.0/24", "10.0.21.0/24", "10.0.22.0/24"]
}
variable "enable_nat_gateway" {
description = "Enable NAT Gateway for private subnets"
type = bool
default = true
}
variable "single_nat_gateway" {
description = "Use single NAT Gateway (cost optimization for dev)"
type = bool
default = false
}
variable "enable_vpc_flow_logs" {
description = "Enable VPC Flow Logs"
type = bool
default = true
}
variable "flow_logs_retention_days" {
description = "Retention period for VPC Flow Logs"
type = number
default = 30
}
# ------------------------------------------------------------------------------
# EKS Configuration
# ------------------------------------------------------------------------------
variable "eks_version" {
description = "Kubernetes version for EKS"
type = string
default = "1.28"
}
variable "enable_irsa" {
description = "Enable IAM Roles for Service Accounts"
type = bool
default = true
}
variable "enable_cluster_autoscaler" {
description = "Enable Cluster Autoscaler"
type = bool
default = true
}
variable "node_groups" {
description = "EKS node group configurations"
type = object({
general = object({
instance_types = list(string)
min_size = number
max_size = number
desired_size = number
disk_size = number
})
compute = object({
instance_types = list(string)
min_size = number
max_size = number
desired_size = number
disk_size = number
})
})
default = {
general = {
instance_types = ["m6i.xlarge", "m6i.2xlarge"]
min_size = 1
max_size = 4
desired_size = 2
disk_size = 50
}
compute = {
instance_types = ["c6i.2xlarge", "c6i.4xlarge"]
min_size = 1
max_size = 8
desired_size = 2
disk_size = 100
}
}
}
variable "enable_gpu_support" {
description = "Enable GPU node group for Ollama"
type = bool
default = false
}
variable "gpu_instance_types" {
description = "GPU instance types for Ollama (G5 for NVIDIA)"
type = list(string)
default = ["g5.xlarge", "g5.2xlarge"]
}
# ------------------------------------------------------------------------------
# RDS PostgreSQL Configuration
# ------------------------------------------------------------------------------
variable "postgresql_version" {
description = "PostgreSQL engine version"
type = string
default = "15"
}
variable "db_instance_class" {
description = "RDS instance class"
type = string
default = "db.m6i.large"
}
variable "db_allocated_storage" {
description = "Initial allocated storage in GB"
type = number
default = 50
}
variable "db_max_allocated_storage" {
description = "Maximum allocated storage in GB"
type = number
default = 500
}
variable "db_name" {
description = "Database name"
type = string
default = "openclaw"
}
variable "db_username" {
description = "Database master username"
type = string
default = "openclaw"
sensitive = true
}
variable "db_password" {
description = "Database master password"
type = string
default = null
sensitive = true
}
variable "db_password_kms_key_id" {
description = "KMS key ID for encrypting db_password"
type = string
default = null
}
variable "db_multi_az" {
description = "Enable Multi-AZ deployment"
type = bool
default = false
}
variable "db_backup_retention_period" {
description = "Backup retention period in days"
type = number
default = 7
}
variable "db_backup_window" {
description = "Preferred backup window"
type = string
default = "03:00-04:00"
}
variable "db_maintenance_window" {
description = "Preferred maintenance window"
type = string
default = "Mon:04:00-Mon:05:00"
}
variable "db_performance_insights_enabled" {
description = "Enable Performance Insights"
type = bool
default = true
}
variable "db_performance_insights_retention" {
description = "Performance Insights retention period in days"
type = number
default = 7
}
# ------------------------------------------------------------------------------
# ElastiCache Redis Configuration
# ------------------------------------------------------------------------------
variable "redis_node_type" {
description = "Redis node type"
type = string
default = "cache.m6i.large"
}
variable "redis_engine_version" {
description = "Redis engine version"
type = string
default = "7.0"
}
variable "redis_num_cache_nodes" {
description = "Number of cache nodes"
type = number
default = 1
}
variable "redis_parameter_group_name" {
description = "Redis parameter group name"
type = string
default = "default.redis7"
}
variable "redis_automatic_failover_enabled" {
description = "Enable automatic failover (requires cluster mode)"
type = bool
default = false
}
variable "redis_multi_az_enabled" {
description = "Enable Multi-AZ for Redis"
type = bool
default = false
}
variable "redis_auth_token" {
description = "Redis authentication token"
type = string
default = null
sensitive = true
}
variable "redis_auth_token_kms_key_id" {
description = "KMS key ID for encrypting redis_auth_token"
type = string
default = null
}
# ------------------------------------------------------------------------------
# ECR Configuration
# ------------------------------------------------------------------------------
variable "lifecycle_policy_days" {
description = "Days to retain images in ECR"
type = number
default = 30
}
# ------------------------------------------------------------------------------
# ALB Configuration
# ------------------------------------------------------------------------------
variable "acm_certificate_arn" {
description = "ACM certificate ARN for HTTPS listener"
type = string
default = null
}
variable "alb_deletion_protection" {
description = "Enable ALB deletion protection"
type = bool
default = true
}
# ------------------------------------------------------------------------------
# CloudWatch Configuration
# ------------------------------------------------------------------------------
variable "enable_cloudwatch_alarms" {
description = "Enable CloudWatch alarms"
type = bool
default = true
}
variable "alarm_notification_arn" {
description = "SNS topic ARN for alarm notifications"
type = string
default = null
}
variable "log_retention_days" {
description = "CloudWatch Logs retention period"
type = number
default = 30
}
+294
View File
@@ -0,0 +1,294 @@
# ==============================================================================
# Heretek OpenClaw - AWS VPC Configuration
# ==============================================================================
# VPC module for OpenClaw infrastructure
# ==============================================================================
# This file is a placeholder - the actual VPC configuration
# is in the ./vpc subdirectory module referenced in main.tf
#
# The VPC module creates:
# - VPC with configurable CIDR
# - Public subnets across multiple AZs
# - Private subnets for application workloads
# - Database subnets for RDS
# - Internet Gateway
# - NAT Gateways (configurable)
# - Route tables
# - VPC Flow Logs
#
# Usage in main.tf:
# module "vpc" {
# source = "./vpc"
# ...
# }
# ------------------------------------------------------------------------------
# VPC Module Structure
# ------------------------------------------------------------------------------
#
# File: deploy/aws/terraform/vpc/main.tf
# File: deploy/aws/terraform/vpc/variables.tf
# File: deploy/aws/terraform/vpc/outputs.tf
#
# For now, we inline the VPC resources here for simplicity.
# In production, extract to a separate module.
# ------------------------------------------------------------------------------
# VPC
# ------------------------------------------------------------------------------
resource "aws_vpc" "openclaw" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${local.name_prefix}-vpc"
}
}
# ------------------------------------------------------------------------------
# Internet Gateway
# ------------------------------------------------------------------------------
resource "aws_internet_gateway" "openclaw" {
vpc_id = aws_vpc.openclaw.id
tags = {
Name = "${local.name_prefix}-igw"
}
}
# ------------------------------------------------------------------------------
# Public Subnets
# ------------------------------------------------------------------------------
resource "aws_subnet" "public" {
count = length(var.public_subnet_cidrs)
vpc_id = aws_vpc.openclaw.id
cidr_block = var.public_subnet_cidrs[count.index]
availability_zone = element(data.aws_availability_zones.available.names, count.index)
map_public_ip_on_launch = true
tags = {
Name = "${local.name_prefix}-public-${count.index + 1}"
Type = "public"
}
}
# ------------------------------------------------------------------------------
# Private Subnets
# ------------------------------------------------------------------------------
resource "aws_subnet" "private" {
count = length(var.private_subnet_cidrs)
vpc_id = aws_vpc.openclaw.id
cidr_block = var.private_subnet_cidrs[count.index]
availability_zone = element(data.aws_availability_zones.available.names, count.index)
tags = {
Name = "${local.name_prefix}-private-${count.index + 1}"
Type = "private"
}
}
# ------------------------------------------------------------------------------
# Database Subnets
# ------------------------------------------------------------------------------
resource "aws_subnet" "database" {
count = length(var.database_subnet_cidrs)
vpc_id = aws_vpc.openclaw.id
cidr_block = var.database_subnet_cidrs[count.index]
availability_zone = element(data.aws_availability_zones.available.names, count.index)
tags = {
Name = "${local.name_prefix}-database-${count.index + 1}"
Type = "database"
}
}
# ------------------------------------------------------------------------------
# Database Subnet Group
# ------------------------------------------------------------------------------
resource "aws_db_subnet_group" "openclaw" {
name = "${local.name_prefix}-db-subnet-group"
subnet_ids = aws_subnet.database[*].id
tags = {
Name = "${local.name_prefix}-db-subnet-group"
}
}
# ------------------------------------------------------------------------------
# Elastic IP for NAT Gateway
# ------------------------------------------------------------------------------
resource "aws_eip" "nat" {
count = var.single_nat_gateway ? 1 : length(var.public_subnet_cidrs)
domain = "vpc"
tags = {
Name = "${local.name_prefix}-nat-eip-${count.index + 1}"
}
depends_on = [aws_internet_gateway.openclaw]
}
# ------------------------------------------------------------------------------
# NAT Gateway
# ------------------------------------------------------------------------------
resource "aws_nat_gateway" "openclaw" {
count = var.single_nat_gateway ? 1 : length(var.public_subnet_cidrs)
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = {
Name = "${local.name_prefix}-nat-${count.index + 1}"
}
depends_on = [aws_internet_gateway.openclaw]
}
# ------------------------------------------------------------------------------
# Route Tables
# ------------------------------------------------------------------------------
# Public route table
resource "aws_route_table" "public" {
vpc_id = aws_vpc.openclaw.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.openclaw.id
}
tags = {
Name = "${local.name_prefix}-public-rt"
Type = "public"
}
}
# Private route table
resource "aws_route_table" "private" {
count = var.single_nat_gateway ? 1 : length(var.private_subnet_cidrs)
vpc_id = aws_vpc.openclaw.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.openclaw[var.single_nat_gateway ? 0 : count.index].id
}
tags = {
Name = "${local.name_prefix}-private-rt-${count.index + 1}"
Type = "private"
}
}
# ------------------------------------------------------------------------------
# Route Table Associations
# ------------------------------------------------------------------------------
resource "aws_route_table_association" "public" {
count = length(var.public_subnet_cidrs)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "private" {
count = length(var.private_subnet_cidrs)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[var.single_nat_gateway ? 0 : count.index].id
}
resource "aws_route_table_association" "database" {
count = length(var.database_subnet_cidrs)
subnet_id = aws_subnet.database[count.index].id
route_table_id = aws_route_table.public.id
}
# ------------------------------------------------------------------------------
# VPC Flow Logs
# ------------------------------------------------------------------------------
resource "aws_cloudwatch_log_group" "flow_logs" {
count = var.enable_vpc_flow_logs ? 1 : 0
name = "/aws/vpc/${local.name_prefix}-flow-logs"
retention_in_days = var.flow_logs_retention_days
tags = {
Name = "${local.name_prefix}-flow-logs"
}
}
resource "aws_flow_log" "openclaw" {
count = var.enable_vpc_flow_logs ? 1 : 0
iam_role_arn = aws_iam_role.flow_logs[0].arn
log_destination = aws_cloudwatch_log_group.flow_logs[0].arn
traffic_type = "ALL"
vpc_id = aws_vpc.openclaw.id
tags = {
Name = "${local.name_prefix}-flow-log"
}
}
resource "aws_iam_role" "flow_logs" {
count = var.enable_vpc_flow_logs ? 1 : 0
name = "${local.name_prefix}-flow-logs-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "vpc-flow-logs.amazonaws.com"
}
}
]
})
tags = {
Name = "${local.name_prefix}-flow-logs-role"
}
}
resource "aws_iam_role_policy" "flow_logs" {
count = var.enable_vpc_flow_logs ? 1 : 0
name = "${local.name_prefix}-flow-logs-policy"
role = aws_iam_role.flow_logs[0].id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams"
]
Effect = "Allow"
Resource = "*"
}
]
})
}
+522
View File
@@ -0,0 +1,522 @@
# Azure Deployment Guide for Heretek OpenClaw
**Version:** 1.0.0
**Last Updated:** 2026-03-31
**OpenClaw Version:** v2026.3.28
This guide provides comprehensive instructions for deploying Heretek OpenClaw on Microsoft Azure using Terraform Infrastructure as Code (IaC).
---
## Table of Contents
1. [Overview](#overview)
2. [Prerequisites](#prerequisites)
3. [Architecture](#architecture)
4. [Cost Estimates](#cost-estimates)
5. [Quick Start](#quick-start)
6. [Configuration](#configuration)
7. [Deployment Steps](#deployment-steps)
8. [Post-Deployment](#post-deployment)
9. [GPU Support](#gpu-support)
10. [Monitoring](#monitoring)
11. [Backup & Recovery](#backup--recovery)
12. [Troubleshooting](#troubleshooting)
---
## Overview
This Terraform configuration deploys a production-ready OpenClaw environment on Azure with:
- **AKS (Azure Kubernetes Service)** - Managed Kubernetes cluster
- **Azure Database for PostgreSQL** - Flexible Server with pgvector support
- **Azure Cache for Redis** - Managed Redis for caching and sessions
- **Azure Container Registry (ACR)** - Private container registry
- **Application Gateway** - Traffic routing and SSL termination
- **Azure Monitor** - Metrics, logging, and alerting
### Components
| Component | Service | Purpose |
|-----------|---------|---------|
| Gateway | AKS | OpenClaw Gateway (port 18789) |
| LiteLLM | AKS | LLM proxy and routing (port 4000) |
| Database | Azure Database for PostgreSQL 15 | Primary data store with pgvector |
| Cache | Azure Cache for Redis | Session management, caching |
| Container Registry | ACR | Private image storage |
| Load Balancer | Application Gateway | HTTPS termination, routing |
| Monitoring | Azure Monitor | Metrics, logs, alerts |
---
## Prerequisites
### Required Tools
```bash
# Install Terraform
brew install terraform # macOS
# or download from https://www.terraform.io/downloads
# Install Azure CLI
brew install azure-cli # macOS
# or follow https://docs.microsoft.com/en-us/cli/azure/install-azure-cli
# Install kubectl
brew install kubectl
# Install Helm
brew install helm
```
### Azure Account Setup
1. **Azure Subscription** - Active subscription with sufficient credits
2. **Service Principal** - Service principal with contributor access
3. **Budget Alert** - Set up cost alerts in Azure Cost Management
### Configure Azure Credentials
```bash
# Login to Azure
az login
# Set subscription
az account set --subscription "YOUR_SUBSCRIPTION_ID"
# Create service principal for Terraform
az ad sp create-for-rbac --name "openclaw-terraform" --role contributor \
--scopes /subscriptions/YOUR_SUBSCRIPTION_ID \
--sdk-auth
# Set environment variables
export ARM_CLIENT_ID="your-app-id"
export ARM_CLIENT_SECRET="your-password"
export ARM_SUBSCRIPTION_ID="your-subscription-id"
export ARM_TENANT_ID="your-tenant-id"
```
### Required Azure Permissions
| Service | Required Permissions |
|---------|---------------------|
| AKS | Contributor |
| Virtual Network | Network Contributor |
| PostgreSQL | PostgreSQL Server Contributor |
| Redis | Redis Cache Contributor |
| ACR | AcrPush |
| Application Gateway | Network Contributor |
| Key Vault | Key Vault Administrator |
| Monitor | Monitoring Contributor |
### Enable Required Resource Providers
```bash
az provider register --namespace Microsoft.ContainerService
az provider register --namespace Microsoft.DBforPostgreSQL
az provider register --namespace Microsoft.Cache
az provider register --namespace Microsoft.ContainerRegistry
az provider register --namespace Microsoft.Network
az provider register --namespace Microsoft.KeyVault
```
---
## Architecture
```
┌─────────────────────────────────────────────┐
│ Microsoft Azure │
│ East US │
└─────────────────────────────────────────────┘
┌─────────────────────────────────┼─────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
│ Availability │ │ Availability │ │ Availability │
│ Zone 1 │ │ Zone 2 │ │ Zone 3 │
│ │ │ │ │ │
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
│ │ AKS Nodes │ │ │ │ AKS Nodes │ │ │ │ AKS Nodes │ │
│ │ (System) │ │ │ │ (User) │ │ │ │ (GPU) │ │
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
│ │ │ │ │ │
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
│ │ PostgreSQL │ │ │ │ Azure Cache │ │ │ │ ACR │ │
│ │ Flexible │ │ │ │ for Redis │ │ │ │ │ │
│ │ Server │ │ │ │ │ │ │ │ │ │
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
│ │ │
└─────────────────────────────────┼─────────────────────────────────┘
┌─────────────────────────────────┼─────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ Application Gateway │
│ (WAF_v2 with SSL Termination) │
└─────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ Azure Monitor │
│ (Log Analytics, Alerts, Dashboard) │
└─────────────────────────────────────────────────────────────────────────────────────────┘
```
---
## Cost Estimates
### Development Environment
| Resource | Configuration | Monthly Cost (USD) |
|----------|--------------|-------------------|
| AKS Cluster | Standard | $73.00 |
| AKS Nodes | 2x Standard_D4s_v3 | $280.00 |
| PostgreSQL | GP_Gen5_2, 100GB | $150.00 |
| Redis Cache | C2 Standard | $100.00 |
| Application Gateway | Standard_v2 | $30.00 |
| ACR | Standard | $10.00 |
| Azure Monitor | Standard | $50.00 |
| Network Egress | Estimated | $30.00 |
| **Total** | | **~$723.00/month** |
### Production Environment
| Resource | Configuration | Monthly Cost (USD) |
|----------|--------------|-------------------|
| AKS Cluster | Standard | $73.00 |
| AKS Nodes System | 3x Standard_D2s_v3 | $210.00 |
| AKS Nodes User | 4x Standard_D8s_v3 | $1,120.00 |
| AKS Nodes GPU | 2x Standard_NC4as_T4_v3 | $5,000.00 |
| PostgreSQL | GP_Gen5_4, Multi-AZ, 200GB | $400.00 |
| Redis Cache | C6 Premium | $400.00 |
| Application Gateway | WAF_v2 | $100.00 |
| ACR | Premium | $50.00 |
| Azure Monitor | Premium | $100.00 |
| Key Vault | Standard | $5.00 |
| Network Egress | Estimated | $150.00 |
| **Total** | | **~$7,608.00/month** |
> **Note:** GPU costs are significant. Consider using spot instances or scheduling for cost optimization.
### Cost Optimization Tips
1. **Use Azure Reserved VM Instances** for predictable workloads (up to 72% savings)
2. **Use Azure Spot VMs** for non-critical workloads
3. **Enable AKS Cluster Autoscaler** to scale nodes based on demand
4. **Use PostgreSQL Burstable SKU** for development environments
5. **Enable Azure Cost Management budgets**
---
## Quick Start
### Clone Repository
```bash
git clone https://github.com/Heretek-AI/heretek-openclaw.git
cd heretek-openclaw/deploy/azure/terraform
```
### Initialize Terraform
```bash
terraform init
```
### Create Terraform Variables File
```bash
cat > terraform.tfvars <<EOF
resource_group_name = "openclaw-rg"
location = "eastus"
environment = "dev"
vnet_address_space = ["10.0.0.0/16"]
db_administrator_login = "openclaw"
db_administrator_password = "generate-secure-password"
redis_password = "generate-secure-token"
# Optional: GPU support for Ollama
enable_gpu_support = false
# Optional: Custom domain
domain_name_label = "openclaw-dev"
EOF
```
### Plan and Apply
```bash
# Review the plan
terraform plan -out=tfplan
# Apply the configuration
terraform apply tfplan
```
### Configure kubectl
```bash
az aks get-credentials --resource-group openclaw-rg --name openclaw-dev-aks
```
### Deploy OpenClaw to AKS
```bash
cd ../../kubernetes
kubectl apply -k overlays/dev
```
---
## Configuration
### Input Variables
| Variable | Description | Default | Required |
|----------|-------------|---------|----------|
| `resource_group_name` | Resource group name | `openclaw-rg` | No |
| `location` | Azure region | `eastus` | No |
| `environment` | Environment name | `dev` | Yes |
| `vnet_address_space` | VNet CIDR | `["10.0.0.0/16"]` | No |
| `enable_gpu_support` | Enable GPU nodes | `false` | No |
| `db_administrator_password` | PostgreSQL password | `null` | Yes |
| `redis_password` | Redis password | `null` | Yes |
| `domain_name_label` | DNS label for gateway | `null` | No |
### Environment-Specific Overrides
#### Development (`terraform.dev.tfvars`)
```hcl
environment = "dev"
db_geo_redundant_backup = false
redis_sku_name = "Basic"
enable_monitoring_alerts = false
default_node_pool = {
name = "default"
vm_size = "Standard_D2s_v3"
node_count = 1
min_count = 1
max_count = 2
enable_auto_scaling = true
}
```
#### Production (`terraform.prod.tfvars`)
```hcl
environment = "prod"
db_geo_redundant_backup = true
redis_sku_name = "Premium"
enable_monitoring_alerts = true
enable_private_cluster = true
default_node_pool = {
name = "default"
vm_size = "Standard_D8s_v3"
node_count = 3
min_count = 3
max_count = 10
enable_auto_scaling = true
}
gpu_node_pool = {
name = "gpu"
vm_size = "Standard_NC4as_T4_v3"
node_count = 2
min_count = 1
max_count = 4
enable_auto_scaling = true
}
```
---
## Deployment Steps
### Step 1: Prepare Azure Subscription
```bash
# Verify Azure CLI configuration
az account show
# Check subscription quota
az vm list-usage --location eastus
# Enable required providers
az provider register --namespace Microsoft.ContainerService
az provider register --namespace Microsoft.DBforPostgreSQL
```
### Step 2: Configure Terraform Backend
```bash
# Create resource group
az group create --name openclaw-tfstate-rg --location eastus
# Create storage account
az storage account create --name tfstateopenclaw --resource-group openclaw-tfstate-rg \
--location eastus --sku Standard_LRS
# Create container
az storage container create --name tfstate --account-name tfstateopenclaw
```
### Step 3: Initialize and Apply
```bash
# Initialize with Azure backend
terraform init \
-backend-config="resource_group_name=openclaw-tfstate-rg" \
-backend-config="storage_account_name=tfstateopenclaw" \
-backend-config="container_name=tfstate" \
-backend-config="key=openclaw/dev/terraform.tfstate"
# Plan
terraform plan -var-file=terraform.dev.tfvars -out=tfplan
# Apply
terraform apply tfplan
```
### Step 4: Verify Deployment
```bash
# Check AKS cluster
az aks show --resource-group openclaw-rg --name openclaw-dev-aks
# Check PostgreSQL server
az postgres flexible-server show --resource-group openclaw-rg --name openclaw-dev-pg
# Check Redis cache
az redis show --resource-group openclaw-rg --name openclaw-dev-redis
# Check ACR
az acr show --name openclawdevacr
```
---
## Post-Deployment
### Configure kubectl
```bash
# Get AKS credentials
az aks get-credentials --resource-group openclaw-rg --name openclaw-dev-aks
# Verify cluster access
kubectl get nodes
kubectl get namespaces
```
### Deploy OpenClaw Helm Chart
```bash
# Deploy using Helm
helm install openclaw ./charts/openclaw \
--namespace openclaw \
--create-namespace \
--values values.dev.yaml \
--set image.repository=openclawdevacr.azurecr.io/openclaw-gateway \
--set litellm.image.repository=openclawdevacr.azurecr.io/litellm-proxy
```
### Configure Secrets
```bash
# Create Kubernetes secrets
kubectl create secret generic openclaw-secrets \
--namespace openclaw \
--from-literal=database-url="postgresql://openclaw:password@openclaw-dev-pg.postgres.database.azure.com:5432/postgres" \
--from-literal=redis-url="redis://:password@openclaw-dev-redis.redis.cache.windows.net:6379" \
--from-literal=minimax-api-key="your-minimax-key" \
--from-literal=zai-api-key="your-zai-key"
```
---
## GPU Support
### Enable GPU Nodes
```hcl
# terraform.tfvars
enable_gpu_support = true
gpu_node_pool = {
name = "gpu"
vm_size = "Standard_NC4as_T4_v3"
node_count = 1
min_count = 0
max_count = 4
enable_auto_scaling = true
}
```
### Install NVIDIA Device Plugin
```bash
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml
```
---
## Monitoring
### Azure Monitor Dashboard
The deployment creates an Azure Monitor dashboard with:
- AKS cluster metrics
- Node pool metrics
- PostgreSQL metrics
- Redis metrics
- Application Gateway metrics
- Application logs
### Access Dashboard
```bash
# Open in Azure Portal
open "https://portal.azure.com/#blade/Microsoft_Azure_Monitoring/AzureMonitoringBrowseBlade"
```
---
## Backup & Recovery
### Automated Backups
| Resource | Backup Strategy | Retention |
|----------|----------------|-----------|
| PostgreSQL | Automated + Geo-redundant | 35 days |
| Redis | Persistence enabled | Manual |
| ACR | Geo-redundant (Premium) | 30 days |
| Terraform State | Blob versioning | Unlimited |
---
## Cleanup
### Destroy Infrastructure
```bash
# Delete Kubernetes resources first
kubectl delete namespace openclaw
# Destroy Terraform resources
terraform destroy -var-file=terraform.dev.tfvars
```
---
🦞 *The thought that never ends.*
+230
View File
@@ -0,0 +1,230 @@
# ==============================================================================
# Heretek OpenClaw - Azure Container Registry Configuration
# ==============================================================================
# Azure Container Registry for OpenClaw container images
# ==============================================================================
# ------------------------------------------------------------------------------
# Container Registry
# ------------------------------------------------------------------------------
resource "azurerm_container_registry" "openclaw" {
name = var.registry_name
resource_group_name = var.resource_group_name
location = var.location
sku = var.sku
admin_enabled = var.environment == "dev"
zone_redundant = var.environment == "prod" && var.sku == "Premium"
# Data endpoint (for Premium SKU)
data_endpoint_enabled = var.sku == "Premium"
# Network rules
network_rule_set {
default_action = "Deny"
ip_rule {
action = "Allow"
ip_range = "0.0.0.0/0" # Allow from AKS VNet
}
}
# Retention policy (Premium SKU only)
retention_policy_in_days = var.sku == "Premium" ? var.retention_policy_days : null
# Quarantine policy (Premium SKU only)
quarantine_policy_enabled = var.quarantine_policy_enabled && var.sku == "Premium"
tags = var.tags
}
# ------------------------------------------------------------------------------
# Registry Scope Map (for fine-grained access control)
# ------------------------------------------------------------------------------
resource "azurerm_container_registry_scope_map" "openclaw_pull" {
name = "openclaw-pull-scope"
container_registry_name = var.registry_name
resource_group_name = var.resource_group_name
actions = [
"repositories/*/pull",
]
}
resource "azurerm_container_registry_scope_map" "openclaw_push" {
name = "openclaw-push-scope"
container_registry_name = var.registry_name
resource_group_name = var.resource_group_name
actions = [
"repositories/*/pull",
"repositories/*/push",
]
}
# ------------------------------------------------------------------------------
# Registry Token (for authentication)
# ------------------------------------------------------------------------------
resource "azurerm_container_registry_token" "openclaw_pull" {
name = "openclaw-pull-token"
container_registry_name = var.registry_name
resource_group_name = var.resource_group_name
scope_map_id = azurerm_container_registry_scope_map.openclaw_pull.id
}
resource "azurerm_container_registry_token" "openclaw_push" {
name = "openclaw-push-token"
container_registry_name = var.registry_name
resource_group_name = var.resource_group_name
scope_map_id = azurerm_container_registry_scope_map.openclaw_push.id
}
# ------------------------------------------------------------------------------
# Registry Task (for automated builds)
# ------------------------------------------------------------------------------
resource "azurerm_container_registry_task" "openclaw_gateway" {
name = "build-openclaw-gateway"
container_registry_name = var.registry_name
resource_group_name = var.resource_group_name
platform {
os = "Linux"
os_version = "20.04"
architecture = "amd64"
}
agent_setting {
cpu = "4"
memory = "8"
}
step {
source_value = "https://github.com/Heretek-AI/heretek-openclaw.git"
context_path = ""
dockerfile_path = "Dockerfile"
image_names = [
"${var.registry_name}.azurecr.io/openclaw-gateway:{{.Run.ID}}",
"${var.registry_name}.azurecr.io/openclaw-gateway:latest",
]
push_enabled = true
}
enabled = false # Disabled by default, enable via CI/CD
tags = var.tags
}
resource "azurerm_container_registry_task" "litellm_proxy" {
name = "build-litellm-proxy"
container_registry_name = var.registry_name
resource_group_name = var.resource_group_name
platform {
os = "Linux"
os_version = "20.04"
architecture = "amd64"
}
agent_setting {
cpu = "2"
memory = "4"
}
step {
source_value = "https://github.com/Heretek-AI/heretek-openclaw.git"
context_path = ""
dockerfile_path = "Dockerfile.litellm"
image_names = [
"${var.registry_name}.azurecr.io/litellm-proxy:{{.Run.ID}}",
"${var.registry_name}.azurecr.io/litellm-proxy:latest",
]
push_enabled = true
}
enabled = false # Disabled by default, enable via CI/CD
tags = var.tags
}
# ------------------------------------------------------------------------------
# Registry Webhook (for CI/CD integration)
# ------------------------------------------------------------------------------
resource "azurerm_container_registry_webhook" "openclaw_gateway" {
name = "openclaw-gateway-webhook"
location = var.location
resource_group_name = var.resource_group_name
container_registry_name = var.registry_name
service_uri = var.webhook_service_uri # CI/CD endpoint
scope = "openclaw-gateway:.*"
actions = ["push"]
status = "enabled"
}
# ------------------------------------------------------------------------------
# Registry Agent Pool (for dedicated build resources)
# ------------------------------------------------------------------------------
resource "azurerm_container_registry_agent_pool" "openclaw" {
count = var.sku == "Premium" ? 1 : 0
name = "openclaw-pool"
resource_group_name = var.resource_group_name
container_registry_name = var.registry_name
sku = "Dedicated"
os_type = "Linux"
tags = var.tags
}
# ------------------------------------------------------------------------------
# Registry Diagnostic Settings
# ------------------------------------------------------------------------------
resource "azurerm_monitor_diagnostic_setting" "acr" {
name = "${var.registry_name}-diagnostics"
target_resource_id = azurerm_container_registry.openclaw.id
log_analytics_workspace_id = var.log_analytics_workspace_id
enabled_log {
category = "ContainerRegistryRepositoryEvents"
}
enabled_log {
category = "ContainerRegistryLoginEvents"
}
metric {
category = "AllMetrics"
enabled = true
}
}
# ------------------------------------------------------------------------------
# Registry Alerts
# ------------------------------------------------------------------------------
resource "azurerm_monitor_metric_alert" "acr_storage" {
count = var.environment == "prod" ? 1 : 0
name = "${var.registry_name}-storage-alert"
resource_group_name = var.resource_group_name
scopes = [azurerm_container_registry.openclaw.id]
description = "Registry storage is running low"
criteria {
metric_namespace = "Microsoft.ContainerRegistry/registries"
metric_name = "Size"
aggregation = "Average"
operator = "GreaterThan"
threshold = var.storage_threshold_gb * 1024 * 1024 * 1024 # Convert to bytes
}
severity = 3
action {
action_group_id = var.action_group_id
}
}
+298
View File
@@ -0,0 +1,298 @@
# ==============================================================================
# Heretek OpenClaw - Azure AKS Configuration
# ==============================================================================
# Azure Kubernetes Service cluster for OpenClaw
# ==============================================================================
# ------------------------------------------------------------------------------
# AKS Cluster
# ------------------------------------------------------------------------------
resource "azurerm_kubernetes_cluster" "openclaw_cluster" {
name = var.cluster_name
location = var.location
resource_group_name = var.resource_group_name
dns_prefix = var.dns_prefix
kubernetes_version = var.kubernetes_version
default_node_pool {
name = var.default_node_pool.name
vm_size = var.default_node_pool.vm_size
node_count = var.default_node_pool.node_count
min_count = var.default_node_pool.min_count
max_count = var.default_node_pool.max_count
enable_auto_scaling = var.default_node_pool.enable_auto_scaling
os_disk_size_gb = var.default_node_pool.os_disk_size_gb
type = var.default_node_pool.type
availability_zones = var.default_node_pool.availability_zones
vnet_subnet_id = var.subnet_id
}
identity {
type = "SystemAssigned"
}
# Network configuration
network_profile {
network_plugin = "azure"
load_balancer_sku = "standard"
network_policy = "calico"
dns_service_ip = "10.0.0.10"
docker_bridge_cidr = "172.17.0.1/16"
service_cidr = "10.1.0.0/16"
outbound_type = "loadBalancer"
}
# Private cluster configuration
dynamic "private_cluster_enabled" {
for_each = var.enable_private_cluster ? [1] : []
content {
enabled = var.enable_private_cluster
}
}
# Azure Policy
azure_policy_enabled = var.enable_azure_policy
# Monitoring
oms_agent {
log_analytics_workspace_id = var.log_analytics_workspace_id
}
# Workload Identity
workload_identity_enabled = var.enable_workload_identity
# Auto upgrade
auto_upgrade_channel = "stable"
# Maintenance window
maintenance_window_auto_upgrade {
frequency = "Weekly"
interval = 1
day_of_week = "Sunday"
start_time = "02:00"
duration = 4
}
maintenance_window_node_os {
frequency = "Weekly"
interval = 1
day_of_week = "Saturday"
start_time = "02:00"
duration = 4
}
tags = var.tags
lifecycle {
ignore_changes = [
default_node_pool[0].node_count
]
}
}
# ------------------------------------------------------------------------------
# System Node Pool
# ------------------------------------------------------------------------------
resource "azurerm_kubernetes_cluster_node_pool" "system" {
name = var.system_node_pool.name
kubernetes_cluster_id = azurerm_kubernetes_cluster.openclaw_cluster.id
vm_size = var.system_node_pool.vm_size
node_count = var.system_node_pool.node_count
min_count = var.system_node_pool.min_count
max_count = var.system_node_pool.max_count
enable_auto_scaling = var.system_node_pool.enable_auto_scaling
os_disk_size_gb = var.system_node_pool.os_disk_size_gb
availability_zones = var.system_node_pool.availability_zones
vnet_subnet_id = var.subnet_id
node_labels = {
"workload-type" = "system"
"environment" = var.environment
}
node_taints = [
"workload-type=system:NoSchedule"
]
tags = var.tags
}
# ------------------------------------------------------------------------------
# User Node Pools
# ------------------------------------------------------------------------------
resource "azurerm_kubernetes_cluster_node_pool" "user" {
for_each = { for pool in var.user_node_pools : pool.name => pool }
name = each.value.name
kubernetes_cluster_id = azurerm_kubernetes_cluster.openclaw_cluster.id
vm_size = each.value.vm_size
node_count = each.value.node_count
min_count = each.value.min_count
max_count = each.value.max_count
enable_auto_scaling = each.value.enable_auto_scaling
os_disk_size_gb = each.value.os_disk_size_gb
availability_zones = each.value.availability_zones
vnet_subnet_id = var.subnet_id
node_labels = {
"workload-type" = each.value.name
"environment" = var.environment
}
tags = var.tags
}
# ------------------------------------------------------------------------------
# GPU Node Pool (Optional)
# ------------------------------------------------------------------------------
resource "azurerm_kubernetes_cluster_node_pool" "gpu" {
count = var.gpu_enabled ? 1 : 0
name = var.gpu_node_pool.name
kubernetes_cluster_id = azurerm_kubernetes_cluster.openclaw_cluster.id
vm_size = var.gpu_node_pool.vm_size
node_count = var.gpu_node_pool.node_count
min_count = var.gpu_node_pool.min_count
max_count = var.gpu_node_pool.max_count
enable_auto_scaling = var.gpu_node_pool.enable_auto_scaling
os_disk_size_gb = var.gpu_node_pool.os_disk_size_gb
availability_zones = var.gpu_node_pool.availability_zones
vnet_subnet_id = var.subnet_id
node_labels = {
"workload-type" = "gpu"
"environment" = var.environment
"gpu" = "true"
}
node_taints = [
"nvidia.com/gpu=true:NoSchedule"
]
tags = var.tags
}
# ------------------------------------------------------------------------------
# AKS Role Assignments
# ------------------------------------------------------------------------------
resource "azurerm_role_assignment" "aks_vnet_contributor" {
scope = var.vnet_id
role_definition_name = "Network Contributor"
principal_id = azurerm_kubernetes_cluster.openclaw_cluster.identity[0].principal_id
}
resource "azurerm_role_assignment" "aks_acr_pull" {
scope = var.acr_id
role_definition_name = "AcrPull"
principal_id = azurerm_kubernetes_cluster.openclaw_cluster.identity[0].principal_id
}
# ------------------------------------------------------------------------------
# Azure Monitor for Containers
# ------------------------------------------------------------------------------
resource "azurerm_monitor_diagnostic_setting" "aks" {
name = "${var.cluster_name}-diagnostics"
target_resource_id = azurerm_kubernetes_cluster.openclaw_cluster.id
log_analytics_workspace_id = var.log_analytics_workspace_id
enabled_log {
category = "kube-apiserver"
}
enabled_log {
category = "kube-audit"
}
enabled_log {
category = "kube-audit-admin"
}
enabled_log {
category = "kube-controller-manager"
}
enabled_log {
category = "kube-scheduler"
}
enabled_log {
category = "cluster-autoscaler"
}
enabled_log {
category = "guard"
}
metric {
category = "AllMetrics"
enabled = true
}
}
# ------------------------------------------------------------------------------
# Kubernetes Manifest Deployments (via Helm)
# ------------------------------------------------------------------------------
resource "helm_release" "nvidia_device_plugin" {
count = var.gpu_enabled ? 1 : 0
name = "nvidia-device-plugin"
repository = "https://nvidia.github.io/k8s-device-plugin"
chart = "nvidia-device-plugin"
version = "0.14.1"
namespace = "kube-system"
set {
name = "config.map.name"
value = "nvidia-device-plugin-config"
}
}
resource "helm_release" "metrics_server" {
name = "metrics-server"
repository = "https://kubernetes-sigs.github.io/metrics-server/"
chart = "metrics-server"
version = "3.11.0"
namespace = "kube-system"
}
resource "helm_release" "cluster_autoscaler" {
count = var.enable_cluster_autoscaler ? 1 : 0
name = "cluster-autoscaler"
repository = "https://kubernetes.github.io/autoscaler"
chart = "cluster-autoscaler"
version = "9.29.0"
namespace = "kube-system"
set {
name = "cloudProvider"
value = "azure"
}
set {
name = "azureClientID"
value = azurerm_kubernetes_cluster.openclaw_cluster.identity[0].client_id
}
set {
name = "azureSubscriptionID"
value = data.azurerm_client_config.current.subscription_id
}
set {
name = "azureResourceGroup"
value = var.resource_group_name
}
set {
name = "azureClusterName"
value = var.cluster_name
}
}
@@ -0,0 +1,310 @@
# ==============================================================================
# Heretek OpenClaw - Azure Application Gateway Configuration
# ==============================================================================
# Application Gateway for OpenClaw traffic routing and SSL termination
# ==============================================================================
# ------------------------------------------------------------------------------
# Public IP for Application Gateway
# ------------------------------------------------------------------------------
resource "azurerm_public_ip" "gateway" {
name = "${var.gateway_name}-pip"
location = var.location
resource_group_name = var.resource_group_name
allocation_method = "Static"
sku = "Standard"
domain_name_label = var.domain_name_label
tags = var.tags
}
# ------------------------------------------------------------------------------
# Application Gateway
# ------------------------------------------------------------------------------
resource "azurerm_application_gateway" "openclaw" {
name = var.gateway_name
location = var.location
resource_group_name = var.resource_group_name
sku {
name = var.sku_name
tier = var.sku_name
capacity = var.capacity
}
gateway_ip_configuration {
name = "gateway-ip-config"
subnet_id = var.subnet_id
}
frontend_port {
name = "http-port"
port = 80
}
frontend_port {
name = "https-port"
port = 443
}
frontend_ip_configuration {
name = "frontend-ip-config"
public_ip_address_id = azurerm_public_ip.gateway.id
}
backend_address_pool {
name = "openclaw-gateway-pool"
}
backend_address_pool {
name = "litellm-proxy-pool"
}
backend_http_settings {
name = "gateway-http-settings"
cookie_based_affinity = "Disabled"
port = 18789
protocol = "Http"
request_timeout = 30
probe_name = "gateway-probe"
}
backend_http_settings {
name = "litellm-http-settings"
cookie_based_affinity = "Disabled"
port = 4000
protocol = "Http"
request_timeout = 60
probe_name = "litellm-probe"
}
# Health Probes
probe {
name = "gateway-probe"
protocol = "Http"
path = "/health"
interval = 30
timeout = 5
unhealthy_threshold = 3
pick_host_name_from_backend_http_settings = false
}
probe {
name = "litellm-probe"
protocol = "Http"
path = "/health"
interval = 30
timeout = 5
unhealthy_threshold = 3
pick_host_name_from_backend_http_settings = false
}
# HTTP Listener
http_listener {
name = "http-listener"
frontend_ip_configuration_name = "frontend-ip-config"
frontend_port_name = "http-port"
protocol = "Http"
}
# HTTPS Listener (if SSL certificate provided)
dynamic "http_listener" {
for_each = var.ssl_certificate_data != null ? [1] : []
content {
name = "https-listener"
frontend_ip_configuration_name = "frontend-ip-config"
frontend_port_name = "https-port"
protocol = "Https"
ssl_certificate_name = "ssl-cert"
}
}
# SSL Certificate (if provided)
dynamic "ssl_certificate" {
for_each = var.ssl_certificate_data != null ? [1] : []
content {
name = "ssl-cert"
data = var.ssl_certificate_data
password = var.ssl_certificate_password
}
}
# Request Routing Rules
request_routing_rule {
name = "http-routing-rule"
rule_type = "Basic"
http_listener_name = "http-listener"
backend_address_pool_name = "openclaw-gateway-pool"
backend_http_settings_name = "gateway-http-settings"
priority = 200
}
# HTTPS Routing Rule (if SSL enabled)
dynamic "request_routing_rule" {
for_each = var.ssl_certificate_data != null ? [1] : []
content {
name = "https-routing-rule"
rule_type = "Basic"
http_listener_name = "https-listener"
backend_address_pool_name = "openclaw-gateway-pool"
backend_http_settings_name = "gateway-http-settings"
priority = 100
}
}
# URL Path Map for path-based routing
url_path_map {
name = "url-path-map"
default_backend_address_pool_name = "openclaw-gateway-pool"
default_backend_http_settings_name = "gateway-http-settings"
path_rule {
name = "litellm-path-rule"
paths = ["/v1/*", "/litellm/*"]
backend_address_pool_name = "litellm-proxy-pool"
backend_http_settings_name = "litellm-http-settings"
}
path_rule {
name = "websocket-path-rule"
paths = ["/ws/*", "/gateway/*"]
backend_address_pool_name = "openclaw-gateway-pool"
backend_http_settings_name = "gateway-http-settings"
}
}
# HTTPS with URL Path Map
dynamic "request_routing_rule" {
for_each = var.ssl_certificate_data != null ? [1] : []
content {
name = "https-path-routing-rule"
rule_type = "PathBasedRouting"
http_listener_name = "https-listener"
url_path_map_name = "url-path-map"
priority = 150
}
}
# Autoscale configuration
autoscale_configuration {
min_capacity = var.autoscale_min_capacity
max_capacity = var.autoscale_max_capacity
}
# WAF Configuration (for WAF SKU)
dynamic "waf_configuration" {
for_each = var.sku_name == "WAF_v2" ? [1] : []
content {
enabled = true
firewall_mode = "Prevention"
rule_set_type = "OWASP"
rule_set_version = "3.2"
request_body_check = true
max_request_body_size_kb = 128
}
}
tags = var.tags
}
# ------------------------------------------------------------------------------
# Application Gateway Diagnostic Settings
# ------------------------------------------------------------------------------
resource "azurerm_monitor_diagnostic_setting" "gateway" {
name = "${var.gateway_name}-diagnostics"
target_resource_id = azurerm_application_gateway.openclaw.id
log_analytics_workspace_id = var.log_analytics_workspace_id
enabled_log {
category = "ApplicationGatewayAccessLog"
}
enabled_log {
category = "ApplicationGatewayPerformanceLog"
}
enabled_log {
category = "ApplicationGatewayFirewallLog"
}
metric {
category = "AllMetrics"
enabled = true
}
}
# ------------------------------------------------------------------------------
# Application Gateway Alerts
# ------------------------------------------------------------------------------
resource "azurerm_monitor_metric_alert" "gateway_capacity" {
count = var.environment == "prod" ? 1 : 0
name = "${var.gateway_name}-capacity-alert"
resource_group_name = var.resource_group_name
scopes = [azurerm_application_gateway.openclaw.id]
description = "Application Gateway capacity is high"
criteria {
metric_namespace = "Microsoft.Network/applicationGateways"
metric_name = "ApplicationGatewayCapacityUnits"
aggregation = "Average"
operator = "GreaterThan"
threshold = var.capacity * 0.8
}
severity = 3
action {
action_group_id = var.action_group_id
}
}
resource "azurerm_monitor_metric_alert" "gateway_response_time" {
count = var.environment == "prod" ? 1 : 0
name = "${var.gateway_name}-response-time-alert"
resource_group_name = var.resource_group_name
scopes = [azurerm_application_gateway.openclaw.id]
description = "Application Gateway response time is too high"
criteria {
metric_namespace = "Microsoft.Network/applicationGateways"
metric_name = "ApplicationGatewayTimeTaken"
aggregation = "Average"
operator = "GreaterThan"
threshold = 5000 # 5 seconds
}
severity = 3
action {
action_group_id = var.action_group_id
}
}
resource "azurerm_monitor_metric_alert" "gateway_failures" {
count = var.environment == "prod" ? 1 : 0
name = "${var.gateway_name}-failures-alert"
resource_group_name = var.resource_group_name
scopes = [azurerm_application_gateway.openclaw.id]
description = "Application Gateway backend failures are high"
criteria {
metric_namespace = "Microsoft.Network/applicationGateways"
metric_name = "ApplicationGatewayFailedBackends"
aggregation = "Total"
operator = "GreaterThan"
threshold = 10
}
severity = 2
action {
action_group_id = var.action_group_id
}
}
+393
View File
@@ -0,0 +1,393 @@
# ==============================================================================
# Heretek OpenClaw - Azure Terraform Configuration
# ==============================================================================
# Main configuration file for Azure infrastructure
# ==============================================================================
terraform {
required_version = ">= 1.6.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.24"
}
helm = {
source = "hashicorp/helm"
version = "~> 2.12"
}
}
backend "azurerm" {
# Configure backend with variables or environment
# resource_group_name = "terraform-state-rg"
# storage_account_name = "tfstatestorage"
# container_name = "tfstate"
# key = "openclaw/terraform.tfstate"
}
}
provider "azurerm" {
features {
resource_group {
prevent_deletion_if_contains_resources = false
}
key_vault {
purge_soft_delete_on_destroy = true
}
}
}
provider "kubernetes" {
host = azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].host
client_certificate = base64decode(azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].client_certificate)
client_key = base64decode(azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].client_key)
cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].cluster_ca_certificate)
}
provider "helm" {
kubernetes {
host = azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].host
client_certificate = base64decode(azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].client_certificate)
client_key = base64decode(azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].client_key)
cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].cluster_ca_certificate)
}
}
# ==============================================================================
# Data Sources
# ==============================================================================
data "azurerm_client_config" "current" {}
data "azurerm_resource_group" "main" {
name = var.resource_group_name
}
# ==============================================================================
# Local Values
# ==============================================================================
locals {
name_prefix = "openclaw-${var.environment}"
common_tags = {
project = "openclaw"
environment = var.environment
version = var.app_version
managed_by = "terraform"
}
gpu_enabled = var.enable_gpu_support
# ACR URLs
acr_urls = {
login_server = azurerm_container_registry.openclaw.login_server
gateway = "${azurerm_container_registry.openclaw.login_server}/openclaw-gateway"
litellm = "${azurerm_container_registry.openclaw.login_server}/litellm-proxy"
}
}
# ==============================================================================
# Random Resources
# ==============================================================================
resource "random_string" "suffix" {
length = 8
special = false
upper = false
}
# ==============================================================================
# Resource Group
# ==============================================================================
resource "azurerm_resource_group" "openclaw" {
name = var.resource_group_name
location = var.location
tags = local.common_tags
}
# ==============================================================================
# VNet
# ==============================================================================
module "vnet" {
source = "./vnet"
resource_group_name = azurerm_resource_group.openclaw.name
location = var.location
vnet_name = "${local.name_prefix}-vnet"
vnet_address_space = var.vnet_address_space
subnet_configs = var.subnet_configs
enable_ddos_protection = var.enable_ddos_protection
enable_flow_logs = var.enable_flow_logs
tags = local.common_tags
}
# ==============================================================================
# AKS Cluster
# ==============================================================================
module "aks" {
source = "./aks"
resource_group_name = azurerm_resource_group.openclaw.name
location = var.location
cluster_name = "${local.name_prefix}-aks"
vnet_id = module.vnet.vnet_id
subnet_id = module.vnet.aks_subnet_id
# AKS configuration
kubernetes_version = var.kubernetes_version
dns_prefix = local.name_prefix
# Node pool configuration
default_node_pool = var.default_node_pool
system_node_pool = var.system_node_pool
user_node_pools = var.user_node_pools
gpu_node_pool = var.gpu_node_pool
gpu_enabled = local.gpu_enabled
# Security
enable_private_cluster = var.enable_private_cluster
enable_azure_policy = var.enable_azure_policy
enable_workload_identity = var.enable_workload_identity
# Monitoring
enable_monitoring = true
log_analytics_workspace_id = module.monitoring.log_analytics_workspace_id
tags = local.common_tags
}
# ==============================================================================
# Azure Database for PostgreSQL
# ==============================================================================
module "postgresql" {
source = "./postgresql"
resource_group_name = azurerm_resource_group.openclaw.name
location = var.location
server_name = "${local.name_prefix}-pg"
vnet_id = module.vnet.vnet_id
subnet_id = module.vnet.database_subnet_id
# Database configuration
sku_name = var.postgresql_sku_name
storage_mb = var.postgresql_storage_mb
version = var.postgresql_version
# Authentication
administrator_login = var.db_administrator_login
administrator_password = var.db_administrator_password
# High availability
geo_redundant_backup = var.db_geo_redundant_backup
auto_grow_enabled = var.db_auto_grow_enabled
# Security
ssl_enforcement_enabled = true
ssl_minimal_tls_version_enforced = "TLS1_2"
public_network_access_enabled = false
tags = local.common_tags
}
# ==============================================================================
# Azure Cache for Redis
# ==============================================================================
module "redis" {
source = "./redis"
resource_group_name = azurerm_resource_group.openclaw.name
location = var.location
cache_name = "${local.name_prefix}-redis"
vnet_id = module.vnet.vnet_id
subnet_id = module.vnet.cache_subnet_id
# Redis configuration
capacity = var.redis_capacity
family = var.redis_family
sku_name = var.redis_sku_name
redis_version = var.redis_version
# Security
enable_non_ssl_port = false
minimum_tls_version = "1.2"
# High availability
zones = var.redis_zones
tags = local.common_tags
}
# ==============================================================================
# Azure Container Registry
# ==============================================================================
module "acr" {
source = "./acr"
resource_group_name = azurerm_resource_group.openclaw.name
location = var.location
registry_name = "${local.name_prefix}acr"
sku = var.acr_sku
# Cleanup
retention_policy_days = var.acr_retention_policy_days
quarantine_policy_enabled = var.environment == "prod"
tags = local.common_tags
}
# ==============================================================================
# Application Gateway
# ==============================================================================
module "application_gateway" {
source = "./application-gateway"
resource_group_name = azurerm_resource_group.openclaw.name
location = var.location
gateway_name = "${local.name_prefix}-agw"
vnet_id = module.vnet.vnet_id
subnet_id = module.vnet.gateway_subnet_id
# Gateway configuration
sku_name = var.gateway_sku_name
capacity = var.gateway_capacity
# SSL
ssl_certificate_key_vault_secret_id = var.ssl_certificate_key_vault_secret_id
ssl_certificate_data = var.ssl_certificate_data
# Backend pools
backend_pools = [
{
name = "openclaw-gateway"
port = 18789
probe_path = "/health"
},
{
name = "litellm-proxy"
port = 4000
probe_path = "/health"
}
]
tags = local.common_tags
}
# ==============================================================================
# Monitoring
# ==============================================================================
module "monitoring" {
source = "../terraform/modules/monitoring"
name_prefix = local.name_prefix
resource_group_name = azurerm_resource_group.openclaw.name
location = var.location
aks_cluster_id = azurerm_kubernetes_cluster.openclaw_cluster.id
postgresql_server_id = module.postgresql.server_id
redis_cache_id = module.redis.redis_cache_id
# Dashboard
enable_dashboard = true
# Alerts
enable_alerts = var.enable_monitoring_alerts
alert_email = var.alert_email
tags = local.common_tags
}
# ==============================================================================
# Key Vault (for secrets)
# ==============================================================================
resource "azurerm_key_vault" "openclaw" {
name = "${local.name_prefix}-kv"
location = var.location
resource_group_name = azurerm_resource_group.openclaw.name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "standard"
purge_protection_enabled = var.environment == "prod"
network_acls {
default_action = "Deny"
bypass = "AzureServices"
}
tags = local.common_tags
}
resource "azurerm_key_vault_secret" "db_password" {
name = "db-password"
value = var.db_administrator_password
key_vault_id = azurerm_key_vault.openclaw.id
}
resource "azurerm_key_vault_secret" "redis_password" {
name = "redis-password"
value = var.redis_password
key_vault_id = azurerm_key_vault.openclaw.id
}
# ==============================================================================
# Outputs
# ==============================================================================
output "vnet_id" {
description = "VNet ID"
value = module.vnet.vnet_id
}
output "aks_cluster_id" {
description = "AKS cluster ID"
value = azurerm_kubernetes_cluster.openclaw_cluster.id
}
output "aks_cluster_name" {
description = "AKS cluster name"
value = azurerm_kubernetes_cluster.openclaw_cluster.name
}
output "aks_fqdn" {
description = "AKS cluster FQDN"
value = azurerm_kubernetes_cluster.openclaw_cluster.fqdn
}
output "postgresql_fqdn" {
description = "PostgreSQL server FQDN"
value = module.postgresql.fqdn
}
output "redis_hostname" {
description = "Redis cache hostname"
value = module.redis.hostname
}
output "acr_login_server" {
description = "ACR login server"
value = module.acr.login_server
}
output "application_gateway_public_ip" {
description = "Application Gateway public IP"
value = module.application_gateway.public_ip
}
output "key_vault_id" {
description = "Key Vault ID"
value = azurerm_key_vault.openclaw.id
}
+305
View File
@@ -0,0 +1,305 @@
# ==============================================================================
# Heretek OpenClaw - Azure Terraform Outputs
# ==============================================================================
# Output values for Azure infrastructure
# ==============================================================================
# ------------------------------------------------------------------------------
# Resource Group Outputs
# ------------------------------------------------------------------------------
output "resource_group_name" {
description = "Resource group name"
value = azurerm_resource_group.openclaw.name
}
output "resource_group_location" {
description = "Resource group location"
value = azurerm_resource_group.openclaw.location
}
# ------------------------------------------------------------------------------
# VNet Outputs
# ------------------------------------------------------------------------------
output "vnet_id" {
description = "VNet ID"
value = module.vnet.vnet_id
}
output "vnet_name" {
description = "VNet name"
value = module.vnet.vnet_name
}
output "vnet_address_space" {
description = "VNet address space"
value = module.vnet.vnet_address_space
}
output "aks_subnet_id" {
description = "AKS subnet ID"
value = module.vnet.aks_subnet_id
}
output "database_subnet_id" {
description = "Database subnet ID"
value = module.vnet.database_subnet_id
}
output "cache_subnet_id" {
description = "Cache subnet ID"
value = module.vnet.cache_subnet_id
}
output "gateway_subnet_id" {
description = "Gateway subnet ID"
value = module.vnet.gateway_subnet_id
}
# ------------------------------------------------------------------------------
# AKS Outputs
# ------------------------------------------------------------------------------
output "aks_cluster_id" {
description = "AKS cluster ID"
value = azurerm_kubernetes_cluster.openclaw_cluster.id
}
output "aks_cluster_name" {
description = "AKS cluster name"
value = azurerm_kubernetes_cluster.openclaw_cluster.name
}
output "aks_cluster_fqdn" {
description = "AKS cluster FQDN"
value = azurerm_kubernetes_cluster.openclaw_cluster.fqdn
}
output "aks_cluster_kubernetes_version" {
description = "AKS cluster Kubernetes version"
value = azurerm_kubernetes_cluster.openclaw_cluster.kubernetes_version
}
output "aks_cluster_node_resource_group" {
description = "AKS cluster node resource group"
value = azurerm_kubernetes_cluster.openclaw_cluster.node_resource_group
}
output "aks_cluster_identity_principal_id" {
description = "AKS cluster identity principal ID"
value = azurerm_kubernetes_cluster.openclaw_cluster.identity[0].principal_id
}
output "aks_kube_config_raw" {
description = "Raw kube config"
value = azurerm_kubernetes_cluster.openclaw_cluster.kube_config_raw
sensitive = true
}
output "aks_kube_config_host" {
description = "Kube config host"
value = azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].host
sensitive = true
}
output "aks_kube_config_command" {
description = "Command to get AKS credentials"
value = "az aks get-credentials --resource-group ${azurerm_resource_group.openclaw.name} --name ${azurerm_kubernetes_cluster.openclaw_cluster.name}"
}
# ------------------------------------------------------------------------------
# PostgreSQL Outputs
# ------------------------------------------------------------------------------
output "postgresql_server_id" {
description = "PostgreSQL server ID"
value = module.postgresql.server_id
}
output "postgresql_server_name" {
description = "PostgreSQL server name"
value = module.postgresql.server_name
}
output "postgresql_fqdn" {
description = "PostgreSQL server FQDN"
value = module.postgresql.fqdn
}
output "postgresql_port" {
description = "PostgreSQL server port"
value = module.postgresql.port
}
output "postgresql_administrator_login" {
description = "PostgreSQL administrator login"
value = module.postgresql.administrator_login
sensitive = true
}
output "postgresql_connection_string" {
description = "PostgreSQL connection string"
value = "postgresql://${var.db_administrator_login}:${var.db_administrator_password}@${module.postgresql.fqdn}:5432/postgres"
sensitive = true
}
# ------------------------------------------------------------------------------
# Redis Outputs
# ------------------------------------------------------------------------------
output "redis_cache_id" {
description = "Redis cache ID"
value = module.redis.redis_cache_id
}
output "redis_cache_name" {
description = "Redis cache name"
value = module.redis.redis_cache_name
}
output "redis_hostname" {
description = "Redis cache hostname"
value = module.redis.hostname
}
output "redis_port" {
description = "Redis cache port"
value = module.redis.port
}
output "redis_connection_string" {
description = "Redis connection string"
value = "redis://${var.redis_password != null ? ":${var.redis_password}@" : ""}${module.redis.hostname}:${module.redis.port}"
sensitive = true
}
# ------------------------------------------------------------------------------
# ACR Outputs
# ------------------------------------------------------------------------------
output "acr_id" {
description = "ACR ID"
value = module.acr.acr_id
}
output "acr_name" {
description = "ACR name"
value = module.acr.acr_name
}
output "acr_login_server" {
description = "ACR login server"
value = module.acr.login_server
}
output "acr_login_server_url" {
description = "ACR login server URL"
value = "https://${module.acr.login_server}"
}
output "acr_admin_username" {
description = "ACR admin username"
value = module.acr.admin_username
}
output "acr_admin_password" {
description = "ACR admin password"
value = module.acr.admin_password
sensitive = true
}
output "acr_login_command" {
description = "ACR login command"
value = "az acr login --name ${module.acr.acr_name}"
}
# ------------------------------------------------------------------------------
# Application Gateway Outputs
# ------------------------------------------------------------------------------
output "application_gateway_id" {
description = "Application Gateway ID"
value = module.application_gateway.gateway_id
}
output "application_gateway_name" {
description = "Application Gateway name"
value = module.application_gateway.gateway_name
}
output "application_gateway_public_ip" {
description = "Application Gateway public IP"
value = module.application_gateway.public_ip
}
output "application_gateway_public_ip_id" {
description = "Application Gateway public IP ID"
value = module.application_gateway.public_ip_id
}
# ------------------------------------------------------------------------------
# Key Vault Outputs
# ------------------------------------------------------------------------------
output "key_vault_id" {
description = "Key Vault ID"
value = azurerm_key_vault.openclaw.id
}
output "key_vault_name" {
description = "Key Vault name"
value = azurerm_key_vault.openclaw.name
}
output "key_vault_uri" {
description = "Key Vault URI"
value = azurerm_key_vault.openclaw.vault_uri
}
# ------------------------------------------------------------------------------
# Monitoring Outputs
# ------------------------------------------------------------------------------
output "log_analytics_workspace_id" {
description = "Log Analytics workspace ID"
value = module.monitoring.log_analytics_workspace_id
}
output "log_analytics_workspace_name" {
description = "Log Analytics workspace name"
value = module.monitoring.log_analytics_workspace_name
}
output "application_insights_id" {
description = "Application Insights ID"
value = module.monitoring.application_insights_id
}
output "monitoring_dashboard_id" {
description = "Monitoring dashboard ID"
value = module.monitoring.dashboard_id
}
# ------------------------------------------------------------------------------
# Cost Estimation
# ------------------------------------------------------------------------------
output "estimated_monthly_cost" {
description = "Estimated monthly cost breakdown"
value = {
aks_cluster = "~$73 (cluster management)"
aks_nodes_default = "~$${var.default_node_pool.node_count * 140} (${var.default_node_pool.vm_size})"
aks_nodes_system = "~$${var.system_node_pool.node_count * 70} (${var.system_node_pool.vm_size})"
aks_nodes_compute = "~$${var.user_node_pools[0].node_count * 350} (${var.user_node_pools[0].vm_size})"
aks_nodes_gpu = local.gpu_enabled ? "~$${var.gpu_node_pool.node_count * 2500} (${var.gpu_node_pool.vm_size})" : "$0"
postgresql = "~$${var.postgresql_sku_name == "GP_Gen5_2" ? 150 : 300} (${var.postgresql_sku_name})"
redis = "~$${var.redis_sku_name == "Standard" ? 100 : 200} (${var.redis_capacity}GB)"
acr = "~$10 (Standard)"
application_gateway = "~$30 (Standard_v2)"
key_vault = "~$5"
monitoring = "~$50"
network_egress = "Variable"
total_estimate = "See Azure Pricing Calculator for accurate pricing"
}
}
+205
View File
@@ -0,0 +1,205 @@
# ==============================================================================
# Heretek OpenClaw - Azure Database for PostgreSQL Configuration
# ==============================================================================
# Azure Database for PostgreSQL Flexible Server for OpenClaw
# ==============================================================================
# ------------------------------------------------------------------------------
# PostgreSQL Flexible Server
# ------------------------------------------------------------------------------
resource "azurerm_postgresql_flexible_server" "openclaw" {
name = var.server_name
location = var.location
resource_group_name = var.resource_group_name
version = var.version
delegated_subnet_id = var.subnet_id
zone = "1"
sku_name = var.sku_name
storage_mb = var.storage_mb
storage_tier = "Premium"
administrator_login = var.administrator_login
administrator_password = var.administrator_password
backup {
backup_retention_days = var.environment == "prod" ? 35 : 7
geo_redundant_backup_enabled = var.geo_redundant_backup_enabled
}
high_availability {
mode = var.environment == "prod" ? "ZoneRedundant" : "Disabled"
standby_availability_zone = var.environment == "prod" ? "2" : null
}
maintenance_window {
day_of_week = 0
start_hour = 2
start_minute = 0
}
parameters {
name = "azure.extensions"
value = "PGVECTOR"
}
parameters {
name = "pg_stat_statements.track"
value = "all"
}
public_network_access_enabled = var.public_network_access_enabled
tags = var.tags
}
# ------------------------------------------------------------------------------
# PostgreSQL Database
# ------------------------------------------------------------------------------
resource "azurerm_postgresql_flexible_server_database" "openclaw" {
name = "openclaw"
server_id = azurerm_postgresql_flexible_server.openclaw.id
charset = "UTF8"
collation = "en_US.UTF8"
}
# ------------------------------------------------------------------------------
# PostgreSQL Firewall Rules
# ------------------------------------------------------------------------------
resource "azurerm_postgresql_flexible_server_firewall_rule" "allow_aks" {
name = "AllowAKS"
server_id = azurerm_postgresql_flexible_server.openclaw.id
start_ip_address = split("/", var.aks_subnet_cidr)[0]
end_ip_address = split("/", var.aks_subnet_cidr)[0]
}
# ------------------------------------------------------------------------------
# PostgreSQL Private DNS Zone
# ------------------------------------------------------------------------------
resource "azurerm_private_dns_zone" "postgresql" {
name = "privatelink.postgres.database.azure.com"
resource_group_name = var.resource_group_name
tags = var.tags
}
resource "azurerm_private_dns_zone_virtual_network_link" "postgresql" {
name = "postgresql-vnet-link"
resource_group_name = var.resource_group_name
private_dns_zone_name = azurerm_private_dns_zone.postgresql.name
virtual_network_id = var.vnet_id
registration_enabled = false
tags = var.tags
}
resource "azurerm_private_dns_a_record" "postgresql" {
name = azurerm_postgresql_flexible_server.openclaw.name
zone_name = azurerm_private_dns_zone.postgresql.name
resource_group_name = var.resource_group_name
ttl = 300
records = [azurerm_postgresql_flexible_server.openclaw.private_ip_address]
tags = var.tags
}
# ------------------------------------------------------------------------------
# PostgreSQL Diagnostic Settings
# ------------------------------------------------------------------------------
resource "azurerm_monitor_diagnostic_setting" "postgresql" {
name = "${var.server_name}-diagnostics"
target_resource_id = azurerm_postgresql_flexible_server.openclaw.id
log_analytics_workspace_id = var.log_analytics_workspace_id
enabled_log {
category = "PostgreSQLLogs"
}
enabled_log {
category = "QueryStoreRuntimeStatistics"
}
enabled_log {
category = "QueryStoreWaitStatistics"
}
metric {
category = "AllMetrics"
enabled = true
}
}
# ------------------------------------------------------------------------------
# PostgreSQL Alerts
# ------------------------------------------------------------------------------
resource "azurerm_monitor_metric_alert" "postgresql_cpu" {
count = var.environment == "prod" ? 1 : 0
name = "${var.server_name}-cpu-alert"
resource_group_name = var.resource_group_name
scopes = [azurerm_postgresql_flexible_server.openclaw.id]
description = "CPU utilization is too high"
criteria {
metric_namespace = "Microsoft.DBforPostgreSQL/flexibleServers"
metric_name = "cpu_percent"
aggregation = "Average"
operator = "GreaterThan"
threshold = 80
}
severity = 3
action {
action_group_id = var.action_group_id
}
}
resource "azurerm_monitor_metric_alert" "postgresql_storage" {
count = var.environment == "prod" ? 1 : 0
name = "${var.server_name}-storage-alert"
resource_group_name = var.resource_group_name
scopes = [azurerm_postgresql_flexible_server.openclaw.id]
description = "Storage utilization is too high"
criteria {
metric_namespace = "Microsoft.DBforPostgreSQL/flexibleServers"
metric_name = "storage_percent"
aggregation = "Average"
operator = "GreaterThan"
threshold = 80
}
severity = 3
action {
action_group_id = var.action_group_id
}
}
resource "azurerm_monitor_metric_alert" "postgresql_connections" {
count = var.environment == "prod" ? 1 : 0
name = "${var.server_name}-connections-alert"
resource_group_name = var.resource_group_name
scopes = [azurerm_postgresql_flexible_server.openclaw.id]
description = "Active connections is too high"
criteria {
metric_namespace = "Microsoft.DBforPostgreSQL/flexibleServers"
metric_name = "active_connections"
aggregation = "Average"
operator = "GreaterThan"
threshold = 100
}
severity = 3
action {
action_group_id = var.action_group_id
}
}
+205
View File
@@ -0,0 +1,205 @@
# ==============================================================================
# Heretek OpenClaw - Azure Cache for Redis Configuration
# ==============================================================================
# Azure Cache for Redis for OpenClaw caching and session management
# ==============================================================================
# ------------------------------------------------------------------------------
# Redis Cache
# ------------------------------------------------------------------------------
resource "azurerm_redis_cache" "openclaw" {
name = var.cache_name
location = var.location
resource_group_name = var.resource_group_name
capacity = var.capacity
family = var.family
sku_name = var.sku_name
redis_version = var.redis_version
enable_non_ssl_port = var.enable_non_ssl_port
minimum_tls_version = var.minimum_tls_version
redis_configuration {
maxmemory_reserved = var.capacity * 1024
maxmemory_delta = var.capacity * 1024
maxmemory_policy = "allkeys-lru"
notify_keyspace_events = "KEA"
}
# Private endpoint
private_endpoint {
name = "${var.cache_name}-pe"
subnet_id = var.subnet_id
}
tags = var.tags
zones = var.zones
}
# ------------------------------------------------------------------------------
# Redis Private Endpoint
# ------------------------------------------------------------------------------
resource "azurerm_private_endpoint" "redis" {
name = "${var.cache_name}-pe"
location = var.location
resource_group_name = var.resource_group_name
subnet_id = var.subnet_id
private_service_connection {
name = "${var.cache_name}-psc"
private_connection_resource_id = azurerm_redis_cache.openclaw.id
is_manual_connection = false
subresource_names = ["redisCache"]
}
private_dns_zone_group {
name = "default"
private_dns_zone_ids = [azurerm_private_dns_zone.redis.id]
}
tags = var.tags
}
# ------------------------------------------------------------------------------
# Redis Private DNS Zone
# ------------------------------------------------------------------------------
resource "azurerm_private_dns_zone" "redis" {
name = "privatelink.redis.cache.windows.net"
resource_group_name = var.resource_group_name
tags = var.tags
}
resource "azurerm_private_dns_zone_virtual_network_link" "redis" {
name = "redis-vnet-link"
resource_group_name = var.resource_group_name
private_dns_zone_name = azurerm_private_dns_zone.redis.name
virtual_network_id = var.vnet_id
registration_enabled = false
tags = var.tags
}
# ------------------------------------------------------------------------------
# Redis Firewall Rules (for Premium tier)
# ------------------------------------------------------------------------------
resource "azurerm_redis_firewall_rule" "allow_aks" {
count = var.sku_name == "Premium" ? 1 : 0
name = "AllowAKS"
redis_cache_name = azurerm_redis_cache.openclaw.name
resource_group_name = var.resource_group_name
start_ip = split("/", var.aks_subnet_cidr)[0]
end_ip = split("/", var.aks_subnet_cidr)[0]
}
# ------------------------------------------------------------------------------
# Redis Patch Schedule
# ------------------------------------------------------------------------------
resource "azurerm_redis_cache_patch_schedule" "openclaw" {
redis_cache_id = azurerm_redis_cache.openclaw.id
time_zone_name = "UTC"
maintenance_window = "03:00-05:00"
day_of_week = "Sunday"
schedule_updates_enabled = true
}
# ------------------------------------------------------------------------------
# Redis Diagnostic Settings
# ------------------------------------------------------------------------------
resource "azurerm_monitor_diagnostic_setting" "redis" {
name = "${var.cache_name}-diagnostics"
target_resource_id = azurerm_redis_cache.openclaw.id
log_analytics_workspace_id = var.log_analytics_workspace_id
enabled_log {
category = "CacheMetrics"
}
enabled_log {
category = "CacheRequests"
}
metric {
category = "AllMetrics"
enabled = true
}
}
# ------------------------------------------------------------------------------
# Redis Alerts
# ------------------------------------------------------------------------------
resource "azurerm_monitor_metric_alert" "redis_cpu" {
count = var.environment == "prod" ? 1 : 0
name = "${var.cache_name}-cpu-alert"
resource_group_name = var.resource_group_name
scopes = [azurerm_redis_cache.openclaw.id]
description = "CPU utilization is too high"
criteria {
metric_namespace = "Microsoft.Cache/Redis"
metric_name = "UsedMemoryPercentage"
aggregation = "Average"
operator = "GreaterThan"
threshold = 80
}
severity = 3
action {
action_group_id = var.action_group_id
}
}
resource "azurerm_monitor_metric_alert" "redis_connections" {
count = var.environment == "prod" ? 1 : 0
name = "${var.cache_name}-connections-alert"
resource_group_name = var.resource_group_name
scopes = [azurerm_redis_cache.openclaw.id]
description = "Connected clients is too high"
criteria {
metric_namespace = "Microsoft.Cache/Redis"
metric_name = "ConnectedClients"
aggregation = "Average"
operator = "GreaterThan"
threshold = 100
}
severity = 3
action {
action_group_id = var.action_group_id
}
}
resource "azurerm_monitor_metric_alert" "redis_timeout" {
count = var.environment == "prod" ? 1 : 0
name = "${var.cache_name}-timeout-alert"
resource_group_name = var.resource_group_name
scopes = [azurerm_redis_cache.openclaw.id]
description = "Server busy/timeout count is too high"
criteria {
metric_namespace = "Microsoft.Cache/Redis"
metric_name = "ServerBusy"
aggregation = "Average"
operator = "GreaterThan"
threshold = 10
}
severity = 2
action {
action_group_id = var.action_group_id
}
}
+370
View File
@@ -0,0 +1,370 @@
# ==============================================================================
# Heretek OpenClaw - Azure Terraform Variables
# ==============================================================================
# Input variables for Azure infrastructure
# ==============================================================================
# ------------------------------------------------------------------------------
# General Configuration
# ------------------------------------------------------------------------------
variable "resource_group_name" {
description = "Azure resource group name"
type = string
default = "openclaw-rg"
}
variable "location" {
description = "Azure region for resources"
type = string
default = "eastus"
}
variable "environment" {
description = "Deployment environment (dev, staging, prod)"
type = string
default = "dev"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be one of: dev, staging, prod."
}
}
variable "app_version" {
description = "Application version to deploy"
type = string
default = "2026.3.28"
}
# ------------------------------------------------------------------------------
# VNet Configuration
# ------------------------------------------------------------------------------
variable "vnet_address_space" {
description = "VNet address space"
type = list(string)
default = ["10.0.0.0/16"]
}
variable "subnet_configs" {
description = "Subnet configurations"
type = map(object({
name = string
address_prefixes = list(string)
}))
default = {
aks = {
name = "aks-subnet"
address_prefixes = ["10.0.1.0/24"]
}
database = {
name = "database-subnet"
address_prefixes = ["10.0.2.0/24"]
}
cache = {
name = "cache-subnet"
address_prefixes = ["10.0.3.0/24"]
}
gateway = {
name = "gateway-subnet"
address_prefixes = ["10.0.4.0/24"]
}
}
}
variable "enable_ddos_protection" {
description = "Enable DDoS protection"
type = bool
default = false
}
variable "enable_flow_logs" {
description = "Enable NSG flow logs"
type = bool
default = true
}
# ------------------------------------------------------------------------------
# AKS Configuration
# ------------------------------------------------------------------------------
variable "kubernetes_version" {
description = "Kubernetes version for AKS"
type = string
default = "1.28"
}
variable "default_node_pool" {
description = "Default node pool configuration"
type = object({
name = string
vm_size = string
node_count = number
min_count = number
max_count = number
enable_auto_scaling = bool
os_disk_size_gb = number
type = string
availability_zones = list(string)
})
default = {
name = "default"
vm_size = "Standard_D4s_v3"
node_count = 2
min_count = 1
max_count = 4
enable_auto_scaling = true
os_disk_size_gb = 100
type = "VirtualMachineScaleSets"
availability_zones = ["1", "2", "3"]
}
}
variable "system_node_pool" {
description = "System node pool configuration"
type = object({
name = string
vm_size = string
node_count = number
min_count = number
max_count = number
enable_auto_scaling = bool
os_disk_size_gb = number
availability_zones = list(string)
})
default = {
name = "system"
vm_size = "Standard_D2s_v3"
node_count = 2
min_count = 1
max_count = 3
enable_auto_scaling = true
os_disk_size_gb = 50
availability_zones = ["1", "2", "3"]
}
}
variable "user_node_pools" {
description = "User node pool configurations"
type = list(object({
name = string
vm_size = string
node_count = number
min_count = number
max_count = number
enable_auto_scaling = bool
os_disk_size_gb = number
availability_zones = list(string)
}))
default = [
{
name = "compute"
vm_size = "Standard_D8s_v3"
node_count = 2
min_count = 1
max_count = 8
enable_auto_scaling = true
os_disk_size_gb = 200
availability_zones = ["1", "2", "3"]
}
]
}
variable "enable_gpu_support" {
description = "Enable GPU node pool for Ollama"
type = bool
default = false
}
variable "gpu_node_pool" {
description = "GPU node pool configuration"
type = object({
name = string
vm_size = string
node_count = number
min_count = number
max_count = number
enable_auto_scaling = bool
os_disk_size_gb = number
availability_zones = list(string)
})
default = {
name = "gpu"
vm_size = "Standard_NC4as_T4_v3"
node_count = 1
min_count = 0
max_count = 4
enable_auto_scaling = true
os_disk_size_gb = 200
availability_zones = ["1", "2", "3"]
}
}
variable "enable_private_cluster" {
description = "Enable private AKS cluster"
type = bool
default = false
}
variable "enable_azure_policy" {
description = "Enable Azure Policy addon"
type = bool
default = true
}
variable "enable_workload_identity" {
description = "Enable Workload Identity"
type = bool
default = true
}
# ------------------------------------------------------------------------------
# Azure Database for PostgreSQL Configuration
# ------------------------------------------------------------------------------
variable "postgresql_sku_name" {
description = "PostgreSQL SKU name"
type = string
default = "GP_Gen5_2"
}
variable "postgresql_storage_mb" {
description = "PostgreSQL storage in MB"
type = number
default = 102400
}
variable "postgresql_version" {
description = "PostgreSQL version"
type = string
default = "15"
}
variable "db_administrator_login" {
description = "PostgreSQL administrator login"
type = string
default = "openclaw"
sensitive = true
}
variable "db_administrator_password" {
description = "PostgreSQL administrator password"
type = string
default = null
sensitive = true
}
variable "db_geo_redundant_backup" {
description = "Enable geo-redundant backup"
type = bool
default = false
}
variable "db_auto_grow_enabled" {
description = "Enable storage auto-grow"
type = bool
default = true
}
# ------------------------------------------------------------------------------
# Azure Cache for Redis Configuration
# ------------------------------------------------------------------------------
variable "redis_capacity" {
description = "Redis cache capacity"
type = number
default = 2
}
variable "redis_family" {
description = "Redis SKU family (C, P, E)"
type = string
default = "C"
}
variable "redis_sku_name" {
description = "Redis SKU name (Basic, Standard, Premium)"
type = string
default = "Standard"
}
variable "redis_version" {
description = "Redis version"
type = string
default = "6"
}
variable "redis_password" {
description = "Redis authentication password"
type = string
default = null
sensitive = true
}
variable "redis_zones" {
description = "Availability zones for Redis"
type = list(string)
default = ["1", "2", "3"]
}
# ------------------------------------------------------------------------------
# Azure Container Registry Configuration
# ------------------------------------------------------------------------------
variable "acr_sku" {
description = "ACR SKU (Basic, Standard, Premium)"
type = string
default = "Standard"
}
variable "acr_retention_policy_days" {
description = "ACR retention policy days"
type = number
default = 30
}
# ------------------------------------------------------------------------------
# Application Gateway Configuration
# ------------------------------------------------------------------------------
variable "gateway_sku_name" {
description = "Application Gateway SKU"
type = string
default = "Standard_v2"
}
variable "gateway_capacity" {
description = "Application Gateway capacity"
type = number
default = 2
}
variable "ssl_certificate_key_vault_secret_id" {
description = "Key Vault secret ID for SSL certificate"
type = string
default = null
}
variable "ssl_certificate_data" {
description = "Base64 encoded SSL certificate data"
type = string
default = null
sensitive = true
}
# ------------------------------------------------------------------------------
# Monitoring Configuration
# ------------------------------------------------------------------------------
variable "enable_monitoring_alerts" {
description = "Enable monitoring alerts"
type = bool
default = true
}
variable "alert_email" {
description = "Email for alert notifications"
type = string
default = null
}
+298
View File
@@ -0,0 +1,298 @@
# ==============================================================================
# Heretek OpenClaw - Azure VNet Configuration
# ==============================================================================
# Virtual Network for OpenClaw infrastructure
# ==============================================================================
# ------------------------------------------------------------------------------
# Virtual Network
# ------------------------------------------------------------------------------
resource "azurerm_virtual_network" "openclaw" {
name = var.vnet_name
location = var.location
resource_group_name = var.resource_group_name
address_space = var.vnet_address_space
dynamic "ddos_protection_plan" {
for_each = var.enable_ddos_protection ? [1] : []
content {
id = azurerm_ddos_protection_plan.openclaw[0].id
enable = true
}
}
tags = var.tags
}
# ------------------------------------------------------------------------------
# DDoS Protection Plan (Optional)
# ------------------------------------------------------------------------------
resource "azurerm_ddos_protection_plan" "openclaw" {
count = var.enable_ddos_protection ? 1 : 0
name = "${var.vnet_name}-ddos"
location = var.location
resource_group_name = var.resource_group_name
tags = var.tags
}
# ------------------------------------------------------------------------------
# Subnets
# ------------------------------------------------------------------------------
resource "azurerm_subnet" "aks" {
name = var.subnet_configs.aks.name
resource_group_name = var.resource_group_name
virtual_network_name = azurerm_virtual_network.openclaw.name
address_prefixes = var.subnet_configs.aks.address_prefixes
delegation {
name = "aks-delegation"
service_delegation {
name = "Microsoft.ContainerService/managedClusters"
actions = ["Microsoft.Network/virtualNetworks/subnets/action"]
}
}
}
resource "azurerm_subnet" "database" {
name = var.subnet_configs.database.name
resource_group_name = var.resource_group_name
virtual_network_name = azurerm_virtual_network.openclaw.name
address_prefixes = var.subnet_configs.database.address_prefixes
delegation {
name = "database-delegation"
service_delegation {
name = "Microsoft.DBforPostgreSQL/servers"
actions = ["Microsoft.Network/virtualNetworks/subnets/action"]
}
}
}
resource "azurerm_subnet" "cache" {
name = var.subnet_configs.cache.name
resource_group_name = var.resource_group_name
virtual_network_name = azurerm_virtual_network.openclaw.name
address_prefixes = var.subnet_configs.cache.address_prefixes
delegation {
name = "cache-delegation"
service_delegation {
name = "Microsoft.Cache/redis"
actions = ["Microsoft.Network/virtualNetworks/subnets/action"]
}
}
}
resource "azurerm_subnet" "gateway" {
name = var.subnet_configs.gateway.name
resource_group_name = var.resource_group_name
virtual_network_name = azurerm_virtual_network.openclaw.name
address_prefixes = var.subnet_configs.gateway.address_prefixes
delegation {
name = "gateway-delegation"
service_delegation {
name = "Microsoft.Network/applicationGateways"
actions = ["Microsoft.Network/virtualNetworks/subnets/action"]
}
}
}
# ------------------------------------------------------------------------------
# Network Security Groups
# ------------------------------------------------------------------------------
resource "azurerm_network_security_group" "aks" {
name = "${var.vnet_name}-aks-nsg"
location = var.location
resource_group_name = var.resource_group_name
tags = var.tags
}
resource "azurerm_network_security_group" "database" {
name = "${var.vnet_name}-database-nsg"
location = var.location
resource_group_name = var.resource_group_name
tags = var.tags
}
resource "azurerm_network_security_group" "cache" {
name = "${var.vnet_name}-cache-nsg"
location = var.location
resource_group_name = var.resource_group_name
tags = var.tags
}
resource "azurerm_network_security_group" "gateway" {
name = "${var.vnet_name}-gateway-nsg"
location = var.location
resource_group_name = var.resource_group_name
tags = var.tags
}
# ------------------------------------------------------------------------------
# NSG Security Rules
# ------------------------------------------------------------------------------
# AKS NSG Rules
resource "azurerm_network_security_rule" "aks_allow_inbound" {
name = "AllowInboundAKS"
priority = 100
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "6443"
source_address_prefix = "*"
destination_address_prefix = "*"
resource_group_name = var.resource_group_name
network_security_group_name = azurerm_network_security_group.aks.name
}
resource "azurerm_network_security_rule" "aks_allow_node" {
name = "AllowNodeCommunication"
priority = 101
direction = "Inbound"
access = "Allow"
protocol = "*"
source_port_range = "*"
destination_port_range = "0-65535"
source_address_prefix = azurerm_virtual_network.openclaw.address_space[0]
destination_address_prefix = azurerm_virtual_network.openclaw.address_space[0]
resource_group_name = var.resource_group_name
network_security_group_name = azurerm_network_security_group.aks.name
}
# Database NSG Rules
resource "azurerm_network_security_rule" "database_allow_postgresql" {
name = "AllowPostgreSQL"
priority = 100
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "5432"
source_address_prefix = azurerm_subnet.aks.address_prefixes[0]
destination_address_prefix = "*"
resource_group_name = var.resource_group_name
network_security_group_name = azurerm_network_security_group.database.name
}
# Cache NSG Rules
resource "azurerm_network_security_rule" "cache_allow_redis" {
name = "AllowRedis"
priority = 100
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "6379"
source_address_prefix = azurerm_subnet.aks.address_prefixes[0]
destination_address_prefix = "*"
resource_group_name = var.resource_group_name
network_security_group_name = azurerm_network_security_group.cache.name
}
# Gateway NSG Rules
resource "azurerm_network_security_rule" "gateway_allow_http" {
name = "AllowHTTP"
priority = 100
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "80"
source_address_prefix = "*"
destination_address_prefix = "*"
resource_group_name = var.resource_group_name
network_security_group_name = azurerm_network_security_group.gateway.name
}
resource "azurerm_network_security_rule" "gateway_allow_https" {
name = "AllowHTTPS"
priority = 101
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "443"
source_address_prefix = "*"
destination_address_prefix = "*"
resource_group_name = var.resource_group_name
network_security_group_name = azurerm_network_security_group.gateway.name
}
# ------------------------------------------------------------------------------
# Subnet NSG Associations
# ------------------------------------------------------------------------------
resource "azurerm_subnet_network_security_group_association" "aks" {
subnet_id = azurerm_subnet.aks.id
network_security_group_id = azurerm_network_security_group.aks.id
}
resource "azurerm_subnet_network_security_group_association" "database" {
subnet_id = azurerm_subnet.database.id
network_security_group_id = azurerm_network_security_group.database.id
}
resource "azurerm_subnet_network_security_group_association" "cache" {
subnet_id = azurerm_subnet.cache.id
network_security_group_id = azurerm_network_security_group.cache.id
}
resource "azurerm_subnet_network_security_group_association" "gateway" {
subnet_id = azurerm_subnet.gateway.id
network_security_group_id = azurerm_network_security_group.gateway.id
}
# ------------------------------------------------------------------------------
# Flow Logs (Optional)
# ------------------------------------------------------------------------------
resource "azurerm_network_watcher" "openclaw" {
count = var.enable_flow_logs ? 1 : 0
name = "${var.vnet_name}-watcher"
location = var.location
resource_group_name = var.resource_group_name
tags = var.tags
}
resource "azurerm_log_analytics_workspace" "flow_logs" {
count = var.enable_flow_logs ? 1 : 0
name = "${var.vnet_name}-flow-logs-log"
location = var.location
resource_group_name = var.resource_group_name
sku = "PerGB2018"
retention_in_days = 30
tags = var.tags
}
resource "azurerm_storage_account" "flow_logs" {
count = var.enable_flow_logs ? 1 : 0
name = "${var.vnet_name}flowlogs"
location = var.location
resource_group_name = var.resource_group_name
account_tier = "Standard"
account_replication_type = "LRS"
tags = var.tags
}
+539
View File
@@ -0,0 +1,539 @@
# GCP Deployment Guide for Heretek OpenClaw
**Version:** 1.0.0
**Last Updated:** 2026-03-31
**OpenClaw Version:** v2026.3.28
This guide provides comprehensive instructions for deploying Heretek OpenClaw on Google Cloud Platform (GCP) using Terraform Infrastructure as Code (IaC).
---
## Table of Contents
1. [Overview](#overview)
2. [Prerequisites](#prerequisites)
3. [Architecture](#architecture)
4. [Cost Estimates](#cost-estimates)
5. [Quick Start](#quick-start)
6. [Configuration](#configuration)
7. [Deployment Steps](#deployment-steps)
8. [Post-Deployment](#post-deployment)
9. [GPU Support](#gpu-support)
10. [Monitoring](#monitoring)
11. [Backup & Recovery](#backup--recovery)
12. [Troubleshooting](#troubleshooting)
---
## Overview
This Terraform configuration deploys a production-ready OpenClaw environment on GCP with:
- **GKE (Google Kubernetes Engine)** - Managed Kubernetes cluster
- **Cloud SQL PostgreSQL** - Managed PostgreSQL with pgvector support
- **Memorystore Redis** - Managed Redis for caching and sessions
- **Artifact Registry** - Private container registry
- **Cloud Load Balancing** - Traffic routing and SSL termination
- **Cloud Monitoring** - Metrics, logging, and alerting
### Components
| Component | Service | Purpose |
|-----------|---------|---------|
| Gateway | GKE | OpenClaw Gateway (port 18789) |
| LiteLLM | GKE | LLM proxy and routing (port 4000) |
| Database | Cloud SQL PostgreSQL 15 | Primary data store with pgvector |
| Cache | Memorystore Redis 7 | Session management, caching |
| Container Registry | Artifact Registry | Private image storage |
| Load Balancer | Cloud LB | HTTPS termination, routing |
| Monitoring | Cloud Monitoring | Metrics, logs, alerts |
---
## Prerequisites
### Required Tools
```bash
# Install Terraform
brew install terraform # macOS
# or download from https://www.terraform.io/downloads
# Install Google Cloud SDK
brew install --cask google-cloud-sdk # macOS
# or follow https://cloud.google.com/sdk/docs/install
# Install kubectl
brew install kubectl
# Install Helm
brew install helm
```
### GCP Account Setup
1. **GCP Project** - Active GCP project with billing enabled
2. **Service Account** - Service account with required permissions
3. **Budget Alert** - Set up billing alerts in GCP Console
### Configure GCP Credentials
```bash
# Authenticate with Google Cloud
gcloud auth login
# Set project
gcloud config set project YOUR_PROJECT_ID
# Create service account for Terraform
gcloud iam service-accounts create terraform \
--display-name "Terraform Service Account"
# Grant required permissions
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:terraform@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/editor"
# Create and download key
gcloud iam service-accounts keys create terraform-key.json \
--iam-account=terraform@YOUR_PROJECT_ID.iam.gserviceaccount.com
# Set environment variable
export GOOGLE_APPLICATION_CREDENTIALS=$(pwd)/terraform-key.json
```
### Required GCP Permissions
| Service | Required Permissions |
|---------|---------------------|
| GKE | Container Admin |
| Compute Engine | Compute Admin |
| Cloud SQL | Cloud SQL Admin |
| Memorystore | Redis Admin |
| Artifact Registry | Artifact Registry Admin |
| Cloud Load Balancing | Load Balancing Admin |
| IAM | Service Account Admin |
| Cloud Monitoring | Monitoring Admin |
| Secret Manager | Secret Manager Admin |
| Cloud KMS | KMS Admin |
### Enable Required APIs
```bash
gcloud services enable \
container.googleapis.com \
sqladmin.googleapis.com \
redis.googleapis.com \
artifactregistry.googleapis.com \
servicenetworking.googleapis.com \
monitoring.googleapis.com \
secretmanager.googleapis.com \
cloudkms.googleapis.com
```
---
## Architecture
```
┌─────────────────────────────────────────────┐
│ Google Cloud Platform │
│ us-central1 │
└─────────────────────────────────────────────┘
┌─────────────────────────────────┼─────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
│ Zone A │ │ Zone B │ │ Zone C │
│ (us-central1-a) │ │ (us-central1-b) │ │ (us-central1-c) │
│ │ │ │ │ │
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
│ │ GKE Nodes │ │ │ │ GKE Nodes │ │ │ │ GKE Nodes │ │
│ │ (General) │ │ │ │ (Compute) │ │ │ │ (GPU) │ │
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
│ │ │ │ │ │
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
│ │ Cloud SQL │ │ │ │ Memorystore │ │ │ │ Artifact │ │
│ │ Primary │ │ │ │ Redis │ │ │ │ Registry │ │
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
│ │ │
└─────────────────────────────────┼─────────────────────────────────┘
┌─────────────────────────────────┼─────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ Cloud Load Balancing │
│ (Global HTTP(S) LB) │
└─────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ Cloud Monitoring │
│ (Metrics, Logging, Alerting) │
└─────────────────────────────────────────────────────────────────────────────────────────┘
```
---
## Cost Estimates
### Development Environment
| Resource | Configuration | Monthly Cost (USD) |
|----------|--------------|-------------------|
| GKE Cluster | Autopilot/Standard | $73.00 |
| GKE Nodes | 2x n2-standard-4 | $280.00 |
| Cloud SQL | db-custom-4-15360, 100GB | $150.00 |
| Memorystore | 4GB STANDARD_HA | $150.00 |
| Cloud Load Balancer | External | $18.00 |
| Artifact Registry | 10GB | $2.50 |
| Cloud Monitoring | Standard | $5.00 |
| Network Egress | Estimated | $30.00 |
| **Total** | | **~$708.50/month** |
### Production Environment
| Resource | Configuration | Monthly Cost (USD) |
|----------|--------------|-------------------|
| GKE Cluster | Standard | $73.00 |
| GKE Nodes General | 3x n2-standard-8 | $840.00 |
| GKE Nodes Compute | 4x c2-standard-16 | $2,400.00 |
| GKE Nodes GPU | 2x g2-standard-4 (L4) | $3,000.00 |
| Cloud SQL | db-custom-8-30720, Multi-Region, 200GB | $600.00 |
| Memorystore | 16GB STANDARD_HA | $600.00 |
| Cloud Load Balancer | External | $18.00 |
| Artifact Registry | 50GB | $12.50 |
| Cloud Monitoring | Premium | $50.00 |
| Network Egress | Estimated | $150.00 |
| **Total** | | **~$7,743.50/month** |
> **Note:** GPU costs are significant. Consider using preemptible VMs or autoscaling for cost optimization.
### Cost Optimization Tips
1. **Use Committed Use Discounts** for predictable workloads (up to 57% savings)
2. **Enable GKE Autopilot** for automatic resource optimization
3. **Use Cloud SQL on-demand backups** instead of high availability for dev
4. **Right-size instances** based on actual usage
5. **Enable Cloud Monitoring budget alerts**
---
## Quick Start
### Clone Repository
```bash
git clone https://github.com/Heretek-AI/heretek-openclaw.git
cd heretek-openclaw/deploy/gcp/terraform
```
### Initialize Terraform
```bash
terraform init
```
### Create Terraform Variables File
```bash
cat > terraform.tfvars <<EOF
project_id = "your-gcp-project-id"
region = "us-central1"
environment = "dev"
vpc_cidr = "10.0.0.0/16"
db_password = "generate-secure-password"
redis_auth_string = "generate-secure-token"
# Optional: GPU support for Ollama
enable_gpu_support = false
# Optional: Custom domain
managed_domain = "openclaw.example.com"
EOF
```
### Plan and Apply
```bash
# Review the plan
terraform plan -out=tfplan
# Apply the configuration
terraform apply tfplan
```
### Configure kubectl
```bash
gcloud container clusters get-credentials openclaw-dev-gke --region us-central1
```
### Deploy OpenClaw to GKE
```bash
cd ../../kubernetes
kubectl apply -k overlays/dev
```
---
## Configuration
### Input Variables
| Variable | Description | Default | Required |
|----------|-------------|---------|----------|
| `project_id` | GCP project ID | - | Yes |
| `region` | GCP region | `us-central1` | No |
| `environment` | Environment name | `dev` | Yes |
| `vpc_cidr` | VPC CIDR block | `10.0.0.0/16` | No |
| `enable_gpu_support` | Enable GPU nodes | `false` | No |
| `db_password` | Cloud SQL password | `null` | Yes |
| `redis_auth_string` | Redis AUTH string | `null` | Yes |
| `managed_domain` | Custom domain | `null` | No |
### Environment-Specific Overrides
#### Development (`terraform.dev.tfvars`)
```hcl
environment = "dev"
db_high_availability = false
redis_tier = "BASIC"
enable_monitoring_alerts = false
node_pools = {
general = {
machine_type = "n2-standard-2"
min_count = 1
max_count = 2
initial_count = 1
}
compute = {
machine_type = "c2-standard-4"
min_count = 0
max_count = 2
initial_count = 1
}
}
```
#### Production (`terraform.prod.tfvars`)
```hcl
environment = "prod"
db_high_availability = true
redis_tier = "STANDARD_HA"
enable_monitoring_alerts = true
node_pools = {
general = {
machine_type = "n2-standard-8"
min_count = 3
max_count = 10
initial_count = 3
}
compute = {
machine_type = "c2-standard-16"
min_count = 2
max_count = 20
initial_count = 4
}
gpu = {
machine_type = "g2-standard-4"
accelerator_type = "nvidia-l4"
accelerator_count = 1
min_count = 1
max_count = 4
initial_count = 2
}
}
```
---
## Deployment Steps
### Step 1: Prepare GCP Project
```bash
# Verify gcloud configuration
gcloud config list
# Check project billing
gcloud billing accounts list
# Enable required APIs
gcloud services enable \
container.googleapis.com \
sqladmin.googleapis.com \
redis.googleapis.com \
artifactregistry.googleapis.com
```
### Step 2: Configure Terraform Backend
```bash
# Create GCS bucket for state
gsutil mb -p YOUR_PROJECT_ID -l us-central1 gs://openclaw-terraform-state
# Enable versioning
gsutil versioning set on gs://openclaw-terraform-state
# Create lock table (using Firestore)
gcloud firestore databases create --location us-central --type FIRESTORE_MODE
```
### Step 3: Initialize and Apply
```bash
# Initialize with GCS backend
terraform init \
-backend-config="bucket=openclaw-terraform-state" \
-backend-config="prefix=openclaw/dev/terraform.tfstate"
# Plan
terraform plan -var-file=terraform.dev.tfvars -out=tfplan
# Apply
terraform apply tfplan
```
### Step 4: Verify Deployment
```bash
# Check GKE cluster
gcloud container clusters describe openclaw-dev-gke --region us-central1
# Check Cloud SQL instance
gcloud sql instances describe openclaw-dev-pg
# Check Memorystore instance
gcloud redis instances describe openclaw-dev-redis --region us-central1
# Check Artifact Registry
gcloud artifacts repositories describe openclaw-dev-registry --location us-central1
```
---
## Post-Deployment
### Configure kubectl
```bash
# Get cluster credentials
gcloud container clusters get-credentials openclaw-dev-gke --region us-central1
# Verify cluster access
kubectl get nodes
kubectl get namespaces
```
### Deploy OpenClaw Helm Chart
```bash
# Deploy using Helm
helm install openclaw ./charts/openclaw \
--namespace openclaw \
--create-namespace \
--values values.dev.yaml \
--set image.repository=us-central1-docker.pkg.dev/YOUR_PROJECT_ID/openclaw-dev-registry/openclaw-gateway \
--set litellm.image.repository=us-central1-docker.pkg.dev/YOUR_PROJECT_ID/openclaw-dev-registry/litellm-proxy
```
### Configure Secrets
```bash
# Create Kubernetes secrets
kubectl create secret generic openclaw-secrets \
--namespace openclaw \
--from-literal=database-url="postgresql://openclaw:password@PRIVATE_IP:5432/openclaw" \
--from-literal=redis-url="redis://:token@MEMORystore_HOST:6379" \
--from-literal=minimax-api-key="your-minimax-key" \
--from-literal=zai-api-key="your-zai-key"
```
---
## GPU Support
### Enable GPU Nodes
```hcl
# terraform.tfvars
enable_gpu_support = true
gpu_node_pool = {
machine_type = "g2-standard-4"
accelerator_type = "nvidia-l4"
accelerator_count = 1
min_count = 0
max_count = 4
initial_count = 1
}
```
### Install NVIDIA Device Plugin
```bash
kubectl apply -f https://raw.githubusercontent.com/GoogleContainerTools/kpt-packages/master/second-party/nvidia-device-plugin/gke.yaml
```
---
## Monitoring
### Cloud Monitoring Dashboard
The deployment creates a Cloud Monitoring dashboard with:
- GKE cluster metrics
- Node pool metrics
- Cloud SQL metrics
- Memorystore metrics
- Load Balancer metrics
- Application logs
### Access Dashboard
```bash
# Open in GCP Console
open "https://console.cloud.google.com/monitoring/dashboards"
```
---
## Backup & Recovery
### Automated Backups
| Resource | Backup Strategy | Retention |
|----------|----------------|-----------|
| Cloud SQL | Automated + On-demand | 7 days |
| Memorystore | Persistence enabled | Manual |
| Artifact Registry | Lifecycle policy | 30 days |
| Terraform State | GCS versioning | Unlimited |
---
## Cleanup
### Destroy Infrastructure
```bash
# Delete Kubernetes resources first
kubectl delete namespace openclaw
# Destroy Terraform resources
terraform destroy -var-file=terraform.dev.tfvars
```
---
🦞 *The thought that never ends.*
+181
View File
@@ -0,0 +1,181 @@
# ==============================================================================
# Heretek OpenClaw - GCP Artifact Registry Configuration
# ==============================================================================
# Artifact Registry for OpenClaw container images
# ==============================================================================
# ------------------------------------------------------------------------------
# Artifact Registry Repository
# ------------------------------------------------------------------------------
resource "google_artifact_registry_repository" "openclaw" {
location = var.location
repository_id = var.repository_name
project = var.project_id
format = var.format
description = "Artifact Registry for Heretek OpenClaw container images"
# Cleanup policy
dynamic "cleanup_policy" {
for_each = var.cleanup_policy_days > 0 ? [1] : []
content {
id = "expire-old-images"
action = "DELETE"
condition {
tag_state = "UNTAGGED"
older_than = "${var.cleanup_policy_days}d"
}
}
}
# Cleanup policy for tagged images
dynamic "cleanup_policy" {
for_each = var.cleanup_policy_days > 0 ? [1] : []
content {
id = "keep-recent-tagged"
action = "DELETE"
condition {
tag_prefixes = ["latest", "main"]
count = 10
}
}
}
# Maven configuration (if needed)
maven_config {
version_policy = "VERSION_POLICY_RELEASE"
}
labels = var.tags
}
# ------------------------------------------------------------------------------
# IAM Permissions
# ------------------------------------------------------------------------------
resource "google_artifact_registry_repository_iam_member" "openclaw_reader" {
project = var.project_id
location = var.location
repository = google_artifact_registry_repository.openclaw.name
role = "roles/artifactregistry.reader"
member = "serviceAccount:${var.project_id}.svc.id.goog[openclaw/openclaw-sa]"
}
resource "google_artifact_registry_repository_iam_member" "openclaw_writer" {
project = var.project_id
location = var.location
repository = google_artifact_registry_repository.openclaw.name
role = "roles/artifactregistry.writer"
member = "serviceAccount:service-${data.google_project.project.number}@gcp-sa-artifactregistry.iam.gserviceaccount.com"
}
# ------------------------------------------------------------------------------
# Remote Repository (for caching external images)
# ------------------------------------------------------------------------------
resource "google_artifact_registry_repository" "docker_hub_cache" {
count = var.environment == "prod" ? 1 : 0
location = var.location
repository_id = "${var.repository_name}-docker-hub-cache"
project = var.project_id
format = "DOCKER"
description = "Docker Hub cache for OpenClaw"
mode = "REMOTE_REPOSITORY"
remote_repository_config {
description = "Docker Hub remote repository"
dockerhub_repository {}
}
cleanup_policy_dry_run = false
labels = var.tags
}
resource "google_artifact_registry_repository" "ghcr_cache" {
count = var.environment == "prod" ? 1 : 0
location = var.location
repository_id = "${var.repository_name}-ghcr-cache"
project = var.project_id
format = "DOCKER"
description = "GitHub Container Registry cache for OpenClaw"
mode = "REMOTE_REPOSITORY"
remote_repository_config {
description = "GitHub Container Registry remote repository"
docker_repository {
custom_repository {
uri = "ghcr.io"
}
}
}
cleanup_policy_dry_run = false
labels = var.tags
}
# ------------------------------------------------------------------------------
# Virtual Repository (for unified access)
# ------------------------------------------------------------------------------
resource "google_artifact_registry_repository" "openclaw_virtual" {
count = var.environment == "prod" ? 1 : 0
location = var.location
repository_id = "${var.repository_name}-virtual"
project = var.project_id
format = "DOCKER"
description = "Virtual repository for OpenClaw"
mode = "VIRTUAL_REPOSITORY"
virtual_repository_config {
upstream_policies {
id = "upstream-docker-hub"
repository_id = google_artifact_registry_repository.docker_hub_cache[0].repository_id
priority = 1
}
upstream_policies {
id = "upstream-ghcr"
repository_id = google_artifact_registry_repository.ghcr_cache[0].repository_id
priority = 2
}
upstream_policies {
id = "upstream-local"
repository_id = google_artifact_registry_repository.openclaw.repository_id
priority = 3
}
}
labels = var.tags
}
# ------------------------------------------------------------------------------
# KMS Key for Encryption (Optional)
# ------------------------------------------------------------------------------
resource "google_kms_key_ring" "artifact_registry" {
count = var.environment == "prod" ? 1 : 0
name = "${var.repository_name}-keyring"
project = var.project_id
location = var.location
labels = var.tags
}
resource "google_kms_crypto_key" "artifact_registry" {
count = var.environment == "prod" ? 1 : 0
name = "${var.repository_name}-key"
key_ring = google_kms_key_ring.artifact_registry[0].id
purpose = "ENCRYPT_DECRYPT"
lifecycle {
prevent_destroy = false
}
labels = var.tags
}
+234
View File
@@ -0,0 +1,234 @@
# ==============================================================================
# Heretek OpenClaw - GCP Cloud SQL Configuration
# ==============================================================================
# Cloud SQL PostgreSQL database for OpenClaw
# ==============================================================================
# ------------------------------------------------------------------------------
# Cloud SQL Instance
# ------------------------------------------------------------------------------
resource "google_sql_database_instance" "openclaw" {
name = var.instance_name
project = var.project_id
region = var.region
database_version = var.database_version
deletion_protection = var.environment == "prod"
settings {
tier = var.tier
disk_size = var.disk_size
disk_type = var.disk_type
availability_type = var.high_availability ? "REGIONAL" : "ZONAL"
# Backup configuration
backup_configuration {
enabled = var.backup_enabled
start_time = var.backup_start_time
point_in_time_recovery_enabled = var.point_in_time_recovery
transaction_log_retention_days = var.backup_enabled ? 7 : null
}
# IP configuration
ip_configuration {
ipv4_enabled = false
private_network = var.network
require_ssl = true
}
# Query insights
insights_config {
query_insights_enabled = var.query_insights_enabled
query_string_length = 1024
record_application_tags = true
record_client_address = true
}
# Maintenance
maintenance_window {
day = 1
hour = 3
update_track = "stable"
}
# Labels
user_labels = var.tags
}
depends_on = [google_service_networking_connection.private_vpc_connection]
}
# ------------------------------------------------------------------------------
# Cloud SQL Database
# ------------------------------------------------------------------------------
resource "google_sql_database" "openclaw" {
name = var.database_name
project = var.project_id
instance = google_sql_database_instance.openclaw.name
charset = "UTF8"
collation = "en_US.UTF8"
}
# ------------------------------------------------------------------------------
# Cloud SQL User
# ------------------------------------------------------------------------------
resource "google_sql_user" "openclaw" {
name = var.database_user
project = var.project_id
instance = google_sql_database_instance.openclaw.name
password = var.database_password
deletion_policy = "ABANDON"
}
# ------------------------------------------------------------------------------
# Cloud SQL Read Replica (Optional for Production)
# ------------------------------------------------------------------------------
resource "google_sql_database_instance" "openclaw_replica" {
count = var.environment == "prod" && var.high_availability ? 1 : 0
name = "${var.instance_name}-replica"
project = var.project_id
region = var.region
database_version = var.database_version
master_instance_name = google_sql_database_instance.openclaw.name
replica_configuration {
failover_target = false
}
settings {
tier = var.tier
disk_size = var.disk_size
disk_type = var.disk_type
availability_type = "ZONAL"
backup_configuration {
enabled = false
}
ip_configuration {
ipv4_enabled = false
private_network = var.network
require_ssl = true
}
user_labels = var.tags
}
depends_on = [google_service_networking_connection.private_vpc_connection]
}
# ------------------------------------------------------------------------------
# Cloud SQL Connection Pooler (Optional)
# ------------------------------------------------------------------------------
resource "google_sql_database_instance" "openclaw_pooler" {
count = var.environment == "prod" ? 1 : 0
name = "${var.instance_name}-pooler"
project = var.project_id
region = var.region
database_version = var.database_version
settings {
tier = "db-custom-2-7680"
disk_size = 20
disk_type = "PD_SSD"
availability_type = "ZONAL"
ip_configuration {
ipv4_enabled = true
require_ssl = true
authorized_networks {
name = "gke-pods"
value = google_compute_subnetwork.secondary_ranges.ip_cidr_range
}
}
user_labels = var.tags
}
}
# ------------------------------------------------------------------------------
# Secret Manager for Database Credentials
# ------------------------------------------------------------------------------
resource "google_secret_manager_secret" "db_credentials" {
secret_id = "${var.instance_name}-credentials"
project = var.project_id
labels = var.tags
replication {
auto {}
}
}
resource "google_secret_manager_secret_version" "db_credentials" {
secret = google_secret_manager_secret.db_credentials.id
secret_data = jsonencode({
username = var.database_user
password = var.database_password
database = var.database_name
host = google_sql_database_instance.openclaw.private_ip_address
port = "5432"
connection_name = google_sql_database_instance.openclaw.connection_name
})
}
# ------------------------------------------------------------------------------
# Monitoring Alerts
# ------------------------------------------------------------------------------
resource "google_monitoring_alert_policy" "cloud_sql_cpu" {
count = var.environment == "prod" ? 1 : 0
display_name = "${var.instance_name} CPU Utilization"
project = var.project_id
conditions {
display_name = "CPU utilization > 80%"
condition_threshold {
filter = "resource.type = \"cloudsql_database\" AND metric.type = \"cloudsql.googleapis.com/database/cpu/utilization\" AND resource.label.\"database_id\" = \"${google_sql_database_instance.openclaw.connection_name}\""
duration = "300s"
comparison = "COMPARISON_GT"
threshold_value = 80
aggregations {
alignment_period = "300s"
per_series_aligner = "ALIGN_MEAN"
}
}
}
notification_channels = var.alert_notification_channels
severity = "WARNING"
}
resource "google_monitoring_alert_policy" "cloud_sql_disk" {
count = var.environment == "prod" ? 1 : 0
display_name = "${var.instance_name} Disk Space"
project = var.project_id
conditions {
display_name = "Disk space < 10%"
condition_threshold {
filter = "resource.type = \"cloudsql_database\" AND metric.type = \"cloudsql.googleapis.com/database/disk/bytes_available\" AND resource.label.\"database_id\" = \"${google_sql_database_instance.openclaw.connection_name}\""
duration = "300s"
comparison = "COMPARISON_LT"
threshold_value = var.disk_size * 1024 * 1024 * 1024 * 0.1
aggregations {
alignment_period = "300s"
per_series_aligner = "ALIGN_MEAN"
}
}
}
notification_channels = var.alert_notification_channels
severity = "CRITICAL"
}
+334
View File
@@ -0,0 +1,334 @@
# ==============================================================================
# Heretek OpenClaw - GCP GKE Configuration
# ==============================================================================
# Google Kubernetes Engine cluster for OpenClaw
# ==============================================================================
# ------------------------------------------------------------------------------
# GKE Cluster
# ------------------------------------------------------------------------------
resource "google_container_cluster" "openclaw_cluster" {
name = var.cluster_name
location = var.location
project = var.project_id
# Node locations (for regional clusters)
node_locations = var.zones
# Remove default node pool
remove_default_node_pool = true
initial_node_count = 1
# Network configuration
network = var.network
subnetwork = var.subnetwork
# IP allocation policy (VPC-native cluster)
ip_allocation_policy {
cluster_secondary_range_name = var.ip_range_pods
services_secondary_range_name = var.ip_range_services
}
# Private cluster configuration
dynamic "private_cluster_config" {
for_each = var.enable_private_cluster ? [1] : []
content {
enable_private_nodes = true
enable_private_endpoint = false
master_ipv4_cidr_block = "172.16.0.0/28"
}
}
# Workload Identity
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
# Release channel
release_channel {
channel = var.gke_release_channel
}
# Kubernetes version
min_master_version = var.gke_version
# Cluster addons
addons_config {
http_load_balancing {
disabled = false
}
horizontal_pod_autoscaling {
disabled = false
}
network_policy_config {
disabled = false
}
gcp_filestore_csi_driver_config {
enabled = true
}
}
# Network policy
network_policy {
enabled = true
provider = "CALICO"
}
# Master authorized networks
master_authorized_networks_config {
cidr_blocks {
cidr_block = "0.0.0.0/0"
display_name = "all-networks"
}
}
# Logging and monitoring
logging_config {
enable_components = [
"SYSTEM_COMPONENTS",
"WORKLOADS"
]
}
monitoring_config {
enable_components = [
"SYSTEM_COMPONENTS"
]
managed_prometheus {
enabled = true
}
}
# Security
resource_labels = var.tags
lifecycle {
ignore_changes = [
node_config,
node_version
]
}
}
# ------------------------------------------------------------------------------
# General Purpose Node Pool
# ------------------------------------------------------------------------------
resource "google_container_node_pool" "general" {
name = "${var.cluster_name}-general"
location = var.location
project = var.project_id
cluster = google_container_cluster.openclaw_cluster.name
node_count = var.node_pools.general.initial_count
autoscaling {
min_node_count = var.node_pools.general.min_count
max_node_count = var.node_pools.general.max_count
}
management {
auto_repair = true
auto_upgrade = true
}
node_config {
machine_type = var.node_pools.general.machine_type
disk_size_gb = var.node_pools.general.disk_size_gb
disk_type = var.node_pools.general.disk_type
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
labels = merge(var.tags, {
workload-type = "general"
})
tags = ["openclaw-node"]
workload_metadata_config {
mode = "GKE_WORKLOAD_IDENTITY"
}
}
upgrade_settings {
max_surge = 1
max_unavailable = 0
}
lifecycle {
ignore_changes = [
node_count
]
}
}
# ------------------------------------------------------------------------------
# Compute Optimized Node Pool
# ------------------------------------------------------------------------------
resource "google_container_node_pool" "compute" {
name = "${var.cluster_name}-compute"
location = var.location
project = var.project_id
cluster = google_container_cluster.openclaw_cluster.name
node_count = var.node_pools.compute.initial_count
autoscaling {
min_node_count = var.node_pools.compute.min_count
max_node_count = var.node_pools.compute.max_count
}
management {
auto_repair = true
auto_upgrade = true
}
node_config {
machine_type = var.node_pools.compute.machine_type
disk_size_gb = var.node_pools.compute.disk_size_gb
disk_type = var.node_pools.compute.disk_type
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
labels = merge(var.tags, {
workload-type = "compute"
})
tags = ["openclaw-node"]
workload_metadata_config {
mode = "GKE_WORKLOAD_IDENTITY"
}
}
upgrade_settings {
max_surge = 1
max_unavailable = 0
}
lifecycle {
ignore_changes = [
node_count
]
}
}
# ------------------------------------------------------------------------------
# GPU Node Pool (Optional)
# ------------------------------------------------------------------------------
resource "google_container_node_pool" "gpu" {
count = var.gpu_enabled ? 1 : 0
name = "${var.cluster_name}-gpu"
location = var.location
project = var.project_id
cluster = google_container_cluster.openclaw_cluster.name
node_count = var.gpu_node_pool.initial_count
autoscaling {
min_node_count = var.gpu_node_pool.min_count
max_node_count = var.gpu_node_pool.max_count
}
management {
auto_repair = true
auto_upgrade = true
}
node_config {
machine_type = var.gpu_node_pool.machine_type
disk_size_gb = var.gpu_node_pool.disk_size_gb
disk_type = "pd-ssd"
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
labels = merge(var.tags, {
workload-type = "gpu"
gpu = "true"
})
tags = ["openclaw-gpu-node"]
guest_accelerator {
type = var.gpu_node_pool.accelerator_type
count = var.gpu_node_pool.accelerator_count
}
workload_metadata_config {
mode = "GKE_WORKLOAD_IDENTITY"
}
}
upgrade_settings {
max_surge = 1
max_unavailable = 0
}
lifecycle {
ignore_changes = [
node_count
]
}
}
# ------------------------------------------------------------------------------
# IAM for Service Account (Workload Identity)
# ------------------------------------------------------------------------------
resource "google_service_account" "openclaw" {
account_id = "${var.cluster_name}-sa"
display_name = "OpenClaw GKE Service Account"
project = var.project_id
}
resource "google_project_iam_member" "openclaw_workload_identity" {
project = var.project_id
role = "roles/workloadidentity.user"
member = "serviceAccount:${var.project_id}.svc.id.goog[openclaw/openclaw-sa]"
}
# ------------------------------------------------------------------------------
# GKE Secondary IP Ranges
# ------------------------------------------------------------------------------
resource "google_compute_subnetwork" "secondary_ranges" {
name = "${var.network}-secondary"
project = var.project_id
region = var.location
network = var.network
ip_cidr_range = "10.1.0.0/16"
secondary_ip_range {
range_name = var.ip_range_pods
ip_cidr_range = "10.2.0.0/16"
}
secondary_ip_range {
range_name = var.ip_range_services
ip_cidr_range = "10.3.0.0/16"
}
}
# ------------------------------------------------------------------------------
# GPU Plugin Installation (via Helm)
# ------------------------------------------------------------------------------
resource "helm_release" "nvidia_device_plugin" {
count = var.gpu_enabled ? 1 : 0
name = "nvidia-device-plugin"
repository = "https://nvidia.github.io/k8s-device-plugin"
chart = "nvidia-device-plugin"
version = "0.14.1"
namespace = "kube-system"
set {
name = "config.map.name"
value = "nvidia-device-plugin-config"
}
}
+255
View File
@@ -0,0 +1,255 @@
# ==============================================================================
# Heretek OpenClaw - GCP Cloud Load Balancing Configuration
# ==============================================================================
# Cloud Load Balancing for OpenClaw traffic routing
# ==============================================================================
# ------------------------------------------------------------------------------
# Serverless Network Endpoint Group (for GKE)
# ------------------------------------------------------------------------------
resource "google_compute_network_endpoint_group" "gateway_neg" {
name = "${var.name}-gateway-neg"
project = var.project_id
network_endpoint_type = "GCE_VM_IP_PORT"
network = var.network
subnetwork = var.subnet
region = var.region
dynamic "network_endpoint" {
for_each = [] # Populated by Kubernetes service
content {
instance = network_endpoint.value.instance
port = network_endpoint.value.port
}
}
labels = var.tags
}
resource "google_compute_network_endpoint_group" "litellm_neg" {
name = "${var.name}-litellm-neg"
project = var.project_id
network_endpoint_type = "GCE_VM_IP_PORT"
network = var.network
subnetwork = var.subnet
region = var.region
labels = var.tags
}
# ------------------------------------------------------------------------------
# Health Checks
# ------------------------------------------------------------------------------
resource "google_compute_health_check" "gateway" {
name = "${var.name}-gateway-health"
project = var.project_id
timeout_sec = 5
check_interval_sec = 10
healthy_threshold = 2
unhealthy_threshold = 3
http_health_check {
port = 18789
request_path = "/health"
}
}
resource "google_compute_health_check" "litellm" {
name = "${var.name}-litellm-health"
project = var.project_id
timeout_sec = 5
check_interval_sec = 10
healthy_threshold = 2
unhealthy_threshold = 3
http_health_check {
port = 4000
request_path = "/health"
}
}
# ------------------------------------------------------------------------------
# Backend Services
# ------------------------------------------------------------------------------
resource "google_compute_backend_service" "gateway" {
name = "${var.name}-gateway"
project = var.project_id
protocol = "HTTP"
port_name = "http"
timeout_sec = 30
health_checks = [google_compute_health_check.gateway.id]
load_balancing_scheme = "EXTERNAL_MANAGED"
log_config {
enable = true
sample_rate = 1.0
}
labels = var.tags
}
resource "google_compute_backend_service" "litellm" {
name = "${var.name}-litellm"
project = var.project_id
protocol = "HTTP"
port_name = "http"
timeout_sec = 60
health_checks = [google_compute_health_check.litellm.id]
load_balancing_scheme = "EXTERNAL_MANAGED"
log_config {
enable = true
sample_rate = 1.0
}
labels = var.tags
}
# ------------------------------------------------------------------------------
# URL Map
# ------------------------------------------------------------------------------
resource "google_compute_url_map" "openclaw" {
name = "${var.name}-url-map"
project = var.project_id
default_service = google_compute_backend_service.gateway.id
host_rule {
hosts = ["*"]
path_matcher = "all-paths"
}
path_matcher {
name = "all-paths"
default_service = google_compute_backend_service.gateway.id
path_rule {
paths = ["/v1/*", "/litellm/*"]
service = google_compute_backend_service.litellm.id
}
path_rule {
paths = ["/ws/*", "/gateway/*"]
service = google_compute_backend_service.gateway.id
}
}
}
# ------------------------------------------------------------------------------
# Target HTTP Proxy (Redirect to HTTPS)
# ------------------------------------------------------------------------------
resource "google_compute_target_http_proxy" "openclaw" {
name = "${var.name}-http-proxy"
project = var.project_id
url_map = google_compute_url_map.openclaw.id
}
# ------------------------------------------------------------------------------
# Target HTTPS Proxy
# ------------------------------------------------------------------------------
resource "google_compute_target_https_proxy" "openclaw" {
name = "${var.name}-https-proxy"
project = var.project_id
url_map = google_compute_url_map.openclaw.id
ssl_certificates = [google_compute_managed_ssl_certificate.openclaw[0].id]
ssl_policy = google_compute_ssl_policy.openclaw[0].id
}
# ------------------------------------------------------------------------------
# Managed SSL Certificate
# ------------------------------------------------------------------------------
resource "google_compute_managed_ssl_certificate" "openclaw" {
count = var.ssl_certificate_arn == null && var.managed_domain != null ? 1 : 0
name = "${var.name}-ssl-cert"
project = var.project_id
managed {
domains = [var.managed_domain]
}
lifecycle {
create_before_destroy = true
}
}
# ------------------------------------------------------------------------------
# SSL Policy
# ------------------------------------------------------------------------------
resource "google_compute_ssl_policy" "openclaw" {
count = var.ssl_certificate_arn == null ? 1 : 0
name = "${var.name}-ssl-policy"
project = var.project_id
min_tls_version = "TLS_1_2"
profile = "MODERN"
custom_features = []
}
# ------------------------------------------------------------------------------
# Global Forwarding Rules
# ------------------------------------------------------------------------------
resource "google_compute_global_forwarding_rule" "http" {
name = "${var.name}-http-fr"
project = var.project_id
ip_protocol = "TCP"
load_balancing_scheme = "EXTERNAL_MANAGED"
port_range = "80"
target = google_compute_target_http_proxy.openclaw.id
ip_address = google_compute_global_address.openclaw[0].address
}
resource "google_compute_global_forwarding_rule" "https" {
name = "${var.name}-https-fr"
project = var.project_id
ip_protocol = "TCP"
load_balancing_scheme = "EXTERNAL_MANAGED"
port_range = "443"
target = google_compute_target_https_proxy.openclaw.id
ip_address = google_compute_global_address.openclaw[0].address
}
# ------------------------------------------------------------------------------
# Global Static IP Address
# ------------------------------------------------------------------------------
resource "google_compute_global_address" "openclaw" {
count = var.load_balancer_ip == null ? 1 : 0
name = "${var.name}-ip"
project = var.project_id
ip_version = "IPV4"
}
# ------------------------------------------------------------------------------
# HTTP to HTTPS Redirect
# ------------------------------------------------------------------------------
resource "google_compute_url_map" "http_redirect" {
name = "${var.name}-http-redirect"
project = var.project_id
default_url_redirect {
redirect_response_code = "MOVED_PERMANENTLY_DEFAULT"
strip_query = false
https_redirect = true
}
}
+351
View File
@@ -0,0 +1,351 @@
# ==============================================================================
# Heretek OpenClaw - GCP Terraform Configuration
# ==============================================================================
# Main configuration file for GCP infrastructure
# ==============================================================================
terraform {
required_version = ">= 1.6.0"
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
google-beta = {
source = "hashicorp/google-beta"
version = "~> 5.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.24"
}
helm = {
source = "hashicorp/helm"
version = "~> 2.12"
}
}
backend "gcs" {
# Configure backend with variables or environment
# bucket = "terraform-state-bucket"
# prefix = "openclaw/terraform.tfstate"
}
}
provider "google" {
project = var.project_id
region = var.region
}
provider "google-beta" {
project = var.project_id
region = var.region
}
provider "kubernetes" {
host = "https://${google_container_cluster.openclaw_cluster.endpoint}"
token = data.google_client_config.current.access_token
cluster_ca_certificate = base64decode(google_container_cluster.openclaw_cluster.master_auth[0].cluster_ca_certificate[0])
}
provider "helm" {
kubernetes {
host = "https://${google_container_cluster.openclaw_cluster.endpoint}"
token = data.google_client_config.current.access_token
cluster_ca_certificate = base64decode(google_container_cluster.openclaw_cluster.master_auth[0].cluster_ca_certificate[0])
}
}
# ==============================================================================
# Data Sources
# ==============================================================================
data "google_client_config" "current" {}
data "google_project" "project" {
project_id = var.project_id
}
# ==============================================================================
# Local Values
# ==============================================================================
locals {
name_prefix = "openclaw-${var.environment}"
common_tags = {
project = "openclaw"
environment = var.environment
version = var.app_version
managed_by = "terraform"
}
gpu_enabled = var.enable_gpu_support
# Artifact Registry URLs
artifact_registry_urls = {
gateway = "${var.region}-docker.pkg.dev/${var.project_id}/${local.name_prefix}-registry/openclaw-gateway"
litellm = "${var.region}-docker.pkg.dev/${var.project_id}/${local.name_prefix}-registry/litellm-proxy"
}
}
# ==============================================================================
# Random Resources
# ==============================================================================
resource "random_string" "suffix" {
length = 8
special = false
upper = false
}
# ==============================================================================
# VPC Network
# ==============================================================================
module "vpc" {
source = "./vpc"
project_id = var.project_id
network_name = "${local.name_prefix}-vpc"
region = var.region
zones = var.zones
vpc_cidr = var.vpc_cidr
subnets = var.subnets
enable_flow_logs = var.enable_vpc_flow_logs
enable_private_google_access = var.enable_private_google_access
tags = local.common_tags
}
# ==============================================================================
# GKE Cluster
# ==============================================================================
module "gke" {
source = "./gke"
project_id = var.project_id
cluster_name = "${local.name_prefix}-gke"
location = var.region
zones = var.zones
network = module.vpc.network_name
subnetwork = module.vpc.subnet_name
ip_range_pods = "${local.name_prefix}-pods"
ip_range_services = "${local.name_prefix}-services"
# GKE configuration
kubernetes_version = var.gke_version
release_channel = var.gke_release_channel
# Node pool configuration
node_pools = var.node_pools
gpu_enabled = local.gpu_enabled
gpu_node_pool = var.gpu_node_pool
# Security
enable_workload_identity = var.enable_workload_identity
enable_private_cluster = var.enable_private_cluster
# Monitoring
enable_monitoring = true
enable_logging = true
tags = local.common_tags
}
# ==============================================================================
# Cloud SQL PostgreSQL
# ==============================================================================
module "cloud_sql" {
source = "./cloud-sql"
project_id = var.project_id
instance_name = "${local.name_prefix}-pg"
region = var.region
network = module.vpc.network_name
# Database configuration
database_version = var.postgresql_version
tier = var.db_tier
disk_size = var.db_disk_size
disk_type = var.db_disk_type
# Authentication
database_name = var.db_name
database_user = var.db_user
database_password = var.db_password
# High availability
high_availability = var.db_high_availability
backup_enabled = var.db_backup_enabled
backup_start_time = var.db_backup_start_time
point_in_time_recovery = var.db_point_in_time_recovery
# Insights
query_insights_enabled = var.db_query_insights_enabled
tags = local.common_tags
}
# ==============================================================================
# Memorystore Redis
# ==============================================================================
module "memorystore" {
source = "./memorystore"
project_id = var.project_id
instance_id = "${local.name_prefix}-redis"
region = var.region
network = module.vpc.network_name
# Redis configuration
tier = var.redis_tier
memory_size_gb = var.redis_memory_size_gb
redis_version = var.redis_version
# High availability
replica_count = var.redis_replica_count
read_replicas_enabled = var.redis_read_replicas_enabled
# Security
auth_enabled = var.redis_auth_enabled
auth_string = var.redis_auth_string
transit_encryption_enabled = var.redis_transit_encryption_enabled
tags = local.common_tags
}
# ==============================================================================
# Artifact Registry
# ==============================================================================
module "artifact_registry" {
source = "./artifact-registry"
project_id = var.project_id
location = var.region
repository_name = "${local.name_prefix}-registry"
format = "DOCKER"
# Cleanup policy
cleanup_policy_days = var.artifact_cleanup_policy_days
tags = local.common_tags
}
# ==============================================================================
# Cloud Load Balancing
# ==============================================================================
module "load_balancer" {
source = "./load-balancer"
project_id = var.project_id
region = var.region
network = module.vpc.network_name
subnet = module.vpc.subnet_name
# Load balancer configuration
name = "${local.name_prefix}-lb"
# Backend services
backend_services = [
{
name = "openclaw-gateway"
port = 18789
health_check_path = "/health"
},
{
name = "litellm-proxy"
port = 4000
health_check_path = "/health"
}
]
# SSL certificate
ssl_certificate_arn = var.ssl_certificate_arn
managed_domain = var.managed_domain
tags = local.common_tags
}
# ==============================================================================
# Monitoring
# ==============================================================================
module "monitoring" {
source = "../terraform/modules/monitoring"
name_prefix = local.name_prefix
project_id = var.project_id
gke_cluster_name = google_container_cluster.openclaw_cluster.name
cloud_sql_instance = module.cloud_sql.instance_name
memorystore_instance = module.memorystore.instance_id
# Dashboard
enable_dashboard = true
# Alerts
enable_alerts = var.enable_monitoring_alerts
alert_email = var.alert_email
tags = local.common_tags
}
# ==============================================================================
# Outputs
# ==============================================================================
output "network_name" {
description = "VPC network name"
value = module.vpc.network_name
}
output "subnet_name" {
description = "Subnet name"
value = module.vpc.subnet_name
}
output "gke_cluster_endpoint" {
description = "GKE cluster endpoint"
value = google_container_cluster.openclaw_cluster.endpoint
}
output "gke_cluster_name" {
description = "GKE cluster name"
value = google_container_cluster.openclaw_cluster.name
}
output "cloud_sql_connection_name" {
description = "Cloud SQL connection name"
value = module.cloud_sql.connection_name
}
output "cloud_sql_private_ip" {
description = "Cloud SQL private IP"
value = module.cloud_sql.private_ip
}
output "memorystore_host" {
description = "Memorystore Redis host"
value = module.memorystore.host
}
output "memorystore_port" {
description = "Memorystore Redis port"
value = module.memorystore.port
}
output "artifact_registry_url" {
description = "Artifact Registry URL"
value = local.artifact_registry_urls
}
output "load_balancer_ip" {
description = "Load balancer IP address"
value = module.load_balancer.ip_address
}
+177
View File
@@ -0,0 +1,177 @@
# ==============================================================================
# Heretek OpenClaw - GCP Memorystore Configuration
# ==============================================================================
# Memorystore Redis for OpenClaw caching and session management
# ==============================================================================
# ------------------------------------------------------------------------------
# Memorystore Redis Instance
# ------------------------------------------------------------------------------
resource "google_redis_instance" "openclaw" {
name = var.instance_id
project = var.project_id
region = var.region
tier = var.tier
memory_size_gb = var.memory_size_gb
redis_version = var.redis_version
# Network configuration
authorized_network = var.network
connect_mode = "PRIVATE_SERVICE_ACCESS"
# High availability
replica_count = var.replica_count
read_replicas_enabled = var.read_replicas_enabled
# Security
auth_enabled = var.auth_enabled
auth_string = var.auth_string
transit_encryption_mode = var.transit_encryption_enabled ? "SERVER_AUTHENTICATION" : "DISABLED"
# Maintenance
maintenance_policy {
weekly_maintenance_window {
day = "TUESDAY"
start_time {
hours = 3
minutes = 0
seconds = 0
nanos = 0
}
}
}
# Persistence
persistence_config {
persistence_mode = "PERSISTENCE_MODE_ENABLED"
}
# Labels
labels = var.tags
# Reserved IP range (optional)
# reserved_ip_range = "10.0.0.0/29"
}
# ------------------------------------------------------------------------------
# Memorystore Instance Configuration (Redis parameters)
# ------------------------------------------------------------------------------
resource "google_redis_instance" "openclaw_config" {
# This is merged with the main instance above
# Redis configuration parameters are set via the instance resource
}
# ------------------------------------------------------------------------------
# Secret Manager for Redis Auth
# ------------------------------------------------------------------------------
resource "google_secret_manager_secret" "redis_auth" {
count = var.auth_enabled ? 1 : 0
secret_id = "${var.instance_id}-auth"
project = var.project_id
labels = var.tags
replication {
auto {}
}
}
resource "google_secret_manager_secret_version" "redis_auth" {
count = var.auth_enabled ? 1 : 0
secret = google_secret_manager_secret.redis_auth[0].id
secret_data = var.auth_string
}
# ------------------------------------------------------------------------------
# Monitoring Alerts
# ------------------------------------------------------------------------------
resource "google_monitoring_alert_policy" "memorystore_cpu" {
count = var.environment == "prod" ? 1 : 0
display_name = "${var.instance_id} CPU Utilization"
project = var.project_id
conditions {
display_name = "CPU utilization > 80%"
condition_threshold {
filter = "resource.type = \"cloud_memorystore_instance\" AND metric.type = \"redis.googleapis.com/memory/usage\" AND resource.label.\"instance_id\" = \"${var.instance_id}\""
duration = "300s"
comparison = "COMPARISON_GT"
threshold_value = 80
aggregations {
alignment_period = "300s"
per_series_aligner = "ALIGN_MEAN"
}
}
}
notification_channels = var.alert_notification_channels
severity = "WARNING"
}
resource "google_monitoring_alert_policy" "memorystore_memory" {
count = var.environment == "prod" ? 1 : 0
display_name = "${var.instance_id} Memory Usage"
project = var.project_id
conditions {
display_name = "Memory usage > 85%"
condition_threshold {
filter = "resource.type = \"cloud_memorystore_instance\" AND metric.type = \"redis.googleapis.com/memory/usage_ratio\" AND resource.label.\"instance_id\" = \"${var.instance_id}\""
duration = "300s"
comparison = "COMPARISON_GT"
threshold_value = 0.85
aggregations {
alignment_period = "300s"
per_series_aligner = "ALIGN_MEAN"
}
}
}
notification_channels = var.alert_notification_channels
severity = "WARNING"
}
resource "google_monitoring_alert_policy" "memorystore_connections" {
count = var.environment == "prod" ? 1 : 0
display_name = "${var.instance_id} Connections"
project = var.project_id
conditions {
display_name = "Connections > 1000"
condition_threshold {
filter = "resource.type = \"cloud_memorystore_instance\" AND metric.type = \"redis.googleapis.com/network/connections\" AND resource.label.\"instance_id\" = \"${var.instance_id}\""
duration = "300s"
comparison = "COMPARISON_GT"
threshold_value = 1000
aggregations {
alignment_period = "300s"
per_series_aligner = "ALIGN_MEAN"
}
}
}
notification_channels = var.alert_notification_channels
severity = "WARNING"
}
# ------------------------------------------------------------------------------
# Memorystore Backup (Optional)
# ------------------------------------------------------------------------------
resource "google_redis_instance" "openclaw_backup" {
# Backups are managed through the persistence_config in the main instance
# Additional backup configurations can be added here
}
+224
View File
@@ -0,0 +1,224 @@
# ==============================================================================
# Heretek OpenClaw - GCP Terraform Outputs
# ==============================================================================
# Output values for GCP infrastructure
# ==============================================================================
# ------------------------------------------------------------------------------
# VPC Outputs
# ------------------------------------------------------------------------------
output "network_name" {
description = "VPC network name"
value = module.vpc.network_name
}
output "network_self_link" {
description = "VPC network self link"
value = module.vpc.network_self_link
}
output "subnet_name" {
description = "Primary subnet name"
value = module.vpc.subnet_name
}
output "subnet_self_link" {
description = "Primary subnet self link"
value = module.vpc.subnet_self_link
}
# ------------------------------------------------------------------------------
# GKE Outputs
# ------------------------------------------------------------------------------
output "gke_cluster_id" {
description = "GKE cluster ID"
value = google_container_cluster.openclaw_cluster.id
}
output "gke_cluster_name" {
description = "GKE cluster name"
value = google_container_cluster.openclaw_cluster.name
}
output "gke_cluster_endpoint" {
description = "GKE cluster endpoint"
value = google_container_cluster.openclaw_cluster.endpoint
sensitive = true
}
output "gke_cluster_ca_certificate" {
description = "GKE cluster CA certificate"
value = google_container_cluster.openclaw_cluster.master_auth[0].cluster_ca_certificate[0]
sensitive = true
}
output "gke_cluster_location" {
description = "GKE cluster location"
value = google_container_cluster.openclaw_cluster.location
}
output "gke_cluster_node_count" {
description = "GKE cluster node count"
value = google_container_cluster.openclaw_cluster.node_count
}
output "gke_cluster_node_pools" {
description = "GKE cluster node pool names"
value = google_container_cluster.openclaw_cluster.node_pools
}
output "gke_workload_identity_pool" {
description = "Workload Identity pool"
value = "${var.project_id}.svc.id.goog"
}
output "gke_kubeconfig_command" {
description = "Command to get cluster credentials"
value = "gcloud container clusters get-credentials ${google_container_cluster.openclaw_cluster.name} --region ${var.region} --project ${var.project_id}"
}
# ------------------------------------------------------------------------------
# Cloud SQL Outputs
# ------------------------------------------------------------------------------
output "cloud_sql_instance_id" {
description = "Cloud SQL instance ID"
value = module.cloud_sql.instance_id
}
output "cloud_sql_instance_name" {
description = "Cloud SQL instance name"
value = module.cloud_sql.instance_name
}
output "cloud_sql_connection_name" {
description = "Cloud SQL connection name"
value = module.cloud_sql.connection_name
}
output "cloud_sql_private_ip" {
description = "Cloud SQL private IP address"
value = module.cloud_sql.private_ip
}
output "cloud_sql_public_ip" {
description = "Cloud SQL public IP address"
value = module.cloud_sql.public_ip
}
output "cloud_sql_database_name" {
description = "Cloud SQL database name"
value = module.cloud_sql.database_name
}
output "cloud_sql_database_user" {
description = "Cloud SQL database user"
value = module.cloud_sql.database_user
sensitive = true
}
output "cloud_sql_connection_string" {
description = "PostgreSQL connection string"
value = "postgresql://${module.cloud_sql.database_user}:${var.db_password}@${module.cloud_sql.private_ip}:5432/${module.cloud_sql.database_name}"
sensitive = true
}
# ------------------------------------------------------------------------------
# Memorystore Outputs
# ------------------------------------------------------------------------------
output "memorystore_instance_id" {
description = "Memorystore instance ID"
value = module.memorystore.instance_id
}
output "memorystore_host" {
description = "Memorystore Redis host"
value = module.memorystore.host
}
output "memorystore_port" {
description = "Memorystore Redis port"
value = module.memorystore.port
}
output "memorystore_connection_string" {
description = "Redis connection string"
value = "redis://${var.redis_auth_enabled && var.redis_auth_string != null ? ":${var.redis_auth_string}@" : ""}${module.memorystore.host}:${module.memorystore.port}"
sensitive = true
}
# ------------------------------------------------------------------------------
# Artifact Registry Outputs
# ------------------------------------------------------------------------------
output "artifact_registry_name" {
description = "Artifact Registry name"
value = module.artifact_registry.repository_name
}
output "artifact_registry_url" {
description = "Artifact Registry URL"
value = local.artifact_registry_urls
}
output "artifact_registry_docker_config" {
description = "Docker configuration for Artifact Registry"
value = "gcloud auth configure-docker ${var.region}-docker.pkg.dev"
}
# ------------------------------------------------------------------------------
# Load Balancer Outputs
# ------------------------------------------------------------------------------
output "load_balancer_name" {
description = "Load balancer name"
value = module.load_balancer.name
}
output "load_balancer_ip" {
description = "Load balancer IP address"
value = module.load_balancer.ip_address
}
output "load_balancer_self_link" {
description = "Load balancer self link"
value = module.load_balancer.self_link
}
# ------------------------------------------------------------------------------
# Monitoring Outputs
# ------------------------------------------------------------------------------
output "monitoring_dashboard_id" {
description = "Cloud Monitoring dashboard ID"
value = module.monitoring.dashboard_id
}
output "monitoring_alert_policies" {
description = "List of alert policy IDs"
value = module.monitoring.alert_policy_ids
}
# ------------------------------------------------------------------------------
# Cost Estimation
# ------------------------------------------------------------------------------
output "estimated_monthly_cost" {
description = "Estimated monthly cost breakdown"
value = {
gke_cluster = "~$73 (cluster management fee)"
gke_nodes_general = "~$${var.node_pools.general.initial_count * 140} (${var.node_pools.general.machine_type})"
gke_nodes_compute = "~$${var.node_pools.compute.initial_count * 300} (${var.node_pools.compute.machine_type})"
gke_nodes_gpu = local.gpu_enabled ? "~$${var.gpu_node_pool.initial_count * 1500} (${var.gpu_node_pool.machine_type})" : "$0"
cloud_sql = "~$${var.db_high_availability ? 300 : 150} (${var.db_tier})"
memorystore = "~$${var.redis_tier == "STANDARD_HA" ? 150 : 75} (${var.redis_memory_size_gb}GB)"
load_balancer = "~$18"
artifact_registry = "~$5 (storage)"
cloud_monitoring = "~$50"
network_egress = "Variable"
total_estimate = "See GCP Pricing Calculator for accurate pricing"
}
}
+365
View File
@@ -0,0 +1,365 @@
# ==============================================================================
# Heretek OpenClaw - GCP Terraform Variables
# ==============================================================================
# Input variables for GCP infrastructure
# ==============================================================================
# ------------------------------------------------------------------------------
# General Configuration
# ------------------------------------------------------------------------------
variable "project_id" {
description = "GCP project ID"
type = string
}
variable "region" {
description = "GCP region for resources"
type = string
default = "us-central1"
}
variable "zones" {
description = "GCP zones for regional distribution"
type = list(string)
default = ["us-central1-a", "us-central1-b", "us-central1-c"]
}
variable "environment" {
description = "Deployment environment (dev, staging, prod)"
type = string
default = "dev"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be one of: dev, staging, prod."
}
}
variable "app_version" {
description = "Application version to deploy"
type = string
default = "2026.3.28"
}
# ------------------------------------------------------------------------------
# VPC Configuration
# ------------------------------------------------------------------------------
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
default = "10.0.0.0/16"
}
variable "subnets" {
description = "Subnet configurations"
type = list(object({
name = string
ip_cidr_range = string
region = string
}))
default = [
{
name = "openclaw-subnet-1"
ip_cidr_range = "10.0.1.0/24"
region = "us-central1"
},
{
name = "openclaw-subnet-2"
ip_cidr_range = "10.0.2.0/24"
region = "us-central1"
},
{
name = "openclaw-subnet-3"
ip_cidr_range = "10.0.3.0/24"
region = "us-central1"
}
]
}
variable "enable_vpc_flow_logs" {
description = "Enable VPC Flow Logs"
type = bool
default = true
}
variable "enable_private_google_access" {
description = "Enable Private Google Access"
type = bool
default = true
}
# ------------------------------------------------------------------------------
# GKE Configuration
# ------------------------------------------------------------------------------
variable "gke_version" {
description = "GKE Kubernetes version"
type = string
default = "1.28"
}
variable "gke_release_channel" {
description = "GKE release channel (regular, rapid, stable)"
type = string
default = "regular"
validation {
condition = contains(["regular", "rapid", "stable"], var.gke_release_channel)
error_message = "Release channel must be one of: regular, rapid, stable."
}
}
variable "node_pools" {
description = "GKE node pool configurations"
type = object({
general = object({
machine_type = string
min_count = number
max_count = number
initial_count = number
disk_size_gb = number
disk_type = string
})
compute = object({
machine_type = string
min_count = number
max_count = number
initial_count = number
disk_size_gb = number
disk_type = string
})
})
default = {
general = {
machine_type = "n2-standard-4"
min_count = 1
max_count = 4
initial_count = 2
disk_size_gb = 100
disk_type = "pd-ssd"
}
compute = {
machine_type = "c2-standard-8"
min_count = 1
max_count = 8
initial_count = 2
disk_size_gb = 200
disk_type = "pd-ssd"
}
}
}
variable "enable_gpu_support" {
description = "Enable GPU node pool for Ollama"
type = bool
default = false
}
variable "gpu_node_pool" {
description = "GPU node pool configuration"
type = object({
machine_type = string
accelerator_type = string
accelerator_count = number
min_count = number
max_count = number
initial_count = number
disk_size_gb = number
})
default = {
machine_type = "g2-standard-4"
accelerator_type = "nvidia-l4"
accelerator_count = 1
min_count = 0
max_count = 4
initial_count = 1
disk_size_gb = 200
}
}
variable "enable_workload_identity" {
description = "Enable Workload Identity"
type = bool
default = true
}
variable "enable_private_cluster" {
description = "Enable private GKE cluster"
type = bool
default = true
}
# ------------------------------------------------------------------------------
# Cloud SQL PostgreSQL Configuration
# ------------------------------------------------------------------------------
variable "postgresql_version" {
description = "PostgreSQL version"
type = string
default = "POSTGRES_15"
}
variable "db_tier" {
description = "Cloud SQL tier"
type = string
default = "db-custom-4-15360"
}
variable "db_disk_size" {
description = "Database disk size in GB"
type = number
default = 100
}
variable "db_disk_type" {
description = "Database disk type"
type = string
default = "PD_SSD"
}
variable "db_name" {
description = "Database name"
type = string
default = "openclaw"
}
variable "db_user" {
description = "Database username"
type = string
default = "openclaw"
sensitive = true
}
variable "db_password" {
description = "Database password"
type = string
default = null
sensitive = true
}
variable "db_high_availability" {
description = "Enable high availability"
type = bool
default = false
}
variable "db_backup_enabled" {
description = "Enable automated backups"
type = bool
default = true
}
variable "db_backup_start_time" {
description = "Backup start time (HH:MM)"
type = string
default = "03:00"
}
variable "db_point_in_time_recovery" {
description = "Enable point-in-time recovery"
type = bool
default = false
}
variable "db_query_insights_enabled" {
description = "Enable Query Insights"
type = bool
default = true
}
# ------------------------------------------------------------------------------
# Memorystore Redis Configuration
# ------------------------------------------------------------------------------
variable "redis_tier" {
description = "Memorystore tier (BASIC, STANDARD_HA)"
type = string
default = "STANDARD_HA"
}
variable "redis_memory_size_gb" {
description = "Redis memory size in GB"
type = number
default = 4
}
variable "redis_version" {
description = "Redis version"
type = string
default = "REDIS_7_0"
}
variable "redis_replica_count" {
description = "Number of read replicas"
type = number
default = 0
}
variable "redis_read_replicas_enabled" {
description = "Enable read replicas"
type = bool
default = false
}
variable "redis_auth_enabled" {
description = "Enable Redis AUTH"
type = bool
default = true
}
variable "redis_auth_string" {
description = "Redis AUTH string"
type = string
default = null
sensitive = true
}
variable "redis_transit_encryption_enabled" {
description = "Enable transit encryption"
type = bool
default = true
}
# ------------------------------------------------------------------------------
# Artifact Registry Configuration
# ------------------------------------------------------------------------------
variable "artifact_cleanup_policy_days" {
description = "Days to retain images in Artifact Registry"
type = number
default = 30
}
# ------------------------------------------------------------------------------
# Load Balancer Configuration
# ------------------------------------------------------------------------------
variable "ssl_certificate_arn" {
description = "SSL certificate manager certificate"
type = string
default = null
}
variable "managed_domain" {
description = "Domain for managed SSL certificate"
type = string
default = null
}
# ------------------------------------------------------------------------------
# Monitoring Configuration
# ------------------------------------------------------------------------------
variable "enable_monitoring_alerts" {
description = "Enable monitoring alerts"
type = bool
default = true
}
variable "alert_email" {
description = "Email for alert notifications"
type = string
default = null
}
+187
View File
@@ -0,0 +1,187 @@
# ==============================================================================
# Heretek OpenClaw - GCP VPC Configuration
# ==============================================================================
# VPC network module for OpenClaw infrastructure
# ==============================================================================
# ------------------------------------------------------------------------------
# VPC Network
# ------------------------------------------------------------------------------
resource "google_compute_network" "openclaw" {
name = var.network_name
project = var.project_id
auto_create_subnetworks = false
routing_mode = "REGIONAL"
delete_default_routes_on_create = false
tags = var.tags
}
# ------------------------------------------------------------------------------
# Subnets
# ------------------------------------------------------------------------------
resource "google_compute_subnetwork" "openclaw" {
count = length(var.subnets)
name = var.subnets[count.index].name
project = var.project_id
region = var.subnets[count.index].region
network = google_compute_network.openclaw.id
ip_cidr_range = var.subnets[count.index].ip_cidr_range
private_ip_google_access = var.enable_private_google_access
dynamic "log_config" {
for_each = var.enable_vpc_flow_logs ? [1] : []
content {
aggregation_interval = "INTERVAL_5_SEC"
flow_sampling = 0.5
metadata = "INCLUDE_ALL_METADATA"
}
}
tags = var.tags
}
# ------------------------------------------------------------------------------
# Firewall Rules
# ------------------------------------------------------------------------------
# Allow internal communication
resource "google_compute_firewall" "allow_internal" {
name = "${var.network_name}-allow-internal"
project = var.project_id
network = google_compute_network.openclaw.name
allow {
protocol = "tcp"
ports = ["0-65535"]
}
allow {
protocol = "udp"
ports = ["0-65535"]
}
allow {
protocol = "icmp"
}
source_ranges = [
var.vpc_cidr,
]
tags = var.tags
}
# Allow health checks from Google Cloud health check systems
resource "google_compute_firewall" "allow_health_checks" {
name = "${var.network_name}-allow-health-checks"
project = var.project_id
network = google_compute_network.openclaw.name
allow {
protocol = "tcp"
ports = ["0-65535"]
}
source_ranges = [
"35.191.0.0/16",
"130.211.0.0/22",
]
target_tags = ["openclaw"]
tags = var.tags
}
# Allow IAP (Identity-Aware Proxy) connections
resource "google_compute_firewall" "allow_iap" {
name = "${var.network_name}-allow-iap"
project = var.project_id
network = google_compute_network.openclaw.name
allow {
protocol = "tcp"
ports = ["22", "3389", "443"]
}
source_ranges = [
"35.235.240.0/20",
]
tags = var.tags
}
# ------------------------------------------------------------------------------
# Cloud NAT
# ------------------------------------------------------------------------------
resource "google_compute_router" "openclaw" {
count = length(var.subnets)
name = "${var.network_name}-router-${var.subnets[count.index].region}"
project = var.project_id
region = var.subnets[count.index].region
network = google_compute_network.openclaw.id
tags = var.tags
}
resource "google_compute_router_nat" "openclaw" {
count = length(var.subnets)
name = "${var.network_name}-nat-${var.subnets[count.index].region}"
project = var.project_id
router = google_compute_router.openclaw[count.index].name
region = var.subnets[count.index].region
nat_ip_allocate_option = "AUTO_ONLY"
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
log_config {
enable = true
filter = "ERRORS_ONLY"
}
tags = var.tags
}
# ------------------------------------------------------------------------------
# Private Service Connection (for Cloud SQL, Memorystore)
# ------------------------------------------------------------------------------
resource "google_compute_global_address" "private_ip_alloc" {
name = "${var.network_name}-private-ip-alloc"
project = var.project_id
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = google_compute_network.openclaw.id
labels = var.tags
}
resource "google_service_networking_connection" "private_vpc_connection" {
network = google_compute_network.openclaw.id
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = [google_compute_global_address.private_ip_alloc.name]
deletion_policy = "ABANDON"
}
# ------------------------------------------------------------------------------
# Routes (if needed)
# ------------------------------------------------------------------------------
# Default route to internet via NAT
resource "google_compute_route" "default_internet" {
name = "${var.network_name}-default-internet"
project = var.project_id
network = google_compute_network.openclaw.name
dest_range = "0.0.0.0/0"
next_hop_gateway = "default-internet-gateway"
tags = var.tags
}
@@ -0,0 +1,156 @@
# ==============================================================================
# Heretek OpenClaw - LiteLLM Proxy Deployment
# ==============================================================================
# Base deployment configuration for LiteLLM proxy
# ==============================================================================
apiVersion: apps/v1
kind: Deployment
metadata:
name: litellm
namespace: openclaw
labels:
app.kubernetes.io/name: litellm
app.kubernetes.io/component: proxy
app.kubernetes.io/part-of: openclaw
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: litellm
template:
metadata:
labels:
app.kubernetes.io/name: litellm
app.kubernetes.io/component: proxy
app.kubernetes.io/part-of: openclaw
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: litellm
image: ghcr.io/berriai/litellm:main-latest
imagePullPolicy: IfNotPresent
command:
- "litellm"
- "--config"
- "/app/config.yaml"
- "--port"
- "4000"
ports:
- name: http
containerPort: 4000
protocol: TCP
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: database-url
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: redis-url
- name: LITELLM_MASTER_KEY
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: litellm-master-key
optional: true
- name: LITELLM_SALT_KEY
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: litellm-salt-key
optional: true
- name: LANGFUSE_PUBLIC_KEY
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: langfuse-public-key
optional: true
- name: LANGFUSE_SECRET_KEY
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: langfuse-secret-key
optional: true
- name: PROXY_COST_TRACKING
value: "True"
- name: PROXY_METRICS_ENABLED
value: "True"
- name: LITELLM_LOG_LEVEL
value: "INFO"
resources:
requests:
cpu: "1000m"
memory: "2Gi"
limits:
cpu: "2000m"
memory: "4Gi"
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
capabilities:
drop:
- ALL
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
volumeMounts:
- name: litellm-config
mountPath: /app/config.yaml
subPath: config.yaml
- name: tmp
mountPath: /tmp
volumes:
- name: litellm-config
configMap:
name: litellm-config
- name: tmp
emptyDir: {}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: litellm-config
namespace: openclaw
labels:
app.kubernetes.io/name: litellm
data:
config.yaml: |
model_list:
- model_name: minimax
litellm_params:
model: minimax/minimax-abab6
api_key: os.environ/MINIMAX_API_KEY
- model_name: zai
litellm_params:
model: zai/glm-4
api_key: os.environ/ZAI_API_KEY
- model_name: ollama
litellm_params:
model: ollama/llama2
api_base: http://ollama:11434
litellm_settings:
set_verbose: true
drop_params: true
max_tokens: 4096
request_timeout: 600
num_retries: 2
@@ -0,0 +1,48 @@
# ==============================================================================
# Heretek OpenClaw - LiteLLM Proxy Service
# ==============================================================================
# Base service configuration for LiteLLM proxy
# ==============================================================================
apiVersion: v1
kind: Service
metadata:
name: litellm
namespace: openclaw
labels:
app.kubernetes.io/name: litellm
app.kubernetes.io/component: proxy
app.kubernetes.io/part-of: openclaw
spec:
type: ClusterIP
ports:
- name: http
port: 4000
targetPort: http
protocol: TCP
selector:
app.kubernetes.io/name: litellm
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: litellm
namespace: openclaw
labels:
app.kubernetes.io/name: litellm
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
spec:
rules:
- host: litellm.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: litellm
port:
number: 4000
+15
View File
@@ -0,0 +1,15 @@
# ==============================================================================
# Heretek OpenClaw - Kubernetes Namespace
# ==============================================================================
# Base namespace configuration for OpenClaw deployment
# ==============================================================================
apiVersion: v1
kind: Namespace
metadata:
name: openclaw
labels:
app.kubernetes.io/name: openclaw
app.kubernetes.io/part-of: openclaw
app.kubernetes.io/managed-by: kustomize
name: openclaw
@@ -0,0 +1,145 @@
# ==============================================================================
# Heretek OpenClaw - OpenClaw Gateway Deployment
# ==============================================================================
# Base deployment configuration for OpenClaw Gateway
# ==============================================================================
apiVersion: apps/v1
kind: Deployment
metadata:
name: openclaw-gateway
namespace: openclaw
labels:
app.kubernetes.io/name: openclaw-gateway
app.kubernetes.io/component: gateway
app.kubernetes.io/part-of: openclaw
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: openclaw-gateway
template:
metadata:
labels:
app.kubernetes.io/name: openclaw-gateway
app.kubernetes.io/component: gateway
app.kubernetes.io/part-of: openclaw
spec:
serviceAccountName: openclaw
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: gateway
image: heretek/openclaw-gateway:2026.3.28
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 18789
protocol: TCP
- name: ws
containerPort: 18790
protocol: TCP
env:
- name: NODE_ENV
value: "production"
- name: PORT
value: "18789"
- name: OPENCLAW_DIR
value: "/app/.openclaw"
- name: LITELLM_URL
value: "http://litellm:4000"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: database-url
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: redis-url
- name: MINIMAX_API_KEY
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: minimax-api-key
optional: true
- name: ZAI_API_KEY
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: zai-api-key
optional: true
resources:
requests:
cpu: "2000m"
memory: "4Gi"
limits:
cpu: "4000m"
memory: "8Gi"
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
volumeMounts:
- name: openclaw-data
mountPath: /app/.openclaw
- name: tmp
mountPath: /tmp
volumes:
- name: openclaw-data
persistentVolumeClaim:
claimName: openclaw-data-pvc
- name: tmp
emptyDir: {}
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: openclaw-gateway
topologyKey: kubernetes.io/hostname
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: openclaw-data-pvc
namespace: openclaw
labels:
app.kubernetes.io/name: openclaw-gateway
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: openclaw
namespace: openclaw
labels:
app.kubernetes.io/name: openclaw-gateway
@@ -0,0 +1,60 @@
# ==============================================================================
# Heretek OpenClaw - OpenClaw Gateway Service
# ==============================================================================
# Base service configuration for OpenClaw Gateway
# ==============================================================================
apiVersion: v1
kind: Service
metadata:
name: openclaw-gateway
namespace: openclaw
labels:
app.kubernetes.io/name: openclaw-gateway
app.kubernetes.io/component: gateway
app.kubernetes.io/part-of: openclaw
spec:
type: ClusterIP
ports:
- name: http
port: 18789
targetPort: http
protocol: TCP
- name: ws
port: 18790
targetPort: ws
protocol: TCP
selector:
app.kubernetes.io/name: openclaw-gateway
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: openclaw-gateway
namespace: openclaw
labels:
app.kubernetes.io/name: openclaw-gateway
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/websocket-services: "openclaw-gateway"
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
spec:
rules:
- host: openclaw.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: openclaw-gateway
port:
number: 18789
- path: /ws
pathType: Prefix
backend:
service:
name: openclaw-gateway
port:
number: 18790
@@ -0,0 +1,145 @@
# ==============================================================================
# Heretek OpenClaw - PostgreSQL StatefulSet
# ==============================================================================
# Base StatefulSet configuration for PostgreSQL with pgvector
# ==============================================================================
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgresql
namespace: openclaw
labels:
app.kubernetes.io/name: postgresql
app.kubernetes.io/component: database
app.kubernetes.io/part-of: openclaw
spec:
serviceName: postgresql
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: postgresql
template:
metadata:
labels:
app.kubernetes.io/name: postgresql
app.kubernetes.io/component: database
app.kubernetes.io/part-of: openclaw
spec:
securityContext:
runAsNonRoot: true
runAsUser: 999
fsGroup: 999
containers:
- name: postgresql
image: pgvector/pgvector:pg17
imagePullPolicy: IfNotPresent
ports:
- name: postgres
containerPort: 5432
protocol: TCP
env:
- name: POSTGRES_USER
value: "openclaw"
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: postgres-password
- name: POSTGRES_DB
value: "openclaw"
- name: PGDATA
value: "/var/lib/postgresql/data/pgdata"
resources:
requests:
cpu: "1000m"
memory: "2Gi"
limits:
cpu: "2000m"
memory: "4Gi"
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
livenessProbe:
exec:
command:
- pg_isready
- -U
- openclaw
- -d
- openclaw
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
exec:
command:
- pg_isready
- -U
- openclaw
- -d
- openclaw
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
volumeMounts:
- name: postgresql-data
mountPath: /var/lib/postgresql/data
- name: postgresql-init
mountPath: /docker-entrypoint-initdb.d
volumes:
- name: postgresql-init
configMap:
name: postgresql-init
volumeClaimTemplates:
- metadata:
name: postgresql-data
labels:
app.kubernetes.io/name: postgresql
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
name: postgresql-init
namespace: openclaw
labels:
app.kubernetes.io/name: postgresql
data:
init-pgvector.sql: |
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create OpenClaw database schema
CREATE SCHEMA IF NOT EXISTS openclaw;
-- Grant permissions
GRANT ALL PRIVILEGES ON SCHEMA openclaw TO openclaw;
GRANT ALL PRIVILEGES ON DATABASE openclaw TO openclaw;
---
apiVersion: v1
kind: Service
metadata:
name: postgresql
namespace: openclaw
labels:
app.kubernetes.io/name: postgresql
app.kubernetes.io/component: database
spec:
type: ClusterIP
ports:
- name: postgres
port: 5432
targetPort: postgres
protocol: TCP
selector:
app.kubernetes.io/name: postgresql
@@ -0,0 +1,146 @@
# ==============================================================================
# Heretek OpenClaw - Redis StatefulSet
# ==============================================================================
# Base StatefulSet configuration for Redis cache
# ==============================================================================
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
namespace: openclaw
labels:
app.kubernetes.io/name: redis
app.kubernetes.io/component: cache
app.kubernetes.io/part-of: openclaw
spec:
serviceName: redis
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: redis
template:
metadata:
labels:
app.kubernetes.io/name: redis
app.kubernetes.io/component: cache
app.kubernetes.io/part-of: openclaw
spec:
securityContext:
runAsNonRoot: true
runAsUser: 999
fsGroup: 999
containers:
- name: redis
image: redis:7-alpine
imagePullPolicy: IfNotPresent
command:
- redis-server
- /etc/redis/redis.conf
- --requirepass
- $(REDIS_PASSWORD)
ports:
- name: redis
containerPort: 6379
protocol: TCP
env:
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: redis-password
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
livenessProbe:
exec:
command:
- redis-cli
- -a
- $(REDIS_PASSWORD)
- ping
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
exec:
command:
- redis-cli
- -a
- $(REDIS_PASSWORD)
- ping
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
volumeMounts:
- name: redis-data
mountPath: /data
- name: redis-config
mountPath: /etc/redis
volumes:
- name: redis-config
configMap:
name: redis-config
volumeClaimTemplates:
- metadata:
name: redis-data
labels:
app.kubernetes.io/name: redis
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-config
namespace: openclaw
labels:
app.kubernetes.io/name: redis
data:
redis.conf: |
# Redis Configuration for OpenClaw
bind 0.0.0.0
port 6379
protected-mode yes
appendonly yes
appendfsync everysec
maxmemory 256mb
maxmemory-policy allkeys-lru
tcp-keepalive 60
timeout 300
slowlog-log-slower-than 10000
slowlog-max-len 128
loglevel notice
---
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: openclaw
labels:
app.kubernetes.io/name: redis
app.kubernetes.io/component: cache
spec:
type: ClusterIP
ports:
- name: redis
port: 6379
targetPort: redis
protocol: TCP
selector:
app.kubernetes.io/name: redis
@@ -0,0 +1,127 @@
# ==============================================================================
# Heretek OpenClaw - Development Overlay
# ==============================================================================
# Kustomization overlay for development environment
# ==============================================================================
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: openclaw-dev
namePrefix: dev-
resources:
- ../../base
commonLabels:
environment: dev
# Development-specific patches
patches:
# Reduce Gateway replicas for dev
- target:
kind: Deployment
name: openclaw-gateway
patch: |-
- op: replace
path: /spec/replicas
value: 1
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: "500m"
- op: replace
path: /spec/template/spec/containers/0/resources/requests/memory
value: "1Gi"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "1000m"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "2Gi"
# Reduce LiteLLM replicas for dev
- target:
kind: Deployment
name: litellm
patch: |-
- op: replace
path: /spec/replicas
value: 1
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: "250m"
- op: replace
path: /spec/template/spec/containers/0/resources/requests/memory
value: "512Mi"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "500m"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "1Gi"
# Reduce PostgreSQL storage for dev
- target:
kind: StatefulSet
name: postgresql
patch: |-
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: "500m"
- op: replace
path: /spec/template/spec/containers/0/resources/requests/memory
value: "1Gi"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "1000m"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "2Gi"
# Reduce Redis storage for dev
- target:
kind: StatefulSet
name: redis
patch: |-
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: "100m"
- op: replace
path: /spec/template/spec/containers/0/resources/requests/memory
value: "128Mi"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "250m"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "256Mi"
# ConfigMapGenerator for environment-specific configuration
configMapGenerator:
- name: openclaw-config
literals:
- ENVIRONMENT=dev
- LOG_LEVEL=debug
- ENABLE_PROFILING=true
# SecretGenerator for development secrets
secretGenerator:
- name: openclaw-secrets
literals:
- database-url=postgresql://openclaw:devpassword@dev-postgresql:5432/openclaw
- redis-url=redis://:devredis@dev-redis:6379/0
- postgres-password=devpassword
- redis-password=devredis
- litellm-master-key=dev-master-key-change-in-production
- litellm-salt-key=dev-salt-key-change-in-production
- minimax-api-key=your-minimax-api-key
- zai-api-key=your-zai-api-key
behavior: replace
# Image overrides for development
images:
- name: heretek/openclaw-gateway
newTag: 2026.3.28
- name: ghcr.io/berriai/litellm
newTag: main-latest
@@ -0,0 +1,170 @@
# ==============================================================================
# Heretek OpenClaw - Production Overlay
# ==============================================================================
# Kustomization overlay for production environment
# ==============================================================================
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: openclaw-prod
namePrefix: prod-
resources:
- ../../base
commonLabels:
environment: prod
# Production-specific patches
patches:
# Gateway configuration for production with HA
- target:
kind: Deployment
name: openclaw-gateway
patch: |-
- op: replace
path: /spec/replicas
value: 3
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: "2000m"
- op: replace
path: /spec/template/spec/containers/0/resources/requests/memory
value: "4Gi"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "4000m"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "8Gi"
# Add PodDisruptionBudget for Gateway
- target:
kind: Deployment
name: openclaw-gateway
patch: |-
apiVersion: apps/v1
kind: Deployment
metadata:
name: openclaw-gateway
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
# LiteLLM configuration for production with HA
- target:
kind: Deployment
name: litellm
patch: |-
- op: replace
path: /spec/replicas
value: 3
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: "1000m"
- op: replace
path: /spec/template/spec/containers/0/resources/requests/memory
value: "2Gi"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "2000m"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "4Gi"
# PostgreSQL configuration for production
- target:
kind: StatefulSet
name: postgresql
patch: |-
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: "2000m"
- op: replace
path: /spec/template/spec/containers/0/resources/requests/memory
value: "4Gi"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "4000m"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "8Gi"
# Redis configuration for production
- target:
kind: StatefulSet
name: redis
patch: |-
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: "500m"
- op: replace
path: /spec/template/spec/containers/0/resources/requests/memory
value: "512Mi"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "1000m"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "1Gi"
# ConfigMapGenerator for production configuration
configMapGenerator:
- name: openclaw-config
literals:
- ENVIRONMENT=prod
- LOG_LEVEL=warn
- ENABLE_PROFILING=false
- ENABLE_DEBUG=false
# SecretGenerator for production secrets
# IMPORTANT: Replace these with actual secrets from your secrets manager
secretGenerator:
- name: openclaw-secrets
literals:
- database-url=postgresql://openclaw:PRODUCTION_PASSWORD_REPLACE_ME@prod-postgresql:5432/openclaw
- redis-url=redis://:PRODUCTION_REDIS_REPLACE_ME@prod-redis:6379/0
- postgres-password=PRODUCTION_PASSWORD_REPLACE_ME
- redis-password=PRODUCTION_REDIS_REPLACE_ME
- litellm-master-key=PRODUCTION_MASTER_KEY_REPLACE_ME
- litellm-salt-key=PRODUCTION_SALT_KEY_REPLACE_ME
- minimax-api-key=your-minimax-api-key
- zai-api-key=your-zai-api-key
behavior: replace
# Image overrides for production
images:
- name: heretek/openclaw-gateway
newTag: 2026.3.28
digest: sha256:replace-with-actual-digest
- name: ghcr.io/berriai/litellm
newTag: main-v1.0.0
digest: sha256:replace-with-actual-digest
# Production-specific additional resources
patchesStrategicMerge:
- |-
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: openclaw-gateway-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app.kubernetes.io/name: openclaw-gateway
- |-
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: litellm-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app.kubernetes.io/name: litellm
@@ -0,0 +1,127 @@
# ==============================================================================
# Heretek OpenClaw - Staging Overlay
# ==============================================================================
# Kustomization overlay for staging environment
# ==============================================================================
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: openclaw-staging
namePrefix: staging-
resources:
- ../../base
commonLabels:
environment: staging
# Staging-specific patches
patches:
# Gateway configuration for staging
- target:
kind: Deployment
name: openclaw-gateway
patch: |-
- op: replace
path: /spec/replicas
value: 2
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: "1000m"
- op: replace
path: /spec/template/spec/containers/0/resources/requests/memory
value: "2Gi"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "2000m"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "4Gi"
# LiteLLM configuration for staging
- target:
kind: Deployment
name: litellm
patch: |-
- op: replace
path: /spec/replicas
value: 2
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: "500m"
- op: replace
path: /spec/template/spec/containers/0/resources/requests/memory
value: "1Gi"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "1000m"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "2Gi"
# PostgreSQL configuration for staging
- target:
kind: StatefulSet
name: postgresql
patch: |-
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: "1000m"
- op: replace
path: /spec/template/spec/containers/0/resources/requests/memory
value: "2Gi"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "2000m"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "4Gi"
# Redis configuration for staging
- target:
kind: StatefulSet
name: redis
patch: |-
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: "250m"
- op: replace
path: /spec/template/spec/containers/0/resources/requests/memory
value: "256Mi"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "500m"
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "512Mi"
# ConfigMapGenerator for staging configuration
configMapGenerator:
- name: openclaw-config
literals:
- ENVIRONMENT=staging
- LOG_LEVEL=info
- ENABLE_PROFILING=false
# SecretGenerator for staging secrets
secretGenerator:
- name: openclaw-secrets
literals:
- database-url=postgresql://openclaw:staging-password-change-me@staging-postgresql:5432/openclaw
- redis-url=redis://:staging-redis-change-me@staging-redis:6379/0
- postgres-password=staging-password-change-me
- redis-password=staging-redis-change-me
- litellm-master-key=staging-master-key-change-in-production
- litellm-salt-key=staging-salt-key-change-in-production
- minimax-api-key=your-minimax-api-key
- zai-api-key=your-zai-api-key
behavior: replace
# Image overrides for staging
images:
- name: heretek/openclaw-gateway
newTag: 2026.3.28
- name: ghcr.io/berriai/litellm
newTag: main-latest
+419
View File
@@ -0,0 +1,419 @@
# ==============================================================================
# Heretek OpenClaw - LiteLLM Terraform Module
# ==============================================================================
# Reusable module for LiteLLM proxy deployment
# ==============================================================================
# ------------------------------------------------------------------------------
# Module Variables
# ------------------------------------------------------------------------------
variable "name" {
description = "Name prefix for resources"
type = string
}
variable "environment" {
description = "Environment name"
type = string
}
variable "tags" {
description = "Tags to apply to resources"
type = map(string)
default = {}
}
variable "image" {
description = "LiteLLM container image"
type = object({
repository = string
tag = string
pull_policy = optional(string, "IfNotPresent")
})
default = {
repository = "ghcr.io/berriai/litellm"
tag = "main-latest"
}
}
variable "replicas" {
description = "Number of replicas"
type = number
default = 1
}
variable "port" {
description = "Service port"
type = number
default = 4000
}
variable "resources" {
description = "Container resources"
type = object({
requests = object({
cpu = string
memory = string
})
limits = object({
cpu = string
memory = string
})
})
default = {
requests = {
cpu = "1000m"
memory = "2Gi"
}
limits = {
cpu = "2000m"
memory = "4Gi"
}
}
}
variable "database" {
description = "Database configuration for LiteLLM"
type = object({
host = string
port = number
name = string
username = string
password = string
ssl_mode = optional(string, "require")
})
}
variable "redis" {
description = "Redis configuration for LiteLLM"
type = object({
host = string
port = number
password = optional(string)
db = optional(number, 0)
})
default = {
host = "localhost"
port = 6379
}
}
variable "config" {
description = "LiteLLM configuration"
type = object({
master_key = optional(string)
master_key_secret = optional(string)
cost_tracking = optional(bool, true)
metrics_enabled = optional(bool, true)
log_level = optional(string, "INFO")
ui_enabled = optional(bool, true)
spend_tracking = optional(bool, true)
})
default = {
cost_tracking = true
metrics_enabled = true
log_level = "INFO"
ui_enabled = true
}
}
variable "providers" {
description = "LLM provider configurations"
type = list(object({
name = string
provider = string
api_key = optional(string)
api_base = optional(string)
models = list(object({
model_name = string
litellm_model = string
}))
}))
default = []
}
variable "autoscaling" {
description = "Autoscaling configuration"
type = object({
enabled = optional(bool, false)
min_replicas = optional(number, 1)
max_replicas = optional(number, 10)
target_cpu_percent = optional(number, 80)
target_memory_percent = optional(number, 80)
})
default = {
enabled = false
}
}
variable "ingress" {
description = "Ingress configuration"
type = object({
enabled = optional(bool, false)
class_name = optional(string, "nginx")
hosts = optional(list(string), [])
tls = optional(list(object({
secret_name = string
hosts = list(string)
})), [])
annotations = optional(map(string), {})
})
default = {
enabled = false
}
}
variable "monitoring" {
description = "Monitoring configuration"
type = object({
enabled = optional(bool, true)
service_monitor = optional(bool, false)
prometheus_rules = optional(bool, false)
})
default = {
enabled = true
}
}
variable "security" {
description = "Security configuration"
type = object({
pod_security_context = optional(object({
run_as_non_root = optional(bool, true)
run_as_user = optional(number, 1000)
fs_group = optional(number, 1000)
}))
container_security_context = optional(object({
allow_privilege_escalation = optional(bool, false)
read_only_root_filesystem = optional(bool, true)
capabilities = optional(object({
drop = optional(list(string), ["ALL"])
}))
}))
})
default = {
pod_security_context = {
run_as_non_root = true
run_as_user = 1000
}
container_security_context = {
allow_privilege_escalation = false
read_only_root_filesystem = true
}
}
}
# ------------------------------------------------------------------------------
# Local Values
# ------------------------------------------------------------------------------
locals {
common_labels = merge(var.tags, {
"app.kubernetes.io/name" = "litellm"
"app.kubernetes.io/component" = "proxy"
"app.kubernetes.io/part-of" = "openclaw"
"app.kubernetes.io/managed-by" = "terraform"
})
database_url = "postgresql://${var.database.username}:${var.database.password}@${var.database.host}:${var.database.port}/${var.database.name}"
redis_url = var.redis.password != null ? "redis://:${var.redis.password}@${var.redis.host}:${var.redis.port}/${var.redis.db}" : "redis://${var.redis.host}:${var.redis.port}/${var.redis.db}"
}
# ------------------------------------------------------------------------------
# Kubernetes Resources (when used with Kubernetes provider)
# ------------------------------------------------------------------------------
# Deployment
resource "kubernetes_deployment" "litellm" {
count = var.environment == "module" ? 1 : 0 # Only when used with Kubernetes provider
metadata {
name = "${var.name}-litellm"
namespace = var.namespace
labels = local.common_labels
}
spec {
replicas = var.replicas
selector {
match_labels = {
"app.kubernetes.io/name" = "litellm"
}
}
template {
metadata {
labels = merge(local.common_labels, {
"app.kubernetes.io/name" = "litellm"
})
}
spec {
container {
name = "litellm"
image = "${var.image.repository}:${var.image.tag}"
ports {
container_port = var.port
}
env {
name = "DATABASE_URL"
value = local.database_url
}
env {
name = "REDIS_URL"
value = local.redis_url
}
env {
name = "LITELLM_MASTER_KEY"
value = var.config.master_key
}
env {
name = "LITELLM_LOG_LEVEL"
value = var.config.log_level
}
env {
name = "PROXY_COST_TRACKING"
value = var.config.cost_tracking ? "True" : "False"
}
resources {
requests = var.resources.requests
limits = var.resources.limits
}
}
dynamic "security_context" {
for_each = var.security.pod_security_context != null ? [1] : []
content {
run_as_non_root = var.security.pod_security_context.run_as_non_root
run_as_user = var.security.pod_security_context.run_as_user
fs_group = var.security.pod_security_context.fs_group
}
}
}
}
}
}
# Service
resource "kubernetes_service" "litellm" {
count = var.environment == "module" ? 1 : 0
metadata {
name = "${var.name}-litellm"
namespace = var.namespace
labels = local.common_labels
}
spec {
selector = {
"app.kubernetes.io/name" = "litellm"
}
port {
port = var.port
target_port = var.port
}
type = "ClusterIP"
}
}
# ConfigMap for LiteLLM configuration
resource "kubernetes_config_map" "litellm" {
count = var.environment == "module" ? 1 : 0
metadata {
name = "${var.name}-litellm-config"
namespace = var.namespace
labels = local.common_labels
}
data = {
"config.yaml" = yamlencode({
model_list = [
for provider in var.providers : [
for model in provider.models : {
model_name = model.model_name
litellm_params = {
model = "${provider.provider}/${model.litellm_model}"
api_key = provider.api_key
api_base = provider.api_base
}
}
]
]
litellm_settings = {
set_verbose = var.environment == "dev"
drop_params = true
max_tokens = 4096
request_timeout = 600
num_retries = 2
}
})
}
}
# ------------------------------------------------------------------------------
# Outputs
# ------------------------------------------------------------------------------
output "name" {
description = "LiteLLM deployment name"
value = "${var.name}-litellm"
}
output "image" {
description = "LiteLLM container image"
value = "${var.image.repository}:${var.image.tag}"
}
output "port" {
description = "LiteLLM service port"
value = var.port
}
output "replicas" {
description = "Number of replicas"
value = var.replicas
}
output "database_url" {
description = "Database connection URL"
value = local.database_url
sensitive = true
}
output "redis_url" {
description = "Redis connection URL"
value = local.redis_url
sensitive = true
}
output "autoscaling_enabled" {
description = "Whether autoscaling is enabled"
value = var.autoscaling.enabled
}
output "ingress_enabled" {
description = "Whether ingress is enabled"
value = var.ingress.enabled
}
output "monitoring_enabled" {
description = "Whether monitoring is enabled"
value = var.monitoring.enabled
}
output "common_labels" {
description = "Common labels applied to resources"
value = local.common_labels
}
+669
View File
@@ -0,0 +1,669 @@
# ==============================================================================
# Heretek OpenClaw - Monitoring Terraform Module
# ==============================================================================
# Reusable module for monitoring stack (Prometheus, Grafana, Alerting)
# ==============================================================================
# ------------------------------------------------------------------------------
# Module Variables
# ------------------------------------------------------------------------------
variable "name_prefix" {
description = "Name prefix for resources"
type = string
}
variable "environment" {
description = "Environment name"
type = string
default = "dev"
}
variable "tags" {
description = "Tags to apply to resources"
type = map(string)
default = {}
}
# ------------------------------------------------------------------------------
# Cloud Provider Specific Variables
# ------------------------------------------------------------------------------
variable "cloud_provider" {
description = "Cloud provider (aws, gcp, azure)"
type = string
validation {
condition = contains(["aws", "gcp", "azure"], var.cloud_provider)
error_message = "Cloud provider must be one of: aws, gcp, azure."
}
}
variable "project_id" {
description = "GCP project ID or Azure subscription ID"
type = string
default = null
}
variable "region" {
description = "Cloud provider region"
type = string
}
variable "resource_group_name" {
description = "Azure resource group name"
type = string
default = null
}
# ------------------------------------------------------------------------------
# Cluster Configuration
# ------------------------------------------------------------------------------
variable "cluster_name" {
description = "Kubernetes cluster name"
type = string
default = null
}
variable "cluster_id" {
description = "Kubernetes cluster ID"
type = string
default = null
}
# ------------------------------------------------------------------------------
# Database Configuration
# ------------------------------------------------------------------------------
variable "database_instance_id" {
description = "Database instance identifier"
type = string
default = null
}
variable "database_instance_name" {
description = "Database instance name"
type = string
default = null
}
variable "database_server_id" {
description = "Database server ID (Azure)"
type = string
default = null
}
# ------------------------------------------------------------------------------
# Cache Configuration
# ------------------------------------------------------------------------------
variable "cache_cluster_id" {
description = "Cache cluster identifier"
type = string
default = null
}
variable "cache_instance_id" {
description = "Cache instance identifier"
type = string
default = null
}
variable "redis_cache_id" {
description = "Redis cache ID (Azure)"
type = string
default = null
}
# ------------------------------------------------------------------------------
# Dashboard Configuration
# ------------------------------------------------------------------------------
variable "enable_dashboard" {
description = "Enable monitoring dashboard"
type = bool
default = true
}
variable "dashboard_name" {
description = "Dashboard name"
type = string
default = null
}
# ------------------------------------------------------------------------------
# Alerting Configuration
# ------------------------------------------------------------------------------
variable "enable_alerts" {
description = "Enable alerting rules"
type = bool
default = true
}
variable "alert_notification_arn" {
description = "SNS topic ARN (AWS)"
type = string
default = null
}
variable "alert_email" {
description = "Email for alert notifications"
type = string
default = null
}
variable "alert_notification_channels" {
description = "Alert notification channel IDs (GCP)"
type = list(string)
default = []
}
variable "action_group_id" {
description = "Action group ID (Azure)"
type = string
default = null
}
# ------------------------------------------------------------------------------
# Log Configuration
# ------------------------------------------------------------------------------
variable "log_retention_days" {
description = "Log retention period in days"
type = number
default = 30
}
variable "enable_log_export" {
description = "Enable log export to storage"
type = bool
default = false
}
variable "log_storage_bucket" {
description = "Storage bucket for log export"
type = string
default = null
}
# ------------------------------------------------------------------------------
# Local Values
# ------------------------------------------------------------------------------
locals {
common_tags = merge(var.tags, {
"app.kubernetes.io/name" = "monitoring"
"app.kubernetes.io/component" = "observability"
"app.kubernetes.io/part-of" = "openclaw"
"app.kubernetes.io/managed-by" = "terraform"
})
dashboard_name = var.dashboard_name != null ? var.dashboard_name : "${var.name_prefix}-dashboard"
alert_prefix = "${var.name_prefix}-alert"
}
# ------------------------------------------------------------------------------
# AWS Resources
# ------------------------------------------------------------------------------
# CloudWatch Dashboard
resource "aws_cloudwatch_dashboard" "openclaw" {
count = var.cloud_provider == "aws" && var.enable_dashboard ? 1 : 0
dashboard_name = local.dashboard_name
dashboard_body = jsonencode({
widgets = [
{
type = "metric"
x = 0
y = 0
width = 12
height = 6
properties = {
title = "EKS Cluster CPU Utilization"
region = var.region
metrics = [
["AWS/EKS", "CPUUtilization", "ClusterName", var.cluster_name, { stat = "Average" }]
]
view = "timeSeries"
period = 300
}
},
{
type = "metric"
x = 12
y = 0
width = 12
height = 6
properties = {
title = "RDS CPU Utilization"
region = var.region
metrics = [
["AWS/RDS", "CPUUtilization", "DBInstanceIdentifier", var.database_instance_id, { stat = "Average" }]
]
view = "timeSeries"
period = 300
}
},
{
type = "metric"
x = 0
y = 6
width = 12
height = 6
properties = {
title = "ElastiCache CPU Utilization"
region = var.region
metrics = [
["AWS/ElastiCache", "CPUUtilization", "CacheClusterId", var.cache_cluster_id, { stat = "Average" }]
]
view = "timeSeries"
period = 300
}
},
{
type = "metric"
x = 12
y = 6
width = 12
height = 6
properties = {
title = "ALB Request Count"
region = var.region
metrics = [
["AWS/ApplicationELB", "RequestCount", "LoadBalancer", var.name_prefix, { stat = "Sum" }]
]
view = "timeSeries"
period = 300
}
}
]
})
}
# CloudWatch Alarms - EKS
resource "aws_cloudwatch_metric_alarm" "eks_cpu" {
count = var.cloud_provider == "aws" && var.enable_alerts ? 1 : 0
alarm_name = "${local.alert_prefix}-eks-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/EKS"
period = 300
statistic = "Average"
threshold = 80
alarm_description = "EKS cluster CPU utilization is too high"
alarm_actions = var.alert_notification_arn != null ? [var.alert_notification_arn] : []
ok_actions = var.alert_notification_arn != null ? [var.alert_notification_arn] : []
dimensions = {
ClusterName = var.cluster_name
}
}
# CloudWatch Alarms - RDS
resource "aws_cloudwatch_metric_alarm" "rds_cpu" {
count = var.cloud_provider == "aws" && var.enable_alerts ? 1 : 0
alarm_name = "${local.alert_prefix}-rds-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/RDS"
period = 300
statistic = "Average"
threshold = 80
alarm_description = "RDS CPU utilization is too high"
alarm_actions = var.alert_notification_arn != null ? [var.alert_notification_arn] : []
dimensions = {
DBInstanceIdentifier = var.database_instance_id
}
}
resource "aws_cloudwatch_metric_alarm" "rds_storage" {
count = var.cloud_provider == "aws" && var.enable_alerts ? 1 : 0
alarm_name = "${local.alert_prefix}-rds-storage"
comparison_operator = "LessThanThreshold"
evaluation_periods = 2
metric_name = "FreeStorageSpace"
namespace = "AWS/RDS"
period = 300
statistic = "Average"
threshold = 10737418240 # 10GB
alarm_description = "RDS free storage space is too low"
alarm_actions = var.alert_notification_arn != null ? [var.alert_notification_arn] : []
dimensions = {
DBInstanceIdentifier = var.database_instance_id
}
}
# CloudWatch Alarms - ElastiCache
resource "aws_cloudwatch_metric_alarm" "elasticache_cpu" {
count = var.cloud_provider == "aws" && var.enable_alerts ? 1 : 0
alarm_name = "${local.alert_prefix}-elasticache-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/ElastiCache"
period = 300
statistic = "Average"
threshold = 80
alarm_description = "ElastiCache CPU utilization is too high"
alarm_actions = var.alert_notification_arn != null ? [var.alert_notification_arn] : []
dimensions = {
CacheClusterId = var.cache_cluster_id
}
}
resource "aws_cloudwatch_metric_alarm" "elasticache_memory" {
count = var.cloud_provider == "aws" && var.enable_alerts ? 1 : 0
alarm_name = "${local.alert_prefix}-elasticache-memory"
comparison_operator = "LessThanThreshold"
evaluation_periods = 2
metric_name = "FreeableMemory"
namespace = "AWS/ElastiCache"
period = 300
statistic = "Average"
threshold = 268435456 # 256MB
alarm_description = "ElastiCache freeable memory is too low"
alarm_actions = var.alert_notification_arn != null ? [var.alert_notification_arn] : []
dimensions = {
CacheClusterId = var.cache_cluster_id
}
}
# ------------------------------------------------------------------------------
# GCP Resources
# ------------------------------------------------------------------------------
# Cloud Monitoring Dashboard
resource "google_monitoring_dashboard" "openclaw" {
count = var.cloud_provider == "gcp" && var.enable_dashboard ? 1 : 0
dashboard_json = jsonencode({
displayName = local.dashboard_name
gridLayout = {
columns = 2
widgets = [
{
title = "GKE Cluster CPU"
xyChart = {
dataSets = [{
timeSeriesQuery = {
apiSource = "CLOUD_MONITORING_API"
timeSeriesFilter = {
filter = "resource.type=\"k8s_container\" AND metric.type=\"kubernetes.io/container/cpu/limit_utilization\""
aggregation = {
alignmentPeriod = "300s"
perSeriesAligner = "ALIGN_MEAN"
crossSeriesReducer = "REDUCE_MEAN"
groupByFields = ["resource.label.\"cluster_name\""]
}
}
}
}]
}
},
{
title = "Cloud SQL CPU"
xyChart = {
dataSets = [{
timeSeriesQuery = {
apiSource = "CLOUD_MONITORING_API"
timeSeriesFilter = {
filter = "resource.type=\"cloudsql_database\" AND metric.type=\"cloudsql.googleapis.com/database/cpu/utilization\""
aggregation = {
alignmentPeriod = "300s"
perSeriesAligner = "ALIGN_MEAN"
}
}
}
}]
}
},
{
title = "Memorystore Memory"
xyChart = {
dataSets = [{
timeSeriesQuery = {
apiSource = "CLOUD_MONITORING_API"
timeSeriesFilter = {
filter = "resource.type=\"cloud_memorystore_instance\" AND metric.type=\"redis.googleapis.com/memory/usage_ratio\""
}
}
}]
}
}
]
}
})
}
# GCP Alert Policies
resource "google_monitoring_alert_policy" "gke_cpu" {
count = var.cloud_provider == "gcp" && var.enable_alerts ? 1 : 0
display_name = "${local.alert_prefix}-gke-cpu"
project = var.project_id
conditions {
display_name = "GKE CPU utilization > 80%"
condition_threshold {
filter = "resource.type=\"k8s_container\" AND metric.type=\"kubernetes.io/container/cpu/limit_utilization\""
duration = "300s"
comparison = "COMPARISON_GT"
threshold_value = 0.8
aggregations {
alignment_period = "300s"
per_series_aligner = "ALIGN_MEAN"
}
}
}
notification_channels = var.alert_notification_channels
severity = "WARNING"
}
resource "google_monitoring_alert_policy" "cloud_sql_cpu" {
count = var.cloud_provider == "gcp" && var.enable_alerts && var.database_instance_name != null ? 1 : 0
display_name = "${local.alert_prefix}-cloud-sql-cpu"
project = var.project_id
conditions {
display_name = "Cloud SQL CPU utilization > 80%"
condition_threshold {
filter = "resource.type=\"cloudsql_database\" AND metric.type=\"cloudsql.googleapis.com/database/cpu/utilization\" AND resource.label.\"database_id\" = \"${var.database_instance_name}\""
duration = "300s"
comparison = "COMPARISON_GT"
threshold_value = 80
}
}
notification_channels = var.alert_notification_channels
severity = "WARNING"
}
# ------------------------------------------------------------------------------
# Azure Resources
# ------------------------------------------------------------------------------
# Azure Monitor Dashboard
resource "azurerm_dashboard" "openclaw" {
count = var.cloud_provider == "azure" && var.enable_dashboard ? 1 : 0
name = local.dashboard_name
resource_group_name = var.resource_group_name
location = var.region
tags = local.common_tags
dashboard_properties = jsonencode({
lenses = {
"0" = {
order = 0
parts = {
"0" = {
position = { x = 0, y = 0, colSpan = 2, rowSpan = 1 }
metadata = {
inputs = []
type = "Extension/HubsExtension/PartType/MonitorChartPart"
settings = {
content = {
options = {
chart = {
metrics = [{
resourceMetadata = { id = var.cluster_id }
name = "cpuUsagePercentage"
namespace = "Insights.Container/containers"
}]
}
}
}
}
}
}
}
}
}
})
}
# Azure Monitor Alerts
resource "azurerm_monitor_metric_alert" "aks_cpu" {
count = var.cloud_provider == "azure" && var.enable_alerts && var.cluster_id != null ? 1 : 0
name = "${local.alert_prefix}-aks-cpu"
resource_group_name = var.resource_group_name
scopes = [var.cluster_id]
description = "AKS cluster CPU utilization is too high"
criteria {
metric_namespace = "Insights.Container/containers"
metric_name = "cpuUsagePercentage"
aggregation = "Average"
operator = "GreaterThan"
threshold = 80
}
severity = 3
dynamic "action" {
for_each = var.action_group_id != null ? [1] : []
content {
action_group_id = var.action_group_id
}
}
}
resource "azurerm_monitor_metric_alert" "postgresql_cpu" {
count = var.cloud_provider == "azure" && var.enable_alerts && var.database_server_id != null ? 1 : 0
name = "${local.alert_prefix}-postgresql-cpu"
resource_group_name = var.resource_group_name
scopes = [var.database_server_id]
description = "PostgreSQL CPU utilization is too high"
criteria {
metric_namespace = "Microsoft.DBforPostgreSQL/flexibleServers"
metric_name = "cpu_percent"
aggregation = "Average"
operator = "GreaterThan"
threshold = 80
}
severity = 3
dynamic "action" {
for_each = var.action_group_id != null ? [1] : []
content {
action_group_id = var.action_group_id
}
}
}
resource "azurerm_monitor_metric_alert" "redis_cpu" {
count = var.cloud_provider == "azure" && var.enable_alerts && var.redis_cache_id != null ? 1 : 0
name = "${local.alert_prefix}-redis-cpu"
resource_group_name = var.resource_group_name
scopes = [var.redis_cache_id]
description = "Redis CPU utilization is too high"
criteria {
metric_namespace = "Microsoft.Cache/Redis"
metric_name = "UsedMemoryPercentage"
aggregation = "Average"
operator = "GreaterThan"
threshold = 80
}
severity = 3
dynamic "action" {
for_each = var.action_group_id != null ? [1] : []
content {
action_group_id = var.action_group_id
}
}
}
# ------------------------------------------------------------------------------
# Outputs
# ------------------------------------------------------------------------------
output "dashboard_id" {
description = "Dashboard ID"
value = var.cloud_provider == "aws" ? (
length(aws_cloudwatch_dashboard.openclaw) > 0 ? aws_cloudwatch_dashboard.openclaw[0].dashboard_name : null
) : var.cloud_provider == "gcp" ? (
length(google_monitoring_dashboard.openclaw) > 0 ? google_monitoring_dashboard.openclaw[0].id : null
) : var.cloud_provider == "azure" ? (
length(azurerm_dashboard.openclaw) > 0 ? azurerm_dashboard.openclaw[0].id : null
) : null
}
output "dashboard_name" {
description = "Dashboard name"
value = local.dashboard_name
}
output "alarm_ids" {
description = "List of alarm IDs"
value = var.cloud_provider == "aws" ? concat(
aws_cloudwatch_metric_alarm.eks_cpu[*].id,
aws_cloudwatch_metric_alarm.rds_cpu[*].id,
aws_cloudwatch_metric_alarm.rds_storage[*].id,
aws_cloudwatch_metric_alarm.elasticache_cpu[*].id,
aws_cloudwatch_metric_alarm.elasticache_memory[*].id
) : var.cloud_provider == "gcp" ? concat(
google_monitoring_alert_policy.gke_cpu[*].id,
google_monitoring_alert_policy.cloud_sql_cpu[*].id
) : []
}
output "alert_policy_ids" {
description = "List of alert policy IDs"
value = var.cloud_provider == "gcp" ? concat(
google_monitoring_alert_policy.gke_cpu[*].id,
google_monitoring_alert_policy.cloud_sql_cpu[*].id
) : []
}
output "log_group_names" {
description = "Map of CloudWatch log group names"
value = var.cloud_provider == "aws" ? {
eks = "/aws/containerinsights/${var.cluster_name}/application"
cluster = "/aws/containerinsights/${var.cluster_name}/dataplane"
} : {}
}
+324
View File
@@ -0,0 +1,324 @@
# ==============================================================================
# Heretek OpenClaw - Common Terraform Module
# ==============================================================================
# Reusable module for OpenClaw deployment across cloud providers
# ==============================================================================
# ------------------------------------------------------------------------------
# Module Variables
# ------------------------------------------------------------------------------
variable "name" {
description = "Name prefix for resources"
type = string
}
variable "environment" {
description = "Environment name"
type = string
}
variable "tags" {
description = "Tags to apply to resources"
type = map(string)
default = {}
}
# ------------------------------------------------------------------------------
# OpenClaw Gateway Configuration
# ------------------------------------------------------------------------------
variable "gateway" {
description = "Gateway configuration"
type = object({
image = string
replicas = number
port = number
resources = optional(object({
requests = object({
cpu = string
memory = string
})
limits = object({
cpu = string
memory = string
})
}))
autoscaling = optional(object({
enabled = bool
min_replicas = number
max_replicas = number
target_cpu = number
}))
})
default = {
image = "heretek/openclaw-gateway:latest"
replicas = 1
port = 18789
}
}
# ------------------------------------------------------------------------------
# LiteLLM Configuration
# ------------------------------------------------------------------------------
variable "litellm" {
description = "LiteLLM configuration"
type = object({
image = string
replicas = number
port = number
resources = optional(object({
requests = object({
cpu = string
memory = string
})
limits = object({
cpu = string
memory = string
})
}))
})
default = {
image = "ghcr.io/berriai/litellm:main-latest"
replicas = 1
port = 4000
}
}
# ------------------------------------------------------------------------------
# Database Configuration
# ------------------------------------------------------------------------------
variable "database" {
description = "Database configuration"
type = object({
type = string # rds, cloud_sql, azure_postgresql
host = string
port = number
name = string
username = string
password = string
ssl_mode = optional(string, "require")
})
}
# ------------------------------------------------------------------------------
# Redis Configuration
# ------------------------------------------------------------------------------
variable "redis" {
description = "Redis configuration"
type = object({
type = string # elasticache, memorystore, azure_redis
host = string
port = number
password = optional(string)
ssl = optional(bool, true)
})
}
# ------------------------------------------------------------------------------
# Ollama Configuration (Optional)
# ------------------------------------------------------------------------------
variable "ollama" {
description = "Ollama configuration for local LLM"
type = object({
enabled = bool
image = optional(string, "ollama/ollama:latest")
gpu = optional(bool, false)
models = optional(list(string), ["nomic-embed-text-v2-moe"])
resources = optional(object({
requests = object({
cpu = string
memory = string
})
limits = object({
cpu = string
memory = string
gpu = optional(string)
})
}))
})
default = {
enabled = false
}
}
# ------------------------------------------------------------------------------
# Neo4j Configuration (Optional for GraphRAG)
# ------------------------------------------------------------------------------
variable "neo4j" {
description = "Neo4j configuration for GraphRAG"
type = object({
enabled = bool
image = optional(string, "neo4j:5.15")
username = optional(string, "neo4j")
password = optional(string)
resources = optional(object({
requests = object({
cpu = string
memory = string
})
limits = object({
cpu = string
memory = string
})
}))
})
default = {
enabled = false
}
}
# ------------------------------------------------------------------------------
# Langfuse Configuration (Optional for Observability)
# ------------------------------------------------------------------------------
variable "langfuse" {
description = "Langfuse observability configuration"
type = object({
enabled = bool
image = optional(string, "langfuse/langfuse:latest")
host = optional(string)
public_key = optional(string)
secret_key = optional(string)
})
default = {
enabled = false
}
}
# ------------------------------------------------------------------------------
# Secrets Configuration
# ------------------------------------------------------------------------------
variable "secrets" {
description = "Secrets configuration"
type = object({
minimax_api_key = optional(string)
zai_api_key = optional(string)
anthropic_api_key = optional(string)
openai_api_key = optional(string)
google_api_key = optional(string)
azure_openai_api_key = optional(string)
})
default = {}
}
# ------------------------------------------------------------------------------
# Networking Configuration
# ------------------------------------------------------------------------------
variable "network" {
description = "Network configuration"
type = object({
vpc_id = string
subnet_ids = list(string)
security_groups = optional(list(string))
})
}
# ------------------------------------------------------------------------------
# Monitoring Configuration
# ------------------------------------------------------------------------------
variable "monitoring" {
description = "Monitoring configuration"
type = object({
enabled = bool
metrics_enabled = optional(bool, true)
logging_enabled = optional(bool, true)
tracing_enabled = optional(bool, false)
})
default = {
enabled = true
}
}
# ------------------------------------------------------------------------------
# Local Values
# ------------------------------------------------------------------------------
locals {
common_labels = merge(var.tags, {
"app.kubernetes.io/name" = "openclaw"
"app.kubernetes.io/component" = "gateway"
"app.kubernetes.io/part-of" = "openclaw"
"app.kubernetes.io/managed-by" = "terraform"
})
default_resources = {
gateway = {
requests = {
cpu = "2000m"
memory = "4Gi"
}
limits = {
cpu = "4000m"
memory = "8Gi"
}
}
litellm = {
requests = {
cpu = "1000m"
memory = "2Gi"
}
limits = {
cpu = "2000m"
memory = "4Gi"
}
}
}
}
# ------------------------------------------------------------------------------
# Outputs
# ------------------------------------------------------------------------------
output "gateway_config" {
description = "Gateway configuration"
value = {
image = var.gateway.image
port = var.gateway.port
replicas = var.gateway.replicas
}
}
output "litellm_config" {
description = "LiteLLM configuration"
value = {
image = var.litellm.image
port = var.litellm.port
replicas = var.litellm.replicas
}
}
output "database_connection_string" {
description = "Database connection string"
value = "postgresql://${var.database.username}:${var.database.password}@${var.database.host}:${var.database.port}/${var.database.name}"
sensitive = true
}
output "redis_connection_string" {
description = "Redis connection string"
value = "redis://${var.redis.password != null ? ":${var.redis.password}@" : ""}${var.redis.host}:${var.redis.port}"
sensitive = true
}
output "ollama_enabled" {
description = "Whether Ollama is enabled"
value = var.ollama.enabled
}
output "neo4j_enabled" {
description = "Whether Neo4j is enabled"
value = var.neo4j.enabled
}
output "langfuse_enabled" {
description = "Whether Langfuse is enabled"
value = var.langfuse.enabled
}
@@ -0,0 +1,378 @@
# ==============================================================================
# Heretek OpenClaw - Common Module Outputs
# ==============================================================================
# Output definitions for the OpenClaw module
# ==============================================================================
# ------------------------------------------------------------------------------
# Application Outputs
# ------------------------------------------------------------------------------
output "name" {
description = "Name prefix used for resources"
value = var.name
}
output "environment" {
description = "Environment name"
value = var.environment
}
output "app_version" {
description = "Application version"
value = var.app_version
}
# ------------------------------------------------------------------------------
# Gateway Outputs
# ------------------------------------------------------------------------------
output "gateway_image" {
description = "Gateway container image"
value = "${var.gateway.image.repository}:${var.gateway.image.tag}"
}
output "gateway_port" {
description = "Gateway service port"
value = var.gateway.port
}
output "gateway_replicas" {
description = "Gateway replica count"
value = var.gateway.replicas
}
output "gateway_autoscaling_enabled" {
description = "Whether gateway autoscaling is enabled"
value = var.gateway.autoscaling.enabled
}
output "gateway_ingress_enabled" {
description = "Whether gateway ingress is enabled"
value = var.gateway.ingress.enabled
}
# ------------------------------------------------------------------------------
# LiteLLM Outputs
# ------------------------------------------------------------------------------
output "litellm_enabled" {
description = "Whether LiteLLM is enabled"
value = var.litellm.enabled
}
output "litellm_image" {
description = "LiteLLM container image"
value = "${var.litellm.image.repository}:${var.litellm.image.tag}"
}
output "litellm_port" {
description = "LiteLLM service port"
value = var.litellm.port
}
output "litellm_replicas" {
description = "LiteLLM replica count"
value = var.litellm.replicas
}
# ------------------------------------------------------------------------------
# Database Outputs
# ------------------------------------------------------------------------------
output "database_type" {
description = "Database type (managed or self-hosted)"
value = var.database.type
}
output "database_connection_string" {
description = "Database connection string"
value = var.database.host != null ? "postgresql://${var.database.username}:${var.database.password}@${var.database.host}:${var.database.port}/${var.database.name}" : null
sensitive = true
}
output "database_pgvector_enabled" {
description = "Whether pgvector is enabled"
value = var.database.pgvector_enabled
}
# ------------------------------------------------------------------------------
# Redis Outputs
# ------------------------------------------------------------------------------
output "redis_type" {
description = "Redis type (managed or self-hosted)"
value = var.redis.type
}
output "redis_connection_string" {
description = "Redis connection string"
value = var.redis.host != null ? "redis://${var.redis.password != null ? ":${var.redis.password}@" : ""}${var.redis.host}:${var.redis.port}" : null
sensitive = true
}
# ------------------------------------------------------------------------------
# Ollama Outputs
# ------------------------------------------------------------------------------
output "ollama_enabled" {
description = "Whether Ollama is enabled"
value = var.ollama.enabled
}
output "ollama_gpu_enabled" {
description = "Whether Ollama GPU support is enabled"
value = var.ollama.gpu.enabled
}
output "ollama_gpu_type" {
description = "Ollama GPU type (amd or nvidia)"
value = var.ollama.gpu.type
}
output "ollama_models" {
description = "List of Ollama models to pull"
value = var.ollama.models
}
output "ollama_image" {
description = "Ollama container image"
value = "${var.ollama.image.repository}:${var.ollama.image.tag}"
}
# ------------------------------------------------------------------------------
# Neo4j Outputs
# ------------------------------------------------------------------------------
output "neo4j_enabled" {
description = "Whether Neo4j is enabled"
value = var.neo4j.enabled
}
output "neo4j_image" {
description = "Neo4j container image"
value = "${var.neo4j.image.repository}:${var.neo4j.image.tag}"
}
# ------------------------------------------------------------------------------
# Langfuse Outputs
# ------------------------------------------------------------------------------
output "langfuse_enabled" {
description = "Whether Langfuse is enabled"
value = var.langfuse.enabled
}
output "langfuse_image" {
description = "Langfuse container image"
value = "${var.langfuse.image.repository}:${var.langfuse.image.tag}"
}
output "langfuse_ingress_enabled" {
description = "Whether Langfuse ingress is enabled"
value = var.langfuse.ingress.enabled
}
# ------------------------------------------------------------------------------
# Secrets Outputs
# ------------------------------------------------------------------------------
output "secrets_configured" {
description = "List of configured secret keys"
value = [for key in keys(var.secrets) : key if var.secrets[key] != null]
sensitive = true
}
output "external_secrets_enabled" {
description = "Whether external secrets manager is enabled"
value = var.external_secrets.enabled
}
output "external_secrets_store" {
description = "External secrets store type"
value = var.external_secrets.store
}
# ------------------------------------------------------------------------------
# Network Outputs
# ------------------------------------------------------------------------------
output "vpc_id" {
description = "VPC ID"
value = var.network.vpc_id
}
output "subnet_ids" {
description = "Subnet IDs"
value = var.network.subnet_ids
}
output "pod_cidr" {
description = "Pod CIDR range"
value = var.network.pod_cidr
}
output "service_cidr" {
description = "Service CIDR range"
value = var.network.service_cidr
}
output "network_policy" {
description = "Network policy provider"
value = var.network.network_policy
}
# ------------------------------------------------------------------------------
# Domain Outputs
# ------------------------------------------------------------------------------
output "domain_enabled" {
description = "Whether custom domain is enabled"
value = var.domain.enabled
}
output "domain_base" {
description = "Base domain name"
value = var.domain.base_domain
}
output "domain_hosts" {
description = "Configured domain hosts"
value = var.domain.enabled ? {
gateway = "${var.domain.gateway_host}.${var.domain.base_domain}"
litellm = "${var.domain.litellm_host}.${var.domain.base_domain}"
langfuse = "${var.domain.langfuse_host}.${var.domain.base_domain}"
} : {}
}
# ------------------------------------------------------------------------------
# Monitoring Outputs
# ------------------------------------------------------------------------------
output "monitoring_enabled" {
description = "Whether monitoring is enabled"
value = var.monitoring.enabled
}
output "metrics_enabled" {
description = "Whether metrics collection is enabled"
value = var.monitoring.metrics_enabled
}
output "logging_enabled" {
description = "Whether logging is enabled"
value = var.monitoring.logging_enabled
}
output "tracing_enabled" {
description = "Whether distributed tracing is enabled"
value = var.monitoring.tracing_enabled
}
output "service_monitor_enabled" {
description = "Whether Prometheus ServiceMonitor is enabled"
value = var.monitoring.service_monitor.enabled
}
# ------------------------------------------------------------------------------
# Security Outputs
# ------------------------------------------------------------------------------
output "pod_security_enabled" {
description = "Whether pod security policy is enabled"
value = var.security.pod_security_policy.enabled
}
output "network_policy_enabled" {
description = "Whether network policy is enabled"
value = var.security.network_policy.enabled
}
output "secrets_encryption_enabled" {
description = "Whether secrets encryption is enabled"
value = var.security.secrets_encryption.enabled
}
# ------------------------------------------------------------------------------
# Backup Outputs
# ------------------------------------------------------------------------------
output "backup_enabled" {
description = "Whether automated backups are enabled"
value = var.backup.enabled
}
output "backup_schedule" {
description = "Backup schedule (cron expression)"
value = var.backup.schedule
}
output "backup_retention_days" {
description = "Backup retention period in days"
value = var.backup.retention_days
}
# ------------------------------------------------------------------------------
# Resource Labels
# ------------------------------------------------------------------------------
output "common_labels" {
description = "Common labels applied to all resources"
value = {
"app.kubernetes.io/name" = "openclaw"
"app.kubernetes.io/component" = "gateway"
"app.kubernetes.io/part-of" = "openclaw"
"app.kubernetes.io/managed-by" = "terraform"
"app.kubernetes.io/version" = var.app_version
"environment" = var.environment
}
}
# ------------------------------------------------------------------------------
# Configuration Summary
# ------------------------------------------------------------------------------
output "configuration_summary" {
description = "Summary of the OpenClaw configuration"
value = {
name = var.name
environment = var.environment
version = var.app_version
components = {
gateway = true
litellm = var.litellm.enabled
ollama = var.ollama.enabled
neo4j = var.neo4j.enabled
langfuse = var.langfuse.enabled
}
database = {
type = var.database.type
pgvector = var.database.pgvector_enabled
}
redis = {
type = var.redis.type
}
monitoring = {
enabled = var.monitoring.enabled
metrics = var.monitoring.metrics_enabled
logging = var.monitoring.logging_enabled
tracing = var.monitoring.tracing_enabled
}
security = {
pod_security = var.security.pod_security_policy.enabled
network_policy = var.security.network_policy.enabled
secrets_encryption = var.security.secrets_encryption.enabled
}
backup = {
enabled = var.backup.enabled
schedule = var.backup.schedule
retention = var.backup.retention_days
}
}
}
@@ -0,0 +1,426 @@
# ==============================================================================
# Heretek OpenClaw - Common Module Variables
# ==============================================================================
# Variable definitions for the OpenClaw module
# ==============================================================================
# ------------------------------------------------------------------------------
# General Configuration
# ------------------------------------------------------------------------------
variable "name" {
description = "Name prefix for all resources"
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]{2,20}$", var.name))
error_message = "Name must be 3-20 characters, start with a letter, and contain only lowercase alphanumeric characters and hyphens."
}
}
variable "environment" {
description = "Deployment environment"
type = string
default = "dev"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be one of: dev, staging, prod."
}
}
variable "region" {
description = "Cloud provider region"
type = string
}
variable "tags" {
description = "Tags to apply to all resources"
type = map(string)
default = {}
}
# ------------------------------------------------------------------------------
# Application Configuration
# ------------------------------------------------------------------------------
variable "app_version" {
description = "Application version to deploy"
type = string
default = "2026.3.28"
}
variable "gateway" {
description = "OpenClaw Gateway configuration"
type = object({
image = object({
repository = string
tag = string
pull_policy = optional(string, "IfNotPresent")
})
replicas = optional(number, 1)
port = optional(number, 18789)
service_type = optional(string, "ClusterIP")
resources = optional(object({
requests = object({
cpu = optional(string, "2000m")
memory = optional(string, "4Gi")
})
limits = object({
cpu = optional(string, "4000m")
memory = optional(string, "8Gi")
})
}))
autoscaling = optional(object({
enabled = optional(bool, false)
min_replicas = optional(number, 1)
max_replicas = optional(number, 5)
target_cpu_percent = optional(number, 80)
target_memory_percent = optional(number, 80)
}))
ingress = optional(object({
enabled = optional(bool, false)
class_name = optional(string, "nginx")
hosts = optional(list(string), [])
tls = optional(list(object({
secret_name = string
hosts = list(string)
})), [])
}))
})
default = {
image = {
repository = "heretek/openclaw-gateway"
tag = "2026.3.28"
}
replicas = 1
port = 18789
}
}
variable "litellm" {
description = "LiteLLM proxy configuration"
type = object({
enabled = optional(bool, true)
image = object({
repository = optional(string, "ghcr.io/berriai/litellm")
tag = optional(string, "main-latest")
})
replicas = optional(number, 1)
port = optional(number, 4000)
service_type = optional(string, "ClusterIP")
resources = optional(object({
requests = object({
cpu = optional(string, "1000m")
memory = optional(string, "2Gi")
})
limits = object({
cpu = optional(string, "2000m")
memory = optional(string, "4Gi")
})
}))
config = optional(object({
master_key = optional(string)
cost_tracking = optional(bool, true)
metrics_enabled = optional(bool, true)
log_level = optional(string, "INFO")
}))
})
default = {
enabled = true
image = {
repository = "ghcr.io/berriai/litellm"
tag = "main-latest"
}
replicas = 1
port = 4000
}
}
# ------------------------------------------------------------------------------
# Database Configuration
# ------------------------------------------------------------------------------
variable "database" {
description = "Database configuration"
type = object({
type = optional(string, "managed") # managed, self-hosted
host = optional(string)
port = optional(number, 5432)
name = optional(string, "openclaw")
username = optional(string, "openclaw")
password = optional(string)
password_secret = optional(string)
ssl_mode = optional(string, "require")
pool_size = optional(number, 10)
max_connections = optional(number, 100)
pgvector_enabled = optional(bool, true)
})
default = {
type = "managed"
}
}
# ------------------------------------------------------------------------------
# Redis Configuration
# ------------------------------------------------------------------------------
variable "redis" {
description = "Redis configuration"
type = object({
type = optional(string, "managed") # managed, self-hosted
host = optional(string)
port = optional(number, 6379)
password = optional(string)
password_secret = optional(string)
ssl_enabled = optional(bool, true)
db = optional(number, 0)
pool_size = optional(number, 10)
})
default = {
type = "managed"
}
}
# ------------------------------------------------------------------------------
# Ollama Configuration
# ------------------------------------------------------------------------------
variable "ollama" {
description = "Ollama local LLM configuration"
type = object({
enabled = optional(bool, false)
image = object({
repository = optional(string, "ollama/ollama")
tag = optional(string, "rocm") # rocm for AMD, latest for CPU
})
gpu = object({
enabled = optional(bool, false)
type = optional(string, "amd") # amd or nvidia
device = optional(string)
})
models = optional(list(string), ["nomic-embed-text-v2-moe"])
persistence = object({
enabled = optional(bool, true)
size = optional(string, "100Gi")
storage_class = optional(string)
})
resources = optional(object({
requests = object({
cpu = optional(string, "4000m")
memory = optional(string, "8Gi")
})
limits = object({
cpu = optional(string, "8000m")
memory = optional(string, "16Gi")
gpu = optional(string)
})
}))
})
default = {
enabled = false
}
}
# ------------------------------------------------------------------------------
# Neo4j Configuration
# ------------------------------------------------------------------------------
variable "neo4j" {
description = "Neo4j GraphRAG configuration"
type = object({
enabled = optional(bool, true)
image = object({
repository = optional(string, "neo4j")
tag = optional(string, "5.15")
})
auth = object({
username = optional(string, "neo4j")
password = optional(string)
password_secret = optional(string)
})
persistence = object({
enabled = optional(bool, true)
size = optional(string, "20Gi")
storage_class = optional(string)
})
resources = optional(object({
requests = object({
cpu = optional(string, "2000m")
memory = optional(string, "4Gi")
})
limits = object({
cpu = optional(string, "4000m")
memory = optional(string, "8Gi")
})
}))
})
default = {
enabled = true
}
}
# ------------------------------------------------------------------------------
# Langfuse Configuration
# ------------------------------------------------------------------------------
variable "langfuse" {
description = "Langfuse observability configuration"
type = object({
enabled = optional(bool, true)
image = object({
repository = optional(string, "langfuse/langfuse")
tag = optional(string, "latest")
})
replicas = optional(number, 1)
ingress = optional(object({
enabled = optional(bool, false)
hosts = optional(list(string), [])
}))
auth = optional(object({
salt = optional(string)
nextauth_secret = optional(string)
sign_up_enabled = optional(bool, true)
}))
})
default = {
enabled = true
}
}
# ------------------------------------------------------------------------------
# Secrets Configuration
# ------------------------------------------------------------------------------
variable "secrets" {
description = "API keys and secrets"
type = object({
minimax_api_key = optional(string)
zai_api_key = optional(string)
anthropic_api_key = optional(string)
openai_api_key = optional(string)
google_api_key = optional(string)
azure_openai_api_key = optional(string)
azure_openai_endpoint = optional(string)
langfuse_public_key = optional(string)
langfuse_secret_key = optional(string)
})
default = {}
}
variable "external_secrets" {
description = "External secrets manager configuration"
type = object({
enabled = optional(bool, false)
store = optional(string, "vault") # vault, aws, gcp, azure
refresh_interval = optional(string, "1h")
})
default = {
enabled = false
}
}
# ------------------------------------------------------------------------------
# Network Configuration
# ------------------------------------------------------------------------------
variable "network" {
description = "Network configuration"
type = object({
vpc_id = string
subnet_ids = list(string)
security_group_ids = optional(list(string))
pod_cidr = optional(string, "10.244.0.0/16")
service_cidr = optional(string, "10.96.0.0/12")
network_policy = optional(string, "calico")
})
}
variable "domain" {
description = "Domain configuration"
type = object({
enabled = optional(bool, false)
base_domain = optional(string)
gateway_host = optional(string, "gateway")
litellm_host = optional(string, "litellm")
langfuse_host = optional(string, "langfuse")
tls_secret = optional(string)
})
default = {
enabled = false
}
}
# ------------------------------------------------------------------------------
# Monitoring Configuration
# ------------------------------------------------------------------------------
variable "monitoring" {
description = "Monitoring and observability configuration"
type = object({
enabled = optional(bool, true)
metrics_enabled = optional(bool, true)
logging_enabled = optional(bool, true)
tracing_enabled = optional(bool, false)
service_monitor = optional(object({
enabled = optional(bool, false)
interval = optional(string, "30s")
scrape_timeout = optional(string, "10s")
}))
prometheus_rule = optional(object({
enabled = optional(bool, false)
rules = optional(list(any), [])
}))
})
default = {
enabled = true
}
}
# ------------------------------------------------------------------------------
# Security Configuration
# ------------------------------------------------------------------------------
variable "security" {
description = "Security configuration"
type = object({
pod_security_policy = optional(object({
enabled = optional(bool, true)
run_as_non_root = optional(bool, true)
run_as_user = optional(number, 1000)
fs_group = optional(number, 1000)
}))
network_policy = optional(object({
enabled = optional(bool, true)
default_policy = optional(string, "Deny")
allowed_namespaces = optional(list(string), [])
}))
secrets_encryption = optional(object({
enabled = optional(bool, false)
kms_key_id = optional(string)
}))
})
default = {
pod_security_policy = {
enabled = true
}
network_policy = {
enabled = true
}
}
}
# ------------------------------------------------------------------------------
# Backup Configuration
# ------------------------------------------------------------------------------
variable "backup" {
description = "Backup configuration"
type = object({
enabled = optional(bool, true)
schedule = optional(string, "0 2 * * *") # Daily at 2 AM
retention_days = optional(number, 7)
storage_location = optional(string)
})
default = {
enabled = true
}
}
+41
View File
@@ -0,0 +1,41 @@
# AWS Deployment Guide for Heretek OpenClaw
**Version:** 1.0.0
**Last Updated:** 2026-03-31
For complete AWS deployment instructions, see [`deploy/aws/README.md`](../../deploy/aws/README.md).
## Quick Reference
### Terraform Files
| File | Purpose |
|------|---------|
| [`deploy/aws/terraform/main.tf`](../../deploy/aws/terraform/main.tf) | Main configuration |
| [`deploy/aws/terraform/variables.tf`](../../deploy/aws/terraform/variables.tf) | Input variables |
| [`deploy/aws/terraform/outputs.tf`](../../deploy/aws/terraform/outputs.tf) | Output values |
| [`deploy/aws/terraform/vpc.tf`](../../deploy/aws/terraform/vpc.tf) | VPC configuration |
| [`deploy/aws/terraform/eks.tf`](../../deploy/aws/terraform/eks.tf) | EKS cluster |
| [`deploy/aws/terraform/rds.tf`](../../deploy/aws/terraform/rds.tf) | RDS PostgreSQL |
| [`deploy/aws/terraform/elasticache.tf`](../../deploy/aws/terraform/elasticache.tf) | ElastiCache Redis |
| [`deploy/aws/terraform/ecr.tf`](../../deploy/aws/terraform/ecr.tf) | ECR repositories |
| [`deploy/aws/terraform/alb.tf`](../../deploy/aws/terraform/alb.tf) | Application Load Balancer |
### Deploy Commands
```bash
cd deploy/aws/terraform
terraform init
terraform plan -var-file=terraform.dev.tfvars -out=tfplan
terraform apply tfplan
```
### kubectl Configuration
```bash
aws eks update-kubeconfig --name openclaw-dev-eks --region us-east-1
```
---
🦞 *The thought that never ends.*
+41
View File
@@ -0,0 +1,41 @@
# Azure Deployment Guide for Heretek OpenClaw
**Version:** 1.0.0
**Last Updated:** 2026-03-31
For complete Azure deployment instructions, see [`deploy/azure/README.md`](../../deploy/azure/README.md).
## Quick Reference
### Terraform Files
| File | Purpose |
|------|---------|
| [`deploy/azure/terraform/main.tf`](../../deploy/azure/terraform/main.tf) | Main configuration |
| [`deploy/azure/terraform/variables.tf`](../../deploy/azure/terraform/variables.tf) | Input variables |
| [`deploy/azure/terraform/outputs.tf`](../../deploy/azure/terraform/outputs.tf) | Output values |
| [`deploy/azure/terraform/vnet.tf`](../../deploy/azure/terraform/vnet.tf) | VNet configuration |
| [`deploy/azure/terraform/aks.tf`](../../deploy/azure/terraform/aks.tf) | AKS cluster |
| [`deploy/azure/terraform/postgresql.tf`](../../deploy/azure/terraform/postgresql.tf) | Azure Database for PostgreSQL |
| [`deploy/azure/terraform/redis.tf`](../../deploy/azure/terraform/redis.tf) | Azure Cache for Redis |
| [`deploy/azure/terraform/acr.tf`](../../deploy/azure/terraform/acr.tf) | Azure Container Registry |
| [`deploy/azure/terraform/application-gateway.tf`](../../deploy/azure/terraform/application-gateway.tf) | Application Gateway |
### Deploy Commands
```bash
cd deploy/azure/terraform
terraform init
terraform plan -var-file=terraform.dev.tfvars -out=tfplan
terraform apply tfplan
```
### kubectl Configuration
```bash
az aks get-credentials --resource-group openclaw-rg --name openclaw-dev-aks
```
---
🦞 *The thought that never ends.*
+834
View File
@@ -0,0 +1,834 @@
# Bare Metal Deployment Guide
**Version:** 1.0.0
**Last Updated:** 2026-03-31
**OpenClaw Version:** v2026.3.28
This guide provides comprehensive instructions for deploying the Heretek OpenClaw stack on bare metal Linux servers without Docker containerization.
---
## Table of Contents
1. [Prerequisites](#prerequisites)
2. [System Requirements](#system-requirements)
3. [Installation Overview](#installation-overview)
4. [Step 1: Install System Dependencies](#step-1-install-system-dependencies)
5. [Step 2: Install and Configure PostgreSQL](#step-2-install-and-configure-postgresql)
6. [Step 3: Install and Configure Redis](#step-3-install-and-configure-redis)
7. [Step 4: Install and Configure Ollama](#step-4-install-and-configure-ollama)
8. [Step 5: Install LiteLLM](#step-5-install-litellm)
9. [Step 6: Install OpenClaw Gateway](#step-6-install-openclaw-gateway)
10. [Step 7: Configure Environment Variables](#step-7-configure-environment-variables)
11. [Step 8: Initialize Database](#step-8-initialize-database)
12. [Step 9: Configure Systemd Services](#step-9-configure-systemd-services)
13. [Step 10: Verify Installation](#step-10-verify-installation)
14. [Post-Deployment Configuration](#post-deployment-configuration)
15. [Security Hardening](#security-hardening)
---
## Prerequisites
### Required Knowledge
- Basic Linux system administration
- Familiarity with systemd service management
- Understanding of PostgreSQL and Redis
- Node.js and npm package management
- Python virtual environments
### Required API Keys
| Provider | Purpose | Get Key |
|----------|---------|---------|
| **MiniMax** | Primary LLM | https://platform.minimaxi.com |
| **z.ai** | Failover LLM | https://platform.z.ai |
| **(Optional) Langfuse** | Observability | https://cloud.langfuse.com |
---
## System Requirements
### Minimum Requirements
| Component | Minimum | Recommended |
|-----------|---------|-------------|
| **OS** | Ubuntu 20.04 / RHEL 8 | Ubuntu 22.04 LTS / RHEL 9 |
| **CPU** | 4 cores | 8+ cores |
| **RAM** | 8 GB | 16+ GB |
| **Disk** | 20 GB SSD | 50+ GB NVMe SSD |
| **Network** | 100 Mbps | 1 Gbps |
### GPU Requirements (Optional)
| GPU Type | Requirements | Notes |
|----------|--------------|-------|
| **AMD ROCm** | RX 6000/7000 series, MI50/MI100 | ROCm 5.6+ required |
| **NVIDIA CUDA** | RTX 3000/4000 series, A100/H100 | CUDA 11.8+, cuDNN 8.6+ |
---
## Installation Overview
The bare metal installation involves the following components:
```
┌─────────────────────────────────────────────────────────────────┐
│ Heretek OpenClaw Stack │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Core Services │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────┐ │ │
│ │ │ LiteLLM │ │PostgreSQL│ │ Redis │ │ Ollama │ │ │
│ │ │ :4000 │ │ :5432 │ │ :6379 │ │ :11434 │ │ │
│ │ │ Python │ │ +pgvector│ │ Cache │ │ Local LLM │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └───────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ OpenClaw Gateway (Port 18789) │ │
│ │ All 12 agents run as workspaces within Gateway process │ │
│ │ Agent workspaces: ~/.openclaw/agents/{agent}/ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Web Interface │ │
│ │ ┌────────────────────────────────────────────────────┐ │ │
│ │ │ Web Dashboard (:3000) │ │ │
│ │ │ SvelteKit • TypeScript • TailwindCSS • WebSocket │ │ │
│ │ └────────────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
### Default Ports
| Service | Port | Protocol |
|---------|------|----------|
| LiteLLM Gateway | 4000 | HTTP |
| PostgreSQL | 5432 | TCP |
| Redis | 6379 | TCP |
| Ollama | 11434 | HTTP |
| OpenClaw Gateway | 18789 | WebSocket |
| Web Dashboard | 3000 | HTTP |
---
## Step 1: Install System Dependencies
### Ubuntu/Debian
```bash
# Update system packages
sudo apt-get update && sudo apt-get upgrade -y
# Install core dependencies
sudo apt-get install -y \
curl \
git \
wget \
gnupg \
ca-certificates \
software-properties-common \
build-essential \
libssl-dev \
libffi-dev \
python3-dev \
python3-pip \
python3-venv \
zlib1g-dev \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
libncursesw5-dev \
xz-utils \
tk-dev \
libxml2-dev \
libxmlsec1-dev \
liblzma-dev
# Install Node.js 20 LTS
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs
# Verify installations
node --version # Should be v20.x
npm --version # Should be 10.x
python3 --version # Should be 3.10+
```
### RHEL/CentOS/Rocky Linux
```bash
# Update system packages
sudo dnf update -y
# Install EPEL repository
sudo dnf install -y epel-release
# Install core dependencies
sudo dnf install -y \
curl \
git \
wget \
gnupg2 \
ca-certificates \
gcc \
gcc-c++ \
make \
openssl-devel \
libffi-devel \
python3-devel \
python3-pip \
bzip2-devel \
readline-devel \
sqlite-devel \
ncurses-devel \
xz-devel \
tk-devel \
libxml2-devel \
libxmlsec1-devel \
zlib-devel
# Install Node.js 20 LTS
curl -fsSL https://rpm.nodesource.com/setup_20.x | sudo -E bash -
sudo dnf install -y nodejs
# Verify installations
node --version # Should be v20.x
npm --version # Should be 10.x
python3 --version # Should be 3.10+
```
---
## Step 2: Install and Configure PostgreSQL
### Install PostgreSQL 15+
#### Ubuntu/Debian
```bash
# Add PostgreSQL repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
# Install PostgreSQL 15
sudo apt-get update
sudo apt-get install -y postgresql-15 postgresql-contrib-15 postgresql-15-pgvector
# Start and enable PostgreSQL
sudo systemctl start postgresql
sudo systemctl enable postgresql
```
#### RHEL/CentOS
```bash
# Add PostgreSQL repository
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
# Disable default PostgreSQL module
sudo dnf -qy module disable postgresql
# Install PostgreSQL 15
sudo dnf install -y postgresql15 postgresql15-contrib postgresql15-pgvector
# Start and enable PostgreSQL
sudo systemctl start postgresql-15
sudo systemctl enable postgresql-15
```
### Configure PostgreSQL
```bash
# Switch to postgres user
sudo -u postgres psql
```
```sql
-- Create OpenClaw database and user
CREATE DATABASE openclaw;
CREATE USER openclaw WITH PASSWORD 'generate-secure-password-here';
GRANT ALL PRIVILEGES ON DATABASE openclaw TO openclaw;
-- Enable pgvector extension
\c openclaw
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify extension
SELECT * FROM pg_extension WHERE extname = 'vector';
-- Exit psql
\q
```
### Configure PostgreSQL for Remote Access (Optional)
```bash
# Edit PostgreSQL configuration
sudo nano /etc/postgresql/15/main/postgresql.conf
```
```ini
# postgresql.conf
listen_addresses = 'localhost' # Change to '*' for remote access
max_connections = 100
shared_buffers = 256MB
work_mem = 8MB
```
```bash
# Edit pg_hba.conf for authentication
sudo nano /etc/postgresql/15/main/pg_hba.conf
```
```ini
# pg_hba.conf
# TYPE DATABASE USER ADDRESS METHOD
local all all peer
host openclaw openclaw 127.0.0.1/32 scram-sha-256
host openclaw openclaw ::1/128 scram-sha-256
```
```bash
# Restart PostgreSQL
sudo systemctl restart postgresql
```
---
## Step 3: Install and Configure Redis
### Install Redis 7+
#### Ubuntu/Debian
```bash
# Add Redis repository
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
# Install Redis
sudo apt-get update
sudo apt-get install -y redis
# Start and enable Redis
sudo systemctl start redis
sudo systemctl enable redis
```
#### RHEL/CentOS
```bash
# Install Redis from Remi repository
sudo dnf install -y dnf-utils
sudo dnf config-manager --set-enabled powertools
sudo dnf install -y https://rpms.remirepo.net/enterprise/remi-release-9.rpm
sudo dnf module reset redis -y
sudo dnf module enable redis:7 -y
sudo dnf install -y redis
# Start and enable Redis
sudo systemctl start redis
sudo systemctl enable redis
```
### Configure Redis
```bash
# Edit Redis configuration
sudo nano /etc/redis/redis.conf
```
```ini
# redis.conf
bind 127.0.0.1
port 6379
protected-mode yes
requirepass generate-secure-redis-password-here
maxmemory 256mb
maxmemory-policy allkeys-lru
appendonly yes
appendfsync everysec
```
```bash
# Restart Redis
sudo systemctl restart redis
# Verify Redis
redis-cli -a your-redis-password ping # Should return PONG
```
---
## Step 4: Install and Configure Ollama
### Install Ollama
```bash
# Install Ollama (official installer)
curl -fsSL https://ollama.ai/install.sh | sh
# Start and enable Ollama
sudo systemctl start ollama
sudo systemctl enable ollama
```
### Configure Ollama for GPU
#### AMD ROCm
```bash
# Create systemd override for ROCm
sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo nano /etc/systemd/system/ollama.service.d/rocm.conf
```
```ini
# rocm.conf
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
Environment="OLLAMA_HOST=0.0.0.0:11434"
DevicePolicy=closed
DeviceAllow=/dev/kfd rw
DeviceAllow=/dev/dri rw
```
#### NVIDIA CUDA
```bash
# Create systemd override for CUDA
sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo nano /etc/systemd/system/ollama.service.d/cuda.conf
```
```ini
# cuda.conf
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="PATH=/usr/bin:/usr/local/cuda/bin"
Environment="LD_LIBRARY_PATH=/usr/local/cuda/lib64"
DevicePolicy=closed
DeviceAllow=/dev/nvidia0 rw
DeviceAllow=/dev/nvidiactl rw
DeviceAllow=/dev/nvidia-uvm rw
```
### Pull Embedding Models
```bash
# Pull embedding model
ollama pull nomic-embed-text-v2-moe
# Verify model
ollama list
# Test Ollama
curl http://localhost:11434/api/tags
```
---
## Step 5: Install LiteLLM
### Create Python Virtual Environment
```bash
# Create LiteLLM user
sudo useradd -r -s /bin/false litellm
sudo mkdir -p /opt/litellm
sudo chown litellm:litellm /opt/litellm
# Create virtual environment
sudo -u litellm python3 -m venv /opt/litellm/venv
sudo -u litellm /opt/litellm/venv/bin/pip install --upgrade pip
```
### Install LiteLLM
```bash
# Install LiteLLM with dependencies
sudo -u litellm /opt/litellm/venv/bin/pip install \
'litellm[proxy]' \
'litellm[langfuse]' \
'litellm[postgres]' \
'litellm[redis]' \
psycopg2-binary \
redis \
langfuse
```
### Configure LiteLLM
```bash
# Create LiteLLM config directory
sudo mkdir -p /etc/litellm
sudo cp litellm_config.yaml /etc/litellm/litellm_config.yaml
sudo chown litellm:litellm /etc/litellm/litellm_config.yaml
```
---
## Step 6: Install OpenClaw Gateway
### Install OpenClaw
```bash
# Install OpenClaw Gateway
curl -fsSL https://openclaw.ai/install.sh | bash
# Verify installation
openclaw --version
# Initialize daemon
openclaw onboard --install-daemon
# Verify Gateway status
openclaw gateway status
```
### Configure OpenClaw
```bash
# Copy Gateway configuration
cp openclaw.json ~/.openclaw/openclaw.json
# Validate configuration
openclaw gateway validate
# Restart Gateway
openclaw gateway restart
```
---
## Step 7: Configure Environment Variables
### Create Environment File
```bash
# Copy environment template
cp .env.bare-metal.example .env
# Edit with your values
nano .env
```
### Required Environment Variables
See [`.env.bare-metal.example`](../../.env.bare-metal.example) for the complete template.
Key variables to configure:
```bash
# LiteLLM Gateway
LITELLM_MASTER_KEY=generate-a-secure-key-here
LITELLM_SALT_KEY=generate-another-secure-key
# Model Providers
MINIMAX_API_KEY=your_minimax_api_key
ZAI_API_KEY=your_zai_api_key
# Database
POSTGRES_USER=openclaw
POSTGRES_PASSWORD=generate-secure-db-password
POSTGRES_DB=openclaw
DATABASE_URL=postgresql://openclaw:your_password@localhost:5432/openclaw
# Redis
REDIS_URL=redis://:your-redis-password@localhost:6379/0
# Ollama
OLLAMA_HOST=http://localhost:11434
# OpenClaw Gateway
OPENCLAW_DIR=/root/.openclaw
OPENCLAW_WORKSPACE=/root/.openclaw/agents
```
---
## Step 8: Initialize Database
### Run Database Migrations
```bash
# Activate LiteLLM virtual environment
source /opt/litellm/venv/bin/activate
# Run OpenClaw database migrations
cd /root/heretek/heretek-openclaw
npm run db:migrate
# Verify database tables
psql -U openclaw -d openclaw -c "\dt"
```
### Initialize LiteLLM Database
```bash
# LiteLLM will auto-create tables on first run
# Verify tables after starting LiteLLM
psql -U openclaw -d openclaw -c "\dt litellm*"
```
---
## Step 9: Configure Systemd Services
### Install Systemd Service Files
```bash
# Copy service files
sudo cp systemd/openclaw-gateway.service /etc/systemd/system/
sudo cp systemd/litellm.service /etc/systemd/system/
sudo cp systemd/ollama.service /etc/systemd/system/
sudo cp systemd/redis.service /etc/systemd/system/
sudo cp systemd/postgresql.service /etc/systemd/system/
# Reload systemd
sudo systemctl daemon-reload
```
### Enable and Start Services
```bash
# Start services in order
sudo systemctl start postgresql
sudo systemctl start redis
sudo systemctl start ollama
sudo systemctl start litellm
sudo systemctl start openclaw-gateway
# Enable auto-start on boot
sudo systemctl enable postgresql
sudo systemctl enable redis
sudo systemctl enable ollama
sudo systemctl enable litellm
sudo systemctl enable openclaw-gateway
# Verify services
sudo systemctl status postgresql
sudo systemctl status redis
sudo systemctl status ollama
sudo systemctl status litellm
sudo systemctl status openclaw-gateway
```
---
## Step 10: Verify Installation
### Health Checks
```bash
# Check PostgreSQL
curl -f http://localhost:5432 || psql -U openclaw -d openclaw -c "SELECT version();"
# Check Redis
redis-cli -a your-redis-password ping
# Check Ollama
curl http://localhost:11434/api/tags
# Check LiteLLM
curl http://localhost:4000/health
# Check OpenClaw Gateway
openclaw gateway status
```
### Expected Output
```
Gateway: Running
Version: v2026.3.28
Workspace: /root/.openclaw
Agents: 12 configured (main + 11 collective)
Plugins: 0 loaded
Skills: 0 loaded
```
---
## Post-Deployment Configuration
### Create Agent Workspaces
```bash
# Run agent creation script for each agent
./agents/deploy-agent.sh steward orchestrator
./agents/deploy-agent.sh alpha triad
./agents/deploy-agent.sh beta triad
./agents/deploy-agent.sh charlie triad
./agents/deploy-agent.sh examiner interrogator
./agents/deploy-agent.sh explorer scout
./agents/deploy-agent.sh sentinel guardian
./agents/deploy-agent.sh coder artisan
./agents/deploy-agent.sh dreamer visionary
./agents/deploy-agent.sh empath diplomat
./agents/deploy-agent.sh historian archivist
# Verify workspaces created
ls -la ~/.openclaw/agents/
```
### Install Plugins & Skills
```bash
# Install consciousness plugin
cd plugins/openclaw-consciousness-plugin
npm install
npm link
openclaw plugins install @heretek-ai/openclaw-consciousness-plugin
# Install liberation plugin
cd ../openclaw-liberation-plugin
npm install
npm link
openclaw plugins install @heretek-ai/openclaw-liberation-plugin
# Install skills
cd ../../skills/triad-consensus
openclaw skills install ./SKILL.md
```
### Configure LiteLLM
```bash
# Copy LiteLLM configuration
sudo cp /root/heretek/heretek-openclaw/litellm_config.yaml /etc/litellm/litellm_config.yaml
# Restart LiteLLM
sudo systemctl restart litellm
# Verify endpoints
curl http://localhost:4000/v1/models
```
---
## Security Hardening
### Firewall Configuration
#### UFW (Ubuntu)
```bash
# Enable UFW
sudo ufw enable
# Allow SSH
sudo ufw allow ssh
# Allow only localhost for internal services
sudo ufw allow from 127.0.0.1 to any port 5432 # PostgreSQL
sudo ufw allow from 127.0.0.1 to any port 6379 # Redis
sudo ufw allow from 127.0.0.1 to any port 11434 # Ollama
# Allow public access to LiteLLM and OpenClaw
sudo ufw allow 4000/tcp # LiteLLM
sudo ufw allow 18789/tcp # OpenClaw Gateway
# Check status
sudo ufw status verbose
```
#### firewalld (RHEL)
```bash
# Enable firewalld
sudo systemctl start firewalld
sudo systemctl enable firewalld
# Allow SSH
sudo firewall-cmd --permanent --add-service=ssh
# Allow only localhost for internal services
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="127.0.0.1" port port="5432" protocol="tcp" accept'
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="127.0.0.1" port port="6379" protocol="tcp" accept'
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="127.0.0.1" port port="11434" protocol="tcp" accept'
# Allow public access
sudo firewall-cmd --permanent --add-port=4000/tcp
sudo firewall-cmd --permanent --add-port=18789/tcp
# Reload firewall
sudo firewall-cmd --reload
```
### SSL/TLS Configuration
For production deployments, configure SSL/TLS for LiteLLM and OpenClaw Gateway using nginx or Apache as a reverse proxy.
### API Key Management
```bash
# Generate secure keys
openssl rand -hex 32 # For LITELLM_MASTER_KEY
openssl rand -hex 32 # For LITELLM_SALT_KEY
# Store keys securely
sudo mkdir -p /etc/openclaw/secrets
sudo chmod 700 /etc/openclaw/secrets
```
---
## Troubleshooting
See [`NON_DOCKER_TROUBLESHOOTING.md`](./NON_DOCKER_TROUBLESHOOTING.md) for detailed troubleshooting guide.
### Common Issues
| Issue | Solution |
|-------|----------|
| PostgreSQL won't start | Check logs: `journalctl -u postgresql -f` |
| Redis connection refused | Verify password in redis.conf |
| Ollama GPU not detected | Check ROCm/CUDA installation |
| LiteLLM health check fails | Verify DATABASE_URL and REDIS_URL |
| OpenClaw Gateway not running | Check workspace permissions |
---
## Backup Configuration
```bash
# Backup OpenClaw configuration
tar -czf openclaw-backup-$(date +%Y%m%d).tar.gz \
~/.openclaw/openclaw.json \
~/.openclaw/agents/ \
/etc/litellm/litellm_config.yaml \
/etc/openclaw/.env
# Backup PostgreSQL
pg_dump -U openclaw openclaw > openclaw-db-$(date +%Y%m%d).sql
# Backup is stored in current directory
ls -la openclaw-backup-*.tar.gz openclaw-db-*.sql
```
---
## Next Steps
After successful deployment:
1. **Access LiteLLM Dashboard** - http://localhost:4000/ui
2. **Test Agent Communication** - Send messages via Gateway WebSocket RPC
3. **Configure User Profiles** - Set up user rolodex
4. **Enable Autonomous Operations** - Activate dreamer agent
5. **Review Documentation** - See [`docs/`](../../docs/) for advanced configuration
---
## Support
For issues or questions:
- Check [`NON_DOCKER_TROUBLESHOOTING.md`](./NON_DOCKER_TROUBLESHOOTING.md)
- Review [`CHANGELOG.md`](../../CHANGELOG.md)
- Open an issue on GitHub: https://github.com/Heretek-AI/heretek-openclaw/issues
---
🦞 *The thought that never ends.*
+493
View File
@@ -0,0 +1,493 @@
# Cloud-Native Deployment Guide for Heretek OpenClaw
**Version:** 1.0.0
**Last Updated:** 2026-03-31
**OpenClaw Version:** v2026.3.28
This guide provides comprehensive instructions for deploying Heretek OpenClaw on major cloud platforms using Infrastructure as Code (IaC) and Kubernetes.
---
## Table of Contents
1. [Overview](#overview)
2. [Architecture](#architecture)
3. [Supported Cloud Providers](#supported-cloud-providers)
4. [Prerequisites](#prerequisites)
5. [Quick Start](#quick-start)
6. [Deployment Options](#deployment-options)
7. [Configuration Reference](#configuration-reference)
8. [Security](#security)
9. [Monitoring](#monitoring)
10. [Backup & Disaster Recovery](#backup--disaster-recovery)
11. [Cost Optimization](#cost-optimization)
12. [Troubleshooting](#troubleshooting)
---
## Overview
Heretek OpenClaw supports cloud-native deployments across all major cloud providers:
- **AWS** - EKS, RDS PostgreSQL, ElastiCache, ECR, ALB
- **GCP** - GKE, Cloud SQL, Memorystore, Artifact Registry, Cloud Load Balancing
- **Azure** - AKS, Azure Database for PostgreSQL, Azure Cache for Redis, ACR, Application Gateway
### Key Features
| Feature | Description |
|---------|-------------|
| **Infrastructure as Code** | Terraform configurations for all cloud providers |
| **Kubernetes Native** | Kustomize overlays for dev, staging, prod |
| **High Availability** | Multi-AZ deployments with auto-scaling |
| **GPU Support** | Optional GPU nodes for Ollama (G5, A2, NCas) |
| **Managed Services** | Managed databases, caches, and container registries |
| **Observability** | Integrated monitoring, logging, and alerting |
| **Security** | Private networking, encryption, IAM roles |
---
## Architecture
### High-Level Architecture
```
┌─────────────────────────────────────────────┐
│ Cloud Provider │
│ (AWS / GCP / Azure) │
└─────────────────────────────────────────────┘
┌─────────────────────────────────┼─────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
│ Kubernetes │ │ Managed │ │ Managed │
│ Cluster │ │ Database │ │ Cache │
│ (EKS/GKE/AKS) │ │ (RDS/Cloud SQL/ │ │ (ElastiCache/ │
│ │ │ Azure PG) │ │ Memorystore/ │
│ ┌────────────────┐ │ │ │ │ Azure Redis) │
│ │ OpenClaw │ │ │ │ │ │
│ │ Gateway │ │ │ │ │ │
│ └────────────────┘ │ │ │ │ │
│ ┌────────────────┐ │ │ │ │ │
│ │ LiteLLM Proxy │ │ │ │ │ │
│ └────────────────┘ │ │ │ │ │
│ ┌────────────────┐ │ │ │ │ │
│ │ Ollama (GPU) │ │ │ │ │ │
│ └────────────────┘ │ │ │ │ │
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
│ │ │
└─────────────────────────────────┼─────────────────────────────────┘
┌─────────────────────────────────┼─────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
│ Container │ │ Load Balancer │ │ Monitoring │
│ Registry │ │ (ALB/CLB/App GW) │ │ (CloudWatch/ │
│ (ECR/AR/ACR) │ │ │ │ Monitoring/ │
│ │ │ │ │ Azure Monitor) │
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
```
### Network Architecture
```
┌─────────────────────────────────────────────────────────────────────────────────┐
│ VPC / VNet / VPC │
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
│ │ Public Subnet(s) │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ NAT GW │ │ NAT GW │ │ NAT GW │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ Load Balancer │ │ │
│ │ │ (Public-facing, SSL Termination) │ │ │
│ │ └─────────────────────────────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
│ │ Private Subnet(s) │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ K8s Nodes │ │ K8s Nodes │ │ K8s Nodes │ │ │
│ │ │ (General) │ │ (Compute) │ │ (GPU) │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
│ │ Database Subnet(s) │ │
│ │ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ RDS/PG │ │ RDS/PG │ │ │
│ │ │ (Primary) │ │ (Standby) │ │ │
│ │ └─────────────┘ └─────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
│ │ Cache Subnet(s) │ │
│ │ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Redis │ │ Redis │ │ │
│ │ │ (Primary) │ │ (Replica) │ │ │
│ │ └─────────────┘ └─────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────┘
```
---
## Supported Cloud Providers
### AWS
| Service | Resource | Purpose |
|---------|----------|---------|
| EKS | eks_cluster | Kubernetes cluster |
| RDS | rds_postgresql | PostgreSQL database |
| ElastiCache | elasticache_redis | Redis cache |
| ECR | ecr_repository | Container registry |
| ALB | application_lb | Load balancer |
| CloudWatch | cloudwatch_dashboard | Monitoring |
**Documentation:** [`deploy/aws/README.md`](../../deploy/aws/README.md)
### GCP
| Service | Resource | Purpose |
|---------|----------|---------|
| GKE | gke_cluster | Kubernetes cluster |
| Cloud SQL | cloud_sql_postgresql | PostgreSQL database |
| Memorystore | memorystore_redis | Redis cache |
| Artifact Registry | artifact_registry | Container registry |
| Cloud LB | cloud_load_balancer | Load balancer |
| Cloud Monitoring | monitoring_dashboard | Monitoring |
**Documentation:** [`deploy/gcp/README.md`](../../deploy/gcp/README.md)
### Azure
| Service | Resource | Purpose |
|---------|----------|---------|
| AKS | aks_cluster | Kubernetes cluster |
| Azure DB for PostgreSQL | postgresql_flexible_server | PostgreSQL database |
| Azure Cache for Redis | redis_cache | Redis cache |
| ACR | container_registry | Container registry |
| Application Gateway | application_gateway | Load balancer |
| Azure Monitor | azure_monitor | Monitoring |
**Documentation:** [`deploy/azure/README.md`](../../deploy/azure/README.md)
---
## Prerequisites
### Required Tools
```bash
# Terraform
brew install terraform
# kubectl
brew install kubectl
# Helm
brew install helm
# Cloud provider CLIs
brew install awscli # AWS
brew install --cask google-cloud-sdk # GCP
brew install azure-cli # Azure
```
### Cloud Account Requirements
| Provider | Requirements |
|----------|-------------|
| AWS | IAM user with admin access, budget alerts configured |
| GCP | Project with billing enabled, required APIs enabled |
| Azure | Subscription with contributor access, resource providers registered |
### Kubernetes Requirements
- Kubernetes 1.26+ cluster
- Storage class for persistent volumes
- Ingress controller (nginx recommended)
- Metrics server for HPA
---
## Quick Start
### AWS Quick Start
```bash
cd deploy/aws/terraform
# Initialize Terraform
terraform init
# Create variables file
cat > terraform.tfvars <<EOF
aws_region = "us-east-1"
environment = "dev"
db_password = "secure-password-here"
redis_auth_token = "secure-token-here"
EOF
# Deploy
terraform plan -out=tfplan
terraform apply tfplan
# Configure kubectl
aws eks update-kubeconfig --name openclaw-dev-eks --region us-east-1
# Deploy OpenClaw
cd ../../kubernetes
kubectl apply -k overlays/dev
```
### GCP Quick Start
```bash
cd deploy/gcp/terraform
# Initialize Terraform
terraform init
# Create variables file
cat > terraform.tfvars <<EOF
project_id = "your-project-id"
region = "us-central1"
environment = "dev"
db_password = "secure-password-here"
EOF
# Deploy
terraform plan -out=tfplan
terraform apply tfplan
# Configure kubectl
gcloud container clusters get-credentials openclaw-dev-gke --region us-central1
# Deploy OpenClaw
cd ../../kubernetes
kubectl apply -k overlays/dev
```
### Azure Quick Start
```bash
cd deploy/azure/terraform
# Initialize Terraform
terraform init
# Create variables file
cat > terraform.tfvars <<EOF
resource_group_name = "openclaw-rg"
location = "eastus"
environment = "dev"
db_administrator_password = "secure-password-here"
EOF
# Deploy
terraform plan -out=tfplan
terraform apply tfplan
# Configure kubectl
az aks get-credentials --resource-group openclaw-rg --name openclaw-dev-aks
# Deploy OpenClaw
cd ../../kubernetes
kubectl apply -k overlays/dev
```
---
## Deployment Options
### Environment Overlays
| Environment | Replicas | Resources | Use Case |
|-------------|----------|-----------|----------|
| Dev | 1 | Minimal | Development, testing |
| Staging | 2 | Medium | Pre-production validation |
| Production | 3+ | Full | Production workloads |
### GPU Support
Enable GPU nodes for Ollama local LLM inference:
```hcl
# terraform.tfvars
enable_gpu_support = true
# AWS
gpu_instance_types = ["g5.xlarge", "g5.2xlarge"]
# GCP
gpu_node_pool = {
machine_type = "g2-standard-4"
accelerator_type = "nvidia-l4"
accelerator_count = 1
}
# Azure
gpu_node_pool = {
vm_size = "Standard_NC4as_T4_v3"
}
```
### High Availability
Production deployments include:
- Multi-AZ database (RDS/Cloud SQL/Azure PG)
- Multi-AZ cache (ElastiCache/Memorystore/Azure Redis)
- Multiple node pools across availability zones
- Pod disruption budgets
- Horizontal pod autoscaling
---
## Configuration Reference
### Input Variables
See individual provider documentation for complete variable lists:
- [AWS Variables](../../deploy/aws/terraform/variables.tf)
- [GCP Variables](../../deploy/gcp/terraform/variables.tf)
- [Azure Variables](../../deploy/azure/terraform/variables.tf)
### Kubernetes Configuration
Base manifests: [`deploy/kubernetes/base/`](../../deploy/kubernetes/base/)
Overlays:
- [`deploy/kubernetes/overlays/dev/`](../../deploy/kubernetes/overlays/dev/)
- [`deploy/kubernetes/overlays/staging/`](../../deploy/kubernetes/overlays/staging/)
- [`deploy/kubernetes/overlays/prod/`](../../deploy/kubernetes/overlays/prod/)
### Secrets Management
**Never commit secrets to version control.** Use:
1. **Cloud Secret Managers**
- AWS Secrets Manager
- GCP Secret Manager
- Azure Key Vault
2. **Kubernetes Secrets** (encrypted at rest)
3. **External Secrets Operator** for sync from cloud secret managers
---
## Security
### Network Security
- Private subnets for application workloads
- Security groups / firewall rules for least privilege
- VPC Flow Logs for network monitoring
- Private endpoints for managed services
### Data Security
- Encryption at rest (database, cache, storage)
- Encryption in transit (TLS 1.2+)
- Secrets encryption with KMS
- Network policies for pod isolation
### Access Control
- IAM roles for service accounts (IRSA/Workload Identity)
- RBAC for Kubernetes access
- Network policies for pod communication
- Pod security policies/standards
---
## Monitoring
### Cloud-Native Monitoring
Each deployment includes:
- Cloud provider dashboards (CloudWatch/Cloud Monitoring/Azure Monitor)
- Pre-configured alerts for CPU, memory, storage
- Log aggregation and retention
- Cost monitoring and budget alerts
### Kubernetes Monitoring
- Prometheus metrics via ServiceMonitor
- Grafana dashboards
- Alertmanager for notifications
- Distributed tracing (optional)
---
## Backup & Disaster Recovery
### Automated Backups
| Resource | Strategy | Retention |
|----------|----------|-----------|
| Database | Automated snapshots | 7-35 days |
| Cache | Persistence + manual snapshots | Manual |
| Container Registry | Lifecycle policies | 30 days |
| Terraform State | Versioned storage | Unlimited |
### Disaster Recovery
1. **Multi-AZ** - Automatic failover within region
2. **Cross-Region** - Manual failover to secondary region
3. **Backup Restoration** - Documented procedures for each service
---
## Cost Optimization
### Development Environments
- Single NAT Gateway
- Burstable database instances
- Basic cache tier
- Spot/preemptible instances for non-critical workloads
### Production Optimizations
- Reserved instances / committed use discounts
- Savings plans for predictable workloads
- Cluster autoscaler for dynamic scaling
- Right-sizing based on actual usage
### Cost Estimates
See individual provider READMEs for detailed cost breakdowns:
- [AWS Cost Estimates](../../deploy/aws/README.md#cost-estimates)
- [GCP Cost Estimates](../../deploy/gcp/README.md#cost-estimates)
- [Azure Cost Estimates](../../deploy/azure/README.md#cost-estimates)
---
## Troubleshooting
### Common Issues
| Issue | Solution |
|-------|----------|
| Pods not scheduling | Check node pool capacity, taints, tolerations |
| Database connection failures | Verify security groups, private endpoints |
| Load balancer not routing | Check target group health, ingress configuration |
| GPU not detected | Verify device plugin, node labels, tolerations |
### Support Resources
- [AWS Deployment Guide](../../deploy/aws/README.md)
- [GCP Deployment Guide](../../deploy/gcp/README.md)
- [Azure Deployment Guide](../../deploy/azure/README.md)
- [Kubernetes Deployment Guide](../../docs/deployment/KUBERNETES_DEPLOYMENT.md)
---
🦞 *The thought that never ends.*
@@ -0,0 +1,890 @@
# Docker to Bare Metal Migration Guide
**Version:** 1.0.0
**Last Updated:** 2026-03-31
**OpenClaw Version:** v2026.3.28
This guide provides step-by-step instructions for migrating from a Docker-based OpenClaw deployment to a bare metal or VM installation.
---
## Table of Contents
1. [Overview](#overview)
2. [Pre-Migration Checklist](#pre-migration-checklist)
3. [Migration Planning](#migration-planning)
4. [Step 1: Backup Docker Deployment](#step-1-backup-docker-deployment)
5. [Step 2: Prepare Target System](#step-2-prepare-target-system)
6. [Step 3: Export Docker Data](#step-3-export-docker-data)
7. [Step 4: Install Bare Metal Dependencies](#step-4-install-bare-metal-dependencies)
8. [Step 5: Migrate PostgreSQL Data](#step-5-migrate-postgresql-data)
9. [Step 6: Migrate Redis Data](#step-6-migrate-redis-data)
10. [Step 7: Migrate Ollama Models](#step-7-migrate-ollama-models)
11. [Step 8: Configure LiteLLM](#step-8-configure-litellm)
12. [Step 9: Migrate OpenClaw Configuration](#step-9-migrate-openclaw-configuration)
13. [Step 10: Start and Verify Services](#step-10-start-and-verify-services)
14. [Rollback Procedures](#rollback-procedures)
15. [Post-Migration Tasks](#post-migration-tasks)
---
## Overview
### Why Migrate?
| Reason | Docker | Bare Metal |
|--------|--------|------------|
| **Performance** | Container overhead | Native performance |
| **GPU Access** | Passthrough complexity | Direct access |
| **Debugging** | Limited visibility | Full system access |
| **Compliance** | Container restrictions | Full control |
| **Cost** | Docker Enterprise licensing | No licensing costs |
### Migration Architecture Comparison
```
┌─────────────────────────────────────────────────────────────────┐
│ Docker Deployment │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Docker Engine │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ LiteLLM │ │PostgreSQL│ │ Redis │ │ Ollama │ │ │
│ │ │ Container│ │ Container│ │ Container│ │ Container│ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Bare Metal Deployment │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ LiteLLM │ │PostgreSQL│ │ Redis │ │ Ollama │ │
│ │ System │ │ System │ │ System │ │ System │ │
│ │ Service │ │ Service │ │ Service │ │ Service │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
### Port Mapping
| Service | Docker Port | Bare Metal Port | Notes |
|---------|-------------|-----------------|-------|
| LiteLLM | 4000 | 4000 | Same |
| PostgreSQL | 5432 (internal) | 5432 (localhost) | Bind to localhost |
| Redis | 6379 (internal) | 6379 (localhost) | Bind to localhost |
| Ollama | 11434 (internal) | 11434 (localhost) | Bind to localhost |
| OpenClaw Gateway | 18789 | 18789 | Same |
---
## Pre-Migration Checklist
### Current State Assessment
```bash
# Verify Docker deployment is healthy
docker compose ps
# Check all services are running
docker compose ps | grep -E "Up|healthy"
# Document current configuration
docker compose config > docker-compose-config-backup.yaml
# List Docker volumes
docker volume ls
# Check disk usage
docker system df
```
### Required Information
| Item | Location | Example |
|------|----------|---------|
| Docker Compose file | `docker-compose.yml` | Current directory |
| Environment file | `.env` | Current directory |
| PostgreSQL password | `.env` or secrets | `POSTGRES_PASSWORD` |
| Redis password | `.env` or secrets | `REDIS_URL` |
| LiteLLM keys | `.env` | `LITELLM_MASTER_KEY` |
| Provider API keys | `.env` | `MINIMAX_API_KEY` |
| OpenClaw config | `~/.openclaw/openclaw.json` | Home directory |
| Agent workspaces | `~/.openclaw/agents/` | Home directory |
### Tools Required
```bash
# Install migration tools
sudo apt-get install -y \
postgresql-client \
redis-tools \
jq \
yq
# Or for RHEL
sudo dnf install -y \
postgresql \
redis-tools \
jq \
yq
```
---
## Migration Planning
### Downtime Estimation
| Phase | Estimated Time | Can Run While Docker Running? |
|-------|----------------|-------------------------------|
| Backup | 5-10 minutes | Yes |
| Target preparation | 30-60 minutes | Yes |
| Data export | 10-30 minutes | No (read-only recommended) |
| Data import | 10-30 minutes | No |
| Configuration | 15-30 minutes | No |
| Verification | 10-15 minutes | No |
| **Total** | **80-155 minutes** | **Partial** |
### Migration Window Planning
```bash
# Calculate migration window
# Recommended: Schedule during low-usage period
# Minimum: 2 hours downtime
# Recommended: 4 hours for first migration
# Notify stakeholders
# Example notification template:
cat << 'EOF'
Subject: Scheduled Maintenance - OpenClaw Migration
Dear Team,
We will be performing a planned migration of the OpenClaw system
from Docker to bare metal deployment.
Maintenance Window:
- Start: [DATE] at [TIME]
- Expected Duration: 2-4 hours
- Impact: OpenClaw services will be unavailable
Rollback Plan:
If issues occur, we will revert to the Docker deployment
within 30 minutes.
Contact: [YOUR_CONTACT]
EOF
```
---
## Step 1: Backup Docker Deployment
### Full System Backup
```bash
# Create backup directory
BACKUP_DIR="/tmp/openclaw-migration-$(date +%Y%m%d-%H%M%S)"
mkdir -p $BACKUP_DIR
# Backup Docker Compose configuration
cp docker-compose.yml $BACKUP_DIR/
cp .env $BACKUP_DIR/
cp .env.example $BACKUP_DIR/
cp litellm_config.yaml $BACKUP_DIR/
cp openclaw.json $BACKUP_DIR/
# Backup OpenClaw data
tar -czf $BACKUP_DIR/openclaw-data.tar.gz ~/.openclaw/
# Backup Docker volumes
docker run --rm \
-v heretek-openclaw_postgres_data:/source:ro \
-v $BACKUP_DIR:/backup \
alpine tar -czf /backup/postgres-data.tar.gz -C /source .
docker run --rm \
-v heretek-openclaw_redis_data:/source:ro \
-v $BACKUP_DIR:/backup \
alpine tar -czf /backup/redis-data.tar.gz -C /source .
docker run --rm \
-v heretek-openclaw_ollama_data:/source:ro \
-v $BACKUP_DIR:/backup \
alpine tar -czf /backup/ollama-data.tar.gz -C /source .
# Verify backups
ls -lah $BACKUP_DIR/
echo "Backup completed: $BACKUP_DIR"
```
### Database Backup
```bash
# Export PostgreSQL database
docker compose exec -T postgres pg_dump -U openclaw openclaw > $BACKUP_DIR/openclaw-database.sql
# Verify SQL dump
wc -l $BACKUP_DIR/openclaw-database.sql
head -20 $BACKUP_DIR/openclaw-database.sql
```
### Redis Backup
```bash
# Trigger Redis BGSAVE
docker compose exec redis redis-cli BGSAVE
# Wait for save to complete
sleep 5
# Export Redis data
docker cp heretek-redis:/data/dump.rdb $BACKUP_DIR/dump.rdb
# Verify RDB file
ls -lah $BACKUP_DIR/dump.rdb
```
---
## Step 2: Prepare Target System
### System Requirements Check
```bash
# Check OS version
cat /etc/os-release
# Check available disk space
df -h /
# Check available memory
free -h
# Check CPU cores
nproc
# Check GPU (if applicable)
lspci | grep -i vga
```
### Install Prerequisites
```bash
# For Ubuntu/Debian
curl -fsSL https://raw.githubusercontent.com/Heretek-AI/heretek-openclaw/main/scripts/install/ubuntu-deps.sh -o ubuntu-deps.sh
chmod +x ubuntu-deps.sh
sudo ./ubuntu-deps.sh
# For RHEL/CentOS
curl -fsSL https://raw.githubusercontent.com/Heretek-AI/heretek-openclaw/main/scripts/install/rhel-deps.sh -o rhel-deps.sh
chmod +x rhel-deps.sh
sudo ./rhel-deps.sh
```
### Create Required Users and Directories
```bash
# Create litellm user
sudo useradd -r -s /bin/false litellm
sudo mkdir -p /opt/litellm
sudo chown litellm:litellm /opt/litellm
# Create OpenClaw directories
sudo mkdir -p /etc/litellm
sudo mkdir -p /etc/openclaw
sudo mkdir -p /var/log/openclaw
# Set permissions
sudo chmod 755 /etc/litellm
sudo chmod 755 /etc/openclaw
sudo chmod 755 /var/log/openclaw
```
---
## Step 3: Export Docker Data
### Export PostgreSQL Data
```bash
# Export full database with schema
docker compose exec -T postgres pg_dumpall -U openclaw > $BACKUP_DIR/full-export.sql
# Export specific database
docker compose exec -T postgres pg_dump -U openclaw -Fc openclaw > $BACKUP_DIR/openclaw.custom
# Export schema only (for reference)
docker compose exec -T postgres pg_dump -U openclaw --schema-only openclaw > $BACKUP_DIR/schema.sql
# Export data only
docker compose exec -T postgres pg_dump -U openclaw --data-only openclaw > $BACKUP_DIR/data.sql
# Verify exports
ls -lah $BACKUP_DIR/*.sql $BACKUP_DIR/*.custom
```
### Export Redis Data
```bash
# Export Redis data in different formats
docker compose exec redis redis-cli --rdb /data/dump.rdb
docker cp heretek-redis:/data/dump.rdb $BACKUP_DIR/
# Export as RDB
docker compose exec redis redis-cli SAVE
docker cp heretek-redis:/data/dump.rdb $BACKUP_DIR/redis-dump.rdb
# Export specific keys (optional)
docker compose exec redis redis-cli KEYS '*' > $BACKUP_DIR/redis-keys.txt
```
### Export Ollama Models
```bash
# List Ollama models
docker compose exec ollama ollama list
# Export model files
docker run --rm \
-v heretek-openclaw_ollama_data:/ollama:ro \
-v $BACKUP_DIR:/backup \
alpine tar -czf /backup/ollama-models.tar.gz -C /ollama .
# Alternative: Pull models on target system
# (Recommended for large models)
docker compose exec ollama ollama list --format json > $BACKUP_DIR/ollama-models.json
```
---
## Step 4: Install Bare Metal Dependencies
### Install PostgreSQL
```bash
# Ubuntu/Debian
sudo apt-get install -y postgresql-15 postgresql-contrib-15 postgresql-15-pgvector
# RHEL/CentOS
sudo dnf install -y postgresql15 postgresql15-contrib postgresql15-pgvector
# Start PostgreSQL
sudo systemctl start postgresql
sudo systemctl enable postgresql
```
### Install Redis
```bash
# Ubuntu/Debian
sudo apt-get install -y redis
# RHEL/CentOS
sudo dnf install -y redis
# Start Redis
sudo systemctl start redis
sudo systemctl enable redis
```
### Install Ollama
```bash
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Configure Ollama (see BARE_METAL_DEPLOYMENT.md for GPU setup)
sudo systemctl start ollama
sudo systemctl enable ollama
```
### Install LiteLLM
```bash
# Create virtual environment
sudo -u litellm python3 -m venv /opt/litellm/venv
# Install LiteLLM
sudo -u litellm /opt/litellm/venv/bin/pip install \
'litellm[proxy]' \
'litellm[langfuse]' \
'litellm[postgres]' \
'litellm[redis]' \
psycopg2-binary \
redis \
langfuse
```
### Install OpenClaw Gateway
```bash
# Install OpenClaw
curl -fsSL https://openclaw.ai/install.sh | bash
# Verify installation
openclaw --version
```
---
## Step 5: Migrate PostgreSQL Data
### Create Database and User
```bash
# Connect to PostgreSQL
sudo -u postgres psql
```
```sql
-- Create database and user
CREATE DATABASE openclaw;
CREATE USER openclaw WITH PASSWORD 'your-secure-password';
GRANT ALL PRIVILEGES ON DATABASE openclaw TO openclaw;
-- Enable pgvector extension
\c openclaw
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify
\dx
\q
```
### Import Data
```bash
# Import SQL dump
psql -U openclaw -d openclaw -f $BACKUP_DIR/openclaw-database.sql
# Or import custom format
pg_restore -U openclaw -d openclaw $BACKUP_DIR/openclaw.custom
# Or import full export
psql -U openclaw -d openclaw -f $BACKUP_DIR/full-export.sql
# Verify import
psql -U openclaw -d openclaw -c "SELECT COUNT(*) FROM pg_tables WHERE schemaname = 'public';"
psql -U openclaw -d openclaw -c "\dt"
```
### Update Connection Strings
```bash
# The DATABASE_URL needs to change from Docker to localhost
# Docker: postgresql://openclaw:password@postgres:5432/openclaw
# Bare Metal: postgresql://openclaw:password@localhost:5432/openclaw
# Update environment file
sed -i 's/@postgres:5432/@localhost:5432/g' $BACKUP_DIR/.env
```
---
## Step 6: Migrate Redis Data
### Stop Redis Service
```bash
# Stop Redis temporarily
sudo systemctl stop redis
```
### Import RDB File
```bash
# Copy RDB file to Redis data directory
sudo cp $BACKUP_DIR/dump.rdb /var/lib/redis/dump.rdb
# Set correct ownership
sudo chown redis:redis /var/lib/redis/dump.rdb
sudo chmod 640 /var/lib/redis/dump.rdb
```
### Start Redis Service
```bash
# Start Redis
sudo systemctl start redis
# Verify data loaded
redis-cli -a your-redis-password KEYS '*' | head -20
```
### Update Redis URL
```bash
# Update REDIS_URL in environment file
# Docker: redis://redis:6379/0
# Bare Metal: redis://:password@localhost:6379/0
sed -i 's|redis://redis:6379|redis://:your-redis-password@localhost:6379|g' $BACKUP_DIR/.env
```
---
## Step 7: Migrate Ollama Models
### Option 1: Restore from Backup
```bash
# Stop Ollama
sudo systemctl stop ollama
# Restore model data
sudo tar -xzf $BACKUP_DIR/ollama-models.tar.gz -C /var/lib/ollama/
# Set permissions
sudo chown -R ollama:ollama /var/lib/ollama
# Start Ollama
sudo systemctl start ollama
# Verify models
ollama list
```
### Option 2: Re-pull Models (Recommended)
```bash
# Get list of models from backup
cat $BACKUP_DIR/ollama-models.json | jq -r '.[].name' > $BACKUP_DIR/model-list.txt
# Pull each model
while read model; do
echo "Pulling $model..."
ollama pull $model
done < $BACKUP_DIR/model-list.txt
# Verify models
ollama list
```
---
## Step 8: Configure LiteLLM
### Copy Configuration
```bash
# Copy LiteLLM configuration
sudo cp $BACKUP_DIR/litellm_config.yaml /etc/litellm/litellm_config.yaml
sudo chown litellm:litellm /etc/litellm/litellm_config.yaml
```
### Update Configuration for Bare Metal
```bash
# Update database connection in litellm_config.yaml
# Change postgres host from 'postgres' to 'localhost'
# Update Redis connection
# Change redis host from 'redis' to 'localhost'
# Or use environment variables (recommended)
# The systemd service will set these
```
### Create Environment File
```bash
# Copy environment template
cp $BACKUP_DIR/.env /etc/openclaw/.env
# Update for bare metal
sed -i 's/@postgres:5432/@localhost:5432/g' /etc/openclaw/.env
sed -i 's|redis://redis:6379|redis://localhost:6379|g' /etc/openclaw/.env
sed -i 's|OLLAMA_HOST=http://ollama:11434|OLLAMA_HOST=http://localhost:11434|g' /etc/openclaw/.env
# Set permissions
sudo chmod 600 /etc/openclaw/.env
sudo chown root:root /etc/openclaw/.env
```
---
## Step 9: Migrate OpenClaw Configuration
### Restore OpenClaw Data
```bash
# Extract OpenClaw data
tar -xzf $BACKUP_DIR/openclaw-data.tar.gz -C ~/
# Verify extraction
ls -la ~/.openclaw/
ls -la ~/.openclaw/agents/
```
### Validate Configuration
```bash
# Validate openclaw.json
openclaw gateway validate
# Check agent workspaces
for agent in steward alpha beta charlie examiner explorer sentinel coder dreamer empath historian; do
echo "=== $agent ==="
ls -la ~/.openclaw/agents/$agent/
done
```
### Update Configuration Paths
```bash
# If paths need to be updated, edit openclaw.json
nano ~/.openclaw/openclaw.json
# Common path changes:
# - Database URLs
# - File paths
# - API endpoints
```
---
## Step 10: Start and Verify Services
### Start Services in Order
```bash
# Start PostgreSQL
sudo systemctl start postgresql
sudo systemctl status postgresql
# Start Redis
sudo systemctl start redis
sudo systemctl status redis
# Start Ollama
sudo systemctl start ollama
sudo systemctl status ollama
# Start LiteLLM
sudo systemctl start litellm
sudo systemctl status litellm
# Start OpenClaw Gateway
sudo systemctl start openclaw-gateway
sudo systemctl status openclaw-gateway
```
### Verify Services
```bash
# Check PostgreSQL
psql -U openclaw -d openclaw -c "SELECT version();"
# Check Redis
redis-cli -a your-redis-password ping
# Check Ollama
curl http://localhost:11434/api/tags
# Check LiteLLM
curl http://localhost:4000/health
# Check OpenClaw Gateway
openclaw gateway status
```
### Run Health Checks
```bash
# Run comprehensive health check
cd /root/heretek/heretek-openclaw
./scripts/health-check.sh
# Or individual checks
curl http://localhost:4000/v1/models
openclaw agent status steward
```
---
## Rollback Procedures
### Quick Rollback to Docker
If the bare metal deployment fails, you can quickly rollback to Docker:
```bash
# Stop bare metal services
sudo systemctl stop openclaw-gateway
sudo systemctl stop litellm
sudo systemctl stop ollama
# Return to project directory
cd /root/heretek/heretek-openclaw
# Start Docker deployment
docker compose up -d
# Verify Docker services
docker compose ps
```
### Rollback Decision Tree
```
┌─────────────────────────────────────────────────────────────┐
│ Rollback Decision Tree │
├─────────────────────────────────────────────────────────────┤
│ Issue Type │ Action │
├───────────────────────────────┼─────────────────────────────┤
│ PostgreSQL migration failed │ Restore from SQL dump │
│ Redis data corrupted │ Restore RDB file │
│ Ollama models missing │ Re-pull models │
│ LiteLLM won't start │ Check logs, restore config │
│ OpenClaw agents not loading │ Validate openclaw.json │
│ Critical failure │ Full Docker rollback │
└────────────────────────────────────────────────────────────┘
```
### Rollback Script
```bash
#!/bin/bash
# rollback-to-docker.sh
echo "Starting rollback to Docker deployment..."
# Stop bare metal services
sudo systemctl stop openclaw-gateway litellm ollama redis postgresql
# Start Docker
cd /root/heretek/heretek-openclaw
docker compose up -d
# Wait for services
sleep 30
# Verify
docker compose ps
echo "Rollback complete. Verify services with: docker compose ps"
```
---
## Post-Migration Tasks
### Update Documentation
```bash
# Document the migration
cat << EOF >> /var/log/openclaw/migration-log.txt
Migration Date: $(date)
From: Docker Deployment
To: Bare Metal Deployment
Duration: [TIME]
Issues: [LIST ANY ISSUES]
Resolution: [LIST RESOLUTIONS]
Verified By: [NAME]
EOF
```
### Configure Monitoring
```bash
# Enable systemd service monitoring
sudo systemctl enable --now openclaw-gateway
sudo systemctl enable --now litellm
# Configure log rotation
sudo nano /etc/logrotate.d/openclaw
```
```
/var/log/openclaw/*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 0640 root root
}
```
### Update Backup Procedures
```bash
# Update backup scripts to use system paths
# See BARE_METAL_DEPLOYMENT.md for backup configuration
# Test backup restoration
# Restore from new backup to verify process
```
### Performance Validation
```bash
# Compare performance metrics
# Docker vs Bare Metal
# Response time
time curl -s http://localhost:4000/health
# Database query time
psql -U openclaw -d openclaw -c "\timing" -c "SELECT COUNT(*) FROM pg_tables;"
# Redis latency
redis-cli -a your-redis-password --latency
```
### Security Validation
```bash
# Verify firewall rules
sudo ufw status # Ubuntu
sudo firewall-cmd --list-all # RHEL
# Verify service isolation
netstat -tlnp | grep -E '5432|6379|11434|4000|18789'
# Verify SSL/TLS (if configured)
openssl s_client -connect localhost:4000 -servername localhost
```
---
## Troubleshooting
### Common Migration Issues
| Issue | Cause | Solution |
|-------|-------|----------|
| PostgreSQL connection refused | Wrong host in connection string | Change `postgres` to `localhost` |
| Redis authentication failed | Password not set in bare metal | Add password to redis.conf |
| Ollama models not found | Models not migrated | Re-pull models or restore backup |
| LiteLLM health check fails | Database/Redis connection | Verify environment variables |
| OpenClaw agents missing | Workspace paths incorrect | Check ~/.openclaw/agents/ |
### Migration Logs
```bash
# Check service logs
journalctl -u postgresql -f
journalctl -u redis -f
journalctl -u ollama -f
journalctl -u litellm -f
journalctl -u openclaw-gateway -f
# Check migration log
cat /var/log/openclaw/migration-log.txt
```
---
## Support
For issues or questions:
- Check [`BARE_METAL_DEPLOYMENT.md`](./BARE_METAL_DEPLOYMENT.md)
- Check [`NON_DOCKER_TROUBLESHOOTING.md`](./NON_DOCKER_TROUBLESHOOTING.md)
- Open an issue on GitHub: https://github.com/Heretek-AI/heretek-openclaw/issues
---
🦞 *The thought that never ends.*
+41
View File
@@ -0,0 +1,41 @@
# GCP Deployment Guide for Heretek OpenClaw
**Version:** 1.0.0
**Last Updated:** 2026-03-31
For complete GCP deployment instructions, see [`deploy/gcp/README.md`](../../deploy/gcp/README.md).
## Quick Reference
### Terraform Files
| File | Purpose |
|------|---------|
| [`deploy/gcp/terraform/main.tf`](../../deploy/gcp/terraform/main.tf) | Main configuration |
| [`deploy/gcp/terraform/variables.tf`](../../deploy/gcp/terraform/variables.tf) | Input variables |
| [`deploy/gcp/terraform/outputs.tf`](../../deploy/gcp/terraform/outputs.tf) | Output values |
| [`deploy/gcp/terraform/vpc.tf`](../../deploy/gcp/terraform/vpc.tf) | VPC configuration |
| [`deploy/gcp/terraform/gke.tf`](../../deploy/gcp/terraform/gke.tf) | GKE cluster |
| [`deploy/gcp/terraform/cloud-sql.tf`](../../deploy/gcp/terraform/cloud-sql.tf) | Cloud SQL PostgreSQL |
| [`deploy/gcp/terraform/memorystore.tf`](../../deploy/gcp/terraform/memorystore.tf) | Memorystore Redis |
| [`deploy/gcp/terraform/artifact-registry.tf`](../../deploy/gcp/terraform/artifact-registry.tf) | Artifact Registry |
| [`deploy/gcp/terraform/load-balancer.tf`](../../deploy/gcp/terraform/load-balancer.tf) | Cloud Load Balancing |
### Deploy Commands
```bash
cd deploy/gcp/terraform
terraform init
terraform plan -var-file=terraform.dev.tfvars -out=tfplan
terraform apply tfplan
```
### kubectl Configuration
```bash
gcloud container clusters get-credentials openclaw-dev-gke --region us-central1
```
---
🦞 *The thought that never ends.*
+271
View File
@@ -0,0 +1,271 @@
# Kubernetes Deployment Guide for Heretek OpenClaw
**Version:** 1.0.0
**Last Updated:** 2026-03-31
**OpenClaw Version:** v2026.3.28
This guide provides instructions for deploying Heretek OpenClaw to Kubernetes clusters using Kustomize.
---
## Table of Contents
1. [Overview](#overview)
2. [Prerequisites](#prerequisites)
3. [Directory Structure](#directory-structure)
4. [Base Configuration](#base-configuration)
5. [Environment Overlays](#environment-overlays)
6. [Deployment](#deployment)
7. [Post-Deployment](#post-deployment)
8. [Troubleshooting](#troubleshooting)
---
## Overview
The Kubernetes deployment uses Kustomize for environment-specific configurations:
- **Base manifests** - Common resources for all environments
- **Overlays** - Environment-specific customizations (dev, staging, prod)
### Components
| Component | Resource Type | Purpose |
|-----------|--------------|---------|
| OpenClaw Gateway | Deployment + Service | Main application gateway |
| LiteLLM Proxy | Deployment + Service | LLM routing and proxy |
| PostgreSQL | StatefulSet + Service | Primary database with pgvector |
| Redis | StatefulSet + Service | Cache and session management |
---
## Prerequisites
### Required Tools
```bash
# kubectl
kubectl version --client
# Kustomize (included in kubectl 1.14+)
kubectl version --client --short
```
### Kubernetes Requirements
- Kubernetes 1.26+ cluster
- Storage class for persistent volumes
- Ingress controller (nginx recommended)
- Metrics server for HPA
---
## Directory Structure
```
deploy/kubernetes/
├── base/
│ ├── namespace.yaml
│ ├── openclaw-deployment.yaml
│ ├── openclaw-service.yaml
│ ├── litellm-deployment.yaml
│ ├── litellm-service.yaml
│ ├── postgresql-statefulset.yaml
│ └── redis-statefulset.yaml
└── overlays/
├── dev/
│ └── kustomization.yaml
├── staging/
│ └── kustomization.yaml
└── prod/
└── kustomization.yaml
```
---
## Base Configuration
### Namespace
All resources are deployed to the `openclaw` namespace by default.
### OpenClaw Gateway
- **Replicas:** 1 (base)
- **Port:** 18789 (HTTP), 18790 (WebSocket)
- **Resources:** 2-4 CPU, 4-8Gi memory
- **Storage:** 10Gi persistent volume
### LiteLLM Proxy
- **Replicas:** 1 (base)
- **Port:** 4000
- **Resources:** 1-2 CPU, 2-4Gi memory
- **Config:** ConfigMap for model configuration
### PostgreSQL
- **Replicas:** 1
- **Port:** 5432
- **Image:** pgvector/pgvector:pg17
- **Storage:** 50Gi persistent volume
- **Extensions:** pgvector enabled
### Redis
- **Replicas:** 1
- **Port:** 6379
- **Image:** redis:7-alpine
- **Storage:** 10Gi persistent volume
- **Persistence:** AOF enabled
---
## Environment Overlays
### Development
```bash
kubectl apply -k deploy/kubernetes/overlays/dev
```
**Characteristics:**
- Namespace: `openclaw-dev`
- Minimal resources
- Debug logging enabled
- Development secrets
### Staging
```bash
kubectl apply -k deploy/kubernetes/overlays/staging
```
**Characteristics:**
- Namespace: `openclaw-staging`
- 2 replicas for HA
- Production-like configuration
- Staging secrets
### Production
```bash
kubectl apply -k deploy/kubernetes/overlays/prod
```
**Characteristics:**
- Namespace: `openclaw-prod`
- 3+ replicas for HA
- Pod disruption budgets
- Resource limits enforced
- Production secrets (from secret manager)
---
## Deployment
### Step 1: Create Secrets
```bash
kubectl create namespace openclaw-dev
kubectl create secret generic openclaw-secrets \
--namespace openclaw-dev \
--from-literal=database-url="postgresql://user:pass@host:5432/db" \
--from-literal=redis-url="redis://:password@host:6379/0" \
--from-literal=minimax-api-key="your-key" \
--from-literal=zai-api-key="your-key"
```
### Step 2: Deploy
```bash
# Development
kubectl apply -k deploy/kubernetes/overlays/dev
# Staging
kubectl apply -k deploy/kubernetes/overlays/staging
# Production
kubectl apply -k deploy/kubernetes/overlays/prod
```
### Step 3: Verify
```bash
# Check pods
kubectl get pods -n openclaw-dev
# Check services
kubectl get svc -n openclaw-dev
# Check logs
kubectl logs -n openclaw-dev -l app.kubernetes.io/name=openclaw-gateway
```
---
## Post-Deployment
### Access Gateway
```bash
# Port forward for local access
kubectl port-forward -n openclaw-dev svc/dev-openclaw-gateway 18789:18789
# Or access via ingress
curl http://openclaw.local/health
```
### Access LiteLLM
```bash
# Port forward
kubectl port-forward -n openclaw-dev svc/dev-litellm 4000:4000
# Test endpoint
curl http://localhost:4000/health
```
### Scale Components
```bash
# Scale Gateway
kubectl scale deployment dev-openclaw-gateway --replicas=3 -n openclaw-dev
# Scale LiteLLM
kubectl scale deployment dev-litellm --replicas=2 -n openclaw-dev
```
---
## Troubleshooting
### Common Issues
| Issue | Solution |
|-------|----------|
| Pods pending | Check storage class, node capacity |
| CrashLoopBackOff | Check logs, secrets configuration |
| Service not accessible | Check ingress, network policies |
| Database connection failed | Verify secrets, network connectivity |
### Debug Commands
```bash
# Describe pod for events
kubectl describe pod <pod-name> -n openclaw-dev
# Check logs
kubectl logs <pod-name> -n openclaw-dev
# Exec into pod
kubectl exec -it <pod-name> -n openclaw-dev -- /bin/sh
# Check resource usage
kubectl top pods -n openclaw-dev
```
---
🦞 *The thought that never ends.*
File diff suppressed because it is too large Load Diff
+750
View File
@@ -0,0 +1,750 @@
# VM Deployment Guide
**Version:** 1.0.0
**Last Updated:** 2026-03-31
**OpenClaw Version:** v2026.3.28
This guide provides instructions for deploying the Heretek OpenClaw stack on virtual machines (VMs) across different platforms and operating systems.
---
## Table of Contents
1. [Overview](#overview)
2. [Ubuntu/Debian VM Deployment](#ubuntudebian-vm-deployment)
3. [RHEL/CentOS VM Deployment](#rhelcentos-vm-deployment)
4. [Cloud VM Considerations](#cloud-vm-considerations)
5. [Network Configuration](#network-configuration)
6. [Security Hardening](#security-hardening)
7. [Resource Optimization](#resource-optimization)
8. [Backup and Recovery](#backup-and-recovery)
---
## Overview
### Supported VM Platforms
| Platform | Supported OS | Notes |
|----------|--------------|-------|
| **AWS EC2** | Ubuntu 22.04, RHEL 9 | Use Graviton (ARM) or x86_64 |
| **GCP Compute** | Ubuntu 22.04, Rocky Linux 9 | N1, N2, or C2 machine types |
| **Azure VM** | Ubuntu 22.04, RHEL 9 | D-series or E-series |
| **DigitalOcean** | Ubuntu 22.04 | Droplets with 4+ GB RAM |
| **Linode** | Ubuntu 22.04, AlmaLinux 9 | Linode 4GB+ plans |
| **Proxmox** | Any supported OS | LXC or full VM |
| **VMware** | Any supported OS | ESXi 7.0+ |
### VM Sizing Recommendations
| Workload | vCPU | RAM | Storage | GPU |
|----------|------|-----|---------|-----|
| **Development** | 2-4 | 8 GB | 50 GB SSD | Optional |
| **Production (Small)** | 4-8 | 16 GB | 100 GB SSD | Optional |
| **Production (Medium)** | 8-16 | 32 GB | 200 GB SSD | Recommended |
| **Production (Large)** | 16-32 | 64 GB | 500 GB NVMe | Required |
---
## Ubuntu/Debian VM Deployment
### Prerequisites
- Ubuntu 22.04 LTS VM instance
- SSH access with sudo privileges
- Outbound internet access
- Minimum 4 vCPU, 8 GB RAM
### Quick Start Script
```bash
# Download and run the VM installer
curl -fsSL https://raw.githubusercontent.com/Heretek-AI/heretek-openclaw/main/scripts/install/vm-install.sh -o vm-install.sh
chmod +x vm-install.sh
sudo ./vm-install.sh --os ubuntu --gpu none
```
### Manual Installation
#### Step 1: System Update
```bash
# Update system packages
sudo apt-get update && sudo apt-get upgrade -y
# Install essential tools
sudo apt-get install -y \
curl \
git \
wget \
gnupg \
ca-certificates \
software-properties-common
```
#### Step 2: Install Dependencies
```bash
# Run Ubuntu dependencies script
curl -fsSL https://raw.githubusercontent.com/Heretek-AI/heretek-openclaw/main/scripts/install/ubuntu-deps.sh -o ubuntu-deps.sh
chmod +x ubuntu-deps.sh
sudo ./ubuntu-deps.sh
```
#### Step 3: Clone Repository
```bash
# Clone OpenClaw repository
git clone https://github.com/Heretek-AI/heretek-openclaw.git
cd heretek-openclaw
# Verify repository structure
ls -la
```
#### Step 4: Configure Environment
```bash
# Copy environment template
cp .env.vm.example .env
# Edit with your values
nano .env
```
#### Step 5: Run Post-Installation
```bash
# Run post-installation script
sudo ./scripts/install/post-install.sh
# Verify installation
./scripts/health-check.sh
```
---
## RHEL/CentOS VM Deployment
### Prerequisites
- RHEL 9 or Rocky Linux 9 VM instance
- SSH access with sudo privileges
- Outbound internet access
- Minimum 4 vCPU, 8 GB RAM
### Quick Start Script
```bash
# Download and run the VM installer
curl -fsSL https://raw.githubusercontent.com/Heretek-AI/heretek-openclaw/main/scripts/install/vm-install.sh -o vm-install.sh
chmod +x vm-install.sh
sudo ./vm-install.sh --os rhel --gpu none
```
### Manual Installation
#### Step 1: System Update
```bash
# Update system packages
sudo dnf update -y
# Install essential tools
sudo dnf install -y \
curl \
git \
wget \
gnupg2 \
ca-certificates \
epel-release
```
#### Step 2: Install Dependencies
```bash
# Run RHEL dependencies script
curl -fsSL https://raw.githubusercontent.com/Heretek-AI/heretek-openclaw/main/scripts/install/rhel-deps.sh -o rhel-deps.sh
chmod +x rhel-deps.sh
sudo ./rhel-deps.sh
```
#### Step 3: Clone Repository
```bash
# Clone OpenClaw repository
git clone https://github.com/Heretek-AI/heretek-openclaw.git
cd heretek-openclaw
# Verify repository structure
ls -la
```
#### Step 4: Configure Environment
```bash
# Copy environment template
cp .env.vm.example .env
# Edit with your values
nano .env
```
#### Step 5: Run Post-Installation
```bash
# Run post-installation script
sudo ./scripts/install/post-install.sh
# Verify installation
./scripts/health-check.sh
```
---
## Cloud VM Considerations
### AWS EC2
#### Instance Types
| Use Case | Instance Type | vCPU | RAM | Notes |
|----------|---------------|------|-----|-------|
| Development | t3.medium | 2 | 4 GB | Burstable |
| Production Small | m5.large | 2 | 8 GB | General purpose |
| Production Medium | m5.xlarge | 4 | 16 GB | General purpose |
| Production Large | m5.2xlarge | 8 | 32 GB | General purpose |
| GPU Workload | g5.xlarge | 4 | 16 GB | NVIDIA A10G |
#### Security Group Rules
```bash
# Required inbound rules
Type: SSH, Port: 22, Source: Your IP
Type: Custom TCP, Port: 4000, Source: Your IP (LiteLLM)
Type: Custom TCP, Port: 18789, Source: Your IP (OpenClaw)
Type: Custom TCP, Port: 3000, Source: Your IP (Dashboard - optional)
```
#### IAM Role (Optional)
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::your-backup-bucket/*"
}
]
}
```
#### User Data Script
```bash
#!/bin/bash
# EC2 User Data for automatic installation
yum update -y
yum install -y git curl wget
git clone https://github.com/Heretek-AI/heretek-openclaw.git
cd heretek-openclaw
./scripts/install/rhel-deps.sh
./scripts/install/post-install.sh
```
### GCP Compute Engine
#### Machine Types
| Use Case | Machine Type | vCPU | RAM | Notes |
|----------|--------------|------|-----|-------|
| Development | e2-medium | 2 | 4 GB | Balanced |
| Production Small | n2-standard-2 | 2 | 8 GB | General purpose |
| Production Medium | n2-standard-4 | 4 | 16 GB | General purpose |
| Production Large | n2-standard-8 | 8 | 32 GB | General purpose |
| GPU Workload | g2-standard-4 | 4 | 24 GB | NVIDIA L4 |
#### Firewall Rules
```bash
# Create firewall rule
gcloud compute firewall-rules create openclaw-allow \
--allow tcp:22,tcp:4000,tcp:18789,tcp:3000 \
--source-ranges YOUR_IP/32 \
--target-tags openclaw-instance
```
#### Service Account
```bash
# Create service account
gcloud iam service-accounts create openclaw-sa \
--display-name "OpenClaw Service Account"
# Grant storage access
gcloud projects add-iam-policy-binding PROJECT_ID \
--member "serviceAccount:openclaw-sa@PROJECT_ID.iam.gserviceaccount.com" \
--role "roles/storage.objectAdmin"
```
### Azure VM
#### VM Sizes
| Use Case | VM Size | vCPU | RAM | Notes |
|----------|---------|------|-----|-------|
| Development | Standard_B2s | 2 | 4 GB | Burstable |
| Production Small | Standard_D2s_v3 | 2 | 8 GB | General purpose |
| Production Medium | Standard_D4s_v3 | 4 | 16 GB | General purpose |
| Production Large | Standard_D8s_v3 | 8 | 32 GB | General purpose |
| GPU Workload | Standard_NC4as_T4_v3 | 4 | 28 GB | NVIDIA T4 |
#### Network Security Group
```bash
# Create NSG rule
az network nsg rule create \
--resource-group openclaw-rg \
--nsg-name openclaw-nsg \
--name AllowOpenClaw \
--priority 1000 \
--source-address-prefixes YOUR_IP/32 \
--destination-port-ranges 22 4000 18789 3000 \
--access Allow \
--protocol Tcp
```
#### Managed Identity
```bash
# Create managed identity
az identity create \
--resource-group openclaw-rg \
--name openclaw-identity
# Grant storage access
az role assignment create \
--assignee OBJECT_ID \
--role "Storage Blob Data Contributor" \
--scope /subscriptions/SUBSCRIPTION_ID/resourceGroups/openclaw-rg
```
---
## Network Configuration
### Static IP Configuration
#### Ubuntu/Debian (netplan)
```yaml
# /etc/netplan/01-netcfg.yaml
network:
version: 2
ethernets:
eth0:
addresses:
- 192.168.1.100/24
routes:
- to: default
via: 192.168.1.1
nameservers:
addresses:
- 1.1.1.1
- 8.8.8.8
```
#### RHEL/CentOS (NetworkManager)
```bash
# Configure static IP
nmcli connection modify eth0 \
ipv4.addresses 192.168.1.100/24 \
ipv4.gateway 192.168.1.1 \
ipv4.dns "1.1.1.1 8.8.8.8" \
ipv4.method manual
nmcli connection up eth0
```
### DNS Configuration
```bash
# Configure DNS resolver
sudo nano /etc/systemd/resolved.conf
```
```ini
[Resolve]
DNS=1.1.1.1 8.8.8.8
FallbackDNS=9.9.9.9
DNSSEC=allow-downgrade
```
```bash
# Restart systemd-resolved
sudo systemctl restart systemd-resolved
```
### Hostname Configuration
```bash
# Set hostname
sudo hostnamectl set-hostname openclaw-server
# Update /etc/hosts
sudo nano /etc/hosts
```
```
127.0.0.1 localhost localhost.localdomain
192.168.1.100 openclaw-server openclaw
```
---
## Security Hardening
### SSH Hardening
```bash
# Edit SSH configuration
sudo nano /etc/ssh/sshd_config
```
```ini
# SSH Hardening
Port 2222 # Change from default
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AuthenticationMethods publickey
MaxAuthTries 3
ClientAliveInterval 300
ClientAliveCountMax 2
X11Forwarding no
AllowTcpForwarding no
```
```bash
# Restart SSH
sudo systemctl restart sshd
```
### Fail2Ban Configuration
```bash
# Install Fail2Ban
sudo apt-get install -y fail2ban # Ubuntu
sudo dnf install -y fail2ban # RHEL
# Configure Fail2Ban
sudo nano /etc/fail2ban/jail.local
```
```ini
[DEFAULT]
bantime = 3600
findtime = 600
maxretry = 5
[sshd]
enabled = true
port = 2222
filter = sshd
logpath = /var/log/auth.log
maxretry = 3
[openclaw]
enabled = true
port = 4000,18789
filter = openclaw
logpath = /var/log/openclaw/*.log
maxretry = 10
```
```bash
# Create OpenClaw filter
sudo nano /etc/fail2ban/filter.d/openclaw.conf
```
```ini
[Definition]
failregex = ^.*Failed authentication.*$
^.*Invalid API key.*$
^.*Rate limit exceeded.*$
ignoreregex =
```
```bash
# Start Fail2Ban
sudo systemctl enable fail2ban
sudo systemctl start fail2ban
```
### SELinux Configuration (RHEL)
```bash
# Check SELinux status
getenforce
# Set to permissive for testing
sudo setenforce 0
# Create SELinux policy for OpenClaw
sudo nano /etc/selinux/targeted/src/policy/local.te
```
```
module openclaw 1.0;
require {
type http_port_t;
type postgresql_port_t;
class tcp_socket name_connect;
}
# Allow OpenClaw to bind to ports
allow http_port_t self:tcp_socket name_connect;
allow postgresql_port_t self:tcp_socket name_connect;
```
```bash
# Compile and install policy
cd /etc/selinux/targeted/src/policy
make -f /usr/share/selinux/devel/Makefile
sudo semodule -i openclaw.pp
# Re-enable SELinux
sudo setenforce 1
```
### Audit Logging
```bash
# Install auditd
sudo apt-get install -y auditd # Ubuntu
sudo dnf install -y audit # RHEL
# Configure audit rules
sudo auditctl -w /etc/openclaw -p wa -k openclaw-config
sudo auditctl -w /root/.openclaw -p wa -k openclaw-data
sudo auditctl -w /etc/litellm -p wa -k litellm-config
```
---
## Resource Optimization
### Memory Optimization
```bash
# Configure swap (if needed)
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
# Make swap permanent
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
# Verify swap
free -h
```
### CPU Optimization
```bash
# Set CPU governor to performance
sudo apt-get install -y linux-tools-common linux-tools-generic
sudo cpupower frequency-set -g performance
# Verify CPU governor
cpupower frequency-info
```
### Disk I/O Optimization
```bash
# Check current I/O scheduler
cat /sys/block/sda/queue/scheduler
# Set to deadline for better performance
echo deadline | sudo tee /sys/block/sda/queue/scheduler
# Make permanent
sudo nano /etc/default/grub
```
```
GRUB_CMDLINE_LINUX="elevator=deadline"
```
```bash
# Update GRUB
sudo update-grub # Ubuntu
sudo grub2-mkconfig -o /boot/grub2/grub.cfg # RHEL
```
---
## Backup and Recovery
### Automated Backup Script
```bash
#!/bin/bash
# /usr/local/bin/openclaw-backup.sh
BACKUP_DIR="/backup/openclaw"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=7
# Create backup directory
mkdir -p $BACKUP_DIR
# Backup OpenClaw configuration
tar -czf $BACKUP_DIR/openclaw-config-$DATE.tar.gz \
~/.openclaw/ \
/etc/litellm/ \
/etc/openclaw/
# Backup PostgreSQL
pg_dump -U openclaw openclaw > $BACKUP_DIR/openclaw-db-$DATE.sql
# Backup Redis
redis-cli -a $REDIS_PASSWORD BGSAVE
cp /var/lib/redis/dump.rdb $BACKUP_DIR/redis-dump-$DATE.rdb
# Compress database backup
gzip $BACKUP_DIR/openclaw-db-$DATE.sql
# Remove old backups
find $BACKUP_DIR -name "*.tar.gz" -mtime +$RETENTION_DAYS -delete
find $BACKUP_DIR -name "*.sql.gz" -mtime +$RETENTION_DAYS -delete
find $BACKUP_DIR -name "*.rdb" -mtime +$RETENTION_DAYS -delete
# Log backup
echo "Backup completed: $DATE" >> /var/log/openclaw-backup.log
```
### Systemd Backup Timer
```ini
# /etc/systemd/system/openclaw-backup.timer
[Unit]
Description=Daily OpenClaw Backup
Documentation=file:///root/heretek/heretek-openclaw/docs/operations/AUTOMATED_BACKUP.md
[Timer]
OnCalendar=daily
Persistent=true
[Install]
WantedBy=timers.target
```
```ini
# /etc/systemd/system/openclaw-backup.service
[Unit]
Description=OpenClaw Backup Service
After=postgresql.service redis.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/openclaw-backup.sh
User=root
Group=root
```
```bash
# Enable backup timer
sudo systemctl daemon-reload
sudo systemctl enable openclaw-backup.timer
sudo systemctl start openclaw-backup.timer
```
---
## Troubleshooting
### Common VM Issues
| Issue | Solution |
|-------|----------|
| VM won't boot after installation | Check cloud-init logs: `/var/log/cloud-init.log` |
| Network connectivity issues | Verify security group/firewall rules |
| Performance degradation | Check resource allocation, enable swap |
| SSH connection refused | Verify SSH port and security group |
| Disk space warnings | Extend volume or clean up old backups |
### Cloud-Specific Commands
#### AWS EC2
```bash
# Check instance status
aws ec2 describe-instance-status --instance-ids i-1234567890abcdef0
# Get system log
aws ec2 get-console-output --instance-id i-1234567890abcdef0
# Reboot instance
aws ec2 reboot-instances --instance-ids i-1234567890abcdef0
```
#### GCP Compute
```bash
# Check instance status
gcloud compute instances describe INSTANCE_NAME
# Get serial port output
gcloud compute instances get-serial-port-output INSTANCE_NAME
# Reset instance
gcloud compute instances reset INSTANCE_NAME
```
#### Azure VM
```bash
# Check VM status
az vm show -d -g openclaw-rg -n openclaw-vm
# Get boot diagnostics
az vm boot-diagnostics get-boot-log -g openclaw-rg -n openclaw-vm
# Restart VM
az vm restart -g openclaw-rg -n openclaw-vm
```
---
## Next Steps
After successful VM deployment:
1. **Configure Monitoring** - Set up cloud monitoring and alerts
2. **Enable Auto-Scaling** (if applicable) - Configure scaling policies
3. **Set Up Backup** - Configure automated backups to cloud storage
4. **Configure DNS** - Set up domain name and SSL certificates
5. **Test Failover** - Verify backup and recovery procedures
---
## Support
For issues or questions:
- Check [`NON_DOCKER_TROUBLESHOOTING.md`](./NON_DOCKER_TROUBLESHOOTING.md)
- Review [`BARE_METAL_DEPLOYMENT.md`](./BARE_METAL_DEPLOYMENT.md)
- Open an issue on GitHub: https://github.com/Heretek-AI/heretek-openclaw/issues
---
🦞 *The thought that never ends.*