mirror of
https://github.com/Heretek-AI/heretek-openclaw-deploy.git
synced 2026-07-01 18:25:50 -04:00
P6: Complete 8 initiatives - Agent files, deployment options, CLI, dashboards, plugins
P6-7: Agent File Completion (34 files - 11 agents × 3 files + guides) - Added BOOTSTRAP.md, IDENTITY.md, TOOLS.md for all 11 agents - Created AGENT_CREATION_GUIDE.md P6-2: Per-Agent Model Configuration (9 files) - Agent model router and config library - YAML configs for arbiter, coder agents - Configuration documentation P6-3: Health Check Dashboard (20+ files) - Complete frontend React application - API endpoints, WebSocket server - Collectors for agents, resources, services - Alert management and configuration P6-4: LiteLLM Observability Integration (10 files) - LiteLLM metrics collector and API - Frontend components for model/budget tracking - Integration documentation P6-1: Non-Docker Deployment (16 files) - Bare metal and VM deployment docs - Systemd service files - Installation scripts for Ubuntu/RHEL - Migration guide and troubleshooting P6-6: Cloud-Native Deployments (45+ files) - AWS, Azure, GCP Terraform configurations - Kubernetes base deployments with Kustomize overlays - Cloud deployment documentation P6-5: Unified Deployment CLI (28 files) - Complete CLI with 12 commands - Deployers for Docker, Kubernetes, cloud, baremetal - Health checker, backup manager, config manager P6-8: Plugin Installation Guide (15 files) - Plugin development and installation guides - Plugin CLI documentation and registry - Templates for basic, skill, and tool plugins
This commit is contained in:
@@ -0,0 +1,669 @@
|
||||
# AWS Deployment Guide for Heretek OpenClaw
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Last Updated:** 2026-03-31
|
||||
**OpenClaw Version:** v2026.3.28
|
||||
|
||||
This guide provides comprehensive instructions for deploying Heretek OpenClaw on Amazon Web Services (AWS) using Terraform Infrastructure as Code (IaC).
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Prerequisites](#prerequisites)
|
||||
3. [Architecture](#architecture)
|
||||
4. [Cost Estimates](#cost-estimates)
|
||||
5. [Quick Start](#quick-start)
|
||||
6. [Configuration](#configuration)
|
||||
7. [Deployment Steps](#deployment-steps)
|
||||
8. [Post-Deployment](#post-deployment)
|
||||
9. [GPU Support](#gpu-support)
|
||||
10. [Monitoring](#monitoring)
|
||||
11. [Backup & Recovery](#backup--recovery)
|
||||
12. [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This Terraform configuration deploys a production-ready OpenClaw environment on AWS with:
|
||||
|
||||
- **EKS (Elastic Kubernetes Service)** - Managed Kubernetes cluster
|
||||
- **RDS PostgreSQL** - Managed PostgreSQL with pgvector support
|
||||
- **ElastiCache Redis** - Managed Redis for caching and sessions
|
||||
- **ECR (Elastic Container Registry)** - Private container registry
|
||||
- **ALB (Application Load Balancer)** - Traffic routing and SSL termination
|
||||
- **CloudWatch** - Monitoring, logging, and alerting
|
||||
|
||||
### Components
|
||||
|
||||
| Component | Service | Purpose |
|
||||
|-----------|---------|---------|
|
||||
| Gateway | EKS | OpenClaw Gateway (port 18789) |
|
||||
| LiteLLM | EKS | LLM proxy and routing (port 4000) |
|
||||
| Database | RDS PostgreSQL 15 | Primary data store with pgvector |
|
||||
| Cache | ElastiCache Redis 7 | Session management, caching |
|
||||
| Container Registry | ECR | Private image storage |
|
||||
| Load Balancer | ALB | HTTPS termination, routing |
|
||||
| Monitoring | CloudWatch | Metrics, logs, alarms |
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Required Tools
|
||||
|
||||
```bash
|
||||
# Install Terraform
|
||||
brew install terraform # macOS
|
||||
# or download from https://www.terraform.io/downloads
|
||||
|
||||
# Install AWS CLI
|
||||
brew install awscli # macOS
|
||||
# or follow https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
|
||||
|
||||
# Install kubectl
|
||||
brew install kubectl
|
||||
|
||||
# Install Helm
|
||||
brew install helm
|
||||
```
|
||||
|
||||
### AWS Account Setup
|
||||
|
||||
1. **AWS Account** - Active AWS account with administrative access
|
||||
2. **IAM User** - User with programmatic access credentials
|
||||
3. **Budget Alert** - Set up billing alerts in AWS Budgets
|
||||
|
||||
### Configure AWS Credentials
|
||||
|
||||
```bash
|
||||
# Configure AWS CLI
|
||||
aws configure
|
||||
|
||||
# Or use environment variables
|
||||
export AWS_ACCESS_KEY_ID="your-access-key"
|
||||
export AWS_SECRET_ACCESS_KEY="your-secret-key"
|
||||
export AWS_DEFAULT_REGION="us-east-1"
|
||||
```
|
||||
|
||||
### Required AWS Permissions
|
||||
|
||||
| Service | Required Permissions |
|
||||
|---------|---------------------|
|
||||
| EKS | Full access |
|
||||
| EC2 | Full access |
|
||||
| RDS | Full access |
|
||||
| ElastiCache | Full access |
|
||||
| ECR | Full access |
|
||||
| ELB | Full access |
|
||||
| IAM | Create roles and policies |
|
||||
| CloudWatch | Full access |
|
||||
| S3 | Create buckets |
|
||||
| KMS | Create and manage keys |
|
||||
| Route53 | DNS management (optional) |
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ AWS Region │
|
||||
│ us-east-1 │
|
||||
└─────────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────────┼─────────────────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
|
||||
│ Public Subnet 1 │ │ Public Subnet 2 │ │ Public Subnet 3 │
|
||||
│ (us-east-1a) │ │ (us-east-1b) │ │ (us-east-1c) │
|
||||
│ │ │ │ │ │
|
||||
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
|
||||
│ │ NAT Gateway │ │ │ │ NAT Gateway │ │ │ │ NAT Gateway │ │
|
||||
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
|
||||
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
|
||||
│ │ │
|
||||
└─────────────────────────────────┼─────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────────┼─────────────────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
|
||||
│ Private Subnet 1 │ │ Private Subnet 2 │ │ Private Subnet 3 │
|
||||
│ (us-east-1a) │ │ (us-east-1b) │ │ (us-east-1c) │
|
||||
│ │ │ │ │ │
|
||||
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
|
||||
│ │ EKS Nodes │ │ │ │ EKS Nodes │ │ │ │ EKS Nodes │ │
|
||||
│ │ (General) │ │ │ │ (Compute) │ │ │ │ (GPU) │ │
|
||||
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
|
||||
│ │ │ │ │ │
|
||||
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
|
||||
│ │ RDS Primary │ │ │ │ ElastiCache │ │ │ │ ECR Repo │ │
|
||||
│ │ PostgreSQL │ │ │ │ Redis │ │ │ │ Images │ │
|
||||
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
|
||||
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
|
||||
│ │ │
|
||||
└─────────────────────────────────┼─────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────────┼─────────────────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
|
||||
│ Database Subnet 1 │ │ Database Subnet 2 │ │ Database Subnet 3 │
|
||||
│ (us-east-1a) │ │ (us-east-1b) │ │ (us-east-1c) │
|
||||
│ │ │ │ │ │
|
||||
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ │
|
||||
│ │ RDS Standby │ │ │ │ ElastiCache │ │ │ │
|
||||
│ │ (Multi-AZ) │ │ │ │ Replica │ │ │ │
|
||||
│ └────────────────┘ │ │ └────────────────┘ │ │ │
|
||||
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cost Estimates
|
||||
|
||||
### Development Environment
|
||||
|
||||
| Resource | Configuration | Monthly Cost (USD) |
|
||||
|----------|--------------|-------------------|
|
||||
| EKS Cluster | Control Plane | $73.00 |
|
||||
| EKS Nodes | 2x m6i.xlarge | $280.00 |
|
||||
| RDS PostgreSQL | db.m6i.large, 50GB | $125.00 |
|
||||
| ElastiCache Redis | cache.m6i.large | $75.00 |
|
||||
| ALB | Standard | $16.00 |
|
||||
| NAT Gateway | 1x | $32.00 |
|
||||
| ECR Storage | 10GB | $2.50 |
|
||||
| CloudWatch Logs | 10GB | $3.00 |
|
||||
| Data Transfer | Estimated | $50.00 |
|
||||
| **Total** | | **~$656.50/month** |
|
||||
|
||||
### Production Environment
|
||||
|
||||
| Resource | Configuration | Monthly Cost (USD) |
|
||||
|----------|--------------|-------------------|
|
||||
| EKS Cluster | Control Plane | $73.00 |
|
||||
| EKS Nodes General | 3x m6i.2xlarge | $840.00 |
|
||||
| EKS Nodes Compute | 4x c6i.4xlarge | $2,000.00 |
|
||||
| EKS Nodes GPU | 2x g5.2xlarge | $4,000.00 |
|
||||
| RDS PostgreSQL | db.m6i.xlarge, Multi-AZ, 200GB | $500.00 |
|
||||
| ElastiCache Redis | cache.m6i.xlarge, Multi-AZ | $300.00 |
|
||||
| ALB | Standard | $16.00 |
|
||||
| NAT Gateway | 3x | $96.00 |
|
||||
| ECR Storage | 50GB | $12.50 |
|
||||
| CloudWatch Logs | 50GB | $15.00 |
|
||||
| Data Transfer | Estimated | $200.00 |
|
||||
| **Total** | | **~$8,052.50/month** |
|
||||
|
||||
> **Note:** GPU costs are significant. Consider using spot instances or on-demand scaling for cost optimization.
|
||||
|
||||
### Cost Optimization Tips
|
||||
|
||||
1. **Use Spot Instances** for non-critical workloads (up to 70% savings)
|
||||
2. **Enable Cluster Autoscaler** to scale nodes based on demand
|
||||
3. **Use Savings Plans** for predictable workloads
|
||||
4. **Right-size instances** based on actual usage
|
||||
5. **Enable RDS Reserved Instances** for production databases
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Clone Repository
|
||||
|
||||
```bash
|
||||
git clone https://github.com/Heretek-AI/heretek-openclaw.git
|
||||
cd heretek-openclaw/deploy/aws/terraform
|
||||
```
|
||||
|
||||
### Initialize Terraform
|
||||
|
||||
```bash
|
||||
terraform init
|
||||
```
|
||||
|
||||
### Create Terraform Variables File
|
||||
|
||||
```bash
|
||||
cat > terraform.tfvars <<EOF
|
||||
aws_region = "us-east-1"
|
||||
environment = "dev"
|
||||
owner = "your-team"
|
||||
vpc_cidr = "10.0.0.0/16"
|
||||
|
||||
db_password = "generate-secure-password"
|
||||
redis_auth_token = "generate-secure-token"
|
||||
|
||||
# Optional: GPU support for Ollama
|
||||
enable_gpu_support = false
|
||||
|
||||
# Optional: Custom domain
|
||||
domain_name = "openclaw.example.com"
|
||||
acm_certificate_arn = "arn:aws:acm:us-east-1:123456789012:certificate/xxx"
|
||||
EOF
|
||||
```
|
||||
|
||||
### Plan and Apply
|
||||
|
||||
```bash
|
||||
# Review the plan
|
||||
terraform plan -out=tfplan
|
||||
|
||||
# Apply the configuration
|
||||
terraform apply tfplan
|
||||
```
|
||||
|
||||
### Configure kubectl
|
||||
|
||||
```bash
|
||||
aws eks update-kubeconfig --name openclaw-dev-eks --region us-east-1
|
||||
```
|
||||
|
||||
### Deploy OpenClaw to EKS
|
||||
|
||||
```bash
|
||||
cd ../../kubernetes
|
||||
kubectl apply -k overlays/dev
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Input Variables
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|----------|-------------|---------|----------|
|
||||
| `aws_region` | AWS region | `us-east-1` | No |
|
||||
| `environment` | Environment name | `dev` | Yes |
|
||||
| `owner` | Resource owner | `platform-team` | No |
|
||||
| `vpc_cidr` | VPC CIDR block | `10.0.0.0/16` | No |
|
||||
| `enable_gpu_support` | Enable GPU nodes | `false` | No |
|
||||
| `db_password` | RDS master password | `null` | Yes |
|
||||
| `redis_auth_token` | Redis auth token | `null` | Yes |
|
||||
| `acm_certificate_arn` | SSL certificate ARN | `null` | No |
|
||||
| `domain_name` | Custom domain | `null` | No |
|
||||
|
||||
### Environment-Specific Overrides
|
||||
|
||||
#### Development (`terraform.dev.tfvars`)
|
||||
|
||||
```hcl
|
||||
environment = "dev"
|
||||
single_nat_gateway = true
|
||||
db_multi_az = false
|
||||
redis_multi_az_enabled = false
|
||||
enable_cloudwatch_alarms = false
|
||||
|
||||
node_groups = {
|
||||
general = {
|
||||
instance_types = ["m6i.large"]
|
||||
min_size = 1
|
||||
max_size = 2
|
||||
desired_size = 1
|
||||
}
|
||||
compute = {
|
||||
instance_types = ["c6i.xlarge"]
|
||||
min_size = 0
|
||||
max_size = 2
|
||||
desired_size = 1
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Production (`terraform.prod.tfvars`)
|
||||
|
||||
```hcl
|
||||
environment = "prod"
|
||||
single_nat_gateway = false
|
||||
db_multi_az = true
|
||||
redis_multi_az_enabled = true
|
||||
enable_cloudwatch_alarms = true
|
||||
alb_deletion_protection = true
|
||||
|
||||
node_groups = {
|
||||
general = {
|
||||
instance_types = ["m6i.2xlarge"]
|
||||
min_size = 3
|
||||
max_size = 10
|
||||
desired_size = 3
|
||||
}
|
||||
compute = {
|
||||
instance_types = ["c6i.4xlarge"]
|
||||
min_size = 2
|
||||
max_size = 20
|
||||
desired_size = 4
|
||||
}
|
||||
gpu = {
|
||||
instance_types = ["g5.2xlarge"]
|
||||
min_size = 1
|
||||
max_size = 4
|
||||
desired_size = 2
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Steps
|
||||
|
||||
### Step 1: Prepare AWS Account
|
||||
|
||||
```bash
|
||||
# Verify AWS credentials
|
||||
aws sts get-caller-identity
|
||||
|
||||
# Check service quotas
|
||||
aws service-quotas list-service-quotas --service-code eks
|
||||
aws service-quotas list-service-quotas --service-code rds
|
||||
aws service-quotas list-service-quotas --service-code elasticache
|
||||
```
|
||||
|
||||
### Step 2: Configure Terraform Backend
|
||||
|
||||
```bash
|
||||
# Create S3 bucket for state
|
||||
aws s3api create-bucket --bucket openclaw-terraform-state --region us-east-1
|
||||
|
||||
# Create DynamoDB table for locking
|
||||
aws dynamodb create-table \
|
||||
--table-name openclaw-terraform-locks \
|
||||
--attribute-definitions AttributeName=LockID,AttributeType=S \
|
||||
--key-schema AttributeName=LockID,KeyType=HASH \
|
||||
--billing-mode PAY_PER_REQUEST
|
||||
```
|
||||
|
||||
### Step 3: Initialize and Apply
|
||||
|
||||
```bash
|
||||
# Initialize with S3 backend
|
||||
terraform init \
|
||||
-backend-config="bucket=openclaw-terraform-state" \
|
||||
-backend-config="key=openclaw/dev/terraform.tfstate" \
|
||||
-backend-config="region=us-east-1" \
|
||||
-backend-config="dynamodb_table=openclaw-terraform-locks"
|
||||
|
||||
# Plan
|
||||
terraform plan -var-file=terraform.dev.tfvars -out=tfplan
|
||||
|
||||
# Apply
|
||||
terraform apply tfplan
|
||||
```
|
||||
|
||||
### Step 4: Verify Deployment
|
||||
|
||||
```bash
|
||||
# Check EKS cluster
|
||||
aws eks describe-cluster --name openclaw-dev-eks
|
||||
|
||||
# Check RDS instance
|
||||
aws rds describe-db-instances --db-instance-identifier openclaw-dev-pg
|
||||
|
||||
# Check ElastiCache cluster
|
||||
aws elasticache describe-cache-clusters --cache-cluster-id openclaw-dev-redis
|
||||
|
||||
# Check ECR repositories
|
||||
aws ecr describe-repositories
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Post-Deployment
|
||||
|
||||
### Configure kubectl
|
||||
|
||||
```bash
|
||||
# Update kubeconfig
|
||||
aws eks update-kubeconfig --name openclaw-dev-eks --region us-east-1
|
||||
|
||||
# Verify cluster access
|
||||
kubectl get nodes
|
||||
kubectl get namespaces
|
||||
```
|
||||
|
||||
### Deploy OpenClaw Helm Chart
|
||||
|
||||
```bash
|
||||
# Add Helm repository (if published)
|
||||
helm repo add heretek https://heretek.github.io/helm-charts
|
||||
helm repo update
|
||||
|
||||
# Deploy using Helm
|
||||
helm install openclaw ./charts/openclaw \
|
||||
--namespace openclaw \
|
||||
--create-namespace \
|
||||
--values values.dev.yaml \
|
||||
--set image.repository=123456789012.dkr.ecr.us-east-1.amazonaws.com/openclaw-gateway \
|
||||
--set litellm.image.repository=123456789012.dkr.ecr.us-east-1.amazonaws.com/litellm-proxy
|
||||
```
|
||||
|
||||
### Configure Secrets
|
||||
|
||||
```bash
|
||||
# Create Kubernetes secrets
|
||||
kubectl create secret generic openclaw-secrets \
|
||||
--namespace openclaw \
|
||||
--from-literal=database-url="postgresql://openclaw:password@openclaw-dev-pg.xxx.us-east-1.rds.amazonaws.com:5432/openclaw" \
|
||||
--from-literal=redis-url="redis://:token@openclaw-dev-redis.xxx.cache.amazonaws.com:6379" \
|
||||
--from-literal=minimax-api-key="your-minimax-key" \
|
||||
--from-literal=zai-api-key="your-zai-key"
|
||||
```
|
||||
|
||||
### Verify Services
|
||||
|
||||
```bash
|
||||
# Check pods
|
||||
kubectl get pods -n openclaw
|
||||
|
||||
# Check services
|
||||
kubectl get svc -n openclaw
|
||||
|
||||
# Check logs
|
||||
kubectl logs -n openclaw -l app=openclaw-gateway
|
||||
kubectl logs -n openclaw -l app=litellm
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## GPU Support
|
||||
|
||||
### Enable GPU Nodes
|
||||
|
||||
```hcl
|
||||
# terraform.tfvars
|
||||
enable_gpu_support = true
|
||||
gpu_instance_types = ["g5.xlarge", "g5.2xlarge"]
|
||||
```
|
||||
|
||||
### Install NVIDIA Device Plugin
|
||||
|
||||
```bash
|
||||
kubectl apply -f https://raw.githubusercontent.com/GoogleContainerTools/kpt-packages/master/second-party/nvidia-device-plugin/gke.yaml
|
||||
```
|
||||
|
||||
### Configure Ollama for GPU
|
||||
|
||||
```yaml
|
||||
# values.yaml
|
||||
ollama:
|
||||
enabled: true
|
||||
gpu:
|
||||
enabled: true
|
||||
type: nvidia
|
||||
resources:
|
||||
limits:
|
||||
nvidia.com/gpu: 1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
### CloudWatch Dashboard
|
||||
|
||||
The deployment creates a CloudWatch dashboard with:
|
||||
|
||||
- EKS cluster metrics
|
||||
- Node group metrics
|
||||
- RDS PostgreSQL metrics
|
||||
- ElastiCache Redis metrics
|
||||
- ALB request metrics
|
||||
- Application logs
|
||||
|
||||
### Access Dashboard
|
||||
|
||||
```bash
|
||||
# Get dashboard name from Terraform output
|
||||
terraform output cloudwatch_dashboard_arn
|
||||
|
||||
# Open in AWS Console
|
||||
open "https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards:name=openclaw-dev-dashboard"
|
||||
```
|
||||
|
||||
### CloudWatch Alarms
|
||||
|
||||
Default alarms configured:
|
||||
|
||||
| Alarm | Metric | Threshold |
|
||||
|-------|--------|-----------|
|
||||
| EKS CPU Utilization | Cluster CPU | > 80% |
|
||||
| RDS CPU Utilization | DB CPU | > 80% |
|
||||
| RDS Free Storage | DB Storage | < 10GB |
|
||||
| Redis CPU Utilization | Cache CPU | > 80% |
|
||||
| Redis Memory | Freeable Memory | < 256MB |
|
||||
| ALB 5XX Errors | HTTP 5XX count | > 10 |
|
||||
|
||||
---
|
||||
|
||||
## Backup & Recovery
|
||||
|
||||
### Automated Backups
|
||||
|
||||
| Resource | Backup Strategy | Retention |
|
||||
|----------|----------------|-----------|
|
||||
| RDS PostgreSQL | Automated snapshots | 7 days |
|
||||
| ElastiCache Redis | Snapshot on delete | Manual |
|
||||
| ECR Images | Lifecycle policy | 30 days |
|
||||
| Terraform State | S3 versioning | Unlimited |
|
||||
|
||||
### Manual Backup
|
||||
|
||||
```bash
|
||||
# RDS snapshot
|
||||
aws rds create-db-snapshot \
|
||||
--db-instance-identifier openclaw-dev-pg \
|
||||
--db-snapshot-identifier openclaw-manual-snapshot-$(date +%Y%m%d)
|
||||
|
||||
# ElastiCache snapshot
|
||||
aws elasticache create-snapshot \
|
||||
--cache-cluster-id openclaw-dev-redis \
|
||||
--snapshot-name openclaw-redis-snapshot-$(date +%Y%m%d)
|
||||
|
||||
# ECR image backup
|
||||
aws ecr batch-get-image \
|
||||
--repository-name openclaw-gateway \
|
||||
--image-ids imageTag=latest \
|
||||
--query 'images[].imageManifest' \
|
||||
--output text > openclaw-gateway-manifest.json
|
||||
```
|
||||
|
||||
### Disaster Recovery
|
||||
|
||||
1. **Restore RDS from snapshot**
|
||||
2. **Recreate ElastiCache from snapshot**
|
||||
3. **Reapply Terraform**
|
||||
4. **Restore Kubernetes workloads**
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### EKS Nodes Not Joining Cluster
|
||||
|
||||
```bash
|
||||
# Check node status
|
||||
kubectl get nodes
|
||||
|
||||
# Check node logs
|
||||
aws eks describe-cluster --name openclaw-dev-eks
|
||||
|
||||
# Verify IAM role permissions
|
||||
aws iam get-role-policy --role-name openclaw-dev-eks-nodes-role --policy-name AmazonEKSWorkerNodePolicy
|
||||
```
|
||||
|
||||
#### RDS Connection Issues
|
||||
|
||||
```bash
|
||||
# Check security group rules
|
||||
aws ec2 describe-security-groups --group-ids sg-xxx
|
||||
|
||||
# Verify database connectivity
|
||||
psql -h openclaw-dev-pg.xxx.us-east-1.rds.amazonaws.com -U openclaw -d openclaw
|
||||
```
|
||||
|
||||
#### ALB Health Check Failures
|
||||
|
||||
```bash
|
||||
# Check target group health
|
||||
aws elbv2 describe-target-health --target-group-arn arn:aws:elasticloadbalancing:xxx
|
||||
|
||||
# Verify health check path
|
||||
curl -v http://<pod-ip>:18789/health
|
||||
```
|
||||
|
||||
### Support Resources
|
||||
|
||||
- [AWS EKS Documentation](https://docs.aws.amazon.com/eks/)
|
||||
- [AWS RDS Documentation](https://docs.aws.amazon.com/AmazonRDS/)
|
||||
- [Terraform AWS Provider](https://registry.terraform.io/providers/hashicorp/aws/latest/docs)
|
||||
- [OpenClaw Documentation](../../docs/)
|
||||
|
||||
---
|
||||
|
||||
## Cleanup
|
||||
|
||||
### Destroy Infrastructure
|
||||
|
||||
```bash
|
||||
# Delete Kubernetes resources first
|
||||
kubectl delete namespace openclaw
|
||||
|
||||
# Destroy Terraform resources
|
||||
terraform destroy -var-file=terraform.dev.tfvars
|
||||
|
||||
# Verify deletion
|
||||
aws eks describe-cluster --name openclaw-dev-eks # Should return error
|
||||
```
|
||||
|
||||
### Manual Cleanup
|
||||
|
||||
```bash
|
||||
# Delete ECR repositories
|
||||
aws ecr delete-repository --repository-name openclaw-gateway --force
|
||||
aws ecr delete-repository --repository-name litellm-proxy --force
|
||||
|
||||
# Delete S3 bucket
|
||||
aws s3 rb s3://openclaw-terraform-state --force
|
||||
|
||||
# Delete DynamoDB table
|
||||
aws dynamodb delete-table --table-name openclaw-terraform-locks
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Configure CI/CD** - Set up automated deployments
|
||||
2. **Enable Monitoring** - Configure alerts and dashboards
|
||||
3. **Set Up Backup** - Implement backup automation
|
||||
4. **Security Hardening** - Review security configurations
|
||||
5. **Cost Optimization** - Implement cost controls
|
||||
|
||||
---
|
||||
|
||||
🦞 *The thought that never ends.*
|
||||
@@ -0,0 +1,356 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - AWS Application Load Balancer Configuration
|
||||
# ==============================================================================
|
||||
# ALB for OpenClaw traffic routing and SSL termination
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ALB Security Group
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_security_group" "alb" {
|
||||
name = "${local.name_prefix}-alb-sg"
|
||||
description = "Security group for Application Load Balancer"
|
||||
vpc_id = var.vpc_id
|
||||
|
||||
ingress {
|
||||
description = "HTTP from anywhere"
|
||||
from_port = 80
|
||||
to_port = 80
|
||||
protocol = "tcp"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
ingress {
|
||||
description = "HTTPS from anywhere"
|
||||
from_port = 443
|
||||
to_port = 443
|
||||
protocol = "tcp"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
egress {
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
protocol = "-1"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-alb-sg"
|
||||
})
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Application Load Balancer
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_lb" "openclaw" {
|
||||
name = "${local.name_prefix}-alb"
|
||||
internal = false
|
||||
load_balancer_type = "application"
|
||||
security_groups = [aws_security_group.alb.id]
|
||||
subnets = var.subnet_ids
|
||||
|
||||
enable_deletion_protection = var.alb_deletion_protection
|
||||
enable_http2 = true
|
||||
drop_invalid_header_fields = true
|
||||
idle_timeout = 60
|
||||
|
||||
access_logs {
|
||||
bucket = aws_s3_bucket.alb_logs[0].bucket
|
||||
prefix = "alb-logs"
|
||||
enabled = var.environment == "prod"
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-alb"
|
||||
})
|
||||
|
||||
depends_on = [aws_internet_gateway.openclaw]
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# S3 Bucket for ALB Access Logs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_s3_bucket" "alb_logs" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
bucket = "${local.name_prefix}-alb-logs-${data.aws_caller_identity.current.account_id}"
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-alb-logs"
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_policy" "alb_logs" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
bucket = aws_s3_bucket.alb_logs[0].id
|
||||
|
||||
policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Sid = "AllowALBLogDelivery"
|
||||
Effect = "Allow"
|
||||
Principal = {
|
||||
AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
|
||||
}
|
||||
Action = [
|
||||
"s3:PutObject",
|
||||
"s3:PutObjectAcl"
|
||||
]
|
||||
Resource = "${aws_s3_bucket.alb_logs[0].arn}/*"
|
||||
}
|
||||
]
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_server_side_encryption_configuration" "alb_logs" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
bucket = aws_s3_bucket.alb_logs[0].id
|
||||
|
||||
rule {
|
||||
apply_server_side_encryption_by_default {
|
||||
sse_algorithm = "AES256"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_lifecycle_configuration" "alb_logs" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
bucket = aws_s3_bucket.alb_logs[0].id
|
||||
|
||||
rule {
|
||||
id = "expire-old-logs"
|
||||
status = "Enabled"
|
||||
|
||||
expiration {
|
||||
days = 90
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# HTTP Listener (Redirect to HTTPS)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_lb_listener" "http" {
|
||||
load_balancer_arn = aws_lb.openclaw.arn
|
||||
port = 80
|
||||
protocol = "HTTP"
|
||||
|
||||
default_action {
|
||||
type = "redirect"
|
||||
|
||||
redirect {
|
||||
port = "443"
|
||||
protocol = "HTTPS"
|
||||
status_code = "HTTP_301"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# HTTPS Listener
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_lb_listener" "https" {
|
||||
load_balancer_arn = aws_lb.openclaw.arn
|
||||
port = 443
|
||||
protocol = "HTTPS"
|
||||
ssl_policy = var.ssl_policy
|
||||
certificate_arn = var.acm_certificate_arn
|
||||
|
||||
default_action {
|
||||
type = "forward"
|
||||
target_group_arn = aws_lb_target_group.gateway.arn
|
||||
}
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = [certificate_arn]
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Target Groups
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
# OpenClaw Gateway Target Group
|
||||
resource "aws_lb_target_group" "gateway" {
|
||||
name = "${local.name_prefix}-gateway"
|
||||
port = 18789
|
||||
protocol = "HTTP"
|
||||
vpc_id = var.vpc_id
|
||||
|
||||
health_check {
|
||||
enabled = true
|
||||
healthy_threshold = 2
|
||||
interval = 30
|
||||
matcher = "200-299"
|
||||
path = "/health"
|
||||
port = "traffic-port"
|
||||
protocol = "HTTP"
|
||||
timeout = 5
|
||||
unhealthy_threshold = 2
|
||||
}
|
||||
|
||||
stickiness {
|
||||
type = "lb_cookie"
|
||||
cookie_duration = 86400
|
||||
enabled = false
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-gateway"
|
||||
Component = "gateway"
|
||||
})
|
||||
|
||||
lifecycle {
|
||||
create_before_destroy = true
|
||||
}
|
||||
}
|
||||
|
||||
# LiteLLM Proxy Target Group
|
||||
resource "aws_lb_target_group" "litellm" {
|
||||
name = "${local.name_prefix}-litellm"
|
||||
port = 4000
|
||||
protocol = "HTTP"
|
||||
vpc_id = var.vpc_id
|
||||
|
||||
health_check {
|
||||
enabled = true
|
||||
healthy_threshold = 2
|
||||
interval = 30
|
||||
matcher = "200-299"
|
||||
path = "/health"
|
||||
port = "traffic-port"
|
||||
protocol = "HTTP"
|
||||
timeout = 5
|
||||
unhealthy_threshold = 2
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-litellm"
|
||||
Component = "litellm"
|
||||
})
|
||||
|
||||
lifecycle {
|
||||
create_before_destroy = true
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Listener Rules
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
# Route to LiteLLM based on path
|
||||
resource "aws_lb_listener_rule" "litellm" {
|
||||
listener_arn = aws_lb_listener.https.arn
|
||||
priority = 100
|
||||
|
||||
action {
|
||||
type = "forward"
|
||||
target_group_arn = aws_lb_target_group.litellm.arn
|
||||
}
|
||||
|
||||
condition {
|
||||
path_pattern {
|
||||
values = ["/v1/*", "/litellm/*"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Route to Gateway for WebSocket connections
|
||||
resource "aws_lb_listener_rule" "gateway_websocket" {
|
||||
listener_arn = aws_lb_listener.https.arn
|
||||
priority = 200
|
||||
|
||||
action {
|
||||
type = "forward"
|
||||
target_group_arn = aws_lb_target_group.gateway.arn
|
||||
}
|
||||
|
||||
condition {
|
||||
path_pattern {
|
||||
values = ["/ws/*", "/gateway/*"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ALB IAM Role for S3 Logging
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_iam_service_linked_role" "alb" {
|
||||
aws_service_name = "elasticloadbalancing.amazonaws.com"
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ACM Certificate (Optional - if not provided)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_acm_certificate" "openclaw" {
|
||||
count = var.acm_certificate_arn == null ? 1 : 0
|
||||
|
||||
domain_name = var.domain_name
|
||||
validation_method = "DNS"
|
||||
|
||||
subject_alternative_names = var.subject_alternative_names
|
||||
|
||||
lifecycle {
|
||||
create_before_destroy = true
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-certificate"
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_acm_certificate_validation" "openclaw" {
|
||||
count = var.acm_certificate_arn == null ? 1 : 0
|
||||
|
||||
certificate_arn = aws_acm_certificate.openclaw[0].arn
|
||||
validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
|
||||
}
|
||||
|
||||
resource "aws_route53_record" "cert_validation" {
|
||||
for_each = var.acm_certificate_arn == null ? {
|
||||
for dvo in aws_acm_certificate.openclaw[0].domain_validation_options : dvo.domain_name => {
|
||||
name = dvo.resource_record_name
|
||||
record = dvo.resource_record_value
|
||||
type = dvo.resource_record_type
|
||||
}
|
||||
} : {}
|
||||
|
||||
allow_overwrite = true
|
||||
name = each.value.name
|
||||
records = [each.value.record]
|
||||
ttl = 60
|
||||
type = each.value.type
|
||||
zone_id = var.route53_zone_id
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Route53 DNS Records
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_route53_record" "openclaw" {
|
||||
count = var.acm_certificate_arn == null ? 1 : 0
|
||||
|
||||
zone_id = var.route53_zone_id
|
||||
name = var.domain_name
|
||||
type = "A"
|
||||
|
||||
alias {
|
||||
name = aws_lb.openclaw.dns_name
|
||||
zone_id = aws_lb.openclaw.zone_id
|
||||
evaluate_target_health = true
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,293 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - AWS ECR Configuration
|
||||
# ==============================================================================
|
||||
# Elastic Container Registry for OpenClaw container images
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ECR Lifecycle Policy Document
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
locals {
|
||||
ecr_lifecycle_policy = jsonencode({
|
||||
rules = [
|
||||
{
|
||||
rulePriority = 1
|
||||
description = "Expire images older than 30 days"
|
||||
selection = {
|
||||
tagStatus = "untagged"
|
||||
countType = "sinceImagePushed"
|
||||
countUnit = "days"
|
||||
countNumber = var.lifecycle_policy_days
|
||||
}
|
||||
action = {
|
||||
type = "expire"
|
||||
}
|
||||
},
|
||||
{
|
||||
rulePriority = 2
|
||||
description = "Keep last N tagged images"
|
||||
selection = {
|
||||
tagStatus = "tagged"
|
||||
tagPrefixList = ["latest", "main"]
|
||||
countType = "imageCountMoreThan"
|
||||
countNumber = 10
|
||||
}
|
||||
action = {
|
||||
type = "expire"
|
||||
}
|
||||
}
|
||||
]
|
||||
})
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ECR Repository - OpenClaw Gateway
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_ecr_repository" "openclaw_gateway" {
|
||||
name = "openclaw-gateway"
|
||||
image_tag_mutability = "MUTABLE"
|
||||
force_delete = var.environment == "dev"
|
||||
|
||||
image_scanning_configuration {
|
||||
scan_on_push = true
|
||||
}
|
||||
|
||||
encryption_configuration {
|
||||
encryption_type = "KMS"
|
||||
kms_key = aws_kms_key.ecr.arn
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "openclaw-gateway"
|
||||
Component = "gateway"
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_ecr_lifecycle_policy" "openclaw_gateway" {
|
||||
repository = aws_ecr_repository.openclaw_gateway.name
|
||||
policy = local.ecr_lifecycle_policy
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ECR Repository - LiteLLM Proxy
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_ecr_repository" "litellm_proxy" {
|
||||
name = "litellm-proxy"
|
||||
image_tag_mutability = "MUTABLE"
|
||||
force_delete = var.environment == "dev"
|
||||
|
||||
image_scanning_configuration {
|
||||
scan_on_push = true
|
||||
}
|
||||
|
||||
encryption_configuration {
|
||||
encryption_type = "KMS"
|
||||
kms_key = aws_kms_key.ecr.arn
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "litellm-proxy"
|
||||
Component = "litellm"
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_ecr_lifecycle_policy" "litellm_proxy" {
|
||||
repository = aws_ecr_repository.litellm_proxy.name
|
||||
policy = local.ecr_lifecycle_policy
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ECR Repository - Ollama (Optional for Custom Images)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_ecr_repository" "ollama" {
|
||||
count = var.enable_gpu_support ? 1 : 0
|
||||
|
||||
name = "ollama"
|
||||
image_tag_mutability = "MUTABLE"
|
||||
force_delete = var.environment == "dev"
|
||||
|
||||
image_scanning_configuration {
|
||||
scan_on_push = true
|
||||
}
|
||||
|
||||
encryption_configuration {
|
||||
encryption_type = "KMS"
|
||||
kms_key = aws_kms_key.ecr.arn
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "ollama"
|
||||
Component = "ollama"
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_ecr_lifecycle_policy" "ollama" {
|
||||
count = var.enable_gpu_support ? 1 : 0
|
||||
|
||||
repository = aws_ecr_repository.ollama[0].name
|
||||
policy = local.ecr_lifecycle_policy
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ECR Repository - Monitoring Stack (Optional)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_ecr_repository" "monitoring" {
|
||||
name = "monitoring"
|
||||
image_tag_mutability = "MUTABLE"
|
||||
force_delete = var.environment == "dev"
|
||||
|
||||
image_scanning_configuration {
|
||||
scan_on_push = true
|
||||
}
|
||||
|
||||
encryption_configuration {
|
||||
encryption_type = "KMS"
|
||||
kms_key = aws_kms_key.ecr.arn
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "monitoring"
|
||||
Component = "monitoring"
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_ecr_lifecycle_policy" "monitoring" {
|
||||
repository = aws_ecr_repository.monitoring.name
|
||||
policy = local.ecr_lifecycle_policy
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# KMS Key for ECR Encryption
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_kms_key" "ecr" {
|
||||
description = "KMS key for ECR repository encryption"
|
||||
deletion_window_in_days = 7
|
||||
enable_key_rotation = true
|
||||
|
||||
policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Sid = "Enable IAM User Permissions"
|
||||
Effect = "Allow"
|
||||
Principal = {
|
||||
AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
|
||||
}
|
||||
Action = "kms:*"
|
||||
Resource = "*"
|
||||
},
|
||||
{
|
||||
Sid = "Allow ECR Service"
|
||||
Effect = "Allow"
|
||||
Principal = {
|
||||
Service = "ecr.amazonaws.com"
|
||||
}
|
||||
Action = [
|
||||
"kms:Encrypt",
|
||||
"kms:Decrypt",
|
||||
"kms:ReEncrypt*",
|
||||
"kms:GenerateDataKey*",
|
||||
"kms:DescribeKey"
|
||||
]
|
||||
Resource = "*"
|
||||
},
|
||||
{
|
||||
Sid = "Allow EKS Service"
|
||||
Effect = "Allow"
|
||||
Principal = {
|
||||
Service = "eks.amazonaws.com"
|
||||
}
|
||||
Action = [
|
||||
"kms:Decrypt",
|
||||
"kms:GenerateDataKey*"
|
||||
]
|
||||
Resource = "*"
|
||||
}
|
||||
]
|
||||
})
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-ecr-key"
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_kms_alias" "ecr" {
|
||||
name = "alias/${local.name_prefix}-ecr"
|
||||
target_key_id = aws_kms_key.ecr.key_id
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ECR Access Policy for Cross-Account (Optional)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_ecr_repository_policy" "openclaw_gateway" {
|
||||
repository = aws_ecr_repository.openclaw_gateway.name
|
||||
|
||||
policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Sid = "Allow EKS Pull"
|
||||
Effect = "Allow"
|
||||
Principal = {
|
||||
Service = "eks.amazonaws.com"
|
||||
}
|
||||
Action = [
|
||||
"ecr:GetDownloadUrlForLayer",
|
||||
"ecr:BatchGetImage",
|
||||
"ecr:BatchCheckLayerAvailability"
|
||||
]
|
||||
}
|
||||
]
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_ecr_repository_policy" "litellm_proxy" {
|
||||
repository = aws_ecr_repository.litellm_proxy.name
|
||||
|
||||
policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Sid = "Allow EKS Pull"
|
||||
Effect = "Allow"
|
||||
Principal = {
|
||||
Service = "eks.amazonaws.com"
|
||||
}
|
||||
Action = [
|
||||
"ecr:GetDownloadUrlForLayer",
|
||||
"ecr:BatchGetImage",
|
||||
"ecr:BatchCheckLayerAvailability"
|
||||
]
|
||||
}
|
||||
]
|
||||
})
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ECR Pull-Through Cache Rules (Optional - for Docker Hub, etc.)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_ecr_pull_through_cache_rule" "docker_hub" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
ecr_repository_prefix = "dockerhub"
|
||||
upstream_registry_url = "registry-1.docker.io"
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
resource "aws_ecr_pull_through_cache_rule" "ghcr" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
ecr_repository_prefix = "ghcr"
|
||||
upstream_registry_url = "ghcr.io"
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
@@ -0,0 +1,589 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - AWS EKS Configuration
|
||||
# ==============================================================================
|
||||
# EKS cluster and node group configurations
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# EKS Cluster
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_eks_cluster" "openclaw_cluster" {
|
||||
name = "${local.name_prefix}-eks"
|
||||
version = var.eks_version
|
||||
role_arn = aws_iam_role.eks_cluster.arn
|
||||
|
||||
vpc_config {
|
||||
subnet_ids = var.subnet_ids
|
||||
endpoint_private_access = true
|
||||
endpoint_public_access = true
|
||||
security_group_ids = [aws_security_group.eks_cluster.id]
|
||||
}
|
||||
|
||||
enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-eks"
|
||||
})
|
||||
|
||||
depends_on = [
|
||||
aws_iam_role_policy_attachment.eks_cluster_policy,
|
||||
aws_iam_role_policy_attachment.eks_vpc_resource_controller
|
||||
]
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# EKS Cluster IAM Role
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_iam_role" "eks_cluster" {
|
||||
name = "${local.name_prefix}-eks-cluster-role"
|
||||
|
||||
assume_role_policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Action = "sts:AssumeRole"
|
||||
Effect = "Allow"
|
||||
Principal = {
|
||||
Service = "eks.amazonaws.com"
|
||||
}
|
||||
}
|
||||
]
|
||||
})
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
|
||||
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
|
||||
role = aws_iam_role.eks_cluster.name
|
||||
}
|
||||
|
||||
resource "aws_iam_role_policy_attachment" "eks_vpc_resource_controller" {
|
||||
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController"
|
||||
role = aws_iam_role.eks_cluster.name
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# EKS Cluster Security Group
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_security_group" "eks_cluster" {
|
||||
name = "${local.name_prefix}-eks-cluster-sg"
|
||||
description = "Security group for EKS cluster control plane"
|
||||
vpc_id = var.vpc_id
|
||||
|
||||
egress {
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
protocol = "-1"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# OIDC Provider for IRSA
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_iam_openid_connect_provider" "eks" {
|
||||
count = var.enable_irsa ? 1 : 0
|
||||
|
||||
client_id_list = ["sts.amazonaws.com"]
|
||||
thumbprint_list = ["9e99a48a9960b14926bb7f3b02e22da2b0ab7280"]
|
||||
url = aws_eks_cluster.openclaw_cluster.identity[0].oidc[0].issuer
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# EKS Node Group - General Purpose
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_eks_node_group" "general" {
|
||||
cluster_name = aws_eks_cluster.openclaw_cluster.name
|
||||
node_group_name = "${local.name_prefix}-general"
|
||||
node_role_arn = aws_iam_role.eks_nodes.arn
|
||||
subnet_ids = var.subnet_ids
|
||||
|
||||
instance_types = var.node_groups.general.instance_types
|
||||
|
||||
scaling_config {
|
||||
desired_size = var.node_groups.general.desired_size
|
||||
max_size = var.node_groups.general.max_size
|
||||
min_size = var.node_groups.general.min_size
|
||||
}
|
||||
|
||||
disk_size = var.node_groups.general.disk_size
|
||||
|
||||
ami_type = "AL2_x86_64"
|
||||
capacity_type = "ON_DEMAND"
|
||||
force_update_version = true
|
||||
|
||||
labels = {
|
||||
"workload-type" = "general"
|
||||
"environment" = var.environment
|
||||
}
|
||||
|
||||
taint {
|
||||
key = "workload-type"
|
||||
value = "general"
|
||||
effect = "NO_SCHEDULE"
|
||||
}
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = [
|
||||
scaling_config[0].desired_size
|
||||
]
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-general-ng"
|
||||
})
|
||||
|
||||
depends_on = [
|
||||
aws_iam_role_policy_attachment.eks_worker_node_policy,
|
||||
aws_iam_role_policy_attachment.eks_cni_policy,
|
||||
aws_iam_role_policy_attachment.eks_ecr_read_only
|
||||
]
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# EKS Node Group - Compute Optimized
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_eks_node_group" "compute" {
|
||||
cluster_name = aws_eks_cluster.openclaw_cluster.name
|
||||
node_group_name = "${local.name_prefix}-compute"
|
||||
node_role_arn = aws_iam_role.eks_nodes.arn
|
||||
subnet_ids = var.subnet_ids
|
||||
|
||||
instance_types = var.node_groups.compute.instance_types
|
||||
|
||||
scaling_config {
|
||||
desired_size = var.node_groups.compute.desired_size
|
||||
max_size = var.node_groups.compute.max_size
|
||||
min_size = var.node_groups.compute.min_size
|
||||
}
|
||||
|
||||
disk_size = var.node_groups.compute.disk_size
|
||||
|
||||
ami_type = "AL2_x86_64"
|
||||
capacity_type = "ON_DEMAND"
|
||||
force_update_version = true
|
||||
|
||||
labels = {
|
||||
"workload-type" = "compute"
|
||||
"environment" = var.environment
|
||||
}
|
||||
|
||||
taint {
|
||||
key = "workload-type"
|
||||
value = "compute"
|
||||
effect = "NO_SCHEDULE"
|
||||
}
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = [
|
||||
scaling_config[0].desired_size
|
||||
]
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-compute-ng"
|
||||
})
|
||||
|
||||
depends_on = [
|
||||
aws_iam_role_policy_attachment.eks_worker_node_policy,
|
||||
aws_iam_role_policy_attachment.eks_cni_policy,
|
||||
aws_iam_role_policy_attachment.eks_ecr_read_only
|
||||
]
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# EKS Node Group - GPU (Optional)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_eks_node_group" "gpu" {
|
||||
count = var.enable_gpu_support ? 1 : 0
|
||||
|
||||
cluster_name = aws_eks_cluster.openclaw_cluster.name
|
||||
node_group_name = "${local.name_prefix}-gpu"
|
||||
node_role_arn = aws_iam_role.eks_nodes.arn
|
||||
subnet_ids = var.subnet_ids
|
||||
|
||||
instance_types = var.gpu_instance_types
|
||||
|
||||
scaling_config {
|
||||
desired_size = 1
|
||||
max_size = 4
|
||||
min_size = 0
|
||||
}
|
||||
|
||||
disk_size = 100
|
||||
|
||||
ami_type = "AL2_x86_64_GPU"
|
||||
capacity_type = "ON_DEMAND"
|
||||
force_update_version = true
|
||||
|
||||
labels = {
|
||||
"workload-type" = "gpu"
|
||||
"environment" = var.environment
|
||||
"gpu" = "true"
|
||||
}
|
||||
|
||||
taint {
|
||||
key = "nvidia.com/gpu"
|
||||
value = "true"
|
||||
effect = "NO_SCHEDULE"
|
||||
}
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = [
|
||||
scaling_config[0].desired_size
|
||||
]
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-gpu-ng"
|
||||
})
|
||||
|
||||
depends_on = [
|
||||
aws_iam_role_policy_attachment.eks_worker_node_policy,
|
||||
aws_iam_role_policy_attachment.eks_cni_policy,
|
||||
aws_iam_role_policy_attachment.eks_ecr_read_only
|
||||
]
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# EKS Nodes IAM Role
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_iam_role" "eks_nodes" {
|
||||
name = "${local.name_prefix}-eks-nodes-role"
|
||||
|
||||
assume_role_policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Action = "sts:AssumeRole"
|
||||
Effect = "Allow"
|
||||
Principal = {
|
||||
Service = "ec2.amazonaws.com"
|
||||
}
|
||||
}
|
||||
]
|
||||
})
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
resource "aws_iam_role_policy_attachment" "eks_worker_node_policy" {
|
||||
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
|
||||
role = aws_iam_role.eks_nodes.name
|
||||
}
|
||||
|
||||
resource "aws_iam_role_policy_attachment" "eks_cni_policy" {
|
||||
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
|
||||
role = aws_iam_role.eks_nodes.name
|
||||
}
|
||||
|
||||
resource "aws_iam_role_policy_attachment" "eks_ecr_read_only" {
|
||||
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
|
||||
role = aws_iam_role.eks_nodes.name
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# EKS Nodes Security Group
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_security_group" "eks_nodes" {
|
||||
name = "${local.name_prefix}-eks-nodes-sg"
|
||||
description = "Security group for EKS worker nodes"
|
||||
vpc_id = var.vpc_id
|
||||
|
||||
egress {
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
protocol = "-1"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# Allow communication from EKS control plane
|
||||
resource "aws_security_group_rule" "eks_nodes_ingress_cluster" {
|
||||
description = "Allow EKS control plane to communicate with nodes"
|
||||
security_group_id = aws_security_group.eks_nodes.id
|
||||
protocol = "tcp"
|
||||
from_port = 1025
|
||||
to_port = 65535
|
||||
source_security_group_id = aws_security_group.eks_cluster.id
|
||||
type = "ingress"
|
||||
}
|
||||
|
||||
# Allow communication between nodes
|
||||
resource "aws_security_group_rule" "eks_nodes_self" {
|
||||
description = "Allow nodes to communicate with each other"
|
||||
security_group_id = aws_security_group.eks_nodes.id
|
||||
protocol = "tcp"
|
||||
from_port = 0
|
||||
to_port = 65535
|
||||
self = true
|
||||
type = "ingress"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cluster Autoscaler IAM Policy
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_iam_policy" "cluster_autoscaler" {
|
||||
count = var.enable_cluster_autoscaler ? 1 : 0
|
||||
|
||||
name = "${local.name_prefix}-cluster-autoscaler-policy"
|
||||
description = "IAM policy for Cluster Autoscaler"
|
||||
|
||||
policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Action = [
|
||||
"autoscaling:DescribeAutoScalingGroups",
|
||||
"autoscaling:DescribeAutoScalingInstances",
|
||||
"autoscaling:DescribeLaunchConfigurations",
|
||||
"autoscaling:DescribeTags",
|
||||
"autoscaling:SetDesiredCapacity",
|
||||
"autoscaling:TerminateInstanceInAutoScalingGroup",
|
||||
"ec2:DescribeLaunchTemplateVersions"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = "*"
|
||||
}
|
||||
]
|
||||
})
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# AWS Load Balancer Controller IAM Policy
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_iam_policy" "aws_load_balancer_controller" {
|
||||
name = "${local.name_prefix}-aws-load-balancer-controller-policy"
|
||||
description = "IAM policy for AWS Load Balancer Controller"
|
||||
|
||||
policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Action = [
|
||||
"iam:CreateServiceLinkedRole"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = "*"
|
||||
Condition = {
|
||||
StringEquals = {
|
||||
"iam:AWSServiceName" = "elasticloadbalancing.amazonaws.com"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
Action = [
|
||||
"ec2:DescribeAccountAttributes",
|
||||
"ec2:DescribeAddresses",
|
||||
"ec2:DescribeAvailabilityZones",
|
||||
"ec2:DescribeInternetGateways",
|
||||
"ec2:DescribeVpcs",
|
||||
"ec2:DescribeVpcPeeringConnections",
|
||||
"ec2:DescribeSubnets",
|
||||
"ec2:DescribeSecurityGroups",
|
||||
"ec2:DescribeInstances",
|
||||
"ec2:DescribeNetworkInterfaces",
|
||||
"ec2:DescribeTags",
|
||||
"ec2:GetCoipPoolUsage",
|
||||
"ec2:DescribeCoipPools",
|
||||
"elasticloadbalancing:DescribeLoadBalancers",
|
||||
"elasticloadbalancing:DescribeLoadBalancerAttributes",
|
||||
"elasticloadbalancing:DescribeListeners",
|
||||
"elasticloadbalancing:DescribeListenerCertificates",
|
||||
"elasticloadbalancing:DescribeSSLPolicies",
|
||||
"elasticloadbalancing:DescribeRules",
|
||||
"elasticloadbalancing:DescribeTargetGroups",
|
||||
"elasticloadbalancing:DescribeTargetGroupAttributes",
|
||||
"elasticloadbalancing:DescribeTargetHealth",
|
||||
"elasticloadbalancing:DescribeTags"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = "*"
|
||||
},
|
||||
{
|
||||
Action = [
|
||||
"cognito-idp:DescribeUserPoolClient",
|
||||
"acm:ListCertificates",
|
||||
"acm:DescribeCertificate",
|
||||
"iam:ListServerCertificates",
|
||||
"iam:GetServerCertificate",
|
||||
"waf-regional:GetWebACL",
|
||||
"waf-regional:GetWebACLForResource",
|
||||
"waf-regional:AssociateWebACL",
|
||||
"waf-regional:DisassociateWebACL",
|
||||
"wafv2:GetWebACL",
|
||||
"wafv2:GetWebACLForResource",
|
||||
"wafv2:AssociateWebACL",
|
||||
"wafv2:DisassociateWebACL",
|
||||
"shield:GetSubscriptionState",
|
||||
"shield:DescribeProtection",
|
||||
"shield:CreateProtection",
|
||||
"shield:DeleteProtection"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = "*"
|
||||
},
|
||||
{
|
||||
Action = [
|
||||
"ec2:AuthorizeSecurityGroupIngress",
|
||||
"ec2:RevokeSecurityGroupIngress"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = "*"
|
||||
},
|
||||
{
|
||||
Action = [
|
||||
"ec2:CreateSecurityGroup"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = "*"
|
||||
},
|
||||
{
|
||||
Action = [
|
||||
"ec2:CreateTags"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = "arn:aws:ec2:*:*:security-group/*"
|
||||
Condition = {
|
||||
StringEquals = {
|
||||
"ec2:CreateAction" = "CreateSecurityGroup"
|
||||
}
|
||||
Null = {
|
||||
"aws:RequestTag/elbv2.k8s.aws/cluster" = "false"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
Action = [
|
||||
"ec2:CreateTags",
|
||||
"ec2:DeleteTags"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = "arn:aws:ec2:*:*:security-group/*"
|
||||
Condition = {
|
||||
Null = {
|
||||
"aws:RequestTag/elbv2.k8s.aws/cluster" = "true"
|
||||
"aws:ResourceTag/elbv2.k8s.aws/cluster" = "false"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
Action = [
|
||||
"ec2:AuthorizeSecurityGroupIngress",
|
||||
"ec2:RevokeSecurityGroupIngress",
|
||||
"ec2:DeleteSecurityGroup"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = "*"
|
||||
Condition = {
|
||||
Null = {
|
||||
"aws:ResourceTag/elbv2.k8s.aws/cluster" = "false"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
Action = [
|
||||
"elasticloadbalancing:CreateLoadBalancer",
|
||||
"elasticloadbalancing:CreateTargetGroup"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = "*"
|
||||
Condition = {
|
||||
Null = {
|
||||
"aws:RequestTag/elbv2.k8s.aws/cluster" = "false"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
Action = [
|
||||
"elasticloadbalancing:CreateListener",
|
||||
"elasticloadbalancing:DeleteListener",
|
||||
"elasticloadbalancing:CreateRule",
|
||||
"elasticloadbalancing:DeleteRule"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = "*"
|
||||
},
|
||||
{
|
||||
Action = [
|
||||
"elasticloadbalancing:AddTags",
|
||||
"elasticloadbalancing:RemoveTags"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = [
|
||||
"arn:aws:elasticloadbalancing:*:*:targetgroup/*/*",
|
||||
"arn:aws:elasticloadbalancing:*:*:loadbalancer/net/*/*",
|
||||
"arn:aws:elasticloadbalancing:*:*:loadbalancer/app/*/*"
|
||||
]
|
||||
Condition = {
|
||||
Null = {
|
||||
"aws:RequestTag/elbv2.k8s.aws/cluster" = "true"
|
||||
"aws:ResourceTag/elbv2.k8s.aws/cluster" = "false"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
Action = [
|
||||
"elasticloadbalancing:AddTags",
|
||||
"elasticloadbalancing:RemoveTags"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = [
|
||||
"arn:aws:elasticloadbalancing:*:*:listener/net/*/*/*",
|
||||
"arn:aws:elasticloadbalancing:*:*:listener/app/*/*/*",
|
||||
"arn:aws:elasticloadbalancing:*:*:listener-rule/net/*/*/*",
|
||||
"arn:aws:elasticloadbalancing:*:*:listener-rule/app/*/*/*"
|
||||
]
|
||||
},
|
||||
{
|
||||
Action = [
|
||||
"elasticloadbalancing:ModifyLoadBalancerAttributes",
|
||||
"elasticloadbalancing:SetIpAddressType",
|
||||
"elasticloadbalancing:SetSecurityGroups",
|
||||
"elasticloadbalancing:SetSubnets",
|
||||
"elasticloadbalancing:DeleteLoadBalancer",
|
||||
"elasticloadbalancing:ModifyTargetGroup",
|
||||
"elasticloadbalancing:ModifyTargetGroupAttributes",
|
||||
"elasticloadbalancing:DeleteTargetGroup"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = "*"
|
||||
Condition = {
|
||||
Null = {
|
||||
"aws:ResourceTag/elbv2.k8s.aws/cluster" = "false"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
Action = [
|
||||
"elasticloadbalancing:RegisterTargets",
|
||||
"elasticloadbalancing:DeregisterTargets"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = "arn:aws:elasticloadbalancing:*:*:targetgroup/*/*"
|
||||
}
|
||||
]
|
||||
})
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
@@ -0,0 +1,245 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - AWS ElastiCache Redis Configuration
|
||||
# ==============================================================================
|
||||
# ElastiCache Redis for OpenClaw caching and session management
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ElastiCache Subnet Group
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_elasticache_subnet_group" "openclaw" {
|
||||
name = "${local.name_prefix}-redis-subnet-group"
|
||||
subnet_ids = var.subnet_ids
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-redis-subnet-group"
|
||||
})
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ElastiCache Security Group
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_security_group" "elasticache" {
|
||||
name = "${local.name_prefix}-elasticache-sg"
|
||||
description = "Security group for ElastiCache Redis"
|
||||
vpc_id = var.vpc_id
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-elasticache-sg"
|
||||
})
|
||||
}
|
||||
|
||||
# Allow Redis access from EKS nodes
|
||||
resource "aws_security_group_rule" "elasticache_ingress_from_nodes" {
|
||||
description = "Allow Redis access from EKS nodes"
|
||||
security_group_id = aws_security_group.elasticache.id
|
||||
protocol = "tcp"
|
||||
from_port = 6379
|
||||
to_port = 6379
|
||||
source_security_group_id = var.security_group_ids[0]
|
||||
type = "ingress"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ElastiCache Parameter Group
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_elasticache_parameter_group" "openclaw" {
|
||||
family = "redis7"
|
||||
name = "${local.name_prefix}-redis-params"
|
||||
|
||||
parameter {
|
||||
name = "maxmemory-policy"
|
||||
value = "allkeys-lru"
|
||||
}
|
||||
|
||||
parameter {
|
||||
name = "timeout"
|
||||
value = "300"
|
||||
}
|
||||
|
||||
parameter {
|
||||
name = "tcp-keepalive"
|
||||
value = "60"
|
||||
}
|
||||
|
||||
parameter {
|
||||
name = "slowlog-log-slower-than"
|
||||
value = "10000"
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-redis-params"
|
||||
})
|
||||
|
||||
lifecycle {
|
||||
create_before_destroy = true
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ElastiCache Redis Cluster (Replication Group)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_elasticache_replication_group" "openclaw" {
|
||||
replication_group_id = "${local.name_prefix}-redis"
|
||||
description = "ElastiCache Redis cluster for OpenClaw"
|
||||
node_type = var.redis_node_type
|
||||
num_cache_clusters = var.redis_automatic_failover_enabled ? 2 : var.redis_num_cache_nodes
|
||||
engine = "redis"
|
||||
engine_version = var.redis_engine_version
|
||||
parameter_group_name = aws_elasticache_parameter_group.openclaw.name
|
||||
subnet_group_name = aws_elasticache_subnet_group.openclaw.name
|
||||
security_group_ids = [aws_security_group.elasticache.id]
|
||||
|
||||
# Authentication
|
||||
auth_token = var.redis_auth_token
|
||||
at_rest_encryption_enabled = true
|
||||
transit_encryption_enabled = true
|
||||
|
||||
# High availability
|
||||
automatic_failover_enabled = var.redis_automatic_failover_enabled
|
||||
multi_az_enabled = var.redis_multi_az_enabled
|
||||
|
||||
# Persistence
|
||||
snapshot_retention_limit = var.environment == "prod" ? 7 : 0
|
||||
snapshot_window = "03:00-04:00"
|
||||
maintenance_window = "Mon:04:00-Mon:05:00"
|
||||
|
||||
# Notifications
|
||||
notification_topic_arn = var.alarm_notification_arn
|
||||
|
||||
# Monitoring
|
||||
log_delivery_configuration {
|
||||
destination = aws_cloudwatch_log_group.slowlog[0].name
|
||||
destination_type = "cloudwatch-logs"
|
||||
log_format = "json"
|
||||
log_type = "slow-log"
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-redis"
|
||||
})
|
||||
|
||||
lifecycle {
|
||||
prevent_destroy = true
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# CloudWatch Log Group for Slow Query Log
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_cloudwatch_log_group" "slowlog" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "/aws/elasticache/${local.name_prefix}-slowlog"
|
||||
retention_in_days = 30
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ElastiCache Global Datastore (Cross-Region Replication for DR)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_elasticache_global_replication_group" "openclaw" {
|
||||
count = var.environment == "prod" && var.redis_multi_az_enabled ? 1 : 0
|
||||
|
||||
global_replication_group_id_suffix = "${local.name_prefix}-global"
|
||||
primary_replication_group_id = aws_elasticache_replication_group.openclaw.id
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ElastiCache Serverless (Alternative for Variable Workloads)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_elasticache_serverless_cache" "openclaw" {
|
||||
count = var.environment == "dev" ? 1 : 0
|
||||
|
||||
name = "${local.name_prefix}-redis-serverless"
|
||||
engine = "REDIS"
|
||||
subnet_ids = var.subnet_ids
|
||||
security_group_ids = [aws_security_group.elasticache.id]
|
||||
|
||||
major_engine_version = "7"
|
||||
|
||||
description = "Serverless Redis cache for development environment"
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# CloudWatch Alarms for ElastiCache
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "elasticache_cpu" {
|
||||
count = var.enable_cloudwatch_alarms ? 1 : 0
|
||||
|
||||
alarm_name = "${local.name_prefix}-redis-cpu-utilization"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = 2
|
||||
metric_name = "CPUUtilization"
|
||||
namespace = "AWS/ElastiCache"
|
||||
period = 300
|
||||
statistic = "Average"
|
||||
threshold = 80
|
||||
alarm_description = "Redis CPU utilization is too high"
|
||||
alarm_actions = var.alarm_notification_arn != null ? [var.alarm_notification_arn] : []
|
||||
ok_actions = var.alarm_notification_arn != null ? [var.alarm_notification_arn] : []
|
||||
|
||||
dimensions = {
|
||||
CacheClusterId = aws_elasticache_replication_group.openclaw.primary_cluster_id
|
||||
}
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "elasticache_memory" {
|
||||
count = var.enable_cloudwatch_alarms ? 1 : 0
|
||||
|
||||
alarm_name = "${local.name_prefix}-redis-memory-utilization"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = 2
|
||||
metric_name = "FreeableMemory"
|
||||
namespace = "AWS/ElastiCache"
|
||||
period = 300
|
||||
statistic = "Average"
|
||||
threshold = 268435456 # 256MB
|
||||
comparison_operator = "LessThanThreshold"
|
||||
alarm_description = "Redis freeable memory is too low"
|
||||
alarm_actions = var.alarm_notification_arn != null ? [var.alarm_notification_arn] : []
|
||||
ok_actions = var.alarm_notification_arn != null ? [var.alarm_notification_arn] : []
|
||||
|
||||
dimensions = {
|
||||
CacheClusterId = aws_elasticache_replication_group.openclaw.primary_cluster_id
|
||||
}
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "elasticache_connections" {
|
||||
count = var.enable_cloudwatch_alarms ? 1 : 0
|
||||
|
||||
alarm_name = "${local.name_prefix}-redis-connections"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = 2
|
||||
metric_name = "CurrConnections"
|
||||
namespace = "AWS/ElastiCache"
|
||||
period = 300
|
||||
statistic = "Average"
|
||||
threshold = 1000
|
||||
alarm_description = "Redis current connections is too high"
|
||||
alarm_actions = var.alarm_notification_arn != null ? [var.alarm_notification_arn] : []
|
||||
ok_actions = var.alarm_notification_arn != null ? [var.alarm_notification_arn] : []
|
||||
|
||||
dimensions = {
|
||||
CacheClusterId = aws_elasticache_replication_group.openclaw.primary_cluster_id
|
||||
}
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
@@ -0,0 +1,368 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - AWS Terraform Configuration
|
||||
# ==============================================================================
|
||||
# Main configuration file for AWS infrastructure
|
||||
# ==============================================================================
|
||||
|
||||
terraform {
|
||||
required_version = ">= 1.6.0"
|
||||
|
||||
required_providers {
|
||||
aws = {
|
||||
source = "hashicorp/aws"
|
||||
version = "~> 5.0"
|
||||
}
|
||||
kubernetes = {
|
||||
source = "hashicorp/kubernetes"
|
||||
version = "~> 2.24"
|
||||
}
|
||||
helm = {
|
||||
source = "hashicorp/helm"
|
||||
version = "~> 2.12"
|
||||
}
|
||||
null = {
|
||||
source = "hashicorp/null"
|
||||
version = "~> 3.2"
|
||||
}
|
||||
}
|
||||
|
||||
backend "s3" {
|
||||
# Configure backend with variables or environment
|
||||
# bucket = "terraform-state-bucket"
|
||||
# key = "openclaw/terraform.tfstate"
|
||||
# region = "us-east-1"
|
||||
# encrypt = true
|
||||
# dynamodb_table = "terraform-locks"
|
||||
}
|
||||
}
|
||||
|
||||
provider "aws" {
|
||||
region = var.aws_region
|
||||
|
||||
default_tags {
|
||||
tags = {
|
||||
Project = "openclaw"
|
||||
Environment = var.environment
|
||||
ManagedBy = "terraform"
|
||||
Owner = var.owner
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
provider "kubernetes" {
|
||||
host = aws_eks_cluster.openclaw_cluster.endpoint
|
||||
cluster_ca_certificate = base64decode(aws_eks_cluster.openclaw_cluster.certificate_authority[0].data)
|
||||
token = data.aws_eks_cluster_auth.openclaw_cluster.token
|
||||
}
|
||||
|
||||
provider "helm" {
|
||||
kubernetes {
|
||||
host = aws_eks_cluster.openclaw_cluster.endpoint
|
||||
cluster_ca_certificate = base64decode(aws_eks_cluster.openclaw_cluster.certificate_authority[0].data)
|
||||
token = data.aws_eks_cluster_auth.openclaw_cluster.token
|
||||
}
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Data Sources
|
||||
# ==============================================================================
|
||||
|
||||
data "aws_eks_cluster_auth" "openclaw_cluster" {
|
||||
name = aws_eks_cluster.openclaw_cluster.name
|
||||
}
|
||||
|
||||
data "aws_availability_zones" "available" {
|
||||
state = "available"
|
||||
}
|
||||
|
||||
data "aws_caller_identity" "current" {}
|
||||
|
||||
data "aws_partition" "current" {}
|
||||
|
||||
# ==============================================================================
|
||||
# Local Values
|
||||
# ==============================================================================
|
||||
|
||||
locals {
|
||||
name_prefix = "openclaw-${var.environment}"
|
||||
|
||||
common_tags = {
|
||||
Project = "openclaw"
|
||||
Environment = var.environment
|
||||
Version = var.app_version
|
||||
ManagedBy = "terraform"
|
||||
}
|
||||
|
||||
gpu_instance_types = var.enable_gpu_support ? var.gpu_instance_types : []
|
||||
|
||||
# ECR repository URLs
|
||||
ecr_repository_urls = {
|
||||
gateway = aws_ecr_repository.openclaw_gateway.repository_url
|
||||
litellm = aws_ecr_repository.litellm_proxy.repository_url
|
||||
}
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Random Resources
|
||||
# ==============================================================================
|
||||
|
||||
resource "random_string" "suffix" {
|
||||
length = 8
|
||||
special = false
|
||||
upper = false
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# VPC Module
|
||||
# ==============================================================================
|
||||
|
||||
module "vpc" {
|
||||
source = "./vpc"
|
||||
|
||||
vpc_cidr = var.vpc_cidr
|
||||
aws_region = var.aws_region
|
||||
availability_zones = slice(data.aws_availability_zones.available.names, 0, 3)
|
||||
name_prefix = local.name_prefix
|
||||
enable_nat_gateway = var.enable_nat_gateway
|
||||
single_nat_gateway = var.single_nat_gateway
|
||||
enable_flow_logs = var.enable_vpc_flow_logs
|
||||
flow_logs_retention = var.flow_logs_retention_days
|
||||
public_subnet_cidrs = var.public_subnet_cidrs
|
||||
private_subnet_cidrs = var.private_subnet_cidrs
|
||||
database_subnet_cidrs = var.database_subnet_cidrs
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# EKS Cluster
|
||||
# ==============================================================================
|
||||
|
||||
module "eks" {
|
||||
source = "./eks"
|
||||
|
||||
cluster_name = "${local.name_prefix}-eks"
|
||||
cluster_version = var.eks_version
|
||||
vpc_id = module.vpc.vpc_id
|
||||
subnet_ids = module.vpc.private_subnet_ids
|
||||
|
||||
# Control plane configuration
|
||||
enable_irsa = var.enable_irsa
|
||||
enable_cluster_autoscaler = var.enable_cluster_autoscaler
|
||||
|
||||
# Node group configuration
|
||||
node_groups = var.node_groups
|
||||
gpu_node_groups = local.gpu_instance_types
|
||||
gpu_enabled = var.enable_gpu_support
|
||||
|
||||
# Addons
|
||||
enable_aws_load_balancer_controller = true
|
||||
enable_metrics_server = true
|
||||
enable_cluster_autoscaler_addon = var.enable_cluster_autoscaler
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# RDS PostgreSQL
|
||||
# ==============================================================================
|
||||
|
||||
module "rds" {
|
||||
source = "./rds"
|
||||
|
||||
identifier_prefix = "${local.name_prefix}-pg"
|
||||
vpc_id = module.vpc.vpc_id
|
||||
subnet_ids = module.vpc.database_subnet_ids
|
||||
security_group_ids = [module.eks.node_security_group_id]
|
||||
|
||||
# Database configuration
|
||||
engine_version = var.postgresql_version
|
||||
instance_class = var.db_instance_class
|
||||
allocated_storage = var.db_allocated_storage
|
||||
max_allocated_storage = var.db_max_allocated_storage
|
||||
|
||||
# Authentication
|
||||
db_name = var.db_name
|
||||
db_username = var.db_username
|
||||
db_password = var.db_password
|
||||
db_password_kms_key_id = var.db_password_kms_key_id
|
||||
|
||||
# High availability
|
||||
multi_az = var.db_multi_az
|
||||
publicly_accessible = false
|
||||
|
||||
# Backup and maintenance
|
||||
backup_retention_period = var.db_backup_retention_period
|
||||
backup_window = var.db_backup_window
|
||||
maintenance_window = var.db_maintenance_window
|
||||
|
||||
# Monitoring
|
||||
enabled_cloudwatch_logs_exports = ["postgresql"]
|
||||
performance_insights_enabled = var.db_performance_insights_enabled
|
||||
performance_insights_retention_period = var.db_performance_insights_retention
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# ElastiCache Redis
|
||||
# ==============================================================================
|
||||
|
||||
module "elasticache" {
|
||||
source = "./elasticache"
|
||||
|
||||
cache_cluster_id = "${local.name_prefix}-redis"
|
||||
vpc_id = module.vpc.vpc_id
|
||||
subnet_ids = module.vpc.private_subnet_ids
|
||||
security_group_ids = [module.eks.node_security_group_id]
|
||||
|
||||
# Redis configuration
|
||||
node_type = var.redis_node_type
|
||||
engine_version = var.redis_engine_version
|
||||
num_cache_nodes = var.redis_num_cache_nodes
|
||||
parameter_group_name = var.redis_parameter_group_name
|
||||
|
||||
# High availability
|
||||
automatic_failover_enabled = var.redis_automatic_failover_enabled
|
||||
multi_az_enabled = var.redis_multi_az_enabled
|
||||
|
||||
# Security
|
||||
auth_token = var.redis_auth_token
|
||||
auth_token_kms_key_id = var.redis_auth_token_kms_key_id
|
||||
at_rest_encryption_enabled = true
|
||||
transit_encryption_enabled = true
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# ECR Repositories
|
||||
# ==============================================================================
|
||||
|
||||
module "ecr" {
|
||||
source = "./ecr"
|
||||
|
||||
repositories = {
|
||||
openclaw_gateway = {
|
||||
name = "openclaw-gateway"
|
||||
image_tag_mutability = "MUTABLE"
|
||||
scan_on_push = true
|
||||
}
|
||||
litellm_proxy = {
|
||||
name = "litellm-proxy"
|
||||
image_tag_mutability = "MUTABLE"
|
||||
scan_on_push = true
|
||||
}
|
||||
}
|
||||
|
||||
lifecycle_policy_enabled = true
|
||||
lifecycle_policy_days = 30
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Application Load Balancer
|
||||
# ==============================================================================
|
||||
|
||||
module "alb" {
|
||||
source = "./alb"
|
||||
|
||||
alb_name = "${local.name_prefix}-alb"
|
||||
vpc_id = var.vpc_id
|
||||
subnet_ids = module.vpc.public_subnet_ids
|
||||
security_group_ids = [module.eks.node_security_group_id]
|
||||
|
||||
# Listener configuration
|
||||
http_port = 80
|
||||
https_port = 443
|
||||
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
|
||||
certificate_arn = var.acm_certificate_arn
|
||||
|
||||
# Target groups
|
||||
target_groups = [
|
||||
{
|
||||
name = "openclaw-gateway"
|
||||
port = 18789
|
||||
protocol = "HTTP"
|
||||
health_check_path = "/health"
|
||||
},
|
||||
{
|
||||
name = "litellm-proxy"
|
||||
port = 4000
|
||||
protocol = "HTTP"
|
||||
health_check_path = "/health"
|
||||
}
|
||||
]
|
||||
|
||||
enable_deletion_protection = var.alb_deletion_protection
|
||||
enable_http2 = true
|
||||
drop_invalid_header_fields = true
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# CloudWatch Monitoring
|
||||
# ==============================================================================
|
||||
|
||||
module "cloudwatch" {
|
||||
source = "./monitoring"
|
||||
|
||||
name_prefix = local.name_prefix
|
||||
eks_cluster_name = aws_eks_cluster.openclaw_cluster.name
|
||||
rds_identifier = module.rds.db_instance_identifier
|
||||
redis_cluster_id = module.elasticache.redis_cluster_id
|
||||
|
||||
# Dashboard configuration
|
||||
enable_dashboard = true
|
||||
dashboard_name = "${local.name_prefix}-dashboard"
|
||||
|
||||
# Alarm configuration
|
||||
enable_alarms = var.enable_cloudwatch_alarms
|
||||
alarm_notification_arn = var.alarm_notification_arn
|
||||
|
||||
# Log groups
|
||||
log_retention_days = var.log_retention_days
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Outputs
|
||||
# ==============================================================================
|
||||
|
||||
output "vpc_id" {
|
||||
description = "VPC ID"
|
||||
value = module.vpc.vpc_id
|
||||
}
|
||||
|
||||
output "eks_cluster_endpoint" {
|
||||
description = "EKS cluster endpoint"
|
||||
value = aws_eks_cluster.openclaw_cluster.endpoint
|
||||
}
|
||||
|
||||
output "eks_cluster_name" {
|
||||
description = "EKS cluster name"
|
||||
value = aws_eks_cluster.openclaw_cluster.name
|
||||
}
|
||||
|
||||
output "rds_endpoint" {
|
||||
description = "RDS PostgreSQL endpoint"
|
||||
value = module.rds.db_instance_endpoint
|
||||
}
|
||||
|
||||
output "redis_endpoint" {
|
||||
description = "ElastiCache Redis endpoint"
|
||||
value = module.elasticache.redis_endpoint
|
||||
}
|
||||
|
||||
output "alb_dns_name" {
|
||||
description = "ALB DNS name"
|
||||
value = module.alb.alb_dns_name
|
||||
}
|
||||
|
||||
output "ecr_repository_urls" {
|
||||
description = "ECR repository URLs"
|
||||
value = local.ecr_repository_urls
|
||||
}
|
||||
@@ -0,0 +1,263 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - AWS Terraform Outputs
|
||||
# ==============================================================================
|
||||
# Output values for AWS infrastructure
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# VPC Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "vpc_id" {
|
||||
description = "The ID of the VPC"
|
||||
value = module.vpc.vpc_id
|
||||
}
|
||||
|
||||
output "vpc_cidr_block" {
|
||||
description = "The CIDR block of the VPC"
|
||||
value = module.vpc.vpc_cidr_block
|
||||
}
|
||||
|
||||
output "public_subnet_ids" {
|
||||
description = "List of public subnet IDs"
|
||||
value = module.vpc.public_subnet_ids
|
||||
}
|
||||
|
||||
output "private_subnet_ids" {
|
||||
description = "List of private subnet IDs"
|
||||
value = module.vpc.private_subnet_ids
|
||||
}
|
||||
|
||||
output "database_subnet_ids" {
|
||||
description = "List of database subnet IDs"
|
||||
value = module.vpc.database_subnet_ids
|
||||
}
|
||||
|
||||
output "nat_gateway_ids" {
|
||||
description = "List of NAT Gateway IDs"
|
||||
value = module.vpc.nat_gateway_ids
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# EKS Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "eks_cluster_id" {
|
||||
description = "The ID of the EKS cluster"
|
||||
value = aws_eks_cluster.openclaw_cluster.id
|
||||
}
|
||||
|
||||
output "eks_cluster_name" {
|
||||
description = "The name of the EKS cluster"
|
||||
value = aws_eks_cluster.openclaw_cluster.name
|
||||
}
|
||||
|
||||
output "eks_cluster_endpoint" {
|
||||
description = "The endpoint of the EKS cluster"
|
||||
value = aws_eks_cluster.openclaw_cluster.endpoint
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "eks_cluster_certificate_authority" {
|
||||
description = "The certificate authority of the EKS cluster"
|
||||
value = aws_eks_cluster.openclaw_cluster.certificate_authority[0].data
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "eks_cluster_version" {
|
||||
description = "The Kubernetes version of the EKS cluster"
|
||||
value = aws_eks_cluster.openclaw_cluster.version
|
||||
}
|
||||
|
||||
output "eks_cluster_security_group_id" {
|
||||
description = "The security group ID of the EKS cluster"
|
||||
value = aws_eks_cluster.openclaw_cluster.vpc_config[0].cluster_security_group_id
|
||||
}
|
||||
|
||||
output "eks_node_security_group_id" {
|
||||
description = "The node security group ID"
|
||||
value = module.eks.node_security_group_id
|
||||
}
|
||||
|
||||
output "eks_oidc_provider_arn" {
|
||||
description = "The ARN of the OIDC provider"
|
||||
value = aws_iam_openid_connect_provider.eks.arn
|
||||
}
|
||||
|
||||
output "eks_kubeconfig_command" {
|
||||
description = "Command to update kubeconfig"
|
||||
value = "aws eks update-kubeconfig --name ${aws_eks_cluster.openclaw_cluster.name} --region ${var.aws_region}"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# RDS PostgreSQL Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "rds_instance_id" {
|
||||
description = "The ID of the RDS instance"
|
||||
value = module.rds.db_instance_id
|
||||
}
|
||||
|
||||
output "rds_instance_arn" {
|
||||
description = "The ARN of the RDS instance"
|
||||
value = module.rds.db_instance_arn
|
||||
}
|
||||
|
||||
output "rds_endpoint" {
|
||||
description = "The endpoint of the RDS instance"
|
||||
value = module.rds.db_instance_endpoint
|
||||
}
|
||||
|
||||
output "rds_port" {
|
||||
description = "The port of the RDS instance"
|
||||
value = module.rds.db_instance_port
|
||||
}
|
||||
|
||||
output "rds_database_name" {
|
||||
description = "The name of the database"
|
||||
value = module.rds.db_name
|
||||
}
|
||||
|
||||
output "rds_username" {
|
||||
description = "The master username of the RDS instance"
|
||||
value = module.rds.db_username
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "rds_connection_string" {
|
||||
description = "The PostgreSQL connection string"
|
||||
value = "postgresql://${module.rds.db_username}:${var.db_password}@${module.rds.db_instance_endpoint}/${module.rds.db_name}"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "rds_security_group_id" {
|
||||
description = "The security group ID of the RDS instance"
|
||||
value = module.rds.db_security_group_id
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ElastiCache Redis Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "redis_cluster_id" {
|
||||
description = "The ID of the Redis cluster"
|
||||
value = module.elasticache.redis_cluster_id
|
||||
}
|
||||
|
||||
output "redis_endpoint" {
|
||||
description = "The endpoint of the Redis cluster"
|
||||
value = module.elasticache.redis_endpoint
|
||||
}
|
||||
|
||||
output "redis_port" {
|
||||
description = "The port of the Redis cluster"
|
||||
value = module.elasticache.redis_port
|
||||
}
|
||||
|
||||
output "redis_connection_string" {
|
||||
description = "The Redis connection string"
|
||||
value = "redis://${var.redis_auth_token != null ? "${var.redis_auth_token}@" : ""}${module.elasticache.redis_endpoint}:${module.elasticache.redis_port}"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "redis_security_group_id" {
|
||||
description = "The security group ID of the Redis cluster"
|
||||
value = module.elasticache.redis_security_group_id
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ECR Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "ecr_repository_arns" {
|
||||
description = "ARNs of ECR repositories"
|
||||
value = module.ecr.repository_arns
|
||||
}
|
||||
|
||||
output "ecr_repository_urls" {
|
||||
description = "URLs of ECR repositories"
|
||||
value = module.ecr.repository_urls
|
||||
}
|
||||
|
||||
output "ecr_registry_id" {
|
||||
description = "ECR registry ID"
|
||||
value = module.ecr.registry_id
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ALB Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "alb_id" {
|
||||
description = "The ID of the ALB"
|
||||
value = module.alb.alb_id
|
||||
}
|
||||
|
||||
output "alb_arn" {
|
||||
description = "The ARN of the ALB"
|
||||
value = module.alb.alb_arn
|
||||
}
|
||||
|
||||
output "alb_dns_name" {
|
||||
description = "The DNS name of the ALB"
|
||||
value = module.alb.alb_dns_name
|
||||
}
|
||||
|
||||
output "alb_zone_id" {
|
||||
description = "The Zone ID of the ALB"
|
||||
value = module.alb.alb_zone_id
|
||||
}
|
||||
|
||||
output "alb_security_group_id" {
|
||||
description = "The security group ID of the ALB"
|
||||
value = module.alb.alb_security_group_id
|
||||
}
|
||||
|
||||
output "alb_http_listener_arn" {
|
||||
description = "The ARN of the HTTP listener"
|
||||
value = module.alb.http_listener_arn
|
||||
}
|
||||
|
||||
output "alb_https_listener_arn" {
|
||||
description = "The ARN of the HTTPS listener"
|
||||
value = module.alb.https_listener_arn
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# CloudWatch Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "cloudwatch_dashboard_arn" {
|
||||
description = "The ARN of the CloudWatch dashboard"
|
||||
value = module.cloudwatch.dashboard_arn
|
||||
}
|
||||
|
||||
output "cloudwatch_log_groups" {
|
||||
description = "Map of CloudWatch log group names"
|
||||
value = module.cloudwatch.log_group_names
|
||||
}
|
||||
|
||||
output "cloudwatch_alarm_arns" {
|
||||
description = "List of CloudWatch alarm ARNs"
|
||||
value = module.cloudwatch.alarm_arns
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cost Estimation
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "estimated_monthly_cost" {
|
||||
description = "Estimated monthly cost breakdown"
|
||||
value = {
|
||||
eks_cluster = "~$73 (control plane)"
|
||||
eks_nodes_general = "~$${var.node_groups.general.desired_size * 140} (general nodes)"
|
||||
eks_nodes_compute = "~$${var.node_groups.compute.desired_size * 250} (compute nodes)"
|
||||
eks_nodes_gpu = var.enable_gpu_support ? "~$${2 * 2000} (GPU nodes)" : "$0"
|
||||
rds_postgresql = "~$${var.db_multi_az ? 250 : 125} (db.${var.db_instance_class})"
|
||||
elasticache_redis = "~$${var.redis_multi_az_enabled ? 150 : 75} (${var.redis_node_type})"
|
||||
nat_gateway = var.single_nat_gateway ? "~$32" : "~$64"
|
||||
alb = "~$16"
|
||||
data_transfer = "Variable"
|
||||
total_estimate = "See AWS Cost Explorer for accurate pricing"
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,318 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - AWS RDS PostgreSQL Configuration
|
||||
# ==============================================================================
|
||||
# RDS PostgreSQL database for OpenClaw
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# RDS Subnet Group
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_db_subnet_group" "openclaw" {
|
||||
name = "${local.name_prefix}-db-subnet-group"
|
||||
subnet_ids = var.database_subnet_ids
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-db-subnet-group"
|
||||
})
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# RDS Security Group
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_security_group" "rds" {
|
||||
name = "${local.name_prefix}-rds-sg"
|
||||
description = "Security group for RDS PostgreSQL"
|
||||
vpc_id = var.vpc_id
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-rds-sg"
|
||||
})
|
||||
}
|
||||
|
||||
# Allow PostgreSQL access from EKS nodes
|
||||
resource "aws_security_group_rule" "rds_ingress_from_nodes" {
|
||||
description = "Allow PostgreSQL access from EKS nodes"
|
||||
security_group_id = aws_security_group.rds.id
|
||||
protocol = "tcp"
|
||||
from_port = 5432
|
||||
to_port = 5432
|
||||
source_security_group_id = var.security_group_ids[0]
|
||||
type = "ingress"
|
||||
}
|
||||
|
||||
# Allow outbound traffic
|
||||
resource "aws_security_group_rule" "rds_egress" {
|
||||
description = "Allow outbound traffic"
|
||||
security_group_id = aws_security_group.rds.id
|
||||
protocol = "tcp"
|
||||
from_port = 443
|
||||
to_port = 443
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
type = "egress"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# RDS PostgreSQL Instance
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_db_instance" "openclaw" {
|
||||
identifier = "${local.name_prefix}-pg"
|
||||
|
||||
# Engine configuration
|
||||
engine = "postgres"
|
||||
engine_version = var.postgresql_version
|
||||
instance_class = var.db_instance_class
|
||||
allocated_storage = var.db_allocated_storage
|
||||
max_allocated_storage = var.db_max_allocated_storage
|
||||
storage_type = "gp3"
|
||||
storage_encrypted = true
|
||||
kms_key_id = var.db_password_kms_key_id
|
||||
|
||||
# Database configuration
|
||||
db_name = var.db_name
|
||||
username = var.db_username
|
||||
password = var.db_password
|
||||
|
||||
# Network configuration
|
||||
db_subnet_group_name = aws_db_subnet_group.openclaw.name
|
||||
vpc_security_group_ids = [aws_security_group.rds.id]
|
||||
publicly_accessible = false
|
||||
|
||||
# High availability
|
||||
multi_az = var.db_multi_az
|
||||
availability_zone = var.db_multi_az ? null : data.aws_availability_zones.available.names[0]
|
||||
|
||||
# Backup configuration
|
||||
backup_retention_period = var.db_backup_retention_period
|
||||
backup_window = var.db_backup_window
|
||||
maintenance_window = var.db_maintenance_window
|
||||
copy_tags_to_snapshot = true
|
||||
delete_automated_backups = var.environment == "dev"
|
||||
skip_final_snapshot = var.environment == "dev"
|
||||
final_snapshot_identifier = var.environment == "dev" ? null : "${local.name_prefix}-final-snapshot"
|
||||
|
||||
# Monitoring
|
||||
enabled_cloudwatch_logs_exports = ["postgresql"]
|
||||
performance_insights_enabled = var.db_performance_insights_enabled
|
||||
performance_insights_retention_period = var.db_performance_insights_enabled ? var.db_performance_insights_retention : null
|
||||
monitoring_interval = 60
|
||||
monitoring_role_arn = aws_iam_role.rds_monitoring.arn
|
||||
|
||||
# Parameters
|
||||
parameter_group_name = aws_db_parameter_group.openclaw.name
|
||||
option_group_name = aws_db_option_group.openclaw.name
|
||||
|
||||
# Tags
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-pg"
|
||||
})
|
||||
|
||||
lifecycle {
|
||||
prevent_destroy = true
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# RDS Parameter Group
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_db_parameter_group" "openclaw" {
|
||||
name = "${local.name_prefix}-pg-params"
|
||||
family = "postgres${var.postgresql_version}"
|
||||
|
||||
parameter {
|
||||
name = "log_statement"
|
||||
value = "all"
|
||||
}
|
||||
|
||||
parameter {
|
||||
name = "log_min_duration_statement"
|
||||
value = "1000"
|
||||
}
|
||||
|
||||
parameter {
|
||||
name = "shared_preload_libraries"
|
||||
value = "pg_stat_statements"
|
||||
}
|
||||
|
||||
parameter {
|
||||
name = "pg_stat_statements.track"
|
||||
value = "all"
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-pg-params"
|
||||
})
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# RDS Option Group
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_db_option_group" "openclaw" {
|
||||
name = "${local.name_prefix}-pg-options"
|
||||
option_group_description = "Option group for OpenClaw PostgreSQL"
|
||||
engine_name = "postgres"
|
||||
major_engine_version = var.postgresql_version
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-pg-options"
|
||||
})
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# RDS Monitoring IAM Role
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_iam_role" "rds_monitoring" {
|
||||
name = "${local.name_prefix}-rds-monitoring-role"
|
||||
|
||||
assume_role_policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Action = "sts:AssumeRole"
|
||||
Effect = "Allow"
|
||||
Principal = {
|
||||
Service = "monitoring.rds.amazonaws.com"
|
||||
}
|
||||
}
|
||||
]
|
||||
})
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
resource "aws_iam_role_policy_attachment" "rds_monitoring" {
|
||||
role = aws_iam_role.rds_monitoring.name
|
||||
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonRDSEnhancedMonitoringRole"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# RDS Read Replica (Optional for Production)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_db_instance" "openclaw_replica" {
|
||||
count = var.environment == "prod" && var.db_multi_az ? 1 : 0
|
||||
|
||||
identifier = "${local.name_prefix}-pg-replica"
|
||||
replicate_source_db = aws_db_instance.openclaw.identifier
|
||||
instance_class = var.db_instance_class
|
||||
|
||||
# Network configuration
|
||||
db_subnet_group_name = aws_db_subnet_group.openclaw.name
|
||||
vpc_security_group_ids = [aws_security_group.rds.id]
|
||||
publicly_accessible = false
|
||||
|
||||
# Backup configuration
|
||||
backup_retention_period = 0
|
||||
skip_final_snapshot = true
|
||||
|
||||
# Monitoring
|
||||
monitoring_interval = 60
|
||||
monitoring_role_arn = aws_iam_role.rds_monitoring.arn
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "${local.name_prefix}-pg-replica"
|
||||
Role = "read-replica"
|
||||
})
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# RDS Proxy (Optional for Connection Pooling)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_secretsmanager_secret" "rds_credentials" {
|
||||
name = "${local.name_prefix}/rds/credentials"
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
resource "aws_secretsmanager_secret_version" "rds_credentials" {
|
||||
secret_id = aws_secretsmanager_secret.rds_credentials.id
|
||||
secret_string = jsonencode({
|
||||
username = var.db_username
|
||||
password = var.db_password
|
||||
dbname = var.db_name
|
||||
host = aws_db_instance.openclaw.address
|
||||
port = aws_db_instance.openclaw.port
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_db_proxy" "openclaw" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${local.name_prefix}-db-proxy"
|
||||
debug_logging = false
|
||||
engine_family = "POSTGRESQL"
|
||||
idle_client_timeout = 1800
|
||||
require_tls = true
|
||||
role_arn = aws_iam_role.rds_proxy.arn
|
||||
vpc_security_group_ids = [aws_security_group.rds.id]
|
||||
vpc_subnet_ids = var.database_subnet_ids
|
||||
|
||||
auth {
|
||||
auth_scheme = "SECRETS"
|
||||
iam_auth = "DISABLED"
|
||||
secret_arn = aws_secretsmanager_secret.rds_credentials.arn
|
||||
client_password = "REQUIRED"
|
||||
}
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
resource "aws_db_proxy_default_target_group" "openclaw" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
db_proxy_name = aws_db_proxy.openclaw[0].name
|
||||
|
||||
connection_pool_config {
|
||||
connection_borrow_timeout = 120
|
||||
init_query = "SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY;"
|
||||
max_connections_percent = 100
|
||||
max_idle_connections_percent = 50
|
||||
session_pinning_filters = ["EXCLUDE_CHANGE_SET"]
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_db_proxy_target" "openclaw" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
db_instance_identifier = aws_db_instance.openclaw.identifier
|
||||
db_proxy_name = aws_db_proxy.openclaw[0].name
|
||||
target_group_name = aws_db_proxy_default_target_group.openclaw[0].name
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# RDS Proxy IAM Role
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_iam_role" "rds_proxy" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${local.name_prefix}-rds-proxy-role"
|
||||
|
||||
assume_role_policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Action = "sts:AssumeRole"
|
||||
Effect = "Allow"
|
||||
Principal = {
|
||||
Service = "rds.amazonaws.com"
|
||||
}
|
||||
}
|
||||
]
|
||||
})
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
resource "aws_iam_role_policy_attachment" "rds_proxy" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
role = aws_iam_role.rds_proxy[0].name
|
||||
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonRDSProxyFullAccess"
|
||||
}
|
||||
@@ -0,0 +1,351 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - AWS Terraform Variables
|
||||
# ==============================================================================
|
||||
# Input variables for AWS infrastructure
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# General Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "aws_region" {
|
||||
description = "AWS region for resources"
|
||||
type = string
|
||||
default = "us-east-1"
|
||||
}
|
||||
|
||||
variable "environment" {
|
||||
description = "Deployment environment (dev, staging, prod)"
|
||||
type = string
|
||||
default = "dev"
|
||||
|
||||
validation {
|
||||
condition = contains(["dev", "staging", "prod"], var.environment)
|
||||
error_message = "Environment must be one of: dev, staging, prod."
|
||||
}
|
||||
}
|
||||
|
||||
variable "owner" {
|
||||
description = "Owner of the resources"
|
||||
type = string
|
||||
default = "platform-team"
|
||||
}
|
||||
|
||||
variable "app_version" {
|
||||
description = "Application version to deploy"
|
||||
type = string
|
||||
default = "2026.3.28"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# VPC Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "vpc_cidr" {
|
||||
description = "CIDR block for VPC"
|
||||
type = string
|
||||
default = "10.0.0.0/16"
|
||||
}
|
||||
|
||||
variable "public_subnet_cidrs" {
|
||||
description = "CIDR blocks for public subnets"
|
||||
type = list(string)
|
||||
default = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
|
||||
}
|
||||
|
||||
variable "private_subnet_cidrs" {
|
||||
description = "CIDR blocks for private subnets"
|
||||
type = list(string)
|
||||
default = ["10.0.10.0/24", "10.0.11.0/24", "10.0.12.0/24"]
|
||||
}
|
||||
|
||||
variable "database_subnet_cidrs" {
|
||||
description = "CIDR blocks for database subnets"
|
||||
type = list(string)
|
||||
default = ["10.0.20.0/24", "10.0.21.0/24", "10.0.22.0/24"]
|
||||
}
|
||||
|
||||
variable "enable_nat_gateway" {
|
||||
description = "Enable NAT Gateway for private subnets"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "single_nat_gateway" {
|
||||
description = "Use single NAT Gateway (cost optimization for dev)"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "enable_vpc_flow_logs" {
|
||||
description = "Enable VPC Flow Logs"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "flow_logs_retention_days" {
|
||||
description = "Retention period for VPC Flow Logs"
|
||||
type = number
|
||||
default = 30
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# EKS Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "eks_version" {
|
||||
description = "Kubernetes version for EKS"
|
||||
type = string
|
||||
default = "1.28"
|
||||
}
|
||||
|
||||
variable "enable_irsa" {
|
||||
description = "Enable IAM Roles for Service Accounts"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "enable_cluster_autoscaler" {
|
||||
description = "Enable Cluster Autoscaler"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "node_groups" {
|
||||
description = "EKS node group configurations"
|
||||
type = object({
|
||||
general = object({
|
||||
instance_types = list(string)
|
||||
min_size = number
|
||||
max_size = number
|
||||
desired_size = number
|
||||
disk_size = number
|
||||
})
|
||||
compute = object({
|
||||
instance_types = list(string)
|
||||
min_size = number
|
||||
max_size = number
|
||||
desired_size = number
|
||||
disk_size = number
|
||||
})
|
||||
})
|
||||
default = {
|
||||
general = {
|
||||
instance_types = ["m6i.xlarge", "m6i.2xlarge"]
|
||||
min_size = 1
|
||||
max_size = 4
|
||||
desired_size = 2
|
||||
disk_size = 50
|
||||
}
|
||||
compute = {
|
||||
instance_types = ["c6i.2xlarge", "c6i.4xlarge"]
|
||||
min_size = 1
|
||||
max_size = 8
|
||||
desired_size = 2
|
||||
disk_size = 100
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
variable "enable_gpu_support" {
|
||||
description = "Enable GPU node group for Ollama"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "gpu_instance_types" {
|
||||
description = "GPU instance types for Ollama (G5 for NVIDIA)"
|
||||
type = list(string)
|
||||
default = ["g5.xlarge", "g5.2xlarge"]
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# RDS PostgreSQL Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "postgresql_version" {
|
||||
description = "PostgreSQL engine version"
|
||||
type = string
|
||||
default = "15"
|
||||
}
|
||||
|
||||
variable "db_instance_class" {
|
||||
description = "RDS instance class"
|
||||
type = string
|
||||
default = "db.m6i.large"
|
||||
}
|
||||
|
||||
variable "db_allocated_storage" {
|
||||
description = "Initial allocated storage in GB"
|
||||
type = number
|
||||
default = 50
|
||||
}
|
||||
|
||||
variable "db_max_allocated_storage" {
|
||||
description = "Maximum allocated storage in GB"
|
||||
type = number
|
||||
default = 500
|
||||
}
|
||||
|
||||
variable "db_name" {
|
||||
description = "Database name"
|
||||
type = string
|
||||
default = "openclaw"
|
||||
}
|
||||
|
||||
variable "db_username" {
|
||||
description = "Database master username"
|
||||
type = string
|
||||
default = "openclaw"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "db_password" {
|
||||
description = "Database master password"
|
||||
type = string
|
||||
default = null
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "db_password_kms_key_id" {
|
||||
description = "KMS key ID for encrypting db_password"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "db_multi_az" {
|
||||
description = "Enable Multi-AZ deployment"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "db_backup_retention_period" {
|
||||
description = "Backup retention period in days"
|
||||
type = number
|
||||
default = 7
|
||||
}
|
||||
|
||||
variable "db_backup_window" {
|
||||
description = "Preferred backup window"
|
||||
type = string
|
||||
default = "03:00-04:00"
|
||||
}
|
||||
|
||||
variable "db_maintenance_window" {
|
||||
description = "Preferred maintenance window"
|
||||
type = string
|
||||
default = "Mon:04:00-Mon:05:00"
|
||||
}
|
||||
|
||||
variable "db_performance_insights_enabled" {
|
||||
description = "Enable Performance Insights"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "db_performance_insights_retention" {
|
||||
description = "Performance Insights retention period in days"
|
||||
type = number
|
||||
default = 7
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ElastiCache Redis Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "redis_node_type" {
|
||||
description = "Redis node type"
|
||||
type = string
|
||||
default = "cache.m6i.large"
|
||||
}
|
||||
|
||||
variable "redis_engine_version" {
|
||||
description = "Redis engine version"
|
||||
type = string
|
||||
default = "7.0"
|
||||
}
|
||||
|
||||
variable "redis_num_cache_nodes" {
|
||||
description = "Number of cache nodes"
|
||||
type = number
|
||||
default = 1
|
||||
}
|
||||
|
||||
variable "redis_parameter_group_name" {
|
||||
description = "Redis parameter group name"
|
||||
type = string
|
||||
default = "default.redis7"
|
||||
}
|
||||
|
||||
variable "redis_automatic_failover_enabled" {
|
||||
description = "Enable automatic failover (requires cluster mode)"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "redis_multi_az_enabled" {
|
||||
description = "Enable Multi-AZ for Redis"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "redis_auth_token" {
|
||||
description = "Redis authentication token"
|
||||
type = string
|
||||
default = null
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "redis_auth_token_kms_key_id" {
|
||||
description = "KMS key ID for encrypting redis_auth_token"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ECR Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "lifecycle_policy_days" {
|
||||
description = "Days to retain images in ECR"
|
||||
type = number
|
||||
default = 30
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ALB Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "acm_certificate_arn" {
|
||||
description = "ACM certificate ARN for HTTPS listener"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "alb_deletion_protection" {
|
||||
description = "Enable ALB deletion protection"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# CloudWatch Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "enable_cloudwatch_alarms" {
|
||||
description = "Enable CloudWatch alarms"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "alarm_notification_arn" {
|
||||
description = "SNS topic ARN for alarm notifications"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "log_retention_days" {
|
||||
description = "CloudWatch Logs retention period"
|
||||
type = number
|
||||
default = 30
|
||||
}
|
||||
@@ -0,0 +1,294 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - AWS VPC Configuration
|
||||
# ==============================================================================
|
||||
# VPC module for OpenClaw infrastructure
|
||||
# ==============================================================================
|
||||
|
||||
# This file is a placeholder - the actual VPC configuration
|
||||
# is in the ./vpc subdirectory module referenced in main.tf
|
||||
#
|
||||
# The VPC module creates:
|
||||
# - VPC with configurable CIDR
|
||||
# - Public subnets across multiple AZs
|
||||
# - Private subnets for application workloads
|
||||
# - Database subnets for RDS
|
||||
# - Internet Gateway
|
||||
# - NAT Gateways (configurable)
|
||||
# - Route tables
|
||||
# - VPC Flow Logs
|
||||
#
|
||||
# Usage in main.tf:
|
||||
# module "vpc" {
|
||||
# source = "./vpc"
|
||||
# ...
|
||||
# }
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# VPC Module Structure
|
||||
# ------------------------------------------------------------------------------
|
||||
#
|
||||
# File: deploy/aws/terraform/vpc/main.tf
|
||||
# File: deploy/aws/terraform/vpc/variables.tf
|
||||
# File: deploy/aws/terraform/vpc/outputs.tf
|
||||
#
|
||||
# For now, we inline the VPC resources here for simplicity.
|
||||
# In production, extract to a separate module.
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# VPC
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_vpc" "openclaw" {
|
||||
cidr_block = var.vpc_cidr
|
||||
enable_dns_hostnames = true
|
||||
enable_dns_support = true
|
||||
|
||||
tags = {
|
||||
Name = "${local.name_prefix}-vpc"
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Internet Gateway
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_internet_gateway" "openclaw" {
|
||||
vpc_id = aws_vpc.openclaw.id
|
||||
|
||||
tags = {
|
||||
Name = "${local.name_prefix}-igw"
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Public Subnets
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_subnet" "public" {
|
||||
count = length(var.public_subnet_cidrs)
|
||||
|
||||
vpc_id = aws_vpc.openclaw.id
|
||||
cidr_block = var.public_subnet_cidrs[count.index]
|
||||
availability_zone = element(data.aws_availability_zones.available.names, count.index)
|
||||
map_public_ip_on_launch = true
|
||||
|
||||
tags = {
|
||||
Name = "${local.name_prefix}-public-${count.index + 1}"
|
||||
Type = "public"
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Private Subnets
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_subnet" "private" {
|
||||
count = length(var.private_subnet_cidrs)
|
||||
|
||||
vpc_id = aws_vpc.openclaw.id
|
||||
cidr_block = var.private_subnet_cidrs[count.index]
|
||||
availability_zone = element(data.aws_availability_zones.available.names, count.index)
|
||||
|
||||
tags = {
|
||||
Name = "${local.name_prefix}-private-${count.index + 1}"
|
||||
Type = "private"
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Database Subnets
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_subnet" "database" {
|
||||
count = length(var.database_subnet_cidrs)
|
||||
|
||||
vpc_id = aws_vpc.openclaw.id
|
||||
cidr_block = var.database_subnet_cidrs[count.index]
|
||||
availability_zone = element(data.aws_availability_zones.available.names, count.index)
|
||||
|
||||
tags = {
|
||||
Name = "${local.name_prefix}-database-${count.index + 1}"
|
||||
Type = "database"
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Database Subnet Group
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_db_subnet_group" "openclaw" {
|
||||
name = "${local.name_prefix}-db-subnet-group"
|
||||
subnet_ids = aws_subnet.database[*].id
|
||||
|
||||
tags = {
|
||||
Name = "${local.name_prefix}-db-subnet-group"
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Elastic IP for NAT Gateway
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_eip" "nat" {
|
||||
count = var.single_nat_gateway ? 1 : length(var.public_subnet_cidrs)
|
||||
domain = "vpc"
|
||||
|
||||
tags = {
|
||||
Name = "${local.name_prefix}-nat-eip-${count.index + 1}"
|
||||
}
|
||||
|
||||
depends_on = [aws_internet_gateway.openclaw]
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# NAT Gateway
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_nat_gateway" "openclaw" {
|
||||
count = var.single_nat_gateway ? 1 : length(var.public_subnet_cidrs)
|
||||
|
||||
allocation_id = aws_eip.nat[count.index].id
|
||||
subnet_id = aws_subnet.public[count.index].id
|
||||
|
||||
tags = {
|
||||
Name = "${local.name_prefix}-nat-${count.index + 1}"
|
||||
}
|
||||
|
||||
depends_on = [aws_internet_gateway.openclaw]
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Route Tables
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
# Public route table
|
||||
resource "aws_route_table" "public" {
|
||||
vpc_id = aws_vpc.openclaw.id
|
||||
|
||||
route {
|
||||
cidr_block = "0.0.0.0/0"
|
||||
gateway_id = aws_internet_gateway.openclaw.id
|
||||
}
|
||||
|
||||
tags = {
|
||||
Name = "${local.name_prefix}-public-rt"
|
||||
Type = "public"
|
||||
}
|
||||
}
|
||||
|
||||
# Private route table
|
||||
resource "aws_route_table" "private" {
|
||||
count = var.single_nat_gateway ? 1 : length(var.private_subnet_cidrs)
|
||||
|
||||
vpc_id = aws_vpc.openclaw.id
|
||||
|
||||
route {
|
||||
cidr_block = "0.0.0.0/0"
|
||||
nat_gateway_id = aws_nat_gateway.openclaw[var.single_nat_gateway ? 0 : count.index].id
|
||||
}
|
||||
|
||||
tags = {
|
||||
Name = "${local.name_prefix}-private-rt-${count.index + 1}"
|
||||
Type = "private"
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Route Table Associations
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_route_table_association" "public" {
|
||||
count = length(var.public_subnet_cidrs)
|
||||
|
||||
subnet_id = aws_subnet.public[count.index].id
|
||||
route_table_id = aws_route_table.public.id
|
||||
}
|
||||
|
||||
resource "aws_route_table_association" "private" {
|
||||
count = length(var.private_subnet_cidrs)
|
||||
|
||||
subnet_id = aws_subnet.private[count.index].id
|
||||
route_table_id = aws_route_table.private[var.single_nat_gateway ? 0 : count.index].id
|
||||
}
|
||||
|
||||
resource "aws_route_table_association" "database" {
|
||||
count = length(var.database_subnet_cidrs)
|
||||
|
||||
subnet_id = aws_subnet.database[count.index].id
|
||||
route_table_id = aws_route_table.public.id
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# VPC Flow Logs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "aws_cloudwatch_log_group" "flow_logs" {
|
||||
count = var.enable_vpc_flow_logs ? 1 : 0
|
||||
|
||||
name = "/aws/vpc/${local.name_prefix}-flow-logs"
|
||||
retention_in_days = var.flow_logs_retention_days
|
||||
|
||||
tags = {
|
||||
Name = "${local.name_prefix}-flow-logs"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_flow_log" "openclaw" {
|
||||
count = var.enable_vpc_flow_logs ? 1 : 0
|
||||
|
||||
iam_role_arn = aws_iam_role.flow_logs[0].arn
|
||||
log_destination = aws_cloudwatch_log_group.flow_logs[0].arn
|
||||
traffic_type = "ALL"
|
||||
vpc_id = aws_vpc.openclaw.id
|
||||
|
||||
tags = {
|
||||
Name = "${local.name_prefix}-flow-log"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_iam_role" "flow_logs" {
|
||||
count = var.enable_vpc_flow_logs ? 1 : 0
|
||||
|
||||
name = "${local.name_prefix}-flow-logs-role"
|
||||
|
||||
assume_role_policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Action = "sts:AssumeRole"
|
||||
Effect = "Allow"
|
||||
Principal = {
|
||||
Service = "vpc-flow-logs.amazonaws.com"
|
||||
}
|
||||
}
|
||||
]
|
||||
})
|
||||
|
||||
tags = {
|
||||
Name = "${local.name_prefix}-flow-logs-role"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_iam_role_policy" "flow_logs" {
|
||||
count = var.enable_vpc_flow_logs ? 1 : 0
|
||||
|
||||
name = "${local.name_prefix}-flow-logs-policy"
|
||||
role = aws_iam_role.flow_logs[0].id
|
||||
|
||||
policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Action = [
|
||||
"logs:CreateLogGroup",
|
||||
"logs:CreateLogStream",
|
||||
"logs:PutLogEvents",
|
||||
"logs:DescribeLogGroups",
|
||||
"logs:DescribeLogStreams"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = "*"
|
||||
}
|
||||
]
|
||||
})
|
||||
}
|
||||
@@ -0,0 +1,522 @@
|
||||
# Azure Deployment Guide for Heretek OpenClaw
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Last Updated:** 2026-03-31
|
||||
**OpenClaw Version:** v2026.3.28
|
||||
|
||||
This guide provides comprehensive instructions for deploying Heretek OpenClaw on Microsoft Azure using Terraform Infrastructure as Code (IaC).
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Prerequisites](#prerequisites)
|
||||
3. [Architecture](#architecture)
|
||||
4. [Cost Estimates](#cost-estimates)
|
||||
5. [Quick Start](#quick-start)
|
||||
6. [Configuration](#configuration)
|
||||
7. [Deployment Steps](#deployment-steps)
|
||||
8. [Post-Deployment](#post-deployment)
|
||||
9. [GPU Support](#gpu-support)
|
||||
10. [Monitoring](#monitoring)
|
||||
11. [Backup & Recovery](#backup--recovery)
|
||||
12. [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This Terraform configuration deploys a production-ready OpenClaw environment on Azure with:
|
||||
|
||||
- **AKS (Azure Kubernetes Service)** - Managed Kubernetes cluster
|
||||
- **Azure Database for PostgreSQL** - Flexible Server with pgvector support
|
||||
- **Azure Cache for Redis** - Managed Redis for caching and sessions
|
||||
- **Azure Container Registry (ACR)** - Private container registry
|
||||
- **Application Gateway** - Traffic routing and SSL termination
|
||||
- **Azure Monitor** - Metrics, logging, and alerting
|
||||
|
||||
### Components
|
||||
|
||||
| Component | Service | Purpose |
|
||||
|-----------|---------|---------|
|
||||
| Gateway | AKS | OpenClaw Gateway (port 18789) |
|
||||
| LiteLLM | AKS | LLM proxy and routing (port 4000) |
|
||||
| Database | Azure Database for PostgreSQL 15 | Primary data store with pgvector |
|
||||
| Cache | Azure Cache for Redis | Session management, caching |
|
||||
| Container Registry | ACR | Private image storage |
|
||||
| Load Balancer | Application Gateway | HTTPS termination, routing |
|
||||
| Monitoring | Azure Monitor | Metrics, logs, alerts |
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Required Tools
|
||||
|
||||
```bash
|
||||
# Install Terraform
|
||||
brew install terraform # macOS
|
||||
# or download from https://www.terraform.io/downloads
|
||||
|
||||
# Install Azure CLI
|
||||
brew install azure-cli # macOS
|
||||
# or follow https://docs.microsoft.com/en-us/cli/azure/install-azure-cli
|
||||
|
||||
# Install kubectl
|
||||
brew install kubectl
|
||||
|
||||
# Install Helm
|
||||
brew install helm
|
||||
```
|
||||
|
||||
### Azure Account Setup
|
||||
|
||||
1. **Azure Subscription** - Active subscription with sufficient credits
|
||||
2. **Service Principal** - Service principal with contributor access
|
||||
3. **Budget Alert** - Set up cost alerts in Azure Cost Management
|
||||
|
||||
### Configure Azure Credentials
|
||||
|
||||
```bash
|
||||
# Login to Azure
|
||||
az login
|
||||
|
||||
# Set subscription
|
||||
az account set --subscription "YOUR_SUBSCRIPTION_ID"
|
||||
|
||||
# Create service principal for Terraform
|
||||
az ad sp create-for-rbac --name "openclaw-terraform" --role contributor \
|
||||
--scopes /subscriptions/YOUR_SUBSCRIPTION_ID \
|
||||
--sdk-auth
|
||||
|
||||
# Set environment variables
|
||||
export ARM_CLIENT_ID="your-app-id"
|
||||
export ARM_CLIENT_SECRET="your-password"
|
||||
export ARM_SUBSCRIPTION_ID="your-subscription-id"
|
||||
export ARM_TENANT_ID="your-tenant-id"
|
||||
```
|
||||
|
||||
### Required Azure Permissions
|
||||
|
||||
| Service | Required Permissions |
|
||||
|---------|---------------------|
|
||||
| AKS | Contributor |
|
||||
| Virtual Network | Network Contributor |
|
||||
| PostgreSQL | PostgreSQL Server Contributor |
|
||||
| Redis | Redis Cache Contributor |
|
||||
| ACR | AcrPush |
|
||||
| Application Gateway | Network Contributor |
|
||||
| Key Vault | Key Vault Administrator |
|
||||
| Monitor | Monitoring Contributor |
|
||||
|
||||
### Enable Required Resource Providers
|
||||
|
||||
```bash
|
||||
az provider register --namespace Microsoft.ContainerService
|
||||
az provider register --namespace Microsoft.DBforPostgreSQL
|
||||
az provider register --namespace Microsoft.Cache
|
||||
az provider register --namespace Microsoft.ContainerRegistry
|
||||
az provider register --namespace Microsoft.Network
|
||||
az provider register --namespace Microsoft.KeyVault
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ Microsoft Azure │
|
||||
│ East US │
|
||||
└─────────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────────┼─────────────────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
|
||||
│ Availability │ │ Availability │ │ Availability │
|
||||
│ Zone 1 │ │ Zone 2 │ │ Zone 3 │
|
||||
│ │ │ │ │ │
|
||||
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
|
||||
│ │ AKS Nodes │ │ │ │ AKS Nodes │ │ │ │ AKS Nodes │ │
|
||||
│ │ (System) │ │ │ │ (User) │ │ │ │ (GPU) │ │
|
||||
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
|
||||
│ │ │ │ │ │
|
||||
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
|
||||
│ │ PostgreSQL │ │ │ │ Azure Cache │ │ │ │ ACR │ │
|
||||
│ │ Flexible │ │ │ │ for Redis │ │ │ │ │ │
|
||||
│ │ Server │ │ │ │ │ │ │ │ │ │
|
||||
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
|
||||
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
|
||||
│ │ │
|
||||
└─────────────────────────────────┼─────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────────┼─────────────────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Application Gateway │
|
||||
│ (WAF_v2 with SSL Termination) │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Azure Monitor │
|
||||
│ (Log Analytics, Alerts, Dashboard) │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cost Estimates
|
||||
|
||||
### Development Environment
|
||||
|
||||
| Resource | Configuration | Monthly Cost (USD) |
|
||||
|----------|--------------|-------------------|
|
||||
| AKS Cluster | Standard | $73.00 |
|
||||
| AKS Nodes | 2x Standard_D4s_v3 | $280.00 |
|
||||
| PostgreSQL | GP_Gen5_2, 100GB | $150.00 |
|
||||
| Redis Cache | C2 Standard | $100.00 |
|
||||
| Application Gateway | Standard_v2 | $30.00 |
|
||||
| ACR | Standard | $10.00 |
|
||||
| Azure Monitor | Standard | $50.00 |
|
||||
| Network Egress | Estimated | $30.00 |
|
||||
| **Total** | | **~$723.00/month** |
|
||||
|
||||
### Production Environment
|
||||
|
||||
| Resource | Configuration | Monthly Cost (USD) |
|
||||
|----------|--------------|-------------------|
|
||||
| AKS Cluster | Standard | $73.00 |
|
||||
| AKS Nodes System | 3x Standard_D2s_v3 | $210.00 |
|
||||
| AKS Nodes User | 4x Standard_D8s_v3 | $1,120.00 |
|
||||
| AKS Nodes GPU | 2x Standard_NC4as_T4_v3 | $5,000.00 |
|
||||
| PostgreSQL | GP_Gen5_4, Multi-AZ, 200GB | $400.00 |
|
||||
| Redis Cache | C6 Premium | $400.00 |
|
||||
| Application Gateway | WAF_v2 | $100.00 |
|
||||
| ACR | Premium | $50.00 |
|
||||
| Azure Monitor | Premium | $100.00 |
|
||||
| Key Vault | Standard | $5.00 |
|
||||
| Network Egress | Estimated | $150.00 |
|
||||
| **Total** | | **~$7,608.00/month** |
|
||||
|
||||
> **Note:** GPU costs are significant. Consider using spot instances or scheduling for cost optimization.
|
||||
|
||||
### Cost Optimization Tips
|
||||
|
||||
1. **Use Azure Reserved VM Instances** for predictable workloads (up to 72% savings)
|
||||
2. **Use Azure Spot VMs** for non-critical workloads
|
||||
3. **Enable AKS Cluster Autoscaler** to scale nodes based on demand
|
||||
4. **Use PostgreSQL Burstable SKU** for development environments
|
||||
5. **Enable Azure Cost Management budgets**
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Clone Repository
|
||||
|
||||
```bash
|
||||
git clone https://github.com/Heretek-AI/heretek-openclaw.git
|
||||
cd heretek-openclaw/deploy/azure/terraform
|
||||
```
|
||||
|
||||
### Initialize Terraform
|
||||
|
||||
```bash
|
||||
terraform init
|
||||
```
|
||||
|
||||
### Create Terraform Variables File
|
||||
|
||||
```bash
|
||||
cat > terraform.tfvars <<EOF
|
||||
resource_group_name = "openclaw-rg"
|
||||
location = "eastus"
|
||||
environment = "dev"
|
||||
|
||||
vnet_address_space = ["10.0.0.0/16"]
|
||||
|
||||
db_administrator_login = "openclaw"
|
||||
db_administrator_password = "generate-secure-password"
|
||||
redis_password = "generate-secure-token"
|
||||
|
||||
# Optional: GPU support for Ollama
|
||||
enable_gpu_support = false
|
||||
|
||||
# Optional: Custom domain
|
||||
domain_name_label = "openclaw-dev"
|
||||
EOF
|
||||
```
|
||||
|
||||
### Plan and Apply
|
||||
|
||||
```bash
|
||||
# Review the plan
|
||||
terraform plan -out=tfplan
|
||||
|
||||
# Apply the configuration
|
||||
terraform apply tfplan
|
||||
```
|
||||
|
||||
### Configure kubectl
|
||||
|
||||
```bash
|
||||
az aks get-credentials --resource-group openclaw-rg --name openclaw-dev-aks
|
||||
```
|
||||
|
||||
### Deploy OpenClaw to AKS
|
||||
|
||||
```bash
|
||||
cd ../../kubernetes
|
||||
kubectl apply -k overlays/dev
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Input Variables
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|----------|-------------|---------|----------|
|
||||
| `resource_group_name` | Resource group name | `openclaw-rg` | No |
|
||||
| `location` | Azure region | `eastus` | No |
|
||||
| `environment` | Environment name | `dev` | Yes |
|
||||
| `vnet_address_space` | VNet CIDR | `["10.0.0.0/16"]` | No |
|
||||
| `enable_gpu_support` | Enable GPU nodes | `false` | No |
|
||||
| `db_administrator_password` | PostgreSQL password | `null` | Yes |
|
||||
| `redis_password` | Redis password | `null` | Yes |
|
||||
| `domain_name_label` | DNS label for gateway | `null` | No |
|
||||
|
||||
### Environment-Specific Overrides
|
||||
|
||||
#### Development (`terraform.dev.tfvars`)
|
||||
|
||||
```hcl
|
||||
environment = "dev"
|
||||
db_geo_redundant_backup = false
|
||||
redis_sku_name = "Basic"
|
||||
enable_monitoring_alerts = false
|
||||
|
||||
default_node_pool = {
|
||||
name = "default"
|
||||
vm_size = "Standard_D2s_v3"
|
||||
node_count = 1
|
||||
min_count = 1
|
||||
max_count = 2
|
||||
enable_auto_scaling = true
|
||||
}
|
||||
```
|
||||
|
||||
#### Production (`terraform.prod.tfvars`)
|
||||
|
||||
```hcl
|
||||
environment = "prod"
|
||||
db_geo_redundant_backup = true
|
||||
redis_sku_name = "Premium"
|
||||
enable_monitoring_alerts = true
|
||||
enable_private_cluster = true
|
||||
|
||||
default_node_pool = {
|
||||
name = "default"
|
||||
vm_size = "Standard_D8s_v3"
|
||||
node_count = 3
|
||||
min_count = 3
|
||||
max_count = 10
|
||||
enable_auto_scaling = true
|
||||
}
|
||||
|
||||
gpu_node_pool = {
|
||||
name = "gpu"
|
||||
vm_size = "Standard_NC4as_T4_v3"
|
||||
node_count = 2
|
||||
min_count = 1
|
||||
max_count = 4
|
||||
enable_auto_scaling = true
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Steps
|
||||
|
||||
### Step 1: Prepare Azure Subscription
|
||||
|
||||
```bash
|
||||
# Verify Azure CLI configuration
|
||||
az account show
|
||||
|
||||
# Check subscription quota
|
||||
az vm list-usage --location eastus
|
||||
|
||||
# Enable required providers
|
||||
az provider register --namespace Microsoft.ContainerService
|
||||
az provider register --namespace Microsoft.DBforPostgreSQL
|
||||
```
|
||||
|
||||
### Step 2: Configure Terraform Backend
|
||||
|
||||
```bash
|
||||
# Create resource group
|
||||
az group create --name openclaw-tfstate-rg --location eastus
|
||||
|
||||
# Create storage account
|
||||
az storage account create --name tfstateopenclaw --resource-group openclaw-tfstate-rg \
|
||||
--location eastus --sku Standard_LRS
|
||||
|
||||
# Create container
|
||||
az storage container create --name tfstate --account-name tfstateopenclaw
|
||||
```
|
||||
|
||||
### Step 3: Initialize and Apply
|
||||
|
||||
```bash
|
||||
# Initialize with Azure backend
|
||||
terraform init \
|
||||
-backend-config="resource_group_name=openclaw-tfstate-rg" \
|
||||
-backend-config="storage_account_name=tfstateopenclaw" \
|
||||
-backend-config="container_name=tfstate" \
|
||||
-backend-config="key=openclaw/dev/terraform.tfstate"
|
||||
|
||||
# Plan
|
||||
terraform plan -var-file=terraform.dev.tfvars -out=tfplan
|
||||
|
||||
# Apply
|
||||
terraform apply tfplan
|
||||
```
|
||||
|
||||
### Step 4: Verify Deployment
|
||||
|
||||
```bash
|
||||
# Check AKS cluster
|
||||
az aks show --resource-group openclaw-rg --name openclaw-dev-aks
|
||||
|
||||
# Check PostgreSQL server
|
||||
az postgres flexible-server show --resource-group openclaw-rg --name openclaw-dev-pg
|
||||
|
||||
# Check Redis cache
|
||||
az redis show --resource-group openclaw-rg --name openclaw-dev-redis
|
||||
|
||||
# Check ACR
|
||||
az acr show --name openclawdevacr
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Post-Deployment
|
||||
|
||||
### Configure kubectl
|
||||
|
||||
```bash
|
||||
# Get AKS credentials
|
||||
az aks get-credentials --resource-group openclaw-rg --name openclaw-dev-aks
|
||||
|
||||
# Verify cluster access
|
||||
kubectl get nodes
|
||||
kubectl get namespaces
|
||||
```
|
||||
|
||||
### Deploy OpenClaw Helm Chart
|
||||
|
||||
```bash
|
||||
# Deploy using Helm
|
||||
helm install openclaw ./charts/openclaw \
|
||||
--namespace openclaw \
|
||||
--create-namespace \
|
||||
--values values.dev.yaml \
|
||||
--set image.repository=openclawdevacr.azurecr.io/openclaw-gateway \
|
||||
--set litellm.image.repository=openclawdevacr.azurecr.io/litellm-proxy
|
||||
```
|
||||
|
||||
### Configure Secrets
|
||||
|
||||
```bash
|
||||
# Create Kubernetes secrets
|
||||
kubectl create secret generic openclaw-secrets \
|
||||
--namespace openclaw \
|
||||
--from-literal=database-url="postgresql://openclaw:password@openclaw-dev-pg.postgres.database.azure.com:5432/postgres" \
|
||||
--from-literal=redis-url="redis://:password@openclaw-dev-redis.redis.cache.windows.net:6379" \
|
||||
--from-literal=minimax-api-key="your-minimax-key" \
|
||||
--from-literal=zai-api-key="your-zai-key"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## GPU Support
|
||||
|
||||
### Enable GPU Nodes
|
||||
|
||||
```hcl
|
||||
# terraform.tfvars
|
||||
enable_gpu_support = true
|
||||
gpu_node_pool = {
|
||||
name = "gpu"
|
||||
vm_size = "Standard_NC4as_T4_v3"
|
||||
node_count = 1
|
||||
min_count = 0
|
||||
max_count = 4
|
||||
enable_auto_scaling = true
|
||||
}
|
||||
```
|
||||
|
||||
### Install NVIDIA Device Plugin
|
||||
|
||||
```bash
|
||||
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Azure Monitor Dashboard
|
||||
|
||||
The deployment creates an Azure Monitor dashboard with:
|
||||
|
||||
- AKS cluster metrics
|
||||
- Node pool metrics
|
||||
- PostgreSQL metrics
|
||||
- Redis metrics
|
||||
- Application Gateway metrics
|
||||
- Application logs
|
||||
|
||||
### Access Dashboard
|
||||
|
||||
```bash
|
||||
# Open in Azure Portal
|
||||
open "https://portal.azure.com/#blade/Microsoft_Azure_Monitoring/AzureMonitoringBrowseBlade"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Backup & Recovery
|
||||
|
||||
### Automated Backups
|
||||
|
||||
| Resource | Backup Strategy | Retention |
|
||||
|----------|----------------|-----------|
|
||||
| PostgreSQL | Automated + Geo-redundant | 35 days |
|
||||
| Redis | Persistence enabled | Manual |
|
||||
| ACR | Geo-redundant (Premium) | 30 days |
|
||||
| Terraform State | Blob versioning | Unlimited |
|
||||
|
||||
---
|
||||
|
||||
## Cleanup
|
||||
|
||||
### Destroy Infrastructure
|
||||
|
||||
```bash
|
||||
# Delete Kubernetes resources first
|
||||
kubectl delete namespace openclaw
|
||||
|
||||
# Destroy Terraform resources
|
||||
terraform destroy -var-file=terraform.dev.tfvars
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
🦞 *The thought that never ends.*
|
||||
@@ -0,0 +1,230 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Azure Container Registry Configuration
|
||||
# ==============================================================================
|
||||
# Azure Container Registry for OpenClaw container images
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Container Registry
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_container_registry" "openclaw" {
|
||||
name = var.registry_name
|
||||
resource_group_name = var.resource_group_name
|
||||
location = var.location
|
||||
sku = var.sku
|
||||
admin_enabled = var.environment == "dev"
|
||||
zone_redundant = var.environment == "prod" && var.sku == "Premium"
|
||||
|
||||
# Data endpoint (for Premium SKU)
|
||||
data_endpoint_enabled = var.sku == "Premium"
|
||||
|
||||
# Network rules
|
||||
network_rule_set {
|
||||
default_action = "Deny"
|
||||
ip_rule {
|
||||
action = "Allow"
|
||||
ip_range = "0.0.0.0/0" # Allow from AKS VNet
|
||||
}
|
||||
}
|
||||
|
||||
# Retention policy (Premium SKU only)
|
||||
retention_policy_in_days = var.sku == "Premium" ? var.retention_policy_days : null
|
||||
|
||||
# Quarantine policy (Premium SKU only)
|
||||
quarantine_policy_enabled = var.quarantine_policy_enabled && var.sku == "Premium"
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Registry Scope Map (for fine-grained access control)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_container_registry_scope_map" "openclaw_pull" {
|
||||
name = "openclaw-pull-scope"
|
||||
container_registry_name = var.registry_name
|
||||
resource_group_name = var.resource_group_name
|
||||
|
||||
actions = [
|
||||
"repositories/*/pull",
|
||||
]
|
||||
}
|
||||
|
||||
resource "azurerm_container_registry_scope_map" "openclaw_push" {
|
||||
name = "openclaw-push-scope"
|
||||
container_registry_name = var.registry_name
|
||||
resource_group_name = var.resource_group_name
|
||||
|
||||
actions = [
|
||||
"repositories/*/pull",
|
||||
"repositories/*/push",
|
||||
]
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Registry Token (for authentication)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_container_registry_token" "openclaw_pull" {
|
||||
name = "openclaw-pull-token"
|
||||
container_registry_name = var.registry_name
|
||||
resource_group_name = var.resource_group_name
|
||||
scope_map_id = azurerm_container_registry_scope_map.openclaw_pull.id
|
||||
}
|
||||
|
||||
resource "azurerm_container_registry_token" "openclaw_push" {
|
||||
name = "openclaw-push-token"
|
||||
container_registry_name = var.registry_name
|
||||
resource_group_name = var.resource_group_name
|
||||
scope_map_id = azurerm_container_registry_scope_map.openclaw_push.id
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Registry Task (for automated builds)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_container_registry_task" "openclaw_gateway" {
|
||||
name = "build-openclaw-gateway"
|
||||
container_registry_name = var.registry_name
|
||||
resource_group_name = var.resource_group_name
|
||||
|
||||
platform {
|
||||
os = "Linux"
|
||||
os_version = "20.04"
|
||||
architecture = "amd64"
|
||||
}
|
||||
|
||||
agent_setting {
|
||||
cpu = "4"
|
||||
memory = "8"
|
||||
}
|
||||
|
||||
step {
|
||||
source_value = "https://github.com/Heretek-AI/heretek-openclaw.git"
|
||||
context_path = ""
|
||||
dockerfile_path = "Dockerfile"
|
||||
image_names = [
|
||||
"${var.registry_name}.azurecr.io/openclaw-gateway:{{.Run.ID}}",
|
||||
"${var.registry_name}.azurecr.io/openclaw-gateway:latest",
|
||||
]
|
||||
push_enabled = true
|
||||
}
|
||||
|
||||
enabled = false # Disabled by default, enable via CI/CD
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
resource "azurerm_container_registry_task" "litellm_proxy" {
|
||||
name = "build-litellm-proxy"
|
||||
container_registry_name = var.registry_name
|
||||
resource_group_name = var.resource_group_name
|
||||
|
||||
platform {
|
||||
os = "Linux"
|
||||
os_version = "20.04"
|
||||
architecture = "amd64"
|
||||
}
|
||||
|
||||
agent_setting {
|
||||
cpu = "2"
|
||||
memory = "4"
|
||||
}
|
||||
|
||||
step {
|
||||
source_value = "https://github.com/Heretek-AI/heretek-openclaw.git"
|
||||
context_path = ""
|
||||
dockerfile_path = "Dockerfile.litellm"
|
||||
image_names = [
|
||||
"${var.registry_name}.azurecr.io/litellm-proxy:{{.Run.ID}}",
|
||||
"${var.registry_name}.azurecr.io/litellm-proxy:latest",
|
||||
]
|
||||
push_enabled = true
|
||||
}
|
||||
|
||||
enabled = false # Disabled by default, enable via CI/CD
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Registry Webhook (for CI/CD integration)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_container_registry_webhook" "openclaw_gateway" {
|
||||
name = "openclaw-gateway-webhook"
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
container_registry_name = var.registry_name
|
||||
service_uri = var.webhook_service_uri # CI/CD endpoint
|
||||
scope = "openclaw-gateway:.*"
|
||||
actions = ["push"]
|
||||
status = "enabled"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Registry Agent Pool (for dedicated build resources)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_container_registry_agent_pool" "openclaw" {
|
||||
count = var.sku == "Premium" ? 1 : 0
|
||||
|
||||
name = "openclaw-pool"
|
||||
resource_group_name = var.resource_group_name
|
||||
container_registry_name = var.registry_name
|
||||
sku = "Dedicated"
|
||||
os_type = "Linux"
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Registry Diagnostic Settings
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_monitor_diagnostic_setting" "acr" {
|
||||
name = "${var.registry_name}-diagnostics"
|
||||
target_resource_id = azurerm_container_registry.openclaw.id
|
||||
log_analytics_workspace_id = var.log_analytics_workspace_id
|
||||
|
||||
enabled_log {
|
||||
category = "ContainerRegistryRepositoryEvents"
|
||||
}
|
||||
|
||||
enabled_log {
|
||||
category = "ContainerRegistryLoginEvents"
|
||||
}
|
||||
|
||||
metric {
|
||||
category = "AllMetrics"
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Registry Alerts
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_monitor_metric_alert" "acr_storage" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${var.registry_name}-storage-alert"
|
||||
resource_group_name = var.resource_group_name
|
||||
scopes = [azurerm_container_registry.openclaw.id]
|
||||
description = "Registry storage is running low"
|
||||
|
||||
criteria {
|
||||
metric_namespace = "Microsoft.ContainerRegistry/registries"
|
||||
metric_name = "Size"
|
||||
aggregation = "Average"
|
||||
operator = "GreaterThan"
|
||||
threshold = var.storage_threshold_gb * 1024 * 1024 * 1024 # Convert to bytes
|
||||
}
|
||||
|
||||
severity = 3
|
||||
|
||||
action {
|
||||
action_group_id = var.action_group_id
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,298 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Azure AKS Configuration
|
||||
# ==============================================================================
|
||||
# Azure Kubernetes Service cluster for OpenClaw
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# AKS Cluster
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_kubernetes_cluster" "openclaw_cluster" {
|
||||
name = var.cluster_name
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
dns_prefix = var.dns_prefix
|
||||
kubernetes_version = var.kubernetes_version
|
||||
|
||||
default_node_pool {
|
||||
name = var.default_node_pool.name
|
||||
vm_size = var.default_node_pool.vm_size
|
||||
node_count = var.default_node_pool.node_count
|
||||
min_count = var.default_node_pool.min_count
|
||||
max_count = var.default_node_pool.max_count
|
||||
enable_auto_scaling = var.default_node_pool.enable_auto_scaling
|
||||
os_disk_size_gb = var.default_node_pool.os_disk_size_gb
|
||||
type = var.default_node_pool.type
|
||||
availability_zones = var.default_node_pool.availability_zones
|
||||
vnet_subnet_id = var.subnet_id
|
||||
}
|
||||
|
||||
identity {
|
||||
type = "SystemAssigned"
|
||||
}
|
||||
|
||||
# Network configuration
|
||||
network_profile {
|
||||
network_plugin = "azure"
|
||||
load_balancer_sku = "standard"
|
||||
network_policy = "calico"
|
||||
dns_service_ip = "10.0.0.10"
|
||||
docker_bridge_cidr = "172.17.0.1/16"
|
||||
service_cidr = "10.1.0.0/16"
|
||||
outbound_type = "loadBalancer"
|
||||
}
|
||||
|
||||
# Private cluster configuration
|
||||
dynamic "private_cluster_enabled" {
|
||||
for_each = var.enable_private_cluster ? [1] : []
|
||||
content {
|
||||
enabled = var.enable_private_cluster
|
||||
}
|
||||
}
|
||||
|
||||
# Azure Policy
|
||||
azure_policy_enabled = var.enable_azure_policy
|
||||
|
||||
# Monitoring
|
||||
oms_agent {
|
||||
log_analytics_workspace_id = var.log_analytics_workspace_id
|
||||
}
|
||||
|
||||
# Workload Identity
|
||||
workload_identity_enabled = var.enable_workload_identity
|
||||
|
||||
# Auto upgrade
|
||||
auto_upgrade_channel = "stable"
|
||||
|
||||
# Maintenance window
|
||||
maintenance_window_auto_upgrade {
|
||||
frequency = "Weekly"
|
||||
interval = 1
|
||||
day_of_week = "Sunday"
|
||||
start_time = "02:00"
|
||||
duration = 4
|
||||
}
|
||||
|
||||
maintenance_window_node_os {
|
||||
frequency = "Weekly"
|
||||
interval = 1
|
||||
day_of_week = "Saturday"
|
||||
start_time = "02:00"
|
||||
duration = 4
|
||||
}
|
||||
|
||||
tags = var.tags
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = [
|
||||
default_node_pool[0].node_count
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# System Node Pool
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_kubernetes_cluster_node_pool" "system" {
|
||||
name = var.system_node_pool.name
|
||||
kubernetes_cluster_id = azurerm_kubernetes_cluster.openclaw_cluster.id
|
||||
vm_size = var.system_node_pool.vm_size
|
||||
node_count = var.system_node_pool.node_count
|
||||
min_count = var.system_node_pool.min_count
|
||||
max_count = var.system_node_pool.max_count
|
||||
enable_auto_scaling = var.system_node_pool.enable_auto_scaling
|
||||
os_disk_size_gb = var.system_node_pool.os_disk_size_gb
|
||||
availability_zones = var.system_node_pool.availability_zones
|
||||
vnet_subnet_id = var.subnet_id
|
||||
|
||||
node_labels = {
|
||||
"workload-type" = "system"
|
||||
"environment" = var.environment
|
||||
}
|
||||
|
||||
node_taints = [
|
||||
"workload-type=system:NoSchedule"
|
||||
]
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# User Node Pools
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_kubernetes_cluster_node_pool" "user" {
|
||||
for_each = { for pool in var.user_node_pools : pool.name => pool }
|
||||
|
||||
name = each.value.name
|
||||
kubernetes_cluster_id = azurerm_kubernetes_cluster.openclaw_cluster.id
|
||||
vm_size = each.value.vm_size
|
||||
node_count = each.value.node_count
|
||||
min_count = each.value.min_count
|
||||
max_count = each.value.max_count
|
||||
enable_auto_scaling = each.value.enable_auto_scaling
|
||||
os_disk_size_gb = each.value.os_disk_size_gb
|
||||
availability_zones = each.value.availability_zones
|
||||
vnet_subnet_id = var.subnet_id
|
||||
|
||||
node_labels = {
|
||||
"workload-type" = each.value.name
|
||||
"environment" = var.environment
|
||||
}
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# GPU Node Pool (Optional)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_kubernetes_cluster_node_pool" "gpu" {
|
||||
count = var.gpu_enabled ? 1 : 0
|
||||
|
||||
name = var.gpu_node_pool.name
|
||||
kubernetes_cluster_id = azurerm_kubernetes_cluster.openclaw_cluster.id
|
||||
vm_size = var.gpu_node_pool.vm_size
|
||||
node_count = var.gpu_node_pool.node_count
|
||||
min_count = var.gpu_node_pool.min_count
|
||||
max_count = var.gpu_node_pool.max_count
|
||||
enable_auto_scaling = var.gpu_node_pool.enable_auto_scaling
|
||||
os_disk_size_gb = var.gpu_node_pool.os_disk_size_gb
|
||||
availability_zones = var.gpu_node_pool.availability_zones
|
||||
vnet_subnet_id = var.subnet_id
|
||||
|
||||
node_labels = {
|
||||
"workload-type" = "gpu"
|
||||
"environment" = var.environment
|
||||
"gpu" = "true"
|
||||
}
|
||||
|
||||
node_taints = [
|
||||
"nvidia.com/gpu=true:NoSchedule"
|
||||
]
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# AKS Role Assignments
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_role_assignment" "aks_vnet_contributor" {
|
||||
scope = var.vnet_id
|
||||
role_definition_name = "Network Contributor"
|
||||
principal_id = azurerm_kubernetes_cluster.openclaw_cluster.identity[0].principal_id
|
||||
}
|
||||
|
||||
resource "azurerm_role_assignment" "aks_acr_pull" {
|
||||
scope = var.acr_id
|
||||
role_definition_name = "AcrPull"
|
||||
principal_id = azurerm_kubernetes_cluster.openclaw_cluster.identity[0].principal_id
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Azure Monitor for Containers
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_monitor_diagnostic_setting" "aks" {
|
||||
name = "${var.cluster_name}-diagnostics"
|
||||
target_resource_id = azurerm_kubernetes_cluster.openclaw_cluster.id
|
||||
log_analytics_workspace_id = var.log_analytics_workspace_id
|
||||
|
||||
enabled_log {
|
||||
category = "kube-apiserver"
|
||||
}
|
||||
|
||||
enabled_log {
|
||||
category = "kube-audit"
|
||||
}
|
||||
|
||||
enabled_log {
|
||||
category = "kube-audit-admin"
|
||||
}
|
||||
|
||||
enabled_log {
|
||||
category = "kube-controller-manager"
|
||||
}
|
||||
|
||||
enabled_log {
|
||||
category = "kube-scheduler"
|
||||
}
|
||||
|
||||
enabled_log {
|
||||
category = "cluster-autoscaler"
|
||||
}
|
||||
|
||||
enabled_log {
|
||||
category = "guard"
|
||||
}
|
||||
|
||||
metric {
|
||||
category = "AllMetrics"
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Kubernetes Manifest Deployments (via Helm)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "helm_release" "nvidia_device_plugin" {
|
||||
count = var.gpu_enabled ? 1 : 0
|
||||
|
||||
name = "nvidia-device-plugin"
|
||||
repository = "https://nvidia.github.io/k8s-device-plugin"
|
||||
chart = "nvidia-device-plugin"
|
||||
version = "0.14.1"
|
||||
namespace = "kube-system"
|
||||
|
||||
set {
|
||||
name = "config.map.name"
|
||||
value = "nvidia-device-plugin-config"
|
||||
}
|
||||
}
|
||||
|
||||
resource "helm_release" "metrics_server" {
|
||||
name = "metrics-server"
|
||||
repository = "https://kubernetes-sigs.github.io/metrics-server/"
|
||||
chart = "metrics-server"
|
||||
version = "3.11.0"
|
||||
namespace = "kube-system"
|
||||
}
|
||||
|
||||
resource "helm_release" "cluster_autoscaler" {
|
||||
count = var.enable_cluster_autoscaler ? 1 : 0
|
||||
|
||||
name = "cluster-autoscaler"
|
||||
repository = "https://kubernetes.github.io/autoscaler"
|
||||
chart = "cluster-autoscaler"
|
||||
version = "9.29.0"
|
||||
namespace = "kube-system"
|
||||
|
||||
set {
|
||||
name = "cloudProvider"
|
||||
value = "azure"
|
||||
}
|
||||
|
||||
set {
|
||||
name = "azureClientID"
|
||||
value = azurerm_kubernetes_cluster.openclaw_cluster.identity[0].client_id
|
||||
}
|
||||
|
||||
set {
|
||||
name = "azureSubscriptionID"
|
||||
value = data.azurerm_client_config.current.subscription_id
|
||||
}
|
||||
|
||||
set {
|
||||
name = "azureResourceGroup"
|
||||
value = var.resource_group_name
|
||||
}
|
||||
|
||||
set {
|
||||
name = "azureClusterName"
|
||||
value = var.cluster_name
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,310 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Azure Application Gateway Configuration
|
||||
# ==============================================================================
|
||||
# Application Gateway for OpenClaw traffic routing and SSL termination
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Public IP for Application Gateway
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_public_ip" "gateway" {
|
||||
name = "${var.gateway_name}-pip"
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
allocation_method = "Static"
|
||||
sku = "Standard"
|
||||
domain_name_label = var.domain_name_label
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Application Gateway
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_application_gateway" "openclaw" {
|
||||
name = var.gateway_name
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
|
||||
sku {
|
||||
name = var.sku_name
|
||||
tier = var.sku_name
|
||||
capacity = var.capacity
|
||||
}
|
||||
|
||||
gateway_ip_configuration {
|
||||
name = "gateway-ip-config"
|
||||
subnet_id = var.subnet_id
|
||||
}
|
||||
|
||||
frontend_port {
|
||||
name = "http-port"
|
||||
port = 80
|
||||
}
|
||||
|
||||
frontend_port {
|
||||
name = "https-port"
|
||||
port = 443
|
||||
}
|
||||
|
||||
frontend_ip_configuration {
|
||||
name = "frontend-ip-config"
|
||||
public_ip_address_id = azurerm_public_ip.gateway.id
|
||||
}
|
||||
|
||||
backend_address_pool {
|
||||
name = "openclaw-gateway-pool"
|
||||
}
|
||||
|
||||
backend_address_pool {
|
||||
name = "litellm-proxy-pool"
|
||||
}
|
||||
|
||||
backend_http_settings {
|
||||
name = "gateway-http-settings"
|
||||
cookie_based_affinity = "Disabled"
|
||||
port = 18789
|
||||
protocol = "Http"
|
||||
request_timeout = 30
|
||||
probe_name = "gateway-probe"
|
||||
}
|
||||
|
||||
backend_http_settings {
|
||||
name = "litellm-http-settings"
|
||||
cookie_based_affinity = "Disabled"
|
||||
port = 4000
|
||||
protocol = "Http"
|
||||
request_timeout = 60
|
||||
probe_name = "litellm-probe"
|
||||
}
|
||||
|
||||
# Health Probes
|
||||
probe {
|
||||
name = "gateway-probe"
|
||||
protocol = "Http"
|
||||
path = "/health"
|
||||
interval = 30
|
||||
timeout = 5
|
||||
unhealthy_threshold = 3
|
||||
pick_host_name_from_backend_http_settings = false
|
||||
}
|
||||
|
||||
probe {
|
||||
name = "litellm-probe"
|
||||
protocol = "Http"
|
||||
path = "/health"
|
||||
interval = 30
|
||||
timeout = 5
|
||||
unhealthy_threshold = 3
|
||||
pick_host_name_from_backend_http_settings = false
|
||||
}
|
||||
|
||||
# HTTP Listener
|
||||
http_listener {
|
||||
name = "http-listener"
|
||||
frontend_ip_configuration_name = "frontend-ip-config"
|
||||
frontend_port_name = "http-port"
|
||||
protocol = "Http"
|
||||
}
|
||||
|
||||
# HTTPS Listener (if SSL certificate provided)
|
||||
dynamic "http_listener" {
|
||||
for_each = var.ssl_certificate_data != null ? [1] : []
|
||||
content {
|
||||
name = "https-listener"
|
||||
frontend_ip_configuration_name = "frontend-ip-config"
|
||||
frontend_port_name = "https-port"
|
||||
protocol = "Https"
|
||||
ssl_certificate_name = "ssl-cert"
|
||||
}
|
||||
}
|
||||
|
||||
# SSL Certificate (if provided)
|
||||
dynamic "ssl_certificate" {
|
||||
for_each = var.ssl_certificate_data != null ? [1] : []
|
||||
content {
|
||||
name = "ssl-cert"
|
||||
data = var.ssl_certificate_data
|
||||
password = var.ssl_certificate_password
|
||||
}
|
||||
}
|
||||
|
||||
# Request Routing Rules
|
||||
request_routing_rule {
|
||||
name = "http-routing-rule"
|
||||
rule_type = "Basic"
|
||||
http_listener_name = "http-listener"
|
||||
backend_address_pool_name = "openclaw-gateway-pool"
|
||||
backend_http_settings_name = "gateway-http-settings"
|
||||
priority = 200
|
||||
}
|
||||
|
||||
# HTTPS Routing Rule (if SSL enabled)
|
||||
dynamic "request_routing_rule" {
|
||||
for_each = var.ssl_certificate_data != null ? [1] : []
|
||||
content {
|
||||
name = "https-routing-rule"
|
||||
rule_type = "Basic"
|
||||
http_listener_name = "https-listener"
|
||||
backend_address_pool_name = "openclaw-gateway-pool"
|
||||
backend_http_settings_name = "gateway-http-settings"
|
||||
priority = 100
|
||||
}
|
||||
}
|
||||
|
||||
# URL Path Map for path-based routing
|
||||
url_path_map {
|
||||
name = "url-path-map"
|
||||
default_backend_address_pool_name = "openclaw-gateway-pool"
|
||||
default_backend_http_settings_name = "gateway-http-settings"
|
||||
|
||||
path_rule {
|
||||
name = "litellm-path-rule"
|
||||
paths = ["/v1/*", "/litellm/*"]
|
||||
backend_address_pool_name = "litellm-proxy-pool"
|
||||
backend_http_settings_name = "litellm-http-settings"
|
||||
}
|
||||
|
||||
path_rule {
|
||||
name = "websocket-path-rule"
|
||||
paths = ["/ws/*", "/gateway/*"]
|
||||
backend_address_pool_name = "openclaw-gateway-pool"
|
||||
backend_http_settings_name = "gateway-http-settings"
|
||||
}
|
||||
}
|
||||
|
||||
# HTTPS with URL Path Map
|
||||
dynamic "request_routing_rule" {
|
||||
for_each = var.ssl_certificate_data != null ? [1] : []
|
||||
content {
|
||||
name = "https-path-routing-rule"
|
||||
rule_type = "PathBasedRouting"
|
||||
http_listener_name = "https-listener"
|
||||
url_path_map_name = "url-path-map"
|
||||
priority = 150
|
||||
}
|
||||
}
|
||||
|
||||
# Autoscale configuration
|
||||
autoscale_configuration {
|
||||
min_capacity = var.autoscale_min_capacity
|
||||
max_capacity = var.autoscale_max_capacity
|
||||
}
|
||||
|
||||
# WAF Configuration (for WAF SKU)
|
||||
dynamic "waf_configuration" {
|
||||
for_each = var.sku_name == "WAF_v2" ? [1] : []
|
||||
content {
|
||||
enabled = true
|
||||
firewall_mode = "Prevention"
|
||||
rule_set_type = "OWASP"
|
||||
rule_set_version = "3.2"
|
||||
request_body_check = true
|
||||
max_request_body_size_kb = 128
|
||||
}
|
||||
}
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Application Gateway Diagnostic Settings
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_monitor_diagnostic_setting" "gateway" {
|
||||
name = "${var.gateway_name}-diagnostics"
|
||||
target_resource_id = azurerm_application_gateway.openclaw.id
|
||||
log_analytics_workspace_id = var.log_analytics_workspace_id
|
||||
|
||||
enabled_log {
|
||||
category = "ApplicationGatewayAccessLog"
|
||||
}
|
||||
|
||||
enabled_log {
|
||||
category = "ApplicationGatewayPerformanceLog"
|
||||
}
|
||||
|
||||
enabled_log {
|
||||
category = "ApplicationGatewayFirewallLog"
|
||||
}
|
||||
|
||||
metric {
|
||||
category = "AllMetrics"
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Application Gateway Alerts
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_monitor_metric_alert" "gateway_capacity" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${var.gateway_name}-capacity-alert"
|
||||
resource_group_name = var.resource_group_name
|
||||
scopes = [azurerm_application_gateway.openclaw.id]
|
||||
description = "Application Gateway capacity is high"
|
||||
|
||||
criteria {
|
||||
metric_namespace = "Microsoft.Network/applicationGateways"
|
||||
metric_name = "ApplicationGatewayCapacityUnits"
|
||||
aggregation = "Average"
|
||||
operator = "GreaterThan"
|
||||
threshold = var.capacity * 0.8
|
||||
}
|
||||
|
||||
severity = 3
|
||||
|
||||
action {
|
||||
action_group_id = var.action_group_id
|
||||
}
|
||||
}
|
||||
|
||||
resource "azurerm_monitor_metric_alert" "gateway_response_time" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${var.gateway_name}-response-time-alert"
|
||||
resource_group_name = var.resource_group_name
|
||||
scopes = [azurerm_application_gateway.openclaw.id]
|
||||
description = "Application Gateway response time is too high"
|
||||
|
||||
criteria {
|
||||
metric_namespace = "Microsoft.Network/applicationGateways"
|
||||
metric_name = "ApplicationGatewayTimeTaken"
|
||||
aggregation = "Average"
|
||||
operator = "GreaterThan"
|
||||
threshold = 5000 # 5 seconds
|
||||
}
|
||||
|
||||
severity = 3
|
||||
|
||||
action {
|
||||
action_group_id = var.action_group_id
|
||||
}
|
||||
}
|
||||
|
||||
resource "azurerm_monitor_metric_alert" "gateway_failures" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${var.gateway_name}-failures-alert"
|
||||
resource_group_name = var.resource_group_name
|
||||
scopes = [azurerm_application_gateway.openclaw.id]
|
||||
description = "Application Gateway backend failures are high"
|
||||
|
||||
criteria {
|
||||
metric_namespace = "Microsoft.Network/applicationGateways"
|
||||
metric_name = "ApplicationGatewayFailedBackends"
|
||||
aggregation = "Total"
|
||||
operator = "GreaterThan"
|
||||
threshold = 10
|
||||
}
|
||||
|
||||
severity = 2
|
||||
|
||||
action {
|
||||
action_group_id = var.action_group_id
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,393 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Azure Terraform Configuration
|
||||
# ==============================================================================
|
||||
# Main configuration file for Azure infrastructure
|
||||
# ==============================================================================
|
||||
|
||||
terraform {
|
||||
required_version = ">= 1.6.0"
|
||||
|
||||
required_providers {
|
||||
azurerm = {
|
||||
source = "hashicorp/azurerm"
|
||||
version = "~> 3.0"
|
||||
}
|
||||
kubernetes = {
|
||||
source = "hashicorp/kubernetes"
|
||||
version = "~> 2.24"
|
||||
}
|
||||
helm = {
|
||||
source = "hashicorp/helm"
|
||||
version = "~> 2.12"
|
||||
}
|
||||
}
|
||||
|
||||
backend "azurerm" {
|
||||
# Configure backend with variables or environment
|
||||
# resource_group_name = "terraform-state-rg"
|
||||
# storage_account_name = "tfstatestorage"
|
||||
# container_name = "tfstate"
|
||||
# key = "openclaw/terraform.tfstate"
|
||||
}
|
||||
}
|
||||
|
||||
provider "azurerm" {
|
||||
features {
|
||||
resource_group {
|
||||
prevent_deletion_if_contains_resources = false
|
||||
}
|
||||
key_vault {
|
||||
purge_soft_delete_on_destroy = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
provider "kubernetes" {
|
||||
host = azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].host
|
||||
client_certificate = base64decode(azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].client_certificate)
|
||||
client_key = base64decode(azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].client_key)
|
||||
cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].cluster_ca_certificate)
|
||||
}
|
||||
|
||||
provider "helm" {
|
||||
kubernetes {
|
||||
host = azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].host
|
||||
client_certificate = base64decode(azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].client_certificate)
|
||||
client_key = base64decode(azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].client_key)
|
||||
cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].cluster_ca_certificate)
|
||||
}
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Data Sources
|
||||
# ==============================================================================
|
||||
|
||||
data "azurerm_client_config" "current" {}
|
||||
|
||||
data "azurerm_resource_group" "main" {
|
||||
name = var.resource_group_name
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Local Values
|
||||
# ==============================================================================
|
||||
|
||||
locals {
|
||||
name_prefix = "openclaw-${var.environment}"
|
||||
|
||||
common_tags = {
|
||||
project = "openclaw"
|
||||
environment = var.environment
|
||||
version = var.app_version
|
||||
managed_by = "terraform"
|
||||
}
|
||||
|
||||
gpu_enabled = var.enable_gpu_support
|
||||
|
||||
# ACR URLs
|
||||
acr_urls = {
|
||||
login_server = azurerm_container_registry.openclaw.login_server
|
||||
gateway = "${azurerm_container_registry.openclaw.login_server}/openclaw-gateway"
|
||||
litellm = "${azurerm_container_registry.openclaw.login_server}/litellm-proxy"
|
||||
}
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Random Resources
|
||||
# ==============================================================================
|
||||
|
||||
resource "random_string" "suffix" {
|
||||
length = 8
|
||||
special = false
|
||||
upper = false
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Resource Group
|
||||
# ==============================================================================
|
||||
|
||||
resource "azurerm_resource_group" "openclaw" {
|
||||
name = var.resource_group_name
|
||||
location = var.location
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# VNet
|
||||
# ==============================================================================
|
||||
|
||||
module "vnet" {
|
||||
source = "./vnet"
|
||||
|
||||
resource_group_name = azurerm_resource_group.openclaw.name
|
||||
location = var.location
|
||||
vnet_name = "${local.name_prefix}-vnet"
|
||||
vnet_address_space = var.vnet_address_space
|
||||
subnet_configs = var.subnet_configs
|
||||
enable_ddos_protection = var.enable_ddos_protection
|
||||
enable_flow_logs = var.enable_flow_logs
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# AKS Cluster
|
||||
# ==============================================================================
|
||||
|
||||
module "aks" {
|
||||
source = "./aks"
|
||||
|
||||
resource_group_name = azurerm_resource_group.openclaw.name
|
||||
location = var.location
|
||||
cluster_name = "${local.name_prefix}-aks"
|
||||
vnet_id = module.vnet.vnet_id
|
||||
subnet_id = module.vnet.aks_subnet_id
|
||||
|
||||
# AKS configuration
|
||||
kubernetes_version = var.kubernetes_version
|
||||
dns_prefix = local.name_prefix
|
||||
|
||||
# Node pool configuration
|
||||
default_node_pool = var.default_node_pool
|
||||
system_node_pool = var.system_node_pool
|
||||
user_node_pools = var.user_node_pools
|
||||
gpu_node_pool = var.gpu_node_pool
|
||||
gpu_enabled = local.gpu_enabled
|
||||
|
||||
# Security
|
||||
enable_private_cluster = var.enable_private_cluster
|
||||
enable_azure_policy = var.enable_azure_policy
|
||||
enable_workload_identity = var.enable_workload_identity
|
||||
|
||||
# Monitoring
|
||||
enable_monitoring = true
|
||||
log_analytics_workspace_id = module.monitoring.log_analytics_workspace_id
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Azure Database for PostgreSQL
|
||||
# ==============================================================================
|
||||
|
||||
module "postgresql" {
|
||||
source = "./postgresql"
|
||||
|
||||
resource_group_name = azurerm_resource_group.openclaw.name
|
||||
location = var.location
|
||||
server_name = "${local.name_prefix}-pg"
|
||||
vnet_id = module.vnet.vnet_id
|
||||
subnet_id = module.vnet.database_subnet_id
|
||||
|
||||
# Database configuration
|
||||
sku_name = var.postgresql_sku_name
|
||||
storage_mb = var.postgresql_storage_mb
|
||||
version = var.postgresql_version
|
||||
|
||||
# Authentication
|
||||
administrator_login = var.db_administrator_login
|
||||
administrator_password = var.db_administrator_password
|
||||
|
||||
# High availability
|
||||
geo_redundant_backup = var.db_geo_redundant_backup
|
||||
auto_grow_enabled = var.db_auto_grow_enabled
|
||||
|
||||
# Security
|
||||
ssl_enforcement_enabled = true
|
||||
ssl_minimal_tls_version_enforced = "TLS1_2"
|
||||
public_network_access_enabled = false
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Azure Cache for Redis
|
||||
# ==============================================================================
|
||||
|
||||
module "redis" {
|
||||
source = "./redis"
|
||||
|
||||
resource_group_name = azurerm_resource_group.openclaw.name
|
||||
location = var.location
|
||||
cache_name = "${local.name_prefix}-redis"
|
||||
vnet_id = module.vnet.vnet_id
|
||||
subnet_id = module.vnet.cache_subnet_id
|
||||
|
||||
# Redis configuration
|
||||
capacity = var.redis_capacity
|
||||
family = var.redis_family
|
||||
sku_name = var.redis_sku_name
|
||||
redis_version = var.redis_version
|
||||
|
||||
# Security
|
||||
enable_non_ssl_port = false
|
||||
minimum_tls_version = "1.2"
|
||||
|
||||
# High availability
|
||||
zones = var.redis_zones
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Azure Container Registry
|
||||
# ==============================================================================
|
||||
|
||||
module "acr" {
|
||||
source = "./acr"
|
||||
|
||||
resource_group_name = azurerm_resource_group.openclaw.name
|
||||
location = var.location
|
||||
registry_name = "${local.name_prefix}acr"
|
||||
sku = var.acr_sku
|
||||
|
||||
# Cleanup
|
||||
retention_policy_days = var.acr_retention_policy_days
|
||||
quarantine_policy_enabled = var.environment == "prod"
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Application Gateway
|
||||
# ==============================================================================
|
||||
|
||||
module "application_gateway" {
|
||||
source = "./application-gateway"
|
||||
|
||||
resource_group_name = azurerm_resource_group.openclaw.name
|
||||
location = var.location
|
||||
gateway_name = "${local.name_prefix}-agw"
|
||||
vnet_id = module.vnet.vnet_id
|
||||
subnet_id = module.vnet.gateway_subnet_id
|
||||
|
||||
# Gateway configuration
|
||||
sku_name = var.gateway_sku_name
|
||||
capacity = var.gateway_capacity
|
||||
|
||||
# SSL
|
||||
ssl_certificate_key_vault_secret_id = var.ssl_certificate_key_vault_secret_id
|
||||
ssl_certificate_data = var.ssl_certificate_data
|
||||
|
||||
# Backend pools
|
||||
backend_pools = [
|
||||
{
|
||||
name = "openclaw-gateway"
|
||||
port = 18789
|
||||
probe_path = "/health"
|
||||
},
|
||||
{
|
||||
name = "litellm-proxy"
|
||||
port = 4000
|
||||
probe_path = "/health"
|
||||
}
|
||||
]
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Monitoring
|
||||
# ==============================================================================
|
||||
|
||||
module "monitoring" {
|
||||
source = "../terraform/modules/monitoring"
|
||||
|
||||
name_prefix = local.name_prefix
|
||||
resource_group_name = azurerm_resource_group.openclaw.name
|
||||
location = var.location
|
||||
aks_cluster_id = azurerm_kubernetes_cluster.openclaw_cluster.id
|
||||
postgresql_server_id = module.postgresql.server_id
|
||||
redis_cache_id = module.redis.redis_cache_id
|
||||
|
||||
# Dashboard
|
||||
enable_dashboard = true
|
||||
|
||||
# Alerts
|
||||
enable_alerts = var.enable_monitoring_alerts
|
||||
alert_email = var.alert_email
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Key Vault (for secrets)
|
||||
# ==============================================================================
|
||||
|
||||
resource "azurerm_key_vault" "openclaw" {
|
||||
name = "${local.name_prefix}-kv"
|
||||
location = var.location
|
||||
resource_group_name = azurerm_resource_group.openclaw.name
|
||||
tenant_id = data.azurerm_client_config.current.tenant_id
|
||||
sku_name = "standard"
|
||||
purge_protection_enabled = var.environment == "prod"
|
||||
|
||||
network_acls {
|
||||
default_action = "Deny"
|
||||
bypass = "AzureServices"
|
||||
}
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
resource "azurerm_key_vault_secret" "db_password" {
|
||||
name = "db-password"
|
||||
value = var.db_administrator_password
|
||||
key_vault_id = azurerm_key_vault.openclaw.id
|
||||
}
|
||||
|
||||
resource "azurerm_key_vault_secret" "redis_password" {
|
||||
name = "redis-password"
|
||||
value = var.redis_password
|
||||
key_vault_id = azurerm_key_vault.openclaw.id
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Outputs
|
||||
# ==============================================================================
|
||||
|
||||
output "vnet_id" {
|
||||
description = "VNet ID"
|
||||
value = module.vnet.vnet_id
|
||||
}
|
||||
|
||||
output "aks_cluster_id" {
|
||||
description = "AKS cluster ID"
|
||||
value = azurerm_kubernetes_cluster.openclaw_cluster.id
|
||||
}
|
||||
|
||||
output "aks_cluster_name" {
|
||||
description = "AKS cluster name"
|
||||
value = azurerm_kubernetes_cluster.openclaw_cluster.name
|
||||
}
|
||||
|
||||
output "aks_fqdn" {
|
||||
description = "AKS cluster FQDN"
|
||||
value = azurerm_kubernetes_cluster.openclaw_cluster.fqdn
|
||||
}
|
||||
|
||||
output "postgresql_fqdn" {
|
||||
description = "PostgreSQL server FQDN"
|
||||
value = module.postgresql.fqdn
|
||||
}
|
||||
|
||||
output "redis_hostname" {
|
||||
description = "Redis cache hostname"
|
||||
value = module.redis.hostname
|
||||
}
|
||||
|
||||
output "acr_login_server" {
|
||||
description = "ACR login server"
|
||||
value = module.acr.login_server
|
||||
}
|
||||
|
||||
output "application_gateway_public_ip" {
|
||||
description = "Application Gateway public IP"
|
||||
value = module.application_gateway.public_ip
|
||||
}
|
||||
|
||||
output "key_vault_id" {
|
||||
description = "Key Vault ID"
|
||||
value = azurerm_key_vault.openclaw.id
|
||||
}
|
||||
@@ -0,0 +1,305 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Azure Terraform Outputs
|
||||
# ==============================================================================
|
||||
# Output values for Azure infrastructure
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Resource Group Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "resource_group_name" {
|
||||
description = "Resource group name"
|
||||
value = azurerm_resource_group.openclaw.name
|
||||
}
|
||||
|
||||
output "resource_group_location" {
|
||||
description = "Resource group location"
|
||||
value = azurerm_resource_group.openclaw.location
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# VNet Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "vnet_id" {
|
||||
description = "VNet ID"
|
||||
value = module.vnet.vnet_id
|
||||
}
|
||||
|
||||
output "vnet_name" {
|
||||
description = "VNet name"
|
||||
value = module.vnet.vnet_name
|
||||
}
|
||||
|
||||
output "vnet_address_space" {
|
||||
description = "VNet address space"
|
||||
value = module.vnet.vnet_address_space
|
||||
}
|
||||
|
||||
output "aks_subnet_id" {
|
||||
description = "AKS subnet ID"
|
||||
value = module.vnet.aks_subnet_id
|
||||
}
|
||||
|
||||
output "database_subnet_id" {
|
||||
description = "Database subnet ID"
|
||||
value = module.vnet.database_subnet_id
|
||||
}
|
||||
|
||||
output "cache_subnet_id" {
|
||||
description = "Cache subnet ID"
|
||||
value = module.vnet.cache_subnet_id
|
||||
}
|
||||
|
||||
output "gateway_subnet_id" {
|
||||
description = "Gateway subnet ID"
|
||||
value = module.vnet.gateway_subnet_id
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# AKS Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "aks_cluster_id" {
|
||||
description = "AKS cluster ID"
|
||||
value = azurerm_kubernetes_cluster.openclaw_cluster.id
|
||||
}
|
||||
|
||||
output "aks_cluster_name" {
|
||||
description = "AKS cluster name"
|
||||
value = azurerm_kubernetes_cluster.openclaw_cluster.name
|
||||
}
|
||||
|
||||
output "aks_cluster_fqdn" {
|
||||
description = "AKS cluster FQDN"
|
||||
value = azurerm_kubernetes_cluster.openclaw_cluster.fqdn
|
||||
}
|
||||
|
||||
output "aks_cluster_kubernetes_version" {
|
||||
description = "AKS cluster Kubernetes version"
|
||||
value = azurerm_kubernetes_cluster.openclaw_cluster.kubernetes_version
|
||||
}
|
||||
|
||||
output "aks_cluster_node_resource_group" {
|
||||
description = "AKS cluster node resource group"
|
||||
value = azurerm_kubernetes_cluster.openclaw_cluster.node_resource_group
|
||||
}
|
||||
|
||||
output "aks_cluster_identity_principal_id" {
|
||||
description = "AKS cluster identity principal ID"
|
||||
value = azurerm_kubernetes_cluster.openclaw_cluster.identity[0].principal_id
|
||||
}
|
||||
|
||||
output "aks_kube_config_raw" {
|
||||
description = "Raw kube config"
|
||||
value = azurerm_kubernetes_cluster.openclaw_cluster.kube_config_raw
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "aks_kube_config_host" {
|
||||
description = "Kube config host"
|
||||
value = azurerm_kubernetes_cluster.openclaw_cluster.kube_config[0].host
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "aks_kube_config_command" {
|
||||
description = "Command to get AKS credentials"
|
||||
value = "az aks get-credentials --resource-group ${azurerm_resource_group.openclaw.name} --name ${azurerm_kubernetes_cluster.openclaw_cluster.name}"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# PostgreSQL Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "postgresql_server_id" {
|
||||
description = "PostgreSQL server ID"
|
||||
value = module.postgresql.server_id
|
||||
}
|
||||
|
||||
output "postgresql_server_name" {
|
||||
description = "PostgreSQL server name"
|
||||
value = module.postgresql.server_name
|
||||
}
|
||||
|
||||
output "postgresql_fqdn" {
|
||||
description = "PostgreSQL server FQDN"
|
||||
value = module.postgresql.fqdn
|
||||
}
|
||||
|
||||
output "postgresql_port" {
|
||||
description = "PostgreSQL server port"
|
||||
value = module.postgresql.port
|
||||
}
|
||||
|
||||
output "postgresql_administrator_login" {
|
||||
description = "PostgreSQL administrator login"
|
||||
value = module.postgresql.administrator_login
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "postgresql_connection_string" {
|
||||
description = "PostgreSQL connection string"
|
||||
value = "postgresql://${var.db_administrator_login}:${var.db_administrator_password}@${module.postgresql.fqdn}:5432/postgres"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Redis Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "redis_cache_id" {
|
||||
description = "Redis cache ID"
|
||||
value = module.redis.redis_cache_id
|
||||
}
|
||||
|
||||
output "redis_cache_name" {
|
||||
description = "Redis cache name"
|
||||
value = module.redis.redis_cache_name
|
||||
}
|
||||
|
||||
output "redis_hostname" {
|
||||
description = "Redis cache hostname"
|
||||
value = module.redis.hostname
|
||||
}
|
||||
|
||||
output "redis_port" {
|
||||
description = "Redis cache port"
|
||||
value = module.redis.port
|
||||
}
|
||||
|
||||
output "redis_connection_string" {
|
||||
description = "Redis connection string"
|
||||
value = "redis://${var.redis_password != null ? ":${var.redis_password}@" : ""}${module.redis.hostname}:${module.redis.port}"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# ACR Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "acr_id" {
|
||||
description = "ACR ID"
|
||||
value = module.acr.acr_id
|
||||
}
|
||||
|
||||
output "acr_name" {
|
||||
description = "ACR name"
|
||||
value = module.acr.acr_name
|
||||
}
|
||||
|
||||
output "acr_login_server" {
|
||||
description = "ACR login server"
|
||||
value = module.acr.login_server
|
||||
}
|
||||
|
||||
output "acr_login_server_url" {
|
||||
description = "ACR login server URL"
|
||||
value = "https://${module.acr.login_server}"
|
||||
}
|
||||
|
||||
output "acr_admin_username" {
|
||||
description = "ACR admin username"
|
||||
value = module.acr.admin_username
|
||||
}
|
||||
|
||||
output "acr_admin_password" {
|
||||
description = "ACR admin password"
|
||||
value = module.acr.admin_password
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "acr_login_command" {
|
||||
description = "ACR login command"
|
||||
value = "az acr login --name ${module.acr.acr_name}"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Application Gateway Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "application_gateway_id" {
|
||||
description = "Application Gateway ID"
|
||||
value = module.application_gateway.gateway_id
|
||||
}
|
||||
|
||||
output "application_gateway_name" {
|
||||
description = "Application Gateway name"
|
||||
value = module.application_gateway.gateway_name
|
||||
}
|
||||
|
||||
output "application_gateway_public_ip" {
|
||||
description = "Application Gateway public IP"
|
||||
value = module.application_gateway.public_ip
|
||||
}
|
||||
|
||||
output "application_gateway_public_ip_id" {
|
||||
description = "Application Gateway public IP ID"
|
||||
value = module.application_gateway.public_ip_id
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Key Vault Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "key_vault_id" {
|
||||
description = "Key Vault ID"
|
||||
value = azurerm_key_vault.openclaw.id
|
||||
}
|
||||
|
||||
output "key_vault_name" {
|
||||
description = "Key Vault name"
|
||||
value = azurerm_key_vault.openclaw.name
|
||||
}
|
||||
|
||||
output "key_vault_uri" {
|
||||
description = "Key Vault URI"
|
||||
value = azurerm_key_vault.openclaw.vault_uri
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Monitoring Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "log_analytics_workspace_id" {
|
||||
description = "Log Analytics workspace ID"
|
||||
value = module.monitoring.log_analytics_workspace_id
|
||||
}
|
||||
|
||||
output "log_analytics_workspace_name" {
|
||||
description = "Log Analytics workspace name"
|
||||
value = module.monitoring.log_analytics_workspace_name
|
||||
}
|
||||
|
||||
output "application_insights_id" {
|
||||
description = "Application Insights ID"
|
||||
value = module.monitoring.application_insights_id
|
||||
}
|
||||
|
||||
output "monitoring_dashboard_id" {
|
||||
description = "Monitoring dashboard ID"
|
||||
value = module.monitoring.dashboard_id
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cost Estimation
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "estimated_monthly_cost" {
|
||||
description = "Estimated monthly cost breakdown"
|
||||
value = {
|
||||
aks_cluster = "~$73 (cluster management)"
|
||||
aks_nodes_default = "~$${var.default_node_pool.node_count * 140} (${var.default_node_pool.vm_size})"
|
||||
aks_nodes_system = "~$${var.system_node_pool.node_count * 70} (${var.system_node_pool.vm_size})"
|
||||
aks_nodes_compute = "~$${var.user_node_pools[0].node_count * 350} (${var.user_node_pools[0].vm_size})"
|
||||
aks_nodes_gpu = local.gpu_enabled ? "~$${var.gpu_node_pool.node_count * 2500} (${var.gpu_node_pool.vm_size})" : "$0"
|
||||
postgresql = "~$${var.postgresql_sku_name == "GP_Gen5_2" ? 150 : 300} (${var.postgresql_sku_name})"
|
||||
redis = "~$${var.redis_sku_name == "Standard" ? 100 : 200} (${var.redis_capacity}GB)"
|
||||
acr = "~$10 (Standard)"
|
||||
application_gateway = "~$30 (Standard_v2)"
|
||||
key_vault = "~$5"
|
||||
monitoring = "~$50"
|
||||
network_egress = "Variable"
|
||||
total_estimate = "See Azure Pricing Calculator for accurate pricing"
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,205 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Azure Database for PostgreSQL Configuration
|
||||
# ==============================================================================
|
||||
# Azure Database for PostgreSQL Flexible Server for OpenClaw
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# PostgreSQL Flexible Server
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_postgresql_flexible_server" "openclaw" {
|
||||
name = var.server_name
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
version = var.version
|
||||
delegated_subnet_id = var.subnet_id
|
||||
zone = "1"
|
||||
|
||||
sku_name = var.sku_name
|
||||
storage_mb = var.storage_mb
|
||||
storage_tier = "Premium"
|
||||
|
||||
administrator_login = var.administrator_login
|
||||
administrator_password = var.administrator_password
|
||||
|
||||
backup {
|
||||
backup_retention_days = var.environment == "prod" ? 35 : 7
|
||||
geo_redundant_backup_enabled = var.geo_redundant_backup_enabled
|
||||
}
|
||||
|
||||
high_availability {
|
||||
mode = var.environment == "prod" ? "ZoneRedundant" : "Disabled"
|
||||
standby_availability_zone = var.environment == "prod" ? "2" : null
|
||||
}
|
||||
|
||||
maintenance_window {
|
||||
day_of_week = 0
|
||||
start_hour = 2
|
||||
start_minute = 0
|
||||
}
|
||||
|
||||
parameters {
|
||||
name = "azure.extensions"
|
||||
value = "PGVECTOR"
|
||||
}
|
||||
|
||||
parameters {
|
||||
name = "pg_stat_statements.track"
|
||||
value = "all"
|
||||
}
|
||||
|
||||
public_network_access_enabled = var.public_network_access_enabled
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# PostgreSQL Database
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_postgresql_flexible_server_database" "openclaw" {
|
||||
name = "openclaw"
|
||||
server_id = azurerm_postgresql_flexible_server.openclaw.id
|
||||
charset = "UTF8"
|
||||
collation = "en_US.UTF8"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# PostgreSQL Firewall Rules
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_postgresql_flexible_server_firewall_rule" "allow_aks" {
|
||||
name = "AllowAKS"
|
||||
server_id = azurerm_postgresql_flexible_server.openclaw.id
|
||||
start_ip_address = split("/", var.aks_subnet_cidr)[0]
|
||||
end_ip_address = split("/", var.aks_subnet_cidr)[0]
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# PostgreSQL Private DNS Zone
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_private_dns_zone" "postgresql" {
|
||||
name = "privatelink.postgres.database.azure.com"
|
||||
resource_group_name = var.resource_group_name
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
resource "azurerm_private_dns_zone_virtual_network_link" "postgresql" {
|
||||
name = "postgresql-vnet-link"
|
||||
resource_group_name = var.resource_group_name
|
||||
private_dns_zone_name = azurerm_private_dns_zone.postgresql.name
|
||||
virtual_network_id = var.vnet_id
|
||||
registration_enabled = false
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
resource "azurerm_private_dns_a_record" "postgresql" {
|
||||
name = azurerm_postgresql_flexible_server.openclaw.name
|
||||
zone_name = azurerm_private_dns_zone.postgresql.name
|
||||
resource_group_name = var.resource_group_name
|
||||
ttl = 300
|
||||
records = [azurerm_postgresql_flexible_server.openclaw.private_ip_address]
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# PostgreSQL Diagnostic Settings
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_monitor_diagnostic_setting" "postgresql" {
|
||||
name = "${var.server_name}-diagnostics"
|
||||
target_resource_id = azurerm_postgresql_flexible_server.openclaw.id
|
||||
log_analytics_workspace_id = var.log_analytics_workspace_id
|
||||
|
||||
enabled_log {
|
||||
category = "PostgreSQLLogs"
|
||||
}
|
||||
|
||||
enabled_log {
|
||||
category = "QueryStoreRuntimeStatistics"
|
||||
}
|
||||
|
||||
enabled_log {
|
||||
category = "QueryStoreWaitStatistics"
|
||||
}
|
||||
|
||||
metric {
|
||||
category = "AllMetrics"
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# PostgreSQL Alerts
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_monitor_metric_alert" "postgresql_cpu" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${var.server_name}-cpu-alert"
|
||||
resource_group_name = var.resource_group_name
|
||||
scopes = [azurerm_postgresql_flexible_server.openclaw.id]
|
||||
description = "CPU utilization is too high"
|
||||
|
||||
criteria {
|
||||
metric_namespace = "Microsoft.DBforPostgreSQL/flexibleServers"
|
||||
metric_name = "cpu_percent"
|
||||
aggregation = "Average"
|
||||
operator = "GreaterThan"
|
||||
threshold = 80
|
||||
}
|
||||
|
||||
severity = 3
|
||||
|
||||
action {
|
||||
action_group_id = var.action_group_id
|
||||
}
|
||||
}
|
||||
|
||||
resource "azurerm_monitor_metric_alert" "postgresql_storage" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${var.server_name}-storage-alert"
|
||||
resource_group_name = var.resource_group_name
|
||||
scopes = [azurerm_postgresql_flexible_server.openclaw.id]
|
||||
description = "Storage utilization is too high"
|
||||
|
||||
criteria {
|
||||
metric_namespace = "Microsoft.DBforPostgreSQL/flexibleServers"
|
||||
metric_name = "storage_percent"
|
||||
aggregation = "Average"
|
||||
operator = "GreaterThan"
|
||||
threshold = 80
|
||||
}
|
||||
|
||||
severity = 3
|
||||
|
||||
action {
|
||||
action_group_id = var.action_group_id
|
||||
}
|
||||
}
|
||||
|
||||
resource "azurerm_monitor_metric_alert" "postgresql_connections" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${var.server_name}-connections-alert"
|
||||
resource_group_name = var.resource_group_name
|
||||
scopes = [azurerm_postgresql_flexible_server.openclaw.id]
|
||||
description = "Active connections is too high"
|
||||
|
||||
criteria {
|
||||
metric_namespace = "Microsoft.DBforPostgreSQL/flexibleServers"
|
||||
metric_name = "active_connections"
|
||||
aggregation = "Average"
|
||||
operator = "GreaterThan"
|
||||
threshold = 100
|
||||
}
|
||||
|
||||
severity = 3
|
||||
|
||||
action {
|
||||
action_group_id = var.action_group_id
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,205 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Azure Cache for Redis Configuration
|
||||
# ==============================================================================
|
||||
# Azure Cache for Redis for OpenClaw caching and session management
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Redis Cache
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_redis_cache" "openclaw" {
|
||||
name = var.cache_name
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
capacity = var.capacity
|
||||
family = var.family
|
||||
sku_name = var.sku_name
|
||||
redis_version = var.redis_version
|
||||
|
||||
enable_non_ssl_port = var.enable_non_ssl_port
|
||||
minimum_tls_version = var.minimum_tls_version
|
||||
|
||||
redis_configuration {
|
||||
maxmemory_reserved = var.capacity * 1024
|
||||
maxmemory_delta = var.capacity * 1024
|
||||
maxmemory_policy = "allkeys-lru"
|
||||
notify_keyspace_events = "KEA"
|
||||
}
|
||||
|
||||
# Private endpoint
|
||||
private_endpoint {
|
||||
name = "${var.cache_name}-pe"
|
||||
subnet_id = var.subnet_id
|
||||
}
|
||||
|
||||
tags = var.tags
|
||||
|
||||
zones = var.zones
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Redis Private Endpoint
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_private_endpoint" "redis" {
|
||||
name = "${var.cache_name}-pe"
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
subnet_id = var.subnet_id
|
||||
|
||||
private_service_connection {
|
||||
name = "${var.cache_name}-psc"
|
||||
private_connection_resource_id = azurerm_redis_cache.openclaw.id
|
||||
is_manual_connection = false
|
||||
subresource_names = ["redisCache"]
|
||||
}
|
||||
|
||||
private_dns_zone_group {
|
||||
name = "default"
|
||||
private_dns_zone_ids = [azurerm_private_dns_zone.redis.id]
|
||||
}
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Redis Private DNS Zone
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_private_dns_zone" "redis" {
|
||||
name = "privatelink.redis.cache.windows.net"
|
||||
resource_group_name = var.resource_group_name
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
resource "azurerm_private_dns_zone_virtual_network_link" "redis" {
|
||||
name = "redis-vnet-link"
|
||||
resource_group_name = var.resource_group_name
|
||||
private_dns_zone_name = azurerm_private_dns_zone.redis.name
|
||||
virtual_network_id = var.vnet_id
|
||||
registration_enabled = false
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Redis Firewall Rules (for Premium tier)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_redis_firewall_rule" "allow_aks" {
|
||||
count = var.sku_name == "Premium" ? 1 : 0
|
||||
|
||||
name = "AllowAKS"
|
||||
redis_cache_name = azurerm_redis_cache.openclaw.name
|
||||
resource_group_name = var.resource_group_name
|
||||
start_ip = split("/", var.aks_subnet_cidr)[0]
|
||||
end_ip = split("/", var.aks_subnet_cidr)[0]
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Redis Patch Schedule
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_redis_cache_patch_schedule" "openclaw" {
|
||||
redis_cache_id = azurerm_redis_cache.openclaw.id
|
||||
time_zone_name = "UTC"
|
||||
maintenance_window = "03:00-05:00"
|
||||
day_of_week = "Sunday"
|
||||
schedule_updates_enabled = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Redis Diagnostic Settings
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_monitor_diagnostic_setting" "redis" {
|
||||
name = "${var.cache_name}-diagnostics"
|
||||
target_resource_id = azurerm_redis_cache.openclaw.id
|
||||
log_analytics_workspace_id = var.log_analytics_workspace_id
|
||||
|
||||
enabled_log {
|
||||
category = "CacheMetrics"
|
||||
}
|
||||
|
||||
enabled_log {
|
||||
category = "CacheRequests"
|
||||
}
|
||||
|
||||
metric {
|
||||
category = "AllMetrics"
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Redis Alerts
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_monitor_metric_alert" "redis_cpu" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${var.cache_name}-cpu-alert"
|
||||
resource_group_name = var.resource_group_name
|
||||
scopes = [azurerm_redis_cache.openclaw.id]
|
||||
description = "CPU utilization is too high"
|
||||
|
||||
criteria {
|
||||
metric_namespace = "Microsoft.Cache/Redis"
|
||||
metric_name = "UsedMemoryPercentage"
|
||||
aggregation = "Average"
|
||||
operator = "GreaterThan"
|
||||
threshold = 80
|
||||
}
|
||||
|
||||
severity = 3
|
||||
|
||||
action {
|
||||
action_group_id = var.action_group_id
|
||||
}
|
||||
}
|
||||
|
||||
resource "azurerm_monitor_metric_alert" "redis_connections" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${var.cache_name}-connections-alert"
|
||||
resource_group_name = var.resource_group_name
|
||||
scopes = [azurerm_redis_cache.openclaw.id]
|
||||
description = "Connected clients is too high"
|
||||
|
||||
criteria {
|
||||
metric_namespace = "Microsoft.Cache/Redis"
|
||||
metric_name = "ConnectedClients"
|
||||
aggregation = "Average"
|
||||
operator = "GreaterThan"
|
||||
threshold = 100
|
||||
}
|
||||
|
||||
severity = 3
|
||||
|
||||
action {
|
||||
action_group_id = var.action_group_id
|
||||
}
|
||||
}
|
||||
|
||||
resource "azurerm_monitor_metric_alert" "redis_timeout" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${var.cache_name}-timeout-alert"
|
||||
resource_group_name = var.resource_group_name
|
||||
scopes = [azurerm_redis_cache.openclaw.id]
|
||||
description = "Server busy/timeout count is too high"
|
||||
|
||||
criteria {
|
||||
metric_namespace = "Microsoft.Cache/Redis"
|
||||
metric_name = "ServerBusy"
|
||||
aggregation = "Average"
|
||||
operator = "GreaterThan"
|
||||
threshold = 10
|
||||
}
|
||||
|
||||
severity = 2
|
||||
|
||||
action {
|
||||
action_group_id = var.action_group_id
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,370 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Azure Terraform Variables
|
||||
# ==============================================================================
|
||||
# Input variables for Azure infrastructure
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# General Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "resource_group_name" {
|
||||
description = "Azure resource group name"
|
||||
type = string
|
||||
default = "openclaw-rg"
|
||||
}
|
||||
|
||||
variable "location" {
|
||||
description = "Azure region for resources"
|
||||
type = string
|
||||
default = "eastus"
|
||||
}
|
||||
|
||||
variable "environment" {
|
||||
description = "Deployment environment (dev, staging, prod)"
|
||||
type = string
|
||||
default = "dev"
|
||||
|
||||
validation {
|
||||
condition = contains(["dev", "staging", "prod"], var.environment)
|
||||
error_message = "Environment must be one of: dev, staging, prod."
|
||||
}
|
||||
}
|
||||
|
||||
variable "app_version" {
|
||||
description = "Application version to deploy"
|
||||
type = string
|
||||
default = "2026.3.28"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# VNet Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "vnet_address_space" {
|
||||
description = "VNet address space"
|
||||
type = list(string)
|
||||
default = ["10.0.0.0/16"]
|
||||
}
|
||||
|
||||
variable "subnet_configs" {
|
||||
description = "Subnet configurations"
|
||||
type = map(object({
|
||||
name = string
|
||||
address_prefixes = list(string)
|
||||
}))
|
||||
default = {
|
||||
aks = {
|
||||
name = "aks-subnet"
|
||||
address_prefixes = ["10.0.1.0/24"]
|
||||
}
|
||||
database = {
|
||||
name = "database-subnet"
|
||||
address_prefixes = ["10.0.2.0/24"]
|
||||
}
|
||||
cache = {
|
||||
name = "cache-subnet"
|
||||
address_prefixes = ["10.0.3.0/24"]
|
||||
}
|
||||
gateway = {
|
||||
name = "gateway-subnet"
|
||||
address_prefixes = ["10.0.4.0/24"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
variable "enable_ddos_protection" {
|
||||
description = "Enable DDoS protection"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "enable_flow_logs" {
|
||||
description = "Enable NSG flow logs"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# AKS Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "kubernetes_version" {
|
||||
description = "Kubernetes version for AKS"
|
||||
type = string
|
||||
default = "1.28"
|
||||
}
|
||||
|
||||
variable "default_node_pool" {
|
||||
description = "Default node pool configuration"
|
||||
type = object({
|
||||
name = string
|
||||
vm_size = string
|
||||
node_count = number
|
||||
min_count = number
|
||||
max_count = number
|
||||
enable_auto_scaling = bool
|
||||
os_disk_size_gb = number
|
||||
type = string
|
||||
availability_zones = list(string)
|
||||
})
|
||||
default = {
|
||||
name = "default"
|
||||
vm_size = "Standard_D4s_v3"
|
||||
node_count = 2
|
||||
min_count = 1
|
||||
max_count = 4
|
||||
enable_auto_scaling = true
|
||||
os_disk_size_gb = 100
|
||||
type = "VirtualMachineScaleSets"
|
||||
availability_zones = ["1", "2", "3"]
|
||||
}
|
||||
}
|
||||
|
||||
variable "system_node_pool" {
|
||||
description = "System node pool configuration"
|
||||
type = object({
|
||||
name = string
|
||||
vm_size = string
|
||||
node_count = number
|
||||
min_count = number
|
||||
max_count = number
|
||||
enable_auto_scaling = bool
|
||||
os_disk_size_gb = number
|
||||
availability_zones = list(string)
|
||||
})
|
||||
default = {
|
||||
name = "system"
|
||||
vm_size = "Standard_D2s_v3"
|
||||
node_count = 2
|
||||
min_count = 1
|
||||
max_count = 3
|
||||
enable_auto_scaling = true
|
||||
os_disk_size_gb = 50
|
||||
availability_zones = ["1", "2", "3"]
|
||||
}
|
||||
}
|
||||
|
||||
variable "user_node_pools" {
|
||||
description = "User node pool configurations"
|
||||
type = list(object({
|
||||
name = string
|
||||
vm_size = string
|
||||
node_count = number
|
||||
min_count = number
|
||||
max_count = number
|
||||
enable_auto_scaling = bool
|
||||
os_disk_size_gb = number
|
||||
availability_zones = list(string)
|
||||
}))
|
||||
default = [
|
||||
{
|
||||
name = "compute"
|
||||
vm_size = "Standard_D8s_v3"
|
||||
node_count = 2
|
||||
min_count = 1
|
||||
max_count = 8
|
||||
enable_auto_scaling = true
|
||||
os_disk_size_gb = 200
|
||||
availability_zones = ["1", "2", "3"]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
variable "enable_gpu_support" {
|
||||
description = "Enable GPU node pool for Ollama"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "gpu_node_pool" {
|
||||
description = "GPU node pool configuration"
|
||||
type = object({
|
||||
name = string
|
||||
vm_size = string
|
||||
node_count = number
|
||||
min_count = number
|
||||
max_count = number
|
||||
enable_auto_scaling = bool
|
||||
os_disk_size_gb = number
|
||||
availability_zones = list(string)
|
||||
})
|
||||
default = {
|
||||
name = "gpu"
|
||||
vm_size = "Standard_NC4as_T4_v3"
|
||||
node_count = 1
|
||||
min_count = 0
|
||||
max_count = 4
|
||||
enable_auto_scaling = true
|
||||
os_disk_size_gb = 200
|
||||
availability_zones = ["1", "2", "3"]
|
||||
}
|
||||
}
|
||||
|
||||
variable "enable_private_cluster" {
|
||||
description = "Enable private AKS cluster"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "enable_azure_policy" {
|
||||
description = "Enable Azure Policy addon"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "enable_workload_identity" {
|
||||
description = "Enable Workload Identity"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Azure Database for PostgreSQL Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "postgresql_sku_name" {
|
||||
description = "PostgreSQL SKU name"
|
||||
type = string
|
||||
default = "GP_Gen5_2"
|
||||
}
|
||||
|
||||
variable "postgresql_storage_mb" {
|
||||
description = "PostgreSQL storage in MB"
|
||||
type = number
|
||||
default = 102400
|
||||
}
|
||||
|
||||
variable "postgresql_version" {
|
||||
description = "PostgreSQL version"
|
||||
type = string
|
||||
default = "15"
|
||||
}
|
||||
|
||||
variable "db_administrator_login" {
|
||||
description = "PostgreSQL administrator login"
|
||||
type = string
|
||||
default = "openclaw"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "db_administrator_password" {
|
||||
description = "PostgreSQL administrator password"
|
||||
type = string
|
||||
default = null
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "db_geo_redundant_backup" {
|
||||
description = "Enable geo-redundant backup"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "db_auto_grow_enabled" {
|
||||
description = "Enable storage auto-grow"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Azure Cache for Redis Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "redis_capacity" {
|
||||
description = "Redis cache capacity"
|
||||
type = number
|
||||
default = 2
|
||||
}
|
||||
|
||||
variable "redis_family" {
|
||||
description = "Redis SKU family (C, P, E)"
|
||||
type = string
|
||||
default = "C"
|
||||
}
|
||||
|
||||
variable "redis_sku_name" {
|
||||
description = "Redis SKU name (Basic, Standard, Premium)"
|
||||
type = string
|
||||
default = "Standard"
|
||||
}
|
||||
|
||||
variable "redis_version" {
|
||||
description = "Redis version"
|
||||
type = string
|
||||
default = "6"
|
||||
}
|
||||
|
||||
variable "redis_password" {
|
||||
description = "Redis authentication password"
|
||||
type = string
|
||||
default = null
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "redis_zones" {
|
||||
description = "Availability zones for Redis"
|
||||
type = list(string)
|
||||
default = ["1", "2", "3"]
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Azure Container Registry Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "acr_sku" {
|
||||
description = "ACR SKU (Basic, Standard, Premium)"
|
||||
type = string
|
||||
default = "Standard"
|
||||
}
|
||||
|
||||
variable "acr_retention_policy_days" {
|
||||
description = "ACR retention policy days"
|
||||
type = number
|
||||
default = 30
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Application Gateway Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "gateway_sku_name" {
|
||||
description = "Application Gateway SKU"
|
||||
type = string
|
||||
default = "Standard_v2"
|
||||
}
|
||||
|
||||
variable "gateway_capacity" {
|
||||
description = "Application Gateway capacity"
|
||||
type = number
|
||||
default = 2
|
||||
}
|
||||
|
||||
variable "ssl_certificate_key_vault_secret_id" {
|
||||
description = "Key Vault secret ID for SSL certificate"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "ssl_certificate_data" {
|
||||
description = "Base64 encoded SSL certificate data"
|
||||
type = string
|
||||
default = null
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Monitoring Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "enable_monitoring_alerts" {
|
||||
description = "Enable monitoring alerts"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "alert_email" {
|
||||
description = "Email for alert notifications"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
@@ -0,0 +1,298 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Azure VNet Configuration
|
||||
# ==============================================================================
|
||||
# Virtual Network for OpenClaw infrastructure
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Virtual Network
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_virtual_network" "openclaw" {
|
||||
name = var.vnet_name
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
address_space = var.vnet_address_space
|
||||
|
||||
dynamic "ddos_protection_plan" {
|
||||
for_each = var.enable_ddos_protection ? [1] : []
|
||||
content {
|
||||
id = azurerm_ddos_protection_plan.openclaw[0].id
|
||||
enable = true
|
||||
}
|
||||
}
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# DDoS Protection Plan (Optional)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_ddos_protection_plan" "openclaw" {
|
||||
count = var.enable_ddos_protection ? 1 : 0
|
||||
|
||||
name = "${var.vnet_name}-ddos"
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Subnets
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_subnet" "aks" {
|
||||
name = var.subnet_configs.aks.name
|
||||
resource_group_name = var.resource_group_name
|
||||
virtual_network_name = azurerm_virtual_network.openclaw.name
|
||||
address_prefixes = var.subnet_configs.aks.address_prefixes
|
||||
|
||||
delegation {
|
||||
name = "aks-delegation"
|
||||
|
||||
service_delegation {
|
||||
name = "Microsoft.ContainerService/managedClusters"
|
||||
actions = ["Microsoft.Network/virtualNetworks/subnets/action"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "azurerm_subnet" "database" {
|
||||
name = var.subnet_configs.database.name
|
||||
resource_group_name = var.resource_group_name
|
||||
virtual_network_name = azurerm_virtual_network.openclaw.name
|
||||
address_prefixes = var.subnet_configs.database.address_prefixes
|
||||
|
||||
delegation {
|
||||
name = "database-delegation"
|
||||
|
||||
service_delegation {
|
||||
name = "Microsoft.DBforPostgreSQL/servers"
|
||||
actions = ["Microsoft.Network/virtualNetworks/subnets/action"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "azurerm_subnet" "cache" {
|
||||
name = var.subnet_configs.cache.name
|
||||
resource_group_name = var.resource_group_name
|
||||
virtual_network_name = azurerm_virtual_network.openclaw.name
|
||||
address_prefixes = var.subnet_configs.cache.address_prefixes
|
||||
|
||||
delegation {
|
||||
name = "cache-delegation"
|
||||
|
||||
service_delegation {
|
||||
name = "Microsoft.Cache/redis"
|
||||
actions = ["Microsoft.Network/virtualNetworks/subnets/action"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "azurerm_subnet" "gateway" {
|
||||
name = var.subnet_configs.gateway.name
|
||||
resource_group_name = var.resource_group_name
|
||||
virtual_network_name = azurerm_virtual_network.openclaw.name
|
||||
address_prefixes = var.subnet_configs.gateway.address_prefixes
|
||||
|
||||
delegation {
|
||||
name = "gateway-delegation"
|
||||
|
||||
service_delegation {
|
||||
name = "Microsoft.Network/applicationGateways"
|
||||
actions = ["Microsoft.Network/virtualNetworks/subnets/action"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Network Security Groups
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_network_security_group" "aks" {
|
||||
name = "${var.vnet_name}-aks-nsg"
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
resource "azurerm_network_security_group" "database" {
|
||||
name = "${var.vnet_name}-database-nsg"
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
resource "azurerm_network_security_group" "cache" {
|
||||
name = "${var.vnet_name}-cache-nsg"
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
resource "azurerm_network_security_group" "gateway" {
|
||||
name = "${var.vnet_name}-gateway-nsg"
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# NSG Security Rules
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
# AKS NSG Rules
|
||||
resource "azurerm_network_security_rule" "aks_allow_inbound" {
|
||||
name = "AllowInboundAKS"
|
||||
priority = 100
|
||||
direction = "Inbound"
|
||||
access = "Allow"
|
||||
protocol = "Tcp"
|
||||
source_port_range = "*"
|
||||
destination_port_range = "6443"
|
||||
source_address_prefix = "*"
|
||||
destination_address_prefix = "*"
|
||||
resource_group_name = var.resource_group_name
|
||||
network_security_group_name = azurerm_network_security_group.aks.name
|
||||
}
|
||||
|
||||
resource "azurerm_network_security_rule" "aks_allow_node" {
|
||||
name = "AllowNodeCommunication"
|
||||
priority = 101
|
||||
direction = "Inbound"
|
||||
access = "Allow"
|
||||
protocol = "*"
|
||||
source_port_range = "*"
|
||||
destination_port_range = "0-65535"
|
||||
source_address_prefix = azurerm_virtual_network.openclaw.address_space[0]
|
||||
destination_address_prefix = azurerm_virtual_network.openclaw.address_space[0]
|
||||
resource_group_name = var.resource_group_name
|
||||
network_security_group_name = azurerm_network_security_group.aks.name
|
||||
}
|
||||
|
||||
# Database NSG Rules
|
||||
resource "azurerm_network_security_rule" "database_allow_postgresql" {
|
||||
name = "AllowPostgreSQL"
|
||||
priority = 100
|
||||
direction = "Inbound"
|
||||
access = "Allow"
|
||||
protocol = "Tcp"
|
||||
source_port_range = "*"
|
||||
destination_port_range = "5432"
|
||||
source_address_prefix = azurerm_subnet.aks.address_prefixes[0]
|
||||
destination_address_prefix = "*"
|
||||
resource_group_name = var.resource_group_name
|
||||
network_security_group_name = azurerm_network_security_group.database.name
|
||||
}
|
||||
|
||||
# Cache NSG Rules
|
||||
resource "azurerm_network_security_rule" "cache_allow_redis" {
|
||||
name = "AllowRedis"
|
||||
priority = 100
|
||||
direction = "Inbound"
|
||||
access = "Allow"
|
||||
protocol = "Tcp"
|
||||
source_port_range = "*"
|
||||
destination_port_range = "6379"
|
||||
source_address_prefix = azurerm_subnet.aks.address_prefixes[0]
|
||||
destination_address_prefix = "*"
|
||||
resource_group_name = var.resource_group_name
|
||||
network_security_group_name = azurerm_network_security_group.cache.name
|
||||
}
|
||||
|
||||
# Gateway NSG Rules
|
||||
resource "azurerm_network_security_rule" "gateway_allow_http" {
|
||||
name = "AllowHTTP"
|
||||
priority = 100
|
||||
direction = "Inbound"
|
||||
access = "Allow"
|
||||
protocol = "Tcp"
|
||||
source_port_range = "*"
|
||||
destination_port_range = "80"
|
||||
source_address_prefix = "*"
|
||||
destination_address_prefix = "*"
|
||||
resource_group_name = var.resource_group_name
|
||||
network_security_group_name = azurerm_network_security_group.gateway.name
|
||||
}
|
||||
|
||||
resource "azurerm_network_security_rule" "gateway_allow_https" {
|
||||
name = "AllowHTTPS"
|
||||
priority = 101
|
||||
direction = "Inbound"
|
||||
access = "Allow"
|
||||
protocol = "Tcp"
|
||||
source_port_range = "*"
|
||||
destination_port_range = "443"
|
||||
source_address_prefix = "*"
|
||||
destination_address_prefix = "*"
|
||||
resource_group_name = var.resource_group_name
|
||||
network_security_group_name = azurerm_network_security_group.gateway.name
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Subnet NSG Associations
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_subnet_network_security_group_association" "aks" {
|
||||
subnet_id = azurerm_subnet.aks.id
|
||||
network_security_group_id = azurerm_network_security_group.aks.id
|
||||
}
|
||||
|
||||
resource "azurerm_subnet_network_security_group_association" "database" {
|
||||
subnet_id = azurerm_subnet.database.id
|
||||
network_security_group_id = azurerm_network_security_group.database.id
|
||||
}
|
||||
|
||||
resource "azurerm_subnet_network_security_group_association" "cache" {
|
||||
subnet_id = azurerm_subnet.cache.id
|
||||
network_security_group_id = azurerm_network_security_group.cache.id
|
||||
}
|
||||
|
||||
resource "azurerm_subnet_network_security_group_association" "gateway" {
|
||||
subnet_id = azurerm_subnet.gateway.id
|
||||
network_security_group_id = azurerm_network_security_group.gateway.id
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Flow Logs (Optional)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "azurerm_network_watcher" "openclaw" {
|
||||
count = var.enable_flow_logs ? 1 : 0
|
||||
|
||||
name = "${var.vnet_name}-watcher"
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
resource "azurerm_log_analytics_workspace" "flow_logs" {
|
||||
count = var.enable_flow_logs ? 1 : 0
|
||||
|
||||
name = "${var.vnet_name}-flow-logs-log"
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
sku = "PerGB2018"
|
||||
retention_in_days = 30
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
resource "azurerm_storage_account" "flow_logs" {
|
||||
count = var.enable_flow_logs ? 1 : 0
|
||||
|
||||
name = "${var.vnet_name}flowlogs"
|
||||
location = var.location
|
||||
resource_group_name = var.resource_group_name
|
||||
account_tier = "Standard"
|
||||
account_replication_type = "LRS"
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
@@ -0,0 +1,539 @@
|
||||
# GCP Deployment Guide for Heretek OpenClaw
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Last Updated:** 2026-03-31
|
||||
**OpenClaw Version:** v2026.3.28
|
||||
|
||||
This guide provides comprehensive instructions for deploying Heretek OpenClaw on Google Cloud Platform (GCP) using Terraform Infrastructure as Code (IaC).
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Prerequisites](#prerequisites)
|
||||
3. [Architecture](#architecture)
|
||||
4. [Cost Estimates](#cost-estimates)
|
||||
5. [Quick Start](#quick-start)
|
||||
6. [Configuration](#configuration)
|
||||
7. [Deployment Steps](#deployment-steps)
|
||||
8. [Post-Deployment](#post-deployment)
|
||||
9. [GPU Support](#gpu-support)
|
||||
10. [Monitoring](#monitoring)
|
||||
11. [Backup & Recovery](#backup--recovery)
|
||||
12. [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This Terraform configuration deploys a production-ready OpenClaw environment on GCP with:
|
||||
|
||||
- **GKE (Google Kubernetes Engine)** - Managed Kubernetes cluster
|
||||
- **Cloud SQL PostgreSQL** - Managed PostgreSQL with pgvector support
|
||||
- **Memorystore Redis** - Managed Redis for caching and sessions
|
||||
- **Artifact Registry** - Private container registry
|
||||
- **Cloud Load Balancing** - Traffic routing and SSL termination
|
||||
- **Cloud Monitoring** - Metrics, logging, and alerting
|
||||
|
||||
### Components
|
||||
|
||||
| Component | Service | Purpose |
|
||||
|-----------|---------|---------|
|
||||
| Gateway | GKE | OpenClaw Gateway (port 18789) |
|
||||
| LiteLLM | GKE | LLM proxy and routing (port 4000) |
|
||||
| Database | Cloud SQL PostgreSQL 15 | Primary data store with pgvector |
|
||||
| Cache | Memorystore Redis 7 | Session management, caching |
|
||||
| Container Registry | Artifact Registry | Private image storage |
|
||||
| Load Balancer | Cloud LB | HTTPS termination, routing |
|
||||
| Monitoring | Cloud Monitoring | Metrics, logs, alerts |
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Required Tools
|
||||
|
||||
```bash
|
||||
# Install Terraform
|
||||
brew install terraform # macOS
|
||||
# or download from https://www.terraform.io/downloads
|
||||
|
||||
# Install Google Cloud SDK
|
||||
brew install --cask google-cloud-sdk # macOS
|
||||
# or follow https://cloud.google.com/sdk/docs/install
|
||||
|
||||
# Install kubectl
|
||||
brew install kubectl
|
||||
|
||||
# Install Helm
|
||||
brew install helm
|
||||
```
|
||||
|
||||
### GCP Account Setup
|
||||
|
||||
1. **GCP Project** - Active GCP project with billing enabled
|
||||
2. **Service Account** - Service account with required permissions
|
||||
3. **Budget Alert** - Set up billing alerts in GCP Console
|
||||
|
||||
### Configure GCP Credentials
|
||||
|
||||
```bash
|
||||
# Authenticate with Google Cloud
|
||||
gcloud auth login
|
||||
|
||||
# Set project
|
||||
gcloud config set project YOUR_PROJECT_ID
|
||||
|
||||
# Create service account for Terraform
|
||||
gcloud iam service-accounts create terraform \
|
||||
--display-name "Terraform Service Account"
|
||||
|
||||
# Grant required permissions
|
||||
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
|
||||
--member="serviceAccount:terraform@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
|
||||
--role="roles/editor"
|
||||
|
||||
# Create and download key
|
||||
gcloud iam service-accounts keys create terraform-key.json \
|
||||
--iam-account=terraform@YOUR_PROJECT_ID.iam.gserviceaccount.com
|
||||
|
||||
# Set environment variable
|
||||
export GOOGLE_APPLICATION_CREDENTIALS=$(pwd)/terraform-key.json
|
||||
```
|
||||
|
||||
### Required GCP Permissions
|
||||
|
||||
| Service | Required Permissions |
|
||||
|---------|---------------------|
|
||||
| GKE | Container Admin |
|
||||
| Compute Engine | Compute Admin |
|
||||
| Cloud SQL | Cloud SQL Admin |
|
||||
| Memorystore | Redis Admin |
|
||||
| Artifact Registry | Artifact Registry Admin |
|
||||
| Cloud Load Balancing | Load Balancing Admin |
|
||||
| IAM | Service Account Admin |
|
||||
| Cloud Monitoring | Monitoring Admin |
|
||||
| Secret Manager | Secret Manager Admin |
|
||||
| Cloud KMS | KMS Admin |
|
||||
|
||||
### Enable Required APIs
|
||||
|
||||
```bash
|
||||
gcloud services enable \
|
||||
container.googleapis.com \
|
||||
sqladmin.googleapis.com \
|
||||
redis.googleapis.com \
|
||||
artifactregistry.googleapis.com \
|
||||
servicenetworking.googleapis.com \
|
||||
monitoring.googleapis.com \
|
||||
secretmanager.googleapis.com \
|
||||
cloudkms.googleapis.com
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ Google Cloud Platform │
|
||||
│ us-central1 │
|
||||
└─────────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────────┼─────────────────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
|
||||
│ Zone A │ │ Zone B │ │ Zone C │
|
||||
│ (us-central1-a) │ │ (us-central1-b) │ │ (us-central1-c) │
|
||||
│ │ │ │ │ │
|
||||
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
|
||||
│ │ GKE Nodes │ │ │ │ GKE Nodes │ │ │ │ GKE Nodes │ │
|
||||
│ │ (General) │ │ │ │ (Compute) │ │ │ │ (GPU) │ │
|
||||
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
|
||||
│ │ │ │ │ │
|
||||
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
|
||||
│ │ Cloud SQL │ │ │ │ Memorystore │ │ │ │ Artifact │ │
|
||||
│ │ Primary │ │ │ │ Redis │ │ │ │ Registry │ │
|
||||
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
|
||||
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
|
||||
│ │ │
|
||||
└─────────────────────────────────┼─────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────────┼─────────────────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Cloud Load Balancing │
|
||||
│ (Global HTTP(S) LB) │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Cloud Monitoring │
|
||||
│ (Metrics, Logging, Alerting) │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cost Estimates
|
||||
|
||||
### Development Environment
|
||||
|
||||
| Resource | Configuration | Monthly Cost (USD) |
|
||||
|----------|--------------|-------------------|
|
||||
| GKE Cluster | Autopilot/Standard | $73.00 |
|
||||
| GKE Nodes | 2x n2-standard-4 | $280.00 |
|
||||
| Cloud SQL | db-custom-4-15360, 100GB | $150.00 |
|
||||
| Memorystore | 4GB STANDARD_HA | $150.00 |
|
||||
| Cloud Load Balancer | External | $18.00 |
|
||||
| Artifact Registry | 10GB | $2.50 |
|
||||
| Cloud Monitoring | Standard | $5.00 |
|
||||
| Network Egress | Estimated | $30.00 |
|
||||
| **Total** | | **~$708.50/month** |
|
||||
|
||||
### Production Environment
|
||||
|
||||
| Resource | Configuration | Monthly Cost (USD) |
|
||||
|----------|--------------|-------------------|
|
||||
| GKE Cluster | Standard | $73.00 |
|
||||
| GKE Nodes General | 3x n2-standard-8 | $840.00 |
|
||||
| GKE Nodes Compute | 4x c2-standard-16 | $2,400.00 |
|
||||
| GKE Nodes GPU | 2x g2-standard-4 (L4) | $3,000.00 |
|
||||
| Cloud SQL | db-custom-8-30720, Multi-Region, 200GB | $600.00 |
|
||||
| Memorystore | 16GB STANDARD_HA | $600.00 |
|
||||
| Cloud Load Balancer | External | $18.00 |
|
||||
| Artifact Registry | 50GB | $12.50 |
|
||||
| Cloud Monitoring | Premium | $50.00 |
|
||||
| Network Egress | Estimated | $150.00 |
|
||||
| **Total** | | **~$7,743.50/month** |
|
||||
|
||||
> **Note:** GPU costs are significant. Consider using preemptible VMs or autoscaling for cost optimization.
|
||||
|
||||
### Cost Optimization Tips
|
||||
|
||||
1. **Use Committed Use Discounts** for predictable workloads (up to 57% savings)
|
||||
2. **Enable GKE Autopilot** for automatic resource optimization
|
||||
3. **Use Cloud SQL on-demand backups** instead of high availability for dev
|
||||
4. **Right-size instances** based on actual usage
|
||||
5. **Enable Cloud Monitoring budget alerts**
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Clone Repository
|
||||
|
||||
```bash
|
||||
git clone https://github.com/Heretek-AI/heretek-openclaw.git
|
||||
cd heretek-openclaw/deploy/gcp/terraform
|
||||
```
|
||||
|
||||
### Initialize Terraform
|
||||
|
||||
```bash
|
||||
terraform init
|
||||
```
|
||||
|
||||
### Create Terraform Variables File
|
||||
|
||||
```bash
|
||||
cat > terraform.tfvars <<EOF
|
||||
project_id = "your-gcp-project-id"
|
||||
region = "us-central1"
|
||||
environment = "dev"
|
||||
|
||||
vpc_cidr = "10.0.0.0/16"
|
||||
|
||||
db_password = "generate-secure-password"
|
||||
redis_auth_string = "generate-secure-token"
|
||||
|
||||
# Optional: GPU support for Ollama
|
||||
enable_gpu_support = false
|
||||
|
||||
# Optional: Custom domain
|
||||
managed_domain = "openclaw.example.com"
|
||||
EOF
|
||||
```
|
||||
|
||||
### Plan and Apply
|
||||
|
||||
```bash
|
||||
# Review the plan
|
||||
terraform plan -out=tfplan
|
||||
|
||||
# Apply the configuration
|
||||
terraform apply tfplan
|
||||
```
|
||||
|
||||
### Configure kubectl
|
||||
|
||||
```bash
|
||||
gcloud container clusters get-credentials openclaw-dev-gke --region us-central1
|
||||
```
|
||||
|
||||
### Deploy OpenClaw to GKE
|
||||
|
||||
```bash
|
||||
cd ../../kubernetes
|
||||
kubectl apply -k overlays/dev
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Input Variables
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|----------|-------------|---------|----------|
|
||||
| `project_id` | GCP project ID | - | Yes |
|
||||
| `region` | GCP region | `us-central1` | No |
|
||||
| `environment` | Environment name | `dev` | Yes |
|
||||
| `vpc_cidr` | VPC CIDR block | `10.0.0.0/16` | No |
|
||||
| `enable_gpu_support` | Enable GPU nodes | `false` | No |
|
||||
| `db_password` | Cloud SQL password | `null` | Yes |
|
||||
| `redis_auth_string` | Redis AUTH string | `null` | Yes |
|
||||
| `managed_domain` | Custom domain | `null` | No |
|
||||
|
||||
### Environment-Specific Overrides
|
||||
|
||||
#### Development (`terraform.dev.tfvars`)
|
||||
|
||||
```hcl
|
||||
environment = "dev"
|
||||
db_high_availability = false
|
||||
redis_tier = "BASIC"
|
||||
enable_monitoring_alerts = false
|
||||
|
||||
node_pools = {
|
||||
general = {
|
||||
machine_type = "n2-standard-2"
|
||||
min_count = 1
|
||||
max_count = 2
|
||||
initial_count = 1
|
||||
}
|
||||
compute = {
|
||||
machine_type = "c2-standard-4"
|
||||
min_count = 0
|
||||
max_count = 2
|
||||
initial_count = 1
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Production (`terraform.prod.tfvars`)
|
||||
|
||||
```hcl
|
||||
environment = "prod"
|
||||
db_high_availability = true
|
||||
redis_tier = "STANDARD_HA"
|
||||
enable_monitoring_alerts = true
|
||||
|
||||
node_pools = {
|
||||
general = {
|
||||
machine_type = "n2-standard-8"
|
||||
min_count = 3
|
||||
max_count = 10
|
||||
initial_count = 3
|
||||
}
|
||||
compute = {
|
||||
machine_type = "c2-standard-16"
|
||||
min_count = 2
|
||||
max_count = 20
|
||||
initial_count = 4
|
||||
}
|
||||
gpu = {
|
||||
machine_type = "g2-standard-4"
|
||||
accelerator_type = "nvidia-l4"
|
||||
accelerator_count = 1
|
||||
min_count = 1
|
||||
max_count = 4
|
||||
initial_count = 2
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Steps
|
||||
|
||||
### Step 1: Prepare GCP Project
|
||||
|
||||
```bash
|
||||
# Verify gcloud configuration
|
||||
gcloud config list
|
||||
|
||||
# Check project billing
|
||||
gcloud billing accounts list
|
||||
|
||||
# Enable required APIs
|
||||
gcloud services enable \
|
||||
container.googleapis.com \
|
||||
sqladmin.googleapis.com \
|
||||
redis.googleapis.com \
|
||||
artifactregistry.googleapis.com
|
||||
```
|
||||
|
||||
### Step 2: Configure Terraform Backend
|
||||
|
||||
```bash
|
||||
# Create GCS bucket for state
|
||||
gsutil mb -p YOUR_PROJECT_ID -l us-central1 gs://openclaw-terraform-state
|
||||
|
||||
# Enable versioning
|
||||
gsutil versioning set on gs://openclaw-terraform-state
|
||||
|
||||
# Create lock table (using Firestore)
|
||||
gcloud firestore databases create --location us-central --type FIRESTORE_MODE
|
||||
```
|
||||
|
||||
### Step 3: Initialize and Apply
|
||||
|
||||
```bash
|
||||
# Initialize with GCS backend
|
||||
terraform init \
|
||||
-backend-config="bucket=openclaw-terraform-state" \
|
||||
-backend-config="prefix=openclaw/dev/terraform.tfstate"
|
||||
|
||||
# Plan
|
||||
terraform plan -var-file=terraform.dev.tfvars -out=tfplan
|
||||
|
||||
# Apply
|
||||
terraform apply tfplan
|
||||
```
|
||||
|
||||
### Step 4: Verify Deployment
|
||||
|
||||
```bash
|
||||
# Check GKE cluster
|
||||
gcloud container clusters describe openclaw-dev-gke --region us-central1
|
||||
|
||||
# Check Cloud SQL instance
|
||||
gcloud sql instances describe openclaw-dev-pg
|
||||
|
||||
# Check Memorystore instance
|
||||
gcloud redis instances describe openclaw-dev-redis --region us-central1
|
||||
|
||||
# Check Artifact Registry
|
||||
gcloud artifacts repositories describe openclaw-dev-registry --location us-central1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Post-Deployment
|
||||
|
||||
### Configure kubectl
|
||||
|
||||
```bash
|
||||
# Get cluster credentials
|
||||
gcloud container clusters get-credentials openclaw-dev-gke --region us-central1
|
||||
|
||||
# Verify cluster access
|
||||
kubectl get nodes
|
||||
kubectl get namespaces
|
||||
```
|
||||
|
||||
### Deploy OpenClaw Helm Chart
|
||||
|
||||
```bash
|
||||
# Deploy using Helm
|
||||
helm install openclaw ./charts/openclaw \
|
||||
--namespace openclaw \
|
||||
--create-namespace \
|
||||
--values values.dev.yaml \
|
||||
--set image.repository=us-central1-docker.pkg.dev/YOUR_PROJECT_ID/openclaw-dev-registry/openclaw-gateway \
|
||||
--set litellm.image.repository=us-central1-docker.pkg.dev/YOUR_PROJECT_ID/openclaw-dev-registry/litellm-proxy
|
||||
```
|
||||
|
||||
### Configure Secrets
|
||||
|
||||
```bash
|
||||
# Create Kubernetes secrets
|
||||
kubectl create secret generic openclaw-secrets \
|
||||
--namespace openclaw \
|
||||
--from-literal=database-url="postgresql://openclaw:password@PRIVATE_IP:5432/openclaw" \
|
||||
--from-literal=redis-url="redis://:token@MEMORystore_HOST:6379" \
|
||||
--from-literal=minimax-api-key="your-minimax-key" \
|
||||
--from-literal=zai-api-key="your-zai-key"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## GPU Support
|
||||
|
||||
### Enable GPU Nodes
|
||||
|
||||
```hcl
|
||||
# terraform.tfvars
|
||||
enable_gpu_support = true
|
||||
gpu_node_pool = {
|
||||
machine_type = "g2-standard-4"
|
||||
accelerator_type = "nvidia-l4"
|
||||
accelerator_count = 1
|
||||
min_count = 0
|
||||
max_count = 4
|
||||
initial_count = 1
|
||||
}
|
||||
```
|
||||
|
||||
### Install NVIDIA Device Plugin
|
||||
|
||||
```bash
|
||||
kubectl apply -f https://raw.githubusercontent.com/GoogleContainerTools/kpt-packages/master/second-party/nvidia-device-plugin/gke.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Cloud Monitoring Dashboard
|
||||
|
||||
The deployment creates a Cloud Monitoring dashboard with:
|
||||
|
||||
- GKE cluster metrics
|
||||
- Node pool metrics
|
||||
- Cloud SQL metrics
|
||||
- Memorystore metrics
|
||||
- Load Balancer metrics
|
||||
- Application logs
|
||||
|
||||
### Access Dashboard
|
||||
|
||||
```bash
|
||||
# Open in GCP Console
|
||||
open "https://console.cloud.google.com/monitoring/dashboards"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Backup & Recovery
|
||||
|
||||
### Automated Backups
|
||||
|
||||
| Resource | Backup Strategy | Retention |
|
||||
|----------|----------------|-----------|
|
||||
| Cloud SQL | Automated + On-demand | 7 days |
|
||||
| Memorystore | Persistence enabled | Manual |
|
||||
| Artifact Registry | Lifecycle policy | 30 days |
|
||||
| Terraform State | GCS versioning | Unlimited |
|
||||
|
||||
---
|
||||
|
||||
## Cleanup
|
||||
|
||||
### Destroy Infrastructure
|
||||
|
||||
```bash
|
||||
# Delete Kubernetes resources first
|
||||
kubectl delete namespace openclaw
|
||||
|
||||
# Destroy Terraform resources
|
||||
terraform destroy -var-file=terraform.dev.tfvars
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
🦞 *The thought that never ends.*
|
||||
@@ -0,0 +1,181 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - GCP Artifact Registry Configuration
|
||||
# ==============================================================================
|
||||
# Artifact Registry for OpenClaw container images
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Artifact Registry Repository
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_artifact_registry_repository" "openclaw" {
|
||||
location = var.location
|
||||
repository_id = var.repository_name
|
||||
project = var.project_id
|
||||
format = var.format
|
||||
description = "Artifact Registry for Heretek OpenClaw container images"
|
||||
|
||||
# Cleanup policy
|
||||
dynamic "cleanup_policy" {
|
||||
for_each = var.cleanup_policy_days > 0 ? [1] : []
|
||||
content {
|
||||
id = "expire-old-images"
|
||||
action = "DELETE"
|
||||
condition {
|
||||
tag_state = "UNTAGGED"
|
||||
older_than = "${var.cleanup_policy_days}d"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Cleanup policy for tagged images
|
||||
dynamic "cleanup_policy" {
|
||||
for_each = var.cleanup_policy_days > 0 ? [1] : []
|
||||
content {
|
||||
id = "keep-recent-tagged"
|
||||
action = "DELETE"
|
||||
condition {
|
||||
tag_prefixes = ["latest", "main"]
|
||||
count = 10
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Maven configuration (if needed)
|
||||
maven_config {
|
||||
version_policy = "VERSION_POLICY_RELEASE"
|
||||
}
|
||||
|
||||
labels = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# IAM Permissions
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_artifact_registry_repository_iam_member" "openclaw_reader" {
|
||||
project = var.project_id
|
||||
location = var.location
|
||||
repository = google_artifact_registry_repository.openclaw.name
|
||||
role = "roles/artifactregistry.reader"
|
||||
member = "serviceAccount:${var.project_id}.svc.id.goog[openclaw/openclaw-sa]"
|
||||
}
|
||||
|
||||
resource "google_artifact_registry_repository_iam_member" "openclaw_writer" {
|
||||
project = var.project_id
|
||||
location = var.location
|
||||
repository = google_artifact_registry_repository.openclaw.name
|
||||
role = "roles/artifactregistry.writer"
|
||||
member = "serviceAccount:service-${data.google_project.project.number}@gcp-sa-artifactregistry.iam.gserviceaccount.com"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Remote Repository (for caching external images)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_artifact_registry_repository" "docker_hub_cache" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
location = var.location
|
||||
repository_id = "${var.repository_name}-docker-hub-cache"
|
||||
project = var.project_id
|
||||
format = "DOCKER"
|
||||
description = "Docker Hub cache for OpenClaw"
|
||||
|
||||
mode = "REMOTE_REPOSITORY"
|
||||
remote_repository_config {
|
||||
description = "Docker Hub remote repository"
|
||||
dockerhub_repository {}
|
||||
}
|
||||
|
||||
cleanup_policy_dry_run = false
|
||||
|
||||
labels = var.tags
|
||||
}
|
||||
|
||||
resource "google_artifact_registry_repository" "ghcr_cache" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
location = var.location
|
||||
repository_id = "${var.repository_name}-ghcr-cache"
|
||||
project = var.project_id
|
||||
format = "DOCKER"
|
||||
description = "GitHub Container Registry cache for OpenClaw"
|
||||
|
||||
mode = "REMOTE_REPOSITORY"
|
||||
remote_repository_config {
|
||||
description = "GitHub Container Registry remote repository"
|
||||
docker_repository {
|
||||
custom_repository {
|
||||
uri = "ghcr.io"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
cleanup_policy_dry_run = false
|
||||
|
||||
labels = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Virtual Repository (for unified access)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_artifact_registry_repository" "openclaw_virtual" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
location = var.location
|
||||
repository_id = "${var.repository_name}-virtual"
|
||||
project = var.project_id
|
||||
format = "DOCKER"
|
||||
description = "Virtual repository for OpenClaw"
|
||||
|
||||
mode = "VIRTUAL_REPOSITORY"
|
||||
virtual_repository_config {
|
||||
upstream_policies {
|
||||
id = "upstream-docker-hub"
|
||||
repository_id = google_artifact_registry_repository.docker_hub_cache[0].repository_id
|
||||
priority = 1
|
||||
}
|
||||
upstream_policies {
|
||||
id = "upstream-ghcr"
|
||||
repository_id = google_artifact_registry_repository.ghcr_cache[0].repository_id
|
||||
priority = 2
|
||||
}
|
||||
upstream_policies {
|
||||
id = "upstream-local"
|
||||
repository_id = google_artifact_registry_repository.openclaw.repository_id
|
||||
priority = 3
|
||||
}
|
||||
}
|
||||
|
||||
labels = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# KMS Key for Encryption (Optional)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_kms_key_ring" "artifact_registry" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${var.repository_name}-keyring"
|
||||
project = var.project_id
|
||||
location = var.location
|
||||
|
||||
labels = var.tags
|
||||
}
|
||||
|
||||
resource "google_kms_crypto_key" "artifact_registry" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${var.repository_name}-key"
|
||||
key_ring = google_kms_key_ring.artifact_registry[0].id
|
||||
purpose = "ENCRYPT_DECRYPT"
|
||||
|
||||
lifecycle {
|
||||
prevent_destroy = false
|
||||
}
|
||||
|
||||
labels = var.tags
|
||||
}
|
||||
@@ -0,0 +1,234 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - GCP Cloud SQL Configuration
|
||||
# ==============================================================================
|
||||
# Cloud SQL PostgreSQL database for OpenClaw
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cloud SQL Instance
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_sql_database_instance" "openclaw" {
|
||||
name = var.instance_name
|
||||
project = var.project_id
|
||||
region = var.region
|
||||
database_version = var.database_version
|
||||
deletion_protection = var.environment == "prod"
|
||||
|
||||
settings {
|
||||
tier = var.tier
|
||||
disk_size = var.disk_size
|
||||
disk_type = var.disk_type
|
||||
availability_type = var.high_availability ? "REGIONAL" : "ZONAL"
|
||||
|
||||
# Backup configuration
|
||||
backup_configuration {
|
||||
enabled = var.backup_enabled
|
||||
start_time = var.backup_start_time
|
||||
point_in_time_recovery_enabled = var.point_in_time_recovery
|
||||
transaction_log_retention_days = var.backup_enabled ? 7 : null
|
||||
}
|
||||
|
||||
# IP configuration
|
||||
ip_configuration {
|
||||
ipv4_enabled = false
|
||||
private_network = var.network
|
||||
require_ssl = true
|
||||
}
|
||||
|
||||
# Query insights
|
||||
insights_config {
|
||||
query_insights_enabled = var.query_insights_enabled
|
||||
query_string_length = 1024
|
||||
record_application_tags = true
|
||||
record_client_address = true
|
||||
}
|
||||
|
||||
# Maintenance
|
||||
maintenance_window {
|
||||
day = 1
|
||||
hour = 3
|
||||
update_track = "stable"
|
||||
}
|
||||
|
||||
# Labels
|
||||
user_labels = var.tags
|
||||
}
|
||||
|
||||
depends_on = [google_service_networking_connection.private_vpc_connection]
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cloud SQL Database
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_sql_database" "openclaw" {
|
||||
name = var.database_name
|
||||
project = var.project_id
|
||||
instance = google_sql_database_instance.openclaw.name
|
||||
charset = "UTF8"
|
||||
collation = "en_US.UTF8"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cloud SQL User
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_sql_user" "openclaw" {
|
||||
name = var.database_user
|
||||
project = var.project_id
|
||||
instance = google_sql_database_instance.openclaw.name
|
||||
password = var.database_password
|
||||
deletion_policy = "ABANDON"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cloud SQL Read Replica (Optional for Production)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_sql_database_instance" "openclaw_replica" {
|
||||
count = var.environment == "prod" && var.high_availability ? 1 : 0
|
||||
|
||||
name = "${var.instance_name}-replica"
|
||||
project = var.project_id
|
||||
region = var.region
|
||||
database_version = var.database_version
|
||||
master_instance_name = google_sql_database_instance.openclaw.name
|
||||
replica_configuration {
|
||||
failover_target = false
|
||||
}
|
||||
|
||||
settings {
|
||||
tier = var.tier
|
||||
disk_size = var.disk_size
|
||||
disk_type = var.disk_type
|
||||
availability_type = "ZONAL"
|
||||
|
||||
backup_configuration {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
ip_configuration {
|
||||
ipv4_enabled = false
|
||||
private_network = var.network
|
||||
require_ssl = true
|
||||
}
|
||||
|
||||
user_labels = var.tags
|
||||
}
|
||||
|
||||
depends_on = [google_service_networking_connection.private_vpc_connection]
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cloud SQL Connection Pooler (Optional)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_sql_database_instance" "openclaw_pooler" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
name = "${var.instance_name}-pooler"
|
||||
project = var.project_id
|
||||
region = var.region
|
||||
database_version = var.database_version
|
||||
|
||||
settings {
|
||||
tier = "db-custom-2-7680"
|
||||
disk_size = 20
|
||||
disk_type = "PD_SSD"
|
||||
availability_type = "ZONAL"
|
||||
|
||||
ip_configuration {
|
||||
ipv4_enabled = true
|
||||
require_ssl = true
|
||||
authorized_networks {
|
||||
name = "gke-pods"
|
||||
value = google_compute_subnetwork.secondary_ranges.ip_cidr_range
|
||||
}
|
||||
}
|
||||
|
||||
user_labels = var.tags
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Secret Manager for Database Credentials
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_secret_manager_secret" "db_credentials" {
|
||||
secret_id = "${var.instance_name}-credentials"
|
||||
project = var.project_id
|
||||
|
||||
labels = var.tags
|
||||
|
||||
replication {
|
||||
auto {}
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_secret_manager_secret_version" "db_credentials" {
|
||||
secret = google_secret_manager_secret.db_credentials.id
|
||||
|
||||
secret_data = jsonencode({
|
||||
username = var.database_user
|
||||
password = var.database_password
|
||||
database = var.database_name
|
||||
host = google_sql_database_instance.openclaw.private_ip_address
|
||||
port = "5432"
|
||||
connection_name = google_sql_database_instance.openclaw.connection_name
|
||||
})
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Monitoring Alerts
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_monitoring_alert_policy" "cloud_sql_cpu" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
display_name = "${var.instance_name} CPU Utilization"
|
||||
project = var.project_id
|
||||
|
||||
conditions {
|
||||
display_name = "CPU utilization > 80%"
|
||||
condition_threshold {
|
||||
filter = "resource.type = \"cloudsql_database\" AND metric.type = \"cloudsql.googleapis.com/database/cpu/utilization\" AND resource.label.\"database_id\" = \"${google_sql_database_instance.openclaw.connection_name}\""
|
||||
duration = "300s"
|
||||
comparison = "COMPARISON_GT"
|
||||
threshold_value = 80
|
||||
aggregations {
|
||||
alignment_period = "300s"
|
||||
per_series_aligner = "ALIGN_MEAN"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
notification_channels = var.alert_notification_channels
|
||||
|
||||
severity = "WARNING"
|
||||
}
|
||||
|
||||
resource "google_monitoring_alert_policy" "cloud_sql_disk" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
display_name = "${var.instance_name} Disk Space"
|
||||
project = var.project_id
|
||||
|
||||
conditions {
|
||||
display_name = "Disk space < 10%"
|
||||
condition_threshold {
|
||||
filter = "resource.type = \"cloudsql_database\" AND metric.type = \"cloudsql.googleapis.com/database/disk/bytes_available\" AND resource.label.\"database_id\" = \"${google_sql_database_instance.openclaw.connection_name}\""
|
||||
duration = "300s"
|
||||
comparison = "COMPARISON_LT"
|
||||
threshold_value = var.disk_size * 1024 * 1024 * 1024 * 0.1
|
||||
aggregations {
|
||||
alignment_period = "300s"
|
||||
per_series_aligner = "ALIGN_MEAN"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
notification_channels = var.alert_notification_channels
|
||||
|
||||
severity = "CRITICAL"
|
||||
}
|
||||
@@ -0,0 +1,334 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - GCP GKE Configuration
|
||||
# ==============================================================================
|
||||
# Google Kubernetes Engine cluster for OpenClaw
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# GKE Cluster
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_container_cluster" "openclaw_cluster" {
|
||||
name = var.cluster_name
|
||||
location = var.location
|
||||
project = var.project_id
|
||||
|
||||
# Node locations (for regional clusters)
|
||||
node_locations = var.zones
|
||||
|
||||
# Remove default node pool
|
||||
remove_default_node_pool = true
|
||||
initial_node_count = 1
|
||||
|
||||
# Network configuration
|
||||
network = var.network
|
||||
subnetwork = var.subnetwork
|
||||
|
||||
# IP allocation policy (VPC-native cluster)
|
||||
ip_allocation_policy {
|
||||
cluster_secondary_range_name = var.ip_range_pods
|
||||
services_secondary_range_name = var.ip_range_services
|
||||
}
|
||||
|
||||
# Private cluster configuration
|
||||
dynamic "private_cluster_config" {
|
||||
for_each = var.enable_private_cluster ? [1] : []
|
||||
content {
|
||||
enable_private_nodes = true
|
||||
enable_private_endpoint = false
|
||||
master_ipv4_cidr_block = "172.16.0.0/28"
|
||||
}
|
||||
}
|
||||
|
||||
# Workload Identity
|
||||
workload_identity_config {
|
||||
workload_pool = "${var.project_id}.svc.id.goog"
|
||||
}
|
||||
|
||||
# Release channel
|
||||
release_channel {
|
||||
channel = var.gke_release_channel
|
||||
}
|
||||
|
||||
# Kubernetes version
|
||||
min_master_version = var.gke_version
|
||||
|
||||
# Cluster addons
|
||||
addons_config {
|
||||
http_load_balancing {
|
||||
disabled = false
|
||||
}
|
||||
horizontal_pod_autoscaling {
|
||||
disabled = false
|
||||
}
|
||||
network_policy_config {
|
||||
disabled = false
|
||||
}
|
||||
gcp_filestore_csi_driver_config {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
|
||||
# Network policy
|
||||
network_policy {
|
||||
enabled = true
|
||||
provider = "CALICO"
|
||||
}
|
||||
|
||||
# Master authorized networks
|
||||
master_authorized_networks_config {
|
||||
cidr_blocks {
|
||||
cidr_block = "0.0.0.0/0"
|
||||
display_name = "all-networks"
|
||||
}
|
||||
}
|
||||
|
||||
# Logging and monitoring
|
||||
logging_config {
|
||||
enable_components = [
|
||||
"SYSTEM_COMPONENTS",
|
||||
"WORKLOADS"
|
||||
]
|
||||
}
|
||||
|
||||
monitoring_config {
|
||||
enable_components = [
|
||||
"SYSTEM_COMPONENTS"
|
||||
]
|
||||
managed_prometheus {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
|
||||
# Security
|
||||
resource_labels = var.tags
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = [
|
||||
node_config,
|
||||
node_version
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# General Purpose Node Pool
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_container_node_pool" "general" {
|
||||
name = "${var.cluster_name}-general"
|
||||
location = var.location
|
||||
project = var.project_id
|
||||
cluster = google_container_cluster.openclaw_cluster.name
|
||||
node_count = var.node_pools.general.initial_count
|
||||
|
||||
autoscaling {
|
||||
min_node_count = var.node_pools.general.min_count
|
||||
max_node_count = var.node_pools.general.max_count
|
||||
}
|
||||
|
||||
management {
|
||||
auto_repair = true
|
||||
auto_upgrade = true
|
||||
}
|
||||
|
||||
node_config {
|
||||
machine_type = var.node_pools.general.machine_type
|
||||
disk_size_gb = var.node_pools.general.disk_size_gb
|
||||
disk_type = var.node_pools.general.disk_type
|
||||
|
||||
oauth_scopes = [
|
||||
"https://www.googleapis.com/auth/cloud-platform"
|
||||
]
|
||||
|
||||
labels = merge(var.tags, {
|
||||
workload-type = "general"
|
||||
})
|
||||
|
||||
tags = ["openclaw-node"]
|
||||
|
||||
workload_metadata_config {
|
||||
mode = "GKE_WORKLOAD_IDENTITY"
|
||||
}
|
||||
}
|
||||
|
||||
upgrade_settings {
|
||||
max_surge = 1
|
||||
max_unavailable = 0
|
||||
}
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = [
|
||||
node_count
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Compute Optimized Node Pool
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_container_node_pool" "compute" {
|
||||
name = "${var.cluster_name}-compute"
|
||||
location = var.location
|
||||
project = var.project_id
|
||||
cluster = google_container_cluster.openclaw_cluster.name
|
||||
node_count = var.node_pools.compute.initial_count
|
||||
|
||||
autoscaling {
|
||||
min_node_count = var.node_pools.compute.min_count
|
||||
max_node_count = var.node_pools.compute.max_count
|
||||
}
|
||||
|
||||
management {
|
||||
auto_repair = true
|
||||
auto_upgrade = true
|
||||
}
|
||||
|
||||
node_config {
|
||||
machine_type = var.node_pools.compute.machine_type
|
||||
disk_size_gb = var.node_pools.compute.disk_size_gb
|
||||
disk_type = var.node_pools.compute.disk_type
|
||||
|
||||
oauth_scopes = [
|
||||
"https://www.googleapis.com/auth/cloud-platform"
|
||||
]
|
||||
|
||||
labels = merge(var.tags, {
|
||||
workload-type = "compute"
|
||||
})
|
||||
|
||||
tags = ["openclaw-node"]
|
||||
|
||||
workload_metadata_config {
|
||||
mode = "GKE_WORKLOAD_IDENTITY"
|
||||
}
|
||||
}
|
||||
|
||||
upgrade_settings {
|
||||
max_surge = 1
|
||||
max_unavailable = 0
|
||||
}
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = [
|
||||
node_count
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# GPU Node Pool (Optional)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_container_node_pool" "gpu" {
|
||||
count = var.gpu_enabled ? 1 : 0
|
||||
|
||||
name = "${var.cluster_name}-gpu"
|
||||
location = var.location
|
||||
project = var.project_id
|
||||
cluster = google_container_cluster.openclaw_cluster.name
|
||||
node_count = var.gpu_node_pool.initial_count
|
||||
|
||||
autoscaling {
|
||||
min_node_count = var.gpu_node_pool.min_count
|
||||
max_node_count = var.gpu_node_pool.max_count
|
||||
}
|
||||
|
||||
management {
|
||||
auto_repair = true
|
||||
auto_upgrade = true
|
||||
}
|
||||
|
||||
node_config {
|
||||
machine_type = var.gpu_node_pool.machine_type
|
||||
disk_size_gb = var.gpu_node_pool.disk_size_gb
|
||||
disk_type = "pd-ssd"
|
||||
|
||||
oauth_scopes = [
|
||||
"https://www.googleapis.com/auth/cloud-platform"
|
||||
]
|
||||
|
||||
labels = merge(var.tags, {
|
||||
workload-type = "gpu"
|
||||
gpu = "true"
|
||||
})
|
||||
|
||||
tags = ["openclaw-gpu-node"]
|
||||
|
||||
guest_accelerator {
|
||||
type = var.gpu_node_pool.accelerator_type
|
||||
count = var.gpu_node_pool.accelerator_count
|
||||
}
|
||||
|
||||
workload_metadata_config {
|
||||
mode = "GKE_WORKLOAD_IDENTITY"
|
||||
}
|
||||
}
|
||||
|
||||
upgrade_settings {
|
||||
max_surge = 1
|
||||
max_unavailable = 0
|
||||
}
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = [
|
||||
node_count
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# IAM for Service Account (Workload Identity)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_service_account" "openclaw" {
|
||||
account_id = "${var.cluster_name}-sa"
|
||||
display_name = "OpenClaw GKE Service Account"
|
||||
project = var.project_id
|
||||
}
|
||||
|
||||
resource "google_project_iam_member" "openclaw_workload_identity" {
|
||||
project = var.project_id
|
||||
role = "roles/workloadidentity.user"
|
||||
member = "serviceAccount:${var.project_id}.svc.id.goog[openclaw/openclaw-sa]"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# GKE Secondary IP Ranges
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_subnetwork" "secondary_ranges" {
|
||||
name = "${var.network}-secondary"
|
||||
project = var.project_id
|
||||
region = var.location
|
||||
network = var.network
|
||||
ip_cidr_range = "10.1.0.0/16"
|
||||
secondary_ip_range {
|
||||
range_name = var.ip_range_pods
|
||||
ip_cidr_range = "10.2.0.0/16"
|
||||
}
|
||||
secondary_ip_range {
|
||||
range_name = var.ip_range_services
|
||||
ip_cidr_range = "10.3.0.0/16"
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# GPU Plugin Installation (via Helm)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "helm_release" "nvidia_device_plugin" {
|
||||
count = var.gpu_enabled ? 1 : 0
|
||||
|
||||
name = "nvidia-device-plugin"
|
||||
repository = "https://nvidia.github.io/k8s-device-plugin"
|
||||
chart = "nvidia-device-plugin"
|
||||
version = "0.14.1"
|
||||
namespace = "kube-system"
|
||||
|
||||
set {
|
||||
name = "config.map.name"
|
||||
value = "nvidia-device-plugin-config"
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,255 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - GCP Cloud Load Balancing Configuration
|
||||
# ==============================================================================
|
||||
# Cloud Load Balancing for OpenClaw traffic routing
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Serverless Network Endpoint Group (for GKE)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_network_endpoint_group" "gateway_neg" {
|
||||
name = "${var.name}-gateway-neg"
|
||||
project = var.project_id
|
||||
network_endpoint_type = "GCE_VM_IP_PORT"
|
||||
network = var.network
|
||||
subnetwork = var.subnet
|
||||
region = var.region
|
||||
|
||||
dynamic "network_endpoint" {
|
||||
for_each = [] # Populated by Kubernetes service
|
||||
content {
|
||||
instance = network_endpoint.value.instance
|
||||
port = network_endpoint.value.port
|
||||
}
|
||||
}
|
||||
|
||||
labels = var.tags
|
||||
}
|
||||
|
||||
resource "google_compute_network_endpoint_group" "litellm_neg" {
|
||||
name = "${var.name}-litellm-neg"
|
||||
project = var.project_id
|
||||
network_endpoint_type = "GCE_VM_IP_PORT"
|
||||
network = var.network
|
||||
subnetwork = var.subnet
|
||||
region = var.region
|
||||
|
||||
labels = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Health Checks
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_health_check" "gateway" {
|
||||
name = "${var.name}-gateway-health"
|
||||
project = var.project_id
|
||||
|
||||
timeout_sec = 5
|
||||
check_interval_sec = 10
|
||||
healthy_threshold = 2
|
||||
unhealthy_threshold = 3
|
||||
|
||||
http_health_check {
|
||||
port = 18789
|
||||
request_path = "/health"
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_compute_health_check" "litellm" {
|
||||
name = "${var.name}-litellm-health"
|
||||
project = var.project_id
|
||||
|
||||
timeout_sec = 5
|
||||
check_interval_sec = 10
|
||||
healthy_threshold = 2
|
||||
unhealthy_threshold = 3
|
||||
|
||||
http_health_check {
|
||||
port = 4000
|
||||
request_path = "/health"
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Backend Services
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_backend_service" "gateway" {
|
||||
name = "${var.name}-gateway"
|
||||
project = var.project_id
|
||||
protocol = "HTTP"
|
||||
port_name = "http"
|
||||
timeout_sec = 30
|
||||
|
||||
health_checks = [google_compute_health_check.gateway.id]
|
||||
|
||||
load_balancing_scheme = "EXTERNAL_MANAGED"
|
||||
|
||||
log_config {
|
||||
enable = true
|
||||
sample_rate = 1.0
|
||||
}
|
||||
|
||||
labels = var.tags
|
||||
}
|
||||
|
||||
resource "google_compute_backend_service" "litellm" {
|
||||
name = "${var.name}-litellm"
|
||||
project = var.project_id
|
||||
protocol = "HTTP"
|
||||
port_name = "http"
|
||||
timeout_sec = 60
|
||||
|
||||
health_checks = [google_compute_health_check.litellm.id]
|
||||
|
||||
load_balancing_scheme = "EXTERNAL_MANAGED"
|
||||
|
||||
log_config {
|
||||
enable = true
|
||||
sample_rate = 1.0
|
||||
}
|
||||
|
||||
labels = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# URL Map
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_url_map" "openclaw" {
|
||||
name = "${var.name}-url-map"
|
||||
project = var.project_id
|
||||
|
||||
default_service = google_compute_backend_service.gateway.id
|
||||
|
||||
host_rule {
|
||||
hosts = ["*"]
|
||||
path_matcher = "all-paths"
|
||||
}
|
||||
|
||||
path_matcher {
|
||||
name = "all-paths"
|
||||
default_service = google_compute_backend_service.gateway.id
|
||||
|
||||
path_rule {
|
||||
paths = ["/v1/*", "/litellm/*"]
|
||||
service = google_compute_backend_service.litellm.id
|
||||
}
|
||||
|
||||
path_rule {
|
||||
paths = ["/ws/*", "/gateway/*"]
|
||||
service = google_compute_backend_service.gateway.id
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Target HTTP Proxy (Redirect to HTTPS)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_target_http_proxy" "openclaw" {
|
||||
name = "${var.name}-http-proxy"
|
||||
project = var.project_id
|
||||
|
||||
url_map = google_compute_url_map.openclaw.id
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Target HTTPS Proxy
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_target_https_proxy" "openclaw" {
|
||||
name = "${var.name}-https-proxy"
|
||||
project = var.project_id
|
||||
|
||||
url_map = google_compute_url_map.openclaw.id
|
||||
ssl_certificates = [google_compute_managed_ssl_certificate.openclaw[0].id]
|
||||
ssl_policy = google_compute_ssl_policy.openclaw[0].id
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Managed SSL Certificate
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_managed_ssl_certificate" "openclaw" {
|
||||
count = var.ssl_certificate_arn == null && var.managed_domain != null ? 1 : 0
|
||||
|
||||
name = "${var.name}-ssl-cert"
|
||||
project = var.project_id
|
||||
|
||||
managed {
|
||||
domains = [var.managed_domain]
|
||||
}
|
||||
|
||||
lifecycle {
|
||||
create_before_destroy = true
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# SSL Policy
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_ssl_policy" "openclaw" {
|
||||
count = var.ssl_certificate_arn == null ? 1 : 0
|
||||
|
||||
name = "${var.name}-ssl-policy"
|
||||
project = var.project_id
|
||||
|
||||
min_tls_version = "TLS_1_2"
|
||||
profile = "MODERN"
|
||||
custom_features = []
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Global Forwarding Rules
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_global_forwarding_rule" "http" {
|
||||
name = "${var.name}-http-fr"
|
||||
project = var.project_id
|
||||
ip_protocol = "TCP"
|
||||
load_balancing_scheme = "EXTERNAL_MANAGED"
|
||||
port_range = "80"
|
||||
target = google_compute_target_http_proxy.openclaw.id
|
||||
ip_address = google_compute_global_address.openclaw[0].address
|
||||
}
|
||||
|
||||
resource "google_compute_global_forwarding_rule" "https" {
|
||||
name = "${var.name}-https-fr"
|
||||
project = var.project_id
|
||||
ip_protocol = "TCP"
|
||||
load_balancing_scheme = "EXTERNAL_MANAGED"
|
||||
port_range = "443"
|
||||
target = google_compute_target_https_proxy.openclaw.id
|
||||
ip_address = google_compute_global_address.openclaw[0].address
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Global Static IP Address
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_global_address" "openclaw" {
|
||||
count = var.load_balancer_ip == null ? 1 : 0
|
||||
|
||||
name = "${var.name}-ip"
|
||||
project = var.project_id
|
||||
ip_version = "IPV4"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# HTTP to HTTPS Redirect
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_url_map" "http_redirect" {
|
||||
name = "${var.name}-http-redirect"
|
||||
project = var.project_id
|
||||
|
||||
default_url_redirect {
|
||||
redirect_response_code = "MOVED_PERMANENTLY_DEFAULT"
|
||||
strip_query = false
|
||||
https_redirect = true
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,351 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - GCP Terraform Configuration
|
||||
# ==============================================================================
|
||||
# Main configuration file for GCP infrastructure
|
||||
# ==============================================================================
|
||||
|
||||
terraform {
|
||||
required_version = ">= 1.6.0"
|
||||
|
||||
required_providers {
|
||||
google = {
|
||||
source = "hashicorp/google"
|
||||
version = "~> 5.0"
|
||||
}
|
||||
google-beta = {
|
||||
source = "hashicorp/google-beta"
|
||||
version = "~> 5.0"
|
||||
}
|
||||
kubernetes = {
|
||||
source = "hashicorp/kubernetes"
|
||||
version = "~> 2.24"
|
||||
}
|
||||
helm = {
|
||||
source = "hashicorp/helm"
|
||||
version = "~> 2.12"
|
||||
}
|
||||
}
|
||||
|
||||
backend "gcs" {
|
||||
# Configure backend with variables or environment
|
||||
# bucket = "terraform-state-bucket"
|
||||
# prefix = "openclaw/terraform.tfstate"
|
||||
}
|
||||
}
|
||||
|
||||
provider "google" {
|
||||
project = var.project_id
|
||||
region = var.region
|
||||
}
|
||||
|
||||
provider "google-beta" {
|
||||
project = var.project_id
|
||||
region = var.region
|
||||
}
|
||||
|
||||
provider "kubernetes" {
|
||||
host = "https://${google_container_cluster.openclaw_cluster.endpoint}"
|
||||
token = data.google_client_config.current.access_token
|
||||
cluster_ca_certificate = base64decode(google_container_cluster.openclaw_cluster.master_auth[0].cluster_ca_certificate[0])
|
||||
}
|
||||
|
||||
provider "helm" {
|
||||
kubernetes {
|
||||
host = "https://${google_container_cluster.openclaw_cluster.endpoint}"
|
||||
token = data.google_client_config.current.access_token
|
||||
cluster_ca_certificate = base64decode(google_container_cluster.openclaw_cluster.master_auth[0].cluster_ca_certificate[0])
|
||||
}
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Data Sources
|
||||
# ==============================================================================
|
||||
|
||||
data "google_client_config" "current" {}
|
||||
|
||||
data "google_project" "project" {
|
||||
project_id = var.project_id
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Local Values
|
||||
# ==============================================================================
|
||||
|
||||
locals {
|
||||
name_prefix = "openclaw-${var.environment}"
|
||||
|
||||
common_tags = {
|
||||
project = "openclaw"
|
||||
environment = var.environment
|
||||
version = var.app_version
|
||||
managed_by = "terraform"
|
||||
}
|
||||
|
||||
gpu_enabled = var.enable_gpu_support
|
||||
|
||||
# Artifact Registry URLs
|
||||
artifact_registry_urls = {
|
||||
gateway = "${var.region}-docker.pkg.dev/${var.project_id}/${local.name_prefix}-registry/openclaw-gateway"
|
||||
litellm = "${var.region}-docker.pkg.dev/${var.project_id}/${local.name_prefix}-registry/litellm-proxy"
|
||||
}
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Random Resources
|
||||
# ==============================================================================
|
||||
|
||||
resource "random_string" "suffix" {
|
||||
length = 8
|
||||
special = false
|
||||
upper = false
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# VPC Network
|
||||
# ==============================================================================
|
||||
|
||||
module "vpc" {
|
||||
source = "./vpc"
|
||||
|
||||
project_id = var.project_id
|
||||
network_name = "${local.name_prefix}-vpc"
|
||||
region = var.region
|
||||
zones = var.zones
|
||||
vpc_cidr = var.vpc_cidr
|
||||
subnets = var.subnets
|
||||
enable_flow_logs = var.enable_vpc_flow_logs
|
||||
enable_private_google_access = var.enable_private_google_access
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# GKE Cluster
|
||||
# ==============================================================================
|
||||
|
||||
module "gke" {
|
||||
source = "./gke"
|
||||
|
||||
project_id = var.project_id
|
||||
cluster_name = "${local.name_prefix}-gke"
|
||||
location = var.region
|
||||
zones = var.zones
|
||||
network = module.vpc.network_name
|
||||
subnetwork = module.vpc.subnet_name
|
||||
ip_range_pods = "${local.name_prefix}-pods"
|
||||
ip_range_services = "${local.name_prefix}-services"
|
||||
|
||||
# GKE configuration
|
||||
kubernetes_version = var.gke_version
|
||||
release_channel = var.gke_release_channel
|
||||
|
||||
# Node pool configuration
|
||||
node_pools = var.node_pools
|
||||
gpu_enabled = local.gpu_enabled
|
||||
gpu_node_pool = var.gpu_node_pool
|
||||
|
||||
# Security
|
||||
enable_workload_identity = var.enable_workload_identity
|
||||
enable_private_cluster = var.enable_private_cluster
|
||||
|
||||
# Monitoring
|
||||
enable_monitoring = true
|
||||
enable_logging = true
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Cloud SQL PostgreSQL
|
||||
# ==============================================================================
|
||||
|
||||
module "cloud_sql" {
|
||||
source = "./cloud-sql"
|
||||
|
||||
project_id = var.project_id
|
||||
instance_name = "${local.name_prefix}-pg"
|
||||
region = var.region
|
||||
network = module.vpc.network_name
|
||||
|
||||
# Database configuration
|
||||
database_version = var.postgresql_version
|
||||
tier = var.db_tier
|
||||
disk_size = var.db_disk_size
|
||||
disk_type = var.db_disk_type
|
||||
|
||||
# Authentication
|
||||
database_name = var.db_name
|
||||
database_user = var.db_user
|
||||
database_password = var.db_password
|
||||
|
||||
# High availability
|
||||
high_availability = var.db_high_availability
|
||||
backup_enabled = var.db_backup_enabled
|
||||
backup_start_time = var.db_backup_start_time
|
||||
point_in_time_recovery = var.db_point_in_time_recovery
|
||||
|
||||
# Insights
|
||||
query_insights_enabled = var.db_query_insights_enabled
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Memorystore Redis
|
||||
# ==============================================================================
|
||||
|
||||
module "memorystore" {
|
||||
source = "./memorystore"
|
||||
|
||||
project_id = var.project_id
|
||||
instance_id = "${local.name_prefix}-redis"
|
||||
region = var.region
|
||||
network = module.vpc.network_name
|
||||
|
||||
# Redis configuration
|
||||
tier = var.redis_tier
|
||||
memory_size_gb = var.redis_memory_size_gb
|
||||
redis_version = var.redis_version
|
||||
|
||||
# High availability
|
||||
replica_count = var.redis_replica_count
|
||||
read_replicas_enabled = var.redis_read_replicas_enabled
|
||||
|
||||
# Security
|
||||
auth_enabled = var.redis_auth_enabled
|
||||
auth_string = var.redis_auth_string
|
||||
transit_encryption_enabled = var.redis_transit_encryption_enabled
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Artifact Registry
|
||||
# ==============================================================================
|
||||
|
||||
module "artifact_registry" {
|
||||
source = "./artifact-registry"
|
||||
|
||||
project_id = var.project_id
|
||||
location = var.region
|
||||
repository_name = "${local.name_prefix}-registry"
|
||||
format = "DOCKER"
|
||||
|
||||
# Cleanup policy
|
||||
cleanup_policy_days = var.artifact_cleanup_policy_days
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Cloud Load Balancing
|
||||
# ==============================================================================
|
||||
|
||||
module "load_balancer" {
|
||||
source = "./load-balancer"
|
||||
|
||||
project_id = var.project_id
|
||||
region = var.region
|
||||
network = module.vpc.network_name
|
||||
subnet = module.vpc.subnet_name
|
||||
|
||||
# Load balancer configuration
|
||||
name = "${local.name_prefix}-lb"
|
||||
|
||||
# Backend services
|
||||
backend_services = [
|
||||
{
|
||||
name = "openclaw-gateway"
|
||||
port = 18789
|
||||
health_check_path = "/health"
|
||||
},
|
||||
{
|
||||
name = "litellm-proxy"
|
||||
port = 4000
|
||||
health_check_path = "/health"
|
||||
}
|
||||
]
|
||||
|
||||
# SSL certificate
|
||||
ssl_certificate_arn = var.ssl_certificate_arn
|
||||
managed_domain = var.managed_domain
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Monitoring
|
||||
# ==============================================================================
|
||||
|
||||
module "monitoring" {
|
||||
source = "../terraform/modules/monitoring"
|
||||
|
||||
name_prefix = local.name_prefix
|
||||
project_id = var.project_id
|
||||
gke_cluster_name = google_container_cluster.openclaw_cluster.name
|
||||
cloud_sql_instance = module.cloud_sql.instance_name
|
||||
memorystore_instance = module.memorystore.instance_id
|
||||
|
||||
# Dashboard
|
||||
enable_dashboard = true
|
||||
|
||||
# Alerts
|
||||
enable_alerts = var.enable_monitoring_alerts
|
||||
alert_email = var.alert_email
|
||||
|
||||
tags = local.common_tags
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Outputs
|
||||
# ==============================================================================
|
||||
|
||||
output "network_name" {
|
||||
description = "VPC network name"
|
||||
value = module.vpc.network_name
|
||||
}
|
||||
|
||||
output "subnet_name" {
|
||||
description = "Subnet name"
|
||||
value = module.vpc.subnet_name
|
||||
}
|
||||
|
||||
output "gke_cluster_endpoint" {
|
||||
description = "GKE cluster endpoint"
|
||||
value = google_container_cluster.openclaw_cluster.endpoint
|
||||
}
|
||||
|
||||
output "gke_cluster_name" {
|
||||
description = "GKE cluster name"
|
||||
value = google_container_cluster.openclaw_cluster.name
|
||||
}
|
||||
|
||||
output "cloud_sql_connection_name" {
|
||||
description = "Cloud SQL connection name"
|
||||
value = module.cloud_sql.connection_name
|
||||
}
|
||||
|
||||
output "cloud_sql_private_ip" {
|
||||
description = "Cloud SQL private IP"
|
||||
value = module.cloud_sql.private_ip
|
||||
}
|
||||
|
||||
output "memorystore_host" {
|
||||
description = "Memorystore Redis host"
|
||||
value = module.memorystore.host
|
||||
}
|
||||
|
||||
output "memorystore_port" {
|
||||
description = "Memorystore Redis port"
|
||||
value = module.memorystore.port
|
||||
}
|
||||
|
||||
output "artifact_registry_url" {
|
||||
description = "Artifact Registry URL"
|
||||
value = local.artifact_registry_urls
|
||||
}
|
||||
|
||||
output "load_balancer_ip" {
|
||||
description = "Load balancer IP address"
|
||||
value = module.load_balancer.ip_address
|
||||
}
|
||||
@@ -0,0 +1,177 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - GCP Memorystore Configuration
|
||||
# ==============================================================================
|
||||
# Memorystore Redis for OpenClaw caching and session management
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Memorystore Redis Instance
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_redis_instance" "openclaw" {
|
||||
name = var.instance_id
|
||||
project = var.project_id
|
||||
region = var.region
|
||||
tier = var.tier
|
||||
memory_size_gb = var.memory_size_gb
|
||||
redis_version = var.redis_version
|
||||
|
||||
# Network configuration
|
||||
authorized_network = var.network
|
||||
connect_mode = "PRIVATE_SERVICE_ACCESS"
|
||||
|
||||
# High availability
|
||||
replica_count = var.replica_count
|
||||
read_replicas_enabled = var.read_replicas_enabled
|
||||
|
||||
# Security
|
||||
auth_enabled = var.auth_enabled
|
||||
auth_string = var.auth_string
|
||||
transit_encryption_mode = var.transit_encryption_enabled ? "SERVER_AUTHENTICATION" : "DISABLED"
|
||||
|
||||
# Maintenance
|
||||
maintenance_policy {
|
||||
weekly_maintenance_window {
|
||||
day = "TUESDAY"
|
||||
start_time {
|
||||
hours = 3
|
||||
minutes = 0
|
||||
seconds = 0
|
||||
nanos = 0
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Persistence
|
||||
persistence_config {
|
||||
persistence_mode = "PERSISTENCE_MODE_ENABLED"
|
||||
}
|
||||
|
||||
# Labels
|
||||
labels = var.tags
|
||||
|
||||
# Reserved IP range (optional)
|
||||
# reserved_ip_range = "10.0.0.0/29"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Memorystore Instance Configuration (Redis parameters)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_redis_instance" "openclaw_config" {
|
||||
# This is merged with the main instance above
|
||||
# Redis configuration parameters are set via the instance resource
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Secret Manager for Redis Auth
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_secret_manager_secret" "redis_auth" {
|
||||
count = var.auth_enabled ? 1 : 0
|
||||
|
||||
secret_id = "${var.instance_id}-auth"
|
||||
project = var.project_id
|
||||
|
||||
labels = var.tags
|
||||
|
||||
replication {
|
||||
auto {}
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_secret_manager_secret_version" "redis_auth" {
|
||||
count = var.auth_enabled ? 1 : 0
|
||||
|
||||
secret = google_secret_manager_secret.redis_auth[0].id
|
||||
|
||||
secret_data = var.auth_string
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Monitoring Alerts
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_monitoring_alert_policy" "memorystore_cpu" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
display_name = "${var.instance_id} CPU Utilization"
|
||||
project = var.project_id
|
||||
|
||||
conditions {
|
||||
display_name = "CPU utilization > 80%"
|
||||
condition_threshold {
|
||||
filter = "resource.type = \"cloud_memorystore_instance\" AND metric.type = \"redis.googleapis.com/memory/usage\" AND resource.label.\"instance_id\" = \"${var.instance_id}\""
|
||||
duration = "300s"
|
||||
comparison = "COMPARISON_GT"
|
||||
threshold_value = 80
|
||||
aggregations {
|
||||
alignment_period = "300s"
|
||||
per_series_aligner = "ALIGN_MEAN"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
notification_channels = var.alert_notification_channels
|
||||
|
||||
severity = "WARNING"
|
||||
}
|
||||
|
||||
resource "google_monitoring_alert_policy" "memorystore_memory" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
display_name = "${var.instance_id} Memory Usage"
|
||||
project = var.project_id
|
||||
|
||||
conditions {
|
||||
display_name = "Memory usage > 85%"
|
||||
condition_threshold {
|
||||
filter = "resource.type = \"cloud_memorystore_instance\" AND metric.type = \"redis.googleapis.com/memory/usage_ratio\" AND resource.label.\"instance_id\" = \"${var.instance_id}\""
|
||||
duration = "300s"
|
||||
comparison = "COMPARISON_GT"
|
||||
threshold_value = 0.85
|
||||
aggregations {
|
||||
alignment_period = "300s"
|
||||
per_series_aligner = "ALIGN_MEAN"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
notification_channels = var.alert_notification_channels
|
||||
|
||||
severity = "WARNING"
|
||||
}
|
||||
|
||||
resource "google_monitoring_alert_policy" "memorystore_connections" {
|
||||
count = var.environment == "prod" ? 1 : 0
|
||||
|
||||
display_name = "${var.instance_id} Connections"
|
||||
project = var.project_id
|
||||
|
||||
conditions {
|
||||
display_name = "Connections > 1000"
|
||||
condition_threshold {
|
||||
filter = "resource.type = \"cloud_memorystore_instance\" AND metric.type = \"redis.googleapis.com/network/connections\" AND resource.label.\"instance_id\" = \"${var.instance_id}\""
|
||||
duration = "300s"
|
||||
comparison = "COMPARISON_GT"
|
||||
threshold_value = 1000
|
||||
aggregations {
|
||||
alignment_period = "300s"
|
||||
per_series_aligner = "ALIGN_MEAN"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
notification_channels = var.alert_notification_channels
|
||||
|
||||
severity = "WARNING"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Memorystore Backup (Optional)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_redis_instance" "openclaw_backup" {
|
||||
# Backups are managed through the persistence_config in the main instance
|
||||
# Additional backup configurations can be added here
|
||||
}
|
||||
@@ -0,0 +1,224 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - GCP Terraform Outputs
|
||||
# ==============================================================================
|
||||
# Output values for GCP infrastructure
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# VPC Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "network_name" {
|
||||
description = "VPC network name"
|
||||
value = module.vpc.network_name
|
||||
}
|
||||
|
||||
output "network_self_link" {
|
||||
description = "VPC network self link"
|
||||
value = module.vpc.network_self_link
|
||||
}
|
||||
|
||||
output "subnet_name" {
|
||||
description = "Primary subnet name"
|
||||
value = module.vpc.subnet_name
|
||||
}
|
||||
|
||||
output "subnet_self_link" {
|
||||
description = "Primary subnet self link"
|
||||
value = module.vpc.subnet_self_link
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# GKE Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "gke_cluster_id" {
|
||||
description = "GKE cluster ID"
|
||||
value = google_container_cluster.openclaw_cluster.id
|
||||
}
|
||||
|
||||
output "gke_cluster_name" {
|
||||
description = "GKE cluster name"
|
||||
value = google_container_cluster.openclaw_cluster.name
|
||||
}
|
||||
|
||||
output "gke_cluster_endpoint" {
|
||||
description = "GKE cluster endpoint"
|
||||
value = google_container_cluster.openclaw_cluster.endpoint
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "gke_cluster_ca_certificate" {
|
||||
description = "GKE cluster CA certificate"
|
||||
value = google_container_cluster.openclaw_cluster.master_auth[0].cluster_ca_certificate[0]
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "gke_cluster_location" {
|
||||
description = "GKE cluster location"
|
||||
value = google_container_cluster.openclaw_cluster.location
|
||||
}
|
||||
|
||||
output "gke_cluster_node_count" {
|
||||
description = "GKE cluster node count"
|
||||
value = google_container_cluster.openclaw_cluster.node_count
|
||||
}
|
||||
|
||||
output "gke_cluster_node_pools" {
|
||||
description = "GKE cluster node pool names"
|
||||
value = google_container_cluster.openclaw_cluster.node_pools
|
||||
}
|
||||
|
||||
output "gke_workload_identity_pool" {
|
||||
description = "Workload Identity pool"
|
||||
value = "${var.project_id}.svc.id.goog"
|
||||
}
|
||||
|
||||
output "gke_kubeconfig_command" {
|
||||
description = "Command to get cluster credentials"
|
||||
value = "gcloud container clusters get-credentials ${google_container_cluster.openclaw_cluster.name} --region ${var.region} --project ${var.project_id}"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cloud SQL Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "cloud_sql_instance_id" {
|
||||
description = "Cloud SQL instance ID"
|
||||
value = module.cloud_sql.instance_id
|
||||
}
|
||||
|
||||
output "cloud_sql_instance_name" {
|
||||
description = "Cloud SQL instance name"
|
||||
value = module.cloud_sql.instance_name
|
||||
}
|
||||
|
||||
output "cloud_sql_connection_name" {
|
||||
description = "Cloud SQL connection name"
|
||||
value = module.cloud_sql.connection_name
|
||||
}
|
||||
|
||||
output "cloud_sql_private_ip" {
|
||||
description = "Cloud SQL private IP address"
|
||||
value = module.cloud_sql.private_ip
|
||||
}
|
||||
|
||||
output "cloud_sql_public_ip" {
|
||||
description = "Cloud SQL public IP address"
|
||||
value = module.cloud_sql.public_ip
|
||||
}
|
||||
|
||||
output "cloud_sql_database_name" {
|
||||
description = "Cloud SQL database name"
|
||||
value = module.cloud_sql.database_name
|
||||
}
|
||||
|
||||
output "cloud_sql_database_user" {
|
||||
description = "Cloud SQL database user"
|
||||
value = module.cloud_sql.database_user
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "cloud_sql_connection_string" {
|
||||
description = "PostgreSQL connection string"
|
||||
value = "postgresql://${module.cloud_sql.database_user}:${var.db_password}@${module.cloud_sql.private_ip}:5432/${module.cloud_sql.database_name}"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Memorystore Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "memorystore_instance_id" {
|
||||
description = "Memorystore instance ID"
|
||||
value = module.memorystore.instance_id
|
||||
}
|
||||
|
||||
output "memorystore_host" {
|
||||
description = "Memorystore Redis host"
|
||||
value = module.memorystore.host
|
||||
}
|
||||
|
||||
output "memorystore_port" {
|
||||
description = "Memorystore Redis port"
|
||||
value = module.memorystore.port
|
||||
}
|
||||
|
||||
output "memorystore_connection_string" {
|
||||
description = "Redis connection string"
|
||||
value = "redis://${var.redis_auth_enabled && var.redis_auth_string != null ? ":${var.redis_auth_string}@" : ""}${module.memorystore.host}:${module.memorystore.port}"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Artifact Registry Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "artifact_registry_name" {
|
||||
description = "Artifact Registry name"
|
||||
value = module.artifact_registry.repository_name
|
||||
}
|
||||
|
||||
output "artifact_registry_url" {
|
||||
description = "Artifact Registry URL"
|
||||
value = local.artifact_registry_urls
|
||||
}
|
||||
|
||||
output "artifact_registry_docker_config" {
|
||||
description = "Docker configuration for Artifact Registry"
|
||||
value = "gcloud auth configure-docker ${var.region}-docker.pkg.dev"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Load Balancer Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "load_balancer_name" {
|
||||
description = "Load balancer name"
|
||||
value = module.load_balancer.name
|
||||
}
|
||||
|
||||
output "load_balancer_ip" {
|
||||
description = "Load balancer IP address"
|
||||
value = module.load_balancer.ip_address
|
||||
}
|
||||
|
||||
output "load_balancer_self_link" {
|
||||
description = "Load balancer self link"
|
||||
value = module.load_balancer.self_link
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Monitoring Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "monitoring_dashboard_id" {
|
||||
description = "Cloud Monitoring dashboard ID"
|
||||
value = module.monitoring.dashboard_id
|
||||
}
|
||||
|
||||
output "monitoring_alert_policies" {
|
||||
description = "List of alert policy IDs"
|
||||
value = module.monitoring.alert_policy_ids
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cost Estimation
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "estimated_monthly_cost" {
|
||||
description = "Estimated monthly cost breakdown"
|
||||
value = {
|
||||
gke_cluster = "~$73 (cluster management fee)"
|
||||
gke_nodes_general = "~$${var.node_pools.general.initial_count * 140} (${var.node_pools.general.machine_type})"
|
||||
gke_nodes_compute = "~$${var.node_pools.compute.initial_count * 300} (${var.node_pools.compute.machine_type})"
|
||||
gke_nodes_gpu = local.gpu_enabled ? "~$${var.gpu_node_pool.initial_count * 1500} (${var.gpu_node_pool.machine_type})" : "$0"
|
||||
cloud_sql = "~$${var.db_high_availability ? 300 : 150} (${var.db_tier})"
|
||||
memorystore = "~$${var.redis_tier == "STANDARD_HA" ? 150 : 75} (${var.redis_memory_size_gb}GB)"
|
||||
load_balancer = "~$18"
|
||||
artifact_registry = "~$5 (storage)"
|
||||
cloud_monitoring = "~$50"
|
||||
network_egress = "Variable"
|
||||
total_estimate = "See GCP Pricing Calculator for accurate pricing"
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,365 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - GCP Terraform Variables
|
||||
# ==============================================================================
|
||||
# Input variables for GCP infrastructure
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# General Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "project_id" {
|
||||
description = "GCP project ID"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "region" {
|
||||
description = "GCP region for resources"
|
||||
type = string
|
||||
default = "us-central1"
|
||||
}
|
||||
|
||||
variable "zones" {
|
||||
description = "GCP zones for regional distribution"
|
||||
type = list(string)
|
||||
default = ["us-central1-a", "us-central1-b", "us-central1-c"]
|
||||
}
|
||||
|
||||
variable "environment" {
|
||||
description = "Deployment environment (dev, staging, prod)"
|
||||
type = string
|
||||
default = "dev"
|
||||
|
||||
validation {
|
||||
condition = contains(["dev", "staging", "prod"], var.environment)
|
||||
error_message = "Environment must be one of: dev, staging, prod."
|
||||
}
|
||||
}
|
||||
|
||||
variable "app_version" {
|
||||
description = "Application version to deploy"
|
||||
type = string
|
||||
default = "2026.3.28"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# VPC Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "vpc_cidr" {
|
||||
description = "CIDR block for VPC"
|
||||
type = string
|
||||
default = "10.0.0.0/16"
|
||||
}
|
||||
|
||||
variable "subnets" {
|
||||
description = "Subnet configurations"
|
||||
type = list(object({
|
||||
name = string
|
||||
ip_cidr_range = string
|
||||
region = string
|
||||
}))
|
||||
default = [
|
||||
{
|
||||
name = "openclaw-subnet-1"
|
||||
ip_cidr_range = "10.0.1.0/24"
|
||||
region = "us-central1"
|
||||
},
|
||||
{
|
||||
name = "openclaw-subnet-2"
|
||||
ip_cidr_range = "10.0.2.0/24"
|
||||
region = "us-central1"
|
||||
},
|
||||
{
|
||||
name = "openclaw-subnet-3"
|
||||
ip_cidr_range = "10.0.3.0/24"
|
||||
region = "us-central1"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
variable "enable_vpc_flow_logs" {
|
||||
description = "Enable VPC Flow Logs"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "enable_private_google_access" {
|
||||
description = "Enable Private Google Access"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# GKE Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "gke_version" {
|
||||
description = "GKE Kubernetes version"
|
||||
type = string
|
||||
default = "1.28"
|
||||
}
|
||||
|
||||
variable "gke_release_channel" {
|
||||
description = "GKE release channel (regular, rapid, stable)"
|
||||
type = string
|
||||
default = "regular"
|
||||
|
||||
validation {
|
||||
condition = contains(["regular", "rapid", "stable"], var.gke_release_channel)
|
||||
error_message = "Release channel must be one of: regular, rapid, stable."
|
||||
}
|
||||
}
|
||||
|
||||
variable "node_pools" {
|
||||
description = "GKE node pool configurations"
|
||||
type = object({
|
||||
general = object({
|
||||
machine_type = string
|
||||
min_count = number
|
||||
max_count = number
|
||||
initial_count = number
|
||||
disk_size_gb = number
|
||||
disk_type = string
|
||||
})
|
||||
compute = object({
|
||||
machine_type = string
|
||||
min_count = number
|
||||
max_count = number
|
||||
initial_count = number
|
||||
disk_size_gb = number
|
||||
disk_type = string
|
||||
})
|
||||
})
|
||||
default = {
|
||||
general = {
|
||||
machine_type = "n2-standard-4"
|
||||
min_count = 1
|
||||
max_count = 4
|
||||
initial_count = 2
|
||||
disk_size_gb = 100
|
||||
disk_type = "pd-ssd"
|
||||
}
|
||||
compute = {
|
||||
machine_type = "c2-standard-8"
|
||||
min_count = 1
|
||||
max_count = 8
|
||||
initial_count = 2
|
||||
disk_size_gb = 200
|
||||
disk_type = "pd-ssd"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
variable "enable_gpu_support" {
|
||||
description = "Enable GPU node pool for Ollama"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "gpu_node_pool" {
|
||||
description = "GPU node pool configuration"
|
||||
type = object({
|
||||
machine_type = string
|
||||
accelerator_type = string
|
||||
accelerator_count = number
|
||||
min_count = number
|
||||
max_count = number
|
||||
initial_count = number
|
||||
disk_size_gb = number
|
||||
})
|
||||
default = {
|
||||
machine_type = "g2-standard-4"
|
||||
accelerator_type = "nvidia-l4"
|
||||
accelerator_count = 1
|
||||
min_count = 0
|
||||
max_count = 4
|
||||
initial_count = 1
|
||||
disk_size_gb = 200
|
||||
}
|
||||
}
|
||||
|
||||
variable "enable_workload_identity" {
|
||||
description = "Enable Workload Identity"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "enable_private_cluster" {
|
||||
description = "Enable private GKE cluster"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cloud SQL PostgreSQL Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "postgresql_version" {
|
||||
description = "PostgreSQL version"
|
||||
type = string
|
||||
default = "POSTGRES_15"
|
||||
}
|
||||
|
||||
variable "db_tier" {
|
||||
description = "Cloud SQL tier"
|
||||
type = string
|
||||
default = "db-custom-4-15360"
|
||||
}
|
||||
|
||||
variable "db_disk_size" {
|
||||
description = "Database disk size in GB"
|
||||
type = number
|
||||
default = 100
|
||||
}
|
||||
|
||||
variable "db_disk_type" {
|
||||
description = "Database disk type"
|
||||
type = string
|
||||
default = "PD_SSD"
|
||||
}
|
||||
|
||||
variable "db_name" {
|
||||
description = "Database name"
|
||||
type = string
|
||||
default = "openclaw"
|
||||
}
|
||||
|
||||
variable "db_user" {
|
||||
description = "Database username"
|
||||
type = string
|
||||
default = "openclaw"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "db_password" {
|
||||
description = "Database password"
|
||||
type = string
|
||||
default = null
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "db_high_availability" {
|
||||
description = "Enable high availability"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "db_backup_enabled" {
|
||||
description = "Enable automated backups"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "db_backup_start_time" {
|
||||
description = "Backup start time (HH:MM)"
|
||||
type = string
|
||||
default = "03:00"
|
||||
}
|
||||
|
||||
variable "db_point_in_time_recovery" {
|
||||
description = "Enable point-in-time recovery"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "db_query_insights_enabled" {
|
||||
description = "Enable Query Insights"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Memorystore Redis Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "redis_tier" {
|
||||
description = "Memorystore tier (BASIC, STANDARD_HA)"
|
||||
type = string
|
||||
default = "STANDARD_HA"
|
||||
}
|
||||
|
||||
variable "redis_memory_size_gb" {
|
||||
description = "Redis memory size in GB"
|
||||
type = number
|
||||
default = 4
|
||||
}
|
||||
|
||||
variable "redis_version" {
|
||||
description = "Redis version"
|
||||
type = string
|
||||
default = "REDIS_7_0"
|
||||
}
|
||||
|
||||
variable "redis_replica_count" {
|
||||
description = "Number of read replicas"
|
||||
type = number
|
||||
default = 0
|
||||
}
|
||||
|
||||
variable "redis_read_replicas_enabled" {
|
||||
description = "Enable read replicas"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "redis_auth_enabled" {
|
||||
description = "Enable Redis AUTH"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "redis_auth_string" {
|
||||
description = "Redis AUTH string"
|
||||
type = string
|
||||
default = null
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "redis_transit_encryption_enabled" {
|
||||
description = "Enable transit encryption"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Artifact Registry Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "artifact_cleanup_policy_days" {
|
||||
description = "Days to retain images in Artifact Registry"
|
||||
type = number
|
||||
default = 30
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Load Balancer Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "ssl_certificate_arn" {
|
||||
description = "SSL certificate manager certificate"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "managed_domain" {
|
||||
description = "Domain for managed SSL certificate"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Monitoring Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "enable_monitoring_alerts" {
|
||||
description = "Enable monitoring alerts"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "alert_email" {
|
||||
description = "Email for alert notifications"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
@@ -0,0 +1,187 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - GCP VPC Configuration
|
||||
# ==============================================================================
|
||||
# VPC network module for OpenClaw infrastructure
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# VPC Network
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_network" "openclaw" {
|
||||
name = var.network_name
|
||||
project = var.project_id
|
||||
auto_create_subnetworks = false
|
||||
routing_mode = "REGIONAL"
|
||||
delete_default_routes_on_create = false
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Subnets
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_subnetwork" "openclaw" {
|
||||
count = length(var.subnets)
|
||||
|
||||
name = var.subnets[count.index].name
|
||||
project = var.project_id
|
||||
region = var.subnets[count.index].region
|
||||
network = google_compute_network.openclaw.id
|
||||
ip_cidr_range = var.subnets[count.index].ip_cidr_range
|
||||
private_ip_google_access = var.enable_private_google_access
|
||||
|
||||
dynamic "log_config" {
|
||||
for_each = var.enable_vpc_flow_logs ? [1] : []
|
||||
content {
|
||||
aggregation_interval = "INTERVAL_5_SEC"
|
||||
flow_sampling = 0.5
|
||||
metadata = "INCLUDE_ALL_METADATA"
|
||||
}
|
||||
}
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Firewall Rules
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
# Allow internal communication
|
||||
resource "google_compute_firewall" "allow_internal" {
|
||||
name = "${var.network_name}-allow-internal"
|
||||
project = var.project_id
|
||||
network = google_compute_network.openclaw.name
|
||||
|
||||
allow {
|
||||
protocol = "tcp"
|
||||
ports = ["0-65535"]
|
||||
}
|
||||
|
||||
allow {
|
||||
protocol = "udp"
|
||||
ports = ["0-65535"]
|
||||
}
|
||||
|
||||
allow {
|
||||
protocol = "icmp"
|
||||
}
|
||||
|
||||
source_ranges = [
|
||||
var.vpc_cidr,
|
||||
]
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# Allow health checks from Google Cloud health check systems
|
||||
resource "google_compute_firewall" "allow_health_checks" {
|
||||
name = "${var.network_name}-allow-health-checks"
|
||||
project = var.project_id
|
||||
network = google_compute_network.openclaw.name
|
||||
|
||||
allow {
|
||||
protocol = "tcp"
|
||||
ports = ["0-65535"]
|
||||
}
|
||||
|
||||
source_ranges = [
|
||||
"35.191.0.0/16",
|
||||
"130.211.0.0/22",
|
||||
]
|
||||
|
||||
target_tags = ["openclaw"]
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# Allow IAP (Identity-Aware Proxy) connections
|
||||
resource "google_compute_firewall" "allow_iap" {
|
||||
name = "${var.network_name}-allow-iap"
|
||||
project = var.project_id
|
||||
network = google_compute_network.openclaw.name
|
||||
|
||||
allow {
|
||||
protocol = "tcp"
|
||||
ports = ["22", "3389", "443"]
|
||||
}
|
||||
|
||||
source_ranges = [
|
||||
"35.235.240.0/20",
|
||||
]
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cloud NAT
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_router" "openclaw" {
|
||||
count = length(var.subnets)
|
||||
|
||||
name = "${var.network_name}-router-${var.subnets[count.index].region}"
|
||||
project = var.project_id
|
||||
region = var.subnets[count.index].region
|
||||
network = google_compute_network.openclaw.id
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
resource "google_compute_router_nat" "openclaw" {
|
||||
count = length(var.subnets)
|
||||
|
||||
name = "${var.network_name}-nat-${var.subnets[count.index].region}"
|
||||
project = var.project_id
|
||||
router = google_compute_router.openclaw[count.index].name
|
||||
region = var.subnets[count.index].region
|
||||
nat_ip_allocate_option = "AUTO_ONLY"
|
||||
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
|
||||
|
||||
log_config {
|
||||
enable = true
|
||||
filter = "ERRORS_ONLY"
|
||||
}
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Private Service Connection (for Cloud SQL, Memorystore)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
resource "google_compute_global_address" "private_ip_alloc" {
|
||||
name = "${var.network_name}-private-ip-alloc"
|
||||
project = var.project_id
|
||||
purpose = "VPC_PEERING"
|
||||
address_type = "INTERNAL"
|
||||
prefix_length = 16
|
||||
network = google_compute_network.openclaw.id
|
||||
|
||||
labels = var.tags
|
||||
}
|
||||
|
||||
resource "google_service_networking_connection" "private_vpc_connection" {
|
||||
network = google_compute_network.openclaw.id
|
||||
service = "servicenetworking.googleapis.com"
|
||||
reserved_peering_ranges = [google_compute_global_address.private_ip_alloc.name]
|
||||
|
||||
deletion_policy = "ABANDON"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Routes (if needed)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
# Default route to internet via NAT
|
||||
resource "google_compute_route" "default_internet" {
|
||||
name = "${var.network_name}-default-internet"
|
||||
project = var.project_id
|
||||
network = google_compute_network.openclaw.name
|
||||
|
||||
dest_range = "0.0.0.0/0"
|
||||
next_hop_gateway = "default-internet-gateway"
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
@@ -0,0 +1,156 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - LiteLLM Proxy Deployment
|
||||
# ==============================================================================
|
||||
# Base deployment configuration for LiteLLM proxy
|
||||
# ==============================================================================
|
||||
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: litellm
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: litellm
|
||||
app.kubernetes.io/component: proxy
|
||||
app.kubernetes.io/part-of: openclaw
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: litellm
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: litellm
|
||||
app.kubernetes.io/component: proxy
|
||||
app.kubernetes.io/part-of: openclaw
|
||||
spec:
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
fsGroup: 1000
|
||||
containers:
|
||||
- name: litellm
|
||||
image: ghcr.io/berriai/litellm:main-latest
|
||||
imagePullPolicy: IfNotPresent
|
||||
command:
|
||||
- "litellm"
|
||||
- "--config"
|
||||
- "/app/config.yaml"
|
||||
- "--port"
|
||||
- "4000"
|
||||
ports:
|
||||
- name: http
|
||||
containerPort: 4000
|
||||
protocol: TCP
|
||||
env:
|
||||
- name: DATABASE_URL
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: openclaw-secrets
|
||||
key: database-url
|
||||
- name: REDIS_URL
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: openclaw-secrets
|
||||
key: redis-url
|
||||
- name: LITELLM_MASTER_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: openclaw-secrets
|
||||
key: litellm-master-key
|
||||
optional: true
|
||||
- name: LITELLM_SALT_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: openclaw-secrets
|
||||
key: litellm-salt-key
|
||||
optional: true
|
||||
- name: LANGFUSE_PUBLIC_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: openclaw-secrets
|
||||
key: langfuse-public-key
|
||||
optional: true
|
||||
- name: LANGFUSE_SECRET_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: openclaw-secrets
|
||||
key: langfuse-secret-key
|
||||
optional: true
|
||||
- name: PROXY_COST_TRACKING
|
||||
value: "True"
|
||||
- name: PROXY_METRICS_ENABLED
|
||||
value: "True"
|
||||
- name: LITELLM_LOG_LEVEL
|
||||
value: "INFO"
|
||||
resources:
|
||||
requests:
|
||||
cpu: "1000m"
|
||||
memory: "2Gi"
|
||||
limits:
|
||||
cpu: "2000m"
|
||||
memory: "4Gi"
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: false
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: http
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 30
|
||||
timeoutSeconds: 10
|
||||
failureThreshold: 3
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: http
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
failureThreshold: 3
|
||||
volumeMounts:
|
||||
- name: litellm-config
|
||||
mountPath: /app/config.yaml
|
||||
subPath: config.yaml
|
||||
- name: tmp
|
||||
mountPath: /tmp
|
||||
volumes:
|
||||
- name: litellm-config
|
||||
configMap:
|
||||
name: litellm-config
|
||||
- name: tmp
|
||||
emptyDir: {}
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: litellm-config
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: litellm
|
||||
data:
|
||||
config.yaml: |
|
||||
model_list:
|
||||
- model_name: minimax
|
||||
litellm_params:
|
||||
model: minimax/minimax-abab6
|
||||
api_key: os.environ/MINIMAX_API_KEY
|
||||
- model_name: zai
|
||||
litellm_params:
|
||||
model: zai/glm-4
|
||||
api_key: os.environ/ZAI_API_KEY
|
||||
- model_name: ollama
|
||||
litellm_params:
|
||||
model: ollama/llama2
|
||||
api_base: http://ollama:11434
|
||||
litellm_settings:
|
||||
set_verbose: true
|
||||
drop_params: true
|
||||
max_tokens: 4096
|
||||
request_timeout: 600
|
||||
num_retries: 2
|
||||
@@ -0,0 +1,48 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - LiteLLM Proxy Service
|
||||
# ==============================================================================
|
||||
# Base service configuration for LiteLLM proxy
|
||||
# ==============================================================================
|
||||
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: litellm
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: litellm
|
||||
app.kubernetes.io/component: proxy
|
||||
app.kubernetes.io/part-of: openclaw
|
||||
spec:
|
||||
type: ClusterIP
|
||||
ports:
|
||||
- name: http
|
||||
port: 4000
|
||||
targetPort: http
|
||||
protocol: TCP
|
||||
selector:
|
||||
app.kubernetes.io/name: litellm
|
||||
---
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: litellm
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: litellm
|
||||
annotations:
|
||||
kubernetes.io/ingress.class: nginx
|
||||
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
|
||||
nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
|
||||
spec:
|
||||
rules:
|
||||
- host: litellm.local
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: litellm
|
||||
port:
|
||||
number: 4000
|
||||
@@ -0,0 +1,15 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Kubernetes Namespace
|
||||
# ==============================================================================
|
||||
# Base namespace configuration for OpenClaw deployment
|
||||
# ==============================================================================
|
||||
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: openclaw
|
||||
app.kubernetes.io/part-of: openclaw
|
||||
app.kubernetes.io/managed-by: kustomize
|
||||
name: openclaw
|
||||
@@ -0,0 +1,145 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - OpenClaw Gateway Deployment
|
||||
# ==============================================================================
|
||||
# Base deployment configuration for OpenClaw Gateway
|
||||
# ==============================================================================
|
||||
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: openclaw-gateway
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: openclaw-gateway
|
||||
app.kubernetes.io/component: gateway
|
||||
app.kubernetes.io/part-of: openclaw
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: openclaw-gateway
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: openclaw-gateway
|
||||
app.kubernetes.io/component: gateway
|
||||
app.kubernetes.io/part-of: openclaw
|
||||
spec:
|
||||
serviceAccountName: openclaw
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
fsGroup: 1000
|
||||
containers:
|
||||
- name: gateway
|
||||
image: heretek/openclaw-gateway:2026.3.28
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: http
|
||||
containerPort: 18789
|
||||
protocol: TCP
|
||||
- name: ws
|
||||
containerPort: 18790
|
||||
protocol: TCP
|
||||
env:
|
||||
- name: NODE_ENV
|
||||
value: "production"
|
||||
- name: PORT
|
||||
value: "18789"
|
||||
- name: OPENCLAW_DIR
|
||||
value: "/app/.openclaw"
|
||||
- name: LITELLM_URL
|
||||
value: "http://litellm:4000"
|
||||
- name: DATABASE_URL
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: openclaw-secrets
|
||||
key: database-url
|
||||
- name: REDIS_URL
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: openclaw-secrets
|
||||
key: redis-url
|
||||
- name: MINIMAX_API_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: openclaw-secrets
|
||||
key: minimax-api-key
|
||||
optional: true
|
||||
- name: ZAI_API_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: openclaw-secrets
|
||||
key: zai-api-key
|
||||
optional: true
|
||||
resources:
|
||||
requests:
|
||||
cpu: "2000m"
|
||||
memory: "4Gi"
|
||||
limits:
|
||||
cpu: "4000m"
|
||||
memory: "8Gi"
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: http
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 30
|
||||
timeoutSeconds: 10
|
||||
failureThreshold: 3
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: http
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
failureThreshold: 3
|
||||
volumeMounts:
|
||||
- name: openclaw-data
|
||||
mountPath: /app/.openclaw
|
||||
- name: tmp
|
||||
mountPath: /tmp
|
||||
volumes:
|
||||
- name: openclaw-data
|
||||
persistentVolumeClaim:
|
||||
claimName: openclaw-data-pvc
|
||||
- name: tmp
|
||||
emptyDir: {}
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
preferredDuringSchedulingIgnoredDuringExecution:
|
||||
- weight: 100
|
||||
podAffinityTerm:
|
||||
labelSelector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: openclaw-gateway
|
||||
topologyKey: kubernetes.io/hostname
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: openclaw-data-pvc
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: openclaw-gateway
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: openclaw
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: openclaw-gateway
|
||||
@@ -0,0 +1,60 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - OpenClaw Gateway Service
|
||||
# ==============================================================================
|
||||
# Base service configuration for OpenClaw Gateway
|
||||
# ==============================================================================
|
||||
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: openclaw-gateway
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: openclaw-gateway
|
||||
app.kubernetes.io/component: gateway
|
||||
app.kubernetes.io/part-of: openclaw
|
||||
spec:
|
||||
type: ClusterIP
|
||||
ports:
|
||||
- name: http
|
||||
port: 18789
|
||||
targetPort: http
|
||||
protocol: TCP
|
||||
- name: ws
|
||||
port: 18790
|
||||
targetPort: ws
|
||||
protocol: TCP
|
||||
selector:
|
||||
app.kubernetes.io/name: openclaw-gateway
|
||||
---
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: openclaw-gateway
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: openclaw-gateway
|
||||
annotations:
|
||||
kubernetes.io/ingress.class: nginx
|
||||
nginx.ingress.kubernetes.io/websocket-services: "openclaw-gateway"
|
||||
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
|
||||
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
|
||||
spec:
|
||||
rules:
|
||||
- host: openclaw.local
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: openclaw-gateway
|
||||
port:
|
||||
number: 18789
|
||||
- path: /ws
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: openclaw-gateway
|
||||
port:
|
||||
number: 18790
|
||||
@@ -0,0 +1,145 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - PostgreSQL StatefulSet
|
||||
# ==============================================================================
|
||||
# Base StatefulSet configuration for PostgreSQL with pgvector
|
||||
# ==============================================================================
|
||||
|
||||
apiVersion: apps/v1
|
||||
kind: StatefulSet
|
||||
metadata:
|
||||
name: postgresql
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: postgresql
|
||||
app.kubernetes.io/component: database
|
||||
app.kubernetes.io/part-of: openclaw
|
||||
spec:
|
||||
serviceName: postgresql
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: postgresql
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: postgresql
|
||||
app.kubernetes.io/component: database
|
||||
app.kubernetes.io/part-of: openclaw
|
||||
spec:
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 999
|
||||
fsGroup: 999
|
||||
containers:
|
||||
- name: postgresql
|
||||
image: pgvector/pgvector:pg17
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: postgres
|
||||
containerPort: 5432
|
||||
protocol: TCP
|
||||
env:
|
||||
- name: POSTGRES_USER
|
||||
value: "openclaw"
|
||||
- name: POSTGRES_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: openclaw-secrets
|
||||
key: postgres-password
|
||||
- name: POSTGRES_DB
|
||||
value: "openclaw"
|
||||
- name: PGDATA
|
||||
value: "/var/lib/postgresql/data/pgdata"
|
||||
resources:
|
||||
requests:
|
||||
cpu: "1000m"
|
||||
memory: "2Gi"
|
||||
limits:
|
||||
cpu: "2000m"
|
||||
memory: "4Gi"
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
livenessProbe:
|
||||
exec:
|
||||
command:
|
||||
- pg_isready
|
||||
- -U
|
||||
- openclaw
|
||||
- -d
|
||||
- openclaw
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
failureThreshold: 3
|
||||
readinessProbe:
|
||||
exec:
|
||||
command:
|
||||
- pg_isready
|
||||
- -U
|
||||
- openclaw
|
||||
- -d
|
||||
- openclaw
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 5
|
||||
timeoutSeconds: 3
|
||||
failureThreshold: 3
|
||||
volumeMounts:
|
||||
- name: postgresql-data
|
||||
mountPath: /var/lib/postgresql/data
|
||||
- name: postgresql-init
|
||||
mountPath: /docker-entrypoint-initdb.d
|
||||
volumes:
|
||||
- name: postgresql-init
|
||||
configMap:
|
||||
name: postgresql-init
|
||||
volumeClaimTemplates:
|
||||
- metadata:
|
||||
name: postgresql-data
|
||||
labels:
|
||||
app.kubernetes.io/name: postgresql
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 50Gi
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: postgresql-init
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: postgresql
|
||||
data:
|
||||
init-pgvector.sql: |
|
||||
-- Enable pgvector extension
|
||||
CREATE EXTENSION IF NOT EXISTS vector;
|
||||
|
||||
-- Create OpenClaw database schema
|
||||
CREATE SCHEMA IF NOT EXISTS openclaw;
|
||||
|
||||
-- Grant permissions
|
||||
GRANT ALL PRIVILEGES ON SCHEMA openclaw TO openclaw;
|
||||
GRANT ALL PRIVILEGES ON DATABASE openclaw TO openclaw;
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: postgresql
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: postgresql
|
||||
app.kubernetes.io/component: database
|
||||
spec:
|
||||
type: ClusterIP
|
||||
ports:
|
||||
- name: postgres
|
||||
port: 5432
|
||||
targetPort: postgres
|
||||
protocol: TCP
|
||||
selector:
|
||||
app.kubernetes.io/name: postgresql
|
||||
@@ -0,0 +1,146 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Redis StatefulSet
|
||||
# ==============================================================================
|
||||
# Base StatefulSet configuration for Redis cache
|
||||
# ==============================================================================
|
||||
|
||||
apiVersion: apps/v1
|
||||
kind: StatefulSet
|
||||
metadata:
|
||||
name: redis
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: redis
|
||||
app.kubernetes.io/component: cache
|
||||
app.kubernetes.io/part-of: openclaw
|
||||
spec:
|
||||
serviceName: redis
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: redis
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: redis
|
||||
app.kubernetes.io/component: cache
|
||||
app.kubernetes.io/part-of: openclaw
|
||||
spec:
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 999
|
||||
fsGroup: 999
|
||||
containers:
|
||||
- name: redis
|
||||
image: redis:7-alpine
|
||||
imagePullPolicy: IfNotPresent
|
||||
command:
|
||||
- redis-server
|
||||
- /etc/redis/redis.conf
|
||||
- --requirepass
|
||||
- $(REDIS_PASSWORD)
|
||||
ports:
|
||||
- name: redis
|
||||
containerPort: 6379
|
||||
protocol: TCP
|
||||
env:
|
||||
- name: REDIS_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: openclaw-secrets
|
||||
key: redis-password
|
||||
resources:
|
||||
requests:
|
||||
cpu: "250m"
|
||||
memory: "256Mi"
|
||||
limits:
|
||||
cpu: "500m"
|
||||
memory: "512Mi"
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
livenessProbe:
|
||||
exec:
|
||||
command:
|
||||
- redis-cli
|
||||
- -a
|
||||
- $(REDIS_PASSWORD)
|
||||
- ping
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
failureThreshold: 3
|
||||
readinessProbe:
|
||||
exec:
|
||||
command:
|
||||
- redis-cli
|
||||
- -a
|
||||
- $(REDIS_PASSWORD)
|
||||
- ping
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 5
|
||||
timeoutSeconds: 3
|
||||
failureThreshold: 3
|
||||
volumeMounts:
|
||||
- name: redis-data
|
||||
mountPath: /data
|
||||
- name: redis-config
|
||||
mountPath: /etc/redis
|
||||
volumes:
|
||||
- name: redis-config
|
||||
configMap:
|
||||
name: redis-config
|
||||
volumeClaimTemplates:
|
||||
- metadata:
|
||||
name: redis-data
|
||||
labels:
|
||||
app.kubernetes.io/name: redis
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: redis-config
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: redis
|
||||
data:
|
||||
redis.conf: |
|
||||
# Redis Configuration for OpenClaw
|
||||
bind 0.0.0.0
|
||||
port 6379
|
||||
protected-mode yes
|
||||
appendonly yes
|
||||
appendfsync everysec
|
||||
maxmemory 256mb
|
||||
maxmemory-policy allkeys-lru
|
||||
tcp-keepalive 60
|
||||
timeout 300
|
||||
slowlog-log-slower-than 10000
|
||||
slowlog-max-len 128
|
||||
loglevel notice
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: redis
|
||||
namespace: openclaw
|
||||
labels:
|
||||
app.kubernetes.io/name: redis
|
||||
app.kubernetes.io/component: cache
|
||||
spec:
|
||||
type: ClusterIP
|
||||
ports:
|
||||
- name: redis
|
||||
port: 6379
|
||||
targetPort: redis
|
||||
protocol: TCP
|
||||
selector:
|
||||
app.kubernetes.io/name: redis
|
||||
@@ -0,0 +1,127 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Development Overlay
|
||||
# ==============================================================================
|
||||
# Kustomization overlay for development environment
|
||||
# ==============================================================================
|
||||
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
|
||||
namespace: openclaw-dev
|
||||
|
||||
namePrefix: dev-
|
||||
|
||||
resources:
|
||||
- ../../base
|
||||
|
||||
commonLabels:
|
||||
environment: dev
|
||||
|
||||
# Development-specific patches
|
||||
patches:
|
||||
# Reduce Gateway replicas for dev
|
||||
- target:
|
||||
kind: Deployment
|
||||
name: openclaw-gateway
|
||||
patch: |-
|
||||
- op: replace
|
||||
path: /spec/replicas
|
||||
value: 1
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/cpu
|
||||
value: "500m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/memory
|
||||
value: "1Gi"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/cpu
|
||||
value: "1000m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/memory
|
||||
value: "2Gi"
|
||||
|
||||
# Reduce LiteLLM replicas for dev
|
||||
- target:
|
||||
kind: Deployment
|
||||
name: litellm
|
||||
patch: |-
|
||||
- op: replace
|
||||
path: /spec/replicas
|
||||
value: 1
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/cpu
|
||||
value: "250m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/memory
|
||||
value: "512Mi"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/cpu
|
||||
value: "500m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/memory
|
||||
value: "1Gi"
|
||||
|
||||
# Reduce PostgreSQL storage for dev
|
||||
- target:
|
||||
kind: StatefulSet
|
||||
name: postgresql
|
||||
patch: |-
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/cpu
|
||||
value: "500m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/memory
|
||||
value: "1Gi"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/cpu
|
||||
value: "1000m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/memory
|
||||
value: "2Gi"
|
||||
|
||||
# Reduce Redis storage for dev
|
||||
- target:
|
||||
kind: StatefulSet
|
||||
name: redis
|
||||
patch: |-
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/cpu
|
||||
value: "100m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/memory
|
||||
value: "128Mi"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/cpu
|
||||
value: "250m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/memory
|
||||
value: "256Mi"
|
||||
|
||||
# ConfigMapGenerator for environment-specific configuration
|
||||
configMapGenerator:
|
||||
- name: openclaw-config
|
||||
literals:
|
||||
- ENVIRONMENT=dev
|
||||
- LOG_LEVEL=debug
|
||||
- ENABLE_PROFILING=true
|
||||
|
||||
# SecretGenerator for development secrets
|
||||
secretGenerator:
|
||||
- name: openclaw-secrets
|
||||
literals:
|
||||
- database-url=postgresql://openclaw:devpassword@dev-postgresql:5432/openclaw
|
||||
- redis-url=redis://:devredis@dev-redis:6379/0
|
||||
- postgres-password=devpassword
|
||||
- redis-password=devredis
|
||||
- litellm-master-key=dev-master-key-change-in-production
|
||||
- litellm-salt-key=dev-salt-key-change-in-production
|
||||
- minimax-api-key=your-minimax-api-key
|
||||
- zai-api-key=your-zai-api-key
|
||||
behavior: replace
|
||||
|
||||
# Image overrides for development
|
||||
images:
|
||||
- name: heretek/openclaw-gateway
|
||||
newTag: 2026.3.28
|
||||
- name: ghcr.io/berriai/litellm
|
||||
newTag: main-latest
|
||||
@@ -0,0 +1,170 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Production Overlay
|
||||
# ==============================================================================
|
||||
# Kustomization overlay for production environment
|
||||
# ==============================================================================
|
||||
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
|
||||
namespace: openclaw-prod
|
||||
|
||||
namePrefix: prod-
|
||||
|
||||
resources:
|
||||
- ../../base
|
||||
|
||||
commonLabels:
|
||||
environment: prod
|
||||
|
||||
# Production-specific patches
|
||||
patches:
|
||||
# Gateway configuration for production with HA
|
||||
- target:
|
||||
kind: Deployment
|
||||
name: openclaw-gateway
|
||||
patch: |-
|
||||
- op: replace
|
||||
path: /spec/replicas
|
||||
value: 3
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/cpu
|
||||
value: "2000m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/memory
|
||||
value: "4Gi"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/cpu
|
||||
value: "4000m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/memory
|
||||
value: "8Gi"
|
||||
|
||||
# Add PodDisruptionBudget for Gateway
|
||||
- target:
|
||||
kind: Deployment
|
||||
name: openclaw-gateway
|
||||
patch: |-
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: openclaw-gateway
|
||||
spec:
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxSurge: 1
|
||||
maxUnavailable: 0
|
||||
|
||||
# LiteLLM configuration for production with HA
|
||||
- target:
|
||||
kind: Deployment
|
||||
name: litellm
|
||||
patch: |-
|
||||
- op: replace
|
||||
path: /spec/replicas
|
||||
value: 3
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/cpu
|
||||
value: "1000m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/memory
|
||||
value: "2Gi"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/cpu
|
||||
value: "2000m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/memory
|
||||
value: "4Gi"
|
||||
|
||||
# PostgreSQL configuration for production
|
||||
- target:
|
||||
kind: StatefulSet
|
||||
name: postgresql
|
||||
patch: |-
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/cpu
|
||||
value: "2000m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/memory
|
||||
value: "4Gi"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/cpu
|
||||
value: "4000m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/memory
|
||||
value: "8Gi"
|
||||
|
||||
# Redis configuration for production
|
||||
- target:
|
||||
kind: StatefulSet
|
||||
name: redis
|
||||
patch: |-
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/cpu
|
||||
value: "500m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/memory
|
||||
value: "512Mi"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/cpu
|
||||
value: "1000m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/memory
|
||||
value: "1Gi"
|
||||
|
||||
# ConfigMapGenerator for production configuration
|
||||
configMapGenerator:
|
||||
- name: openclaw-config
|
||||
literals:
|
||||
- ENVIRONMENT=prod
|
||||
- LOG_LEVEL=warn
|
||||
- ENABLE_PROFILING=false
|
||||
- ENABLE_DEBUG=false
|
||||
|
||||
# SecretGenerator for production secrets
|
||||
# IMPORTANT: Replace these with actual secrets from your secrets manager
|
||||
secretGenerator:
|
||||
- name: openclaw-secrets
|
||||
literals:
|
||||
- database-url=postgresql://openclaw:PRODUCTION_PASSWORD_REPLACE_ME@prod-postgresql:5432/openclaw
|
||||
- redis-url=redis://:PRODUCTION_REDIS_REPLACE_ME@prod-redis:6379/0
|
||||
- postgres-password=PRODUCTION_PASSWORD_REPLACE_ME
|
||||
- redis-password=PRODUCTION_REDIS_REPLACE_ME
|
||||
- litellm-master-key=PRODUCTION_MASTER_KEY_REPLACE_ME
|
||||
- litellm-salt-key=PRODUCTION_SALT_KEY_REPLACE_ME
|
||||
- minimax-api-key=your-minimax-api-key
|
||||
- zai-api-key=your-zai-api-key
|
||||
behavior: replace
|
||||
|
||||
# Image overrides for production
|
||||
images:
|
||||
- name: heretek/openclaw-gateway
|
||||
newTag: 2026.3.28
|
||||
digest: sha256:replace-with-actual-digest
|
||||
- name: ghcr.io/berriai/litellm
|
||||
newTag: main-v1.0.0
|
||||
digest: sha256:replace-with-actual-digest
|
||||
|
||||
# Production-specific additional resources
|
||||
patchesStrategicMerge:
|
||||
- |-
|
||||
apiVersion: policy/v1
|
||||
kind: PodDisruptionBudget
|
||||
metadata:
|
||||
name: openclaw-gateway-pdb
|
||||
spec:
|
||||
minAvailable: 2
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: openclaw-gateway
|
||||
- |-
|
||||
apiVersion: policy/v1
|
||||
kind: PodDisruptionBudget
|
||||
metadata:
|
||||
name: litellm-pdb
|
||||
spec:
|
||||
minAvailable: 2
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: litellm
|
||||
@@ -0,0 +1,127 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Staging Overlay
|
||||
# ==============================================================================
|
||||
# Kustomization overlay for staging environment
|
||||
# ==============================================================================
|
||||
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
|
||||
namespace: openclaw-staging
|
||||
|
||||
namePrefix: staging-
|
||||
|
||||
resources:
|
||||
- ../../base
|
||||
|
||||
commonLabels:
|
||||
environment: staging
|
||||
|
||||
# Staging-specific patches
|
||||
patches:
|
||||
# Gateway configuration for staging
|
||||
- target:
|
||||
kind: Deployment
|
||||
name: openclaw-gateway
|
||||
patch: |-
|
||||
- op: replace
|
||||
path: /spec/replicas
|
||||
value: 2
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/cpu
|
||||
value: "1000m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/memory
|
||||
value: "2Gi"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/cpu
|
||||
value: "2000m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/memory
|
||||
value: "4Gi"
|
||||
|
||||
# LiteLLM configuration for staging
|
||||
- target:
|
||||
kind: Deployment
|
||||
name: litellm
|
||||
patch: |-
|
||||
- op: replace
|
||||
path: /spec/replicas
|
||||
value: 2
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/cpu
|
||||
value: "500m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/memory
|
||||
value: "1Gi"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/cpu
|
||||
value: "1000m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/memory
|
||||
value: "2Gi"
|
||||
|
||||
# PostgreSQL configuration for staging
|
||||
- target:
|
||||
kind: StatefulSet
|
||||
name: postgresql
|
||||
patch: |-
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/cpu
|
||||
value: "1000m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/memory
|
||||
value: "2Gi"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/cpu
|
||||
value: "2000m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/memory
|
||||
value: "4Gi"
|
||||
|
||||
# Redis configuration for staging
|
||||
- target:
|
||||
kind: StatefulSet
|
||||
name: redis
|
||||
patch: |-
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/cpu
|
||||
value: "250m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/requests/memory
|
||||
value: "256Mi"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/cpu
|
||||
value: "500m"
|
||||
- op: replace
|
||||
path: /spec/template/spec/containers/0/resources/limits/memory
|
||||
value: "512Mi"
|
||||
|
||||
# ConfigMapGenerator for staging configuration
|
||||
configMapGenerator:
|
||||
- name: openclaw-config
|
||||
literals:
|
||||
- ENVIRONMENT=staging
|
||||
- LOG_LEVEL=info
|
||||
- ENABLE_PROFILING=false
|
||||
|
||||
# SecretGenerator for staging secrets
|
||||
secretGenerator:
|
||||
- name: openclaw-secrets
|
||||
literals:
|
||||
- database-url=postgresql://openclaw:staging-password-change-me@staging-postgresql:5432/openclaw
|
||||
- redis-url=redis://:staging-redis-change-me@staging-redis:6379/0
|
||||
- postgres-password=staging-password-change-me
|
||||
- redis-password=staging-redis-change-me
|
||||
- litellm-master-key=staging-master-key-change-in-production
|
||||
- litellm-salt-key=staging-salt-key-change-in-production
|
||||
- minimax-api-key=your-minimax-api-key
|
||||
- zai-api-key=your-zai-api-key
|
||||
behavior: replace
|
||||
|
||||
# Image overrides for staging
|
||||
images:
|
||||
- name: heretek/openclaw-gateway
|
||||
newTag: 2026.3.28
|
||||
- name: ghcr.io/berriai/litellm
|
||||
newTag: main-latest
|
||||
@@ -0,0 +1,419 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - LiteLLM Terraform Module
|
||||
# ==============================================================================
|
||||
# Reusable module for LiteLLM proxy deployment
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Module Variables
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "name" {
|
||||
description = "Name prefix for resources"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "environment" {
|
||||
description = "Environment name"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "tags" {
|
||||
description = "Tags to apply to resources"
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
variable "image" {
|
||||
description = "LiteLLM container image"
|
||||
type = object({
|
||||
repository = string
|
||||
tag = string
|
||||
pull_policy = optional(string, "IfNotPresent")
|
||||
})
|
||||
default = {
|
||||
repository = "ghcr.io/berriai/litellm"
|
||||
tag = "main-latest"
|
||||
}
|
||||
}
|
||||
|
||||
variable "replicas" {
|
||||
description = "Number of replicas"
|
||||
type = number
|
||||
default = 1
|
||||
}
|
||||
|
||||
variable "port" {
|
||||
description = "Service port"
|
||||
type = number
|
||||
default = 4000
|
||||
}
|
||||
|
||||
variable "resources" {
|
||||
description = "Container resources"
|
||||
type = object({
|
||||
requests = object({
|
||||
cpu = string
|
||||
memory = string
|
||||
})
|
||||
limits = object({
|
||||
cpu = string
|
||||
memory = string
|
||||
})
|
||||
})
|
||||
default = {
|
||||
requests = {
|
||||
cpu = "1000m"
|
||||
memory = "2Gi"
|
||||
}
|
||||
limits = {
|
||||
cpu = "2000m"
|
||||
memory = "4Gi"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
variable "database" {
|
||||
description = "Database configuration for LiteLLM"
|
||||
type = object({
|
||||
host = string
|
||||
port = number
|
||||
name = string
|
||||
username = string
|
||||
password = string
|
||||
ssl_mode = optional(string, "require")
|
||||
})
|
||||
}
|
||||
|
||||
variable "redis" {
|
||||
description = "Redis configuration for LiteLLM"
|
||||
type = object({
|
||||
host = string
|
||||
port = number
|
||||
password = optional(string)
|
||||
db = optional(number, 0)
|
||||
})
|
||||
default = {
|
||||
host = "localhost"
|
||||
port = 6379
|
||||
}
|
||||
}
|
||||
|
||||
variable "config" {
|
||||
description = "LiteLLM configuration"
|
||||
type = object({
|
||||
master_key = optional(string)
|
||||
master_key_secret = optional(string)
|
||||
cost_tracking = optional(bool, true)
|
||||
metrics_enabled = optional(bool, true)
|
||||
log_level = optional(string, "INFO")
|
||||
ui_enabled = optional(bool, true)
|
||||
spend_tracking = optional(bool, true)
|
||||
})
|
||||
default = {
|
||||
cost_tracking = true
|
||||
metrics_enabled = true
|
||||
log_level = "INFO"
|
||||
ui_enabled = true
|
||||
}
|
||||
}
|
||||
|
||||
variable "providers" {
|
||||
description = "LLM provider configurations"
|
||||
type = list(object({
|
||||
name = string
|
||||
provider = string
|
||||
api_key = optional(string)
|
||||
api_base = optional(string)
|
||||
models = list(object({
|
||||
model_name = string
|
||||
litellm_model = string
|
||||
}))
|
||||
}))
|
||||
default = []
|
||||
}
|
||||
|
||||
variable "autoscaling" {
|
||||
description = "Autoscaling configuration"
|
||||
type = object({
|
||||
enabled = optional(bool, false)
|
||||
min_replicas = optional(number, 1)
|
||||
max_replicas = optional(number, 10)
|
||||
target_cpu_percent = optional(number, 80)
|
||||
target_memory_percent = optional(number, 80)
|
||||
})
|
||||
default = {
|
||||
enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
variable "ingress" {
|
||||
description = "Ingress configuration"
|
||||
type = object({
|
||||
enabled = optional(bool, false)
|
||||
class_name = optional(string, "nginx")
|
||||
hosts = optional(list(string), [])
|
||||
tls = optional(list(object({
|
||||
secret_name = string
|
||||
hosts = list(string)
|
||||
})), [])
|
||||
annotations = optional(map(string), {})
|
||||
})
|
||||
default = {
|
||||
enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
variable "monitoring" {
|
||||
description = "Monitoring configuration"
|
||||
type = object({
|
||||
enabled = optional(bool, true)
|
||||
service_monitor = optional(bool, false)
|
||||
prometheus_rules = optional(bool, false)
|
||||
})
|
||||
default = {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
|
||||
variable "security" {
|
||||
description = "Security configuration"
|
||||
type = object({
|
||||
pod_security_context = optional(object({
|
||||
run_as_non_root = optional(bool, true)
|
||||
run_as_user = optional(number, 1000)
|
||||
fs_group = optional(number, 1000)
|
||||
}))
|
||||
container_security_context = optional(object({
|
||||
allow_privilege_escalation = optional(bool, false)
|
||||
read_only_root_filesystem = optional(bool, true)
|
||||
capabilities = optional(object({
|
||||
drop = optional(list(string), ["ALL"])
|
||||
}))
|
||||
}))
|
||||
})
|
||||
default = {
|
||||
pod_security_context = {
|
||||
run_as_non_root = true
|
||||
run_as_user = 1000
|
||||
}
|
||||
container_security_context = {
|
||||
allow_privilege_escalation = false
|
||||
read_only_root_filesystem = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Local Values
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
locals {
|
||||
common_labels = merge(var.tags, {
|
||||
"app.kubernetes.io/name" = "litellm"
|
||||
"app.kubernetes.io/component" = "proxy"
|
||||
"app.kubernetes.io/part-of" = "openclaw"
|
||||
"app.kubernetes.io/managed-by" = "terraform"
|
||||
})
|
||||
|
||||
database_url = "postgresql://${var.database.username}:${var.database.password}@${var.database.host}:${var.database.port}/${var.database.name}"
|
||||
|
||||
redis_url = var.redis.password != null ? "redis://:${var.redis.password}@${var.redis.host}:${var.redis.port}/${var.redis.db}" : "redis://${var.redis.host}:${var.redis.port}/${var.redis.db}"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Kubernetes Resources (when used with Kubernetes provider)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
# Deployment
|
||||
resource "kubernetes_deployment" "litellm" {
|
||||
count = var.environment == "module" ? 1 : 0 # Only when used with Kubernetes provider
|
||||
|
||||
metadata {
|
||||
name = "${var.name}-litellm"
|
||||
namespace = var.namespace
|
||||
labels = local.common_labels
|
||||
}
|
||||
|
||||
spec {
|
||||
replicas = var.replicas
|
||||
|
||||
selector {
|
||||
match_labels = {
|
||||
"app.kubernetes.io/name" = "litellm"
|
||||
}
|
||||
}
|
||||
|
||||
template {
|
||||
metadata {
|
||||
labels = merge(local.common_labels, {
|
||||
"app.kubernetes.io/name" = "litellm"
|
||||
})
|
||||
}
|
||||
|
||||
spec {
|
||||
container {
|
||||
name = "litellm"
|
||||
image = "${var.image.repository}:${var.image.tag}"
|
||||
ports {
|
||||
container_port = var.port
|
||||
}
|
||||
|
||||
env {
|
||||
name = "DATABASE_URL"
|
||||
value = local.database_url
|
||||
}
|
||||
|
||||
env {
|
||||
name = "REDIS_URL"
|
||||
value = local.redis_url
|
||||
}
|
||||
|
||||
env {
|
||||
name = "LITELLM_MASTER_KEY"
|
||||
value = var.config.master_key
|
||||
}
|
||||
|
||||
env {
|
||||
name = "LITELLM_LOG_LEVEL"
|
||||
value = var.config.log_level
|
||||
}
|
||||
|
||||
env {
|
||||
name = "PROXY_COST_TRACKING"
|
||||
value = var.config.cost_tracking ? "True" : "False"
|
||||
}
|
||||
|
||||
resources {
|
||||
requests = var.resources.requests
|
||||
limits = var.resources.limits
|
||||
}
|
||||
}
|
||||
|
||||
dynamic "security_context" {
|
||||
for_each = var.security.pod_security_context != null ? [1] : []
|
||||
content {
|
||||
run_as_non_root = var.security.pod_security_context.run_as_non_root
|
||||
run_as_user = var.security.pod_security_context.run_as_user
|
||||
fs_group = var.security.pod_security_context.fs_group
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Service
|
||||
resource "kubernetes_service" "litellm" {
|
||||
count = var.environment == "module" ? 1 : 0
|
||||
|
||||
metadata {
|
||||
name = "${var.name}-litellm"
|
||||
namespace = var.namespace
|
||||
labels = local.common_labels
|
||||
}
|
||||
|
||||
spec {
|
||||
selector = {
|
||||
"app.kubernetes.io/name" = "litellm"
|
||||
}
|
||||
|
||||
port {
|
||||
port = var.port
|
||||
target_port = var.port
|
||||
}
|
||||
|
||||
type = "ClusterIP"
|
||||
}
|
||||
}
|
||||
|
||||
# ConfigMap for LiteLLM configuration
|
||||
resource "kubernetes_config_map" "litellm" {
|
||||
count = var.environment == "module" ? 1 : 0
|
||||
|
||||
metadata {
|
||||
name = "${var.name}-litellm-config"
|
||||
namespace = var.namespace
|
||||
labels = local.common_labels
|
||||
}
|
||||
|
||||
data = {
|
||||
"config.yaml" = yamlencode({
|
||||
model_list = [
|
||||
for provider in var.providers : [
|
||||
for model in provider.models : {
|
||||
model_name = model.model_name
|
||||
litellm_params = {
|
||||
model = "${provider.provider}/${model.litellm_model}"
|
||||
api_key = provider.api_key
|
||||
api_base = provider.api_base
|
||||
}
|
||||
}
|
||||
]
|
||||
]
|
||||
litellm_settings = {
|
||||
set_verbose = var.environment == "dev"
|
||||
drop_params = true
|
||||
max_tokens = 4096
|
||||
request_timeout = 600
|
||||
num_retries = 2
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "name" {
|
||||
description = "LiteLLM deployment name"
|
||||
value = "${var.name}-litellm"
|
||||
}
|
||||
|
||||
output "image" {
|
||||
description = "LiteLLM container image"
|
||||
value = "${var.image.repository}:${var.image.tag}"
|
||||
}
|
||||
|
||||
output "port" {
|
||||
description = "LiteLLM service port"
|
||||
value = var.port
|
||||
}
|
||||
|
||||
output "replicas" {
|
||||
description = "Number of replicas"
|
||||
value = var.replicas
|
||||
}
|
||||
|
||||
output "database_url" {
|
||||
description = "Database connection URL"
|
||||
value = local.database_url
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "redis_url" {
|
||||
description = "Redis connection URL"
|
||||
value = local.redis_url
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "autoscaling_enabled" {
|
||||
description = "Whether autoscaling is enabled"
|
||||
value = var.autoscaling.enabled
|
||||
}
|
||||
|
||||
output "ingress_enabled" {
|
||||
description = "Whether ingress is enabled"
|
||||
value = var.ingress.enabled
|
||||
}
|
||||
|
||||
output "monitoring_enabled" {
|
||||
description = "Whether monitoring is enabled"
|
||||
value = var.monitoring.enabled
|
||||
}
|
||||
|
||||
output "common_labels" {
|
||||
description = "Common labels applied to resources"
|
||||
value = local.common_labels
|
||||
}
|
||||
@@ -0,0 +1,669 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Monitoring Terraform Module
|
||||
# ==============================================================================
|
||||
# Reusable module for monitoring stack (Prometheus, Grafana, Alerting)
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Module Variables
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "name_prefix" {
|
||||
description = "Name prefix for resources"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "environment" {
|
||||
description = "Environment name"
|
||||
type = string
|
||||
default = "dev"
|
||||
}
|
||||
|
||||
variable "tags" {
|
||||
description = "Tags to apply to resources"
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cloud Provider Specific Variables
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "cloud_provider" {
|
||||
description = "Cloud provider (aws, gcp, azure)"
|
||||
type = string
|
||||
validation {
|
||||
condition = contains(["aws", "gcp", "azure"], var.cloud_provider)
|
||||
error_message = "Cloud provider must be one of: aws, gcp, azure."
|
||||
}
|
||||
}
|
||||
|
||||
variable "project_id" {
|
||||
description = "GCP project ID or Azure subscription ID"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "region" {
|
||||
description = "Cloud provider region"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "resource_group_name" {
|
||||
description = "Azure resource group name"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cluster Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "cluster_name" {
|
||||
description = "Kubernetes cluster name"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "cluster_id" {
|
||||
description = "Kubernetes cluster ID"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Database Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "database_instance_id" {
|
||||
description = "Database instance identifier"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "database_instance_name" {
|
||||
description = "Database instance name"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "database_server_id" {
|
||||
description = "Database server ID (Azure)"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Cache Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "cache_cluster_id" {
|
||||
description = "Cache cluster identifier"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "cache_instance_id" {
|
||||
description = "Cache instance identifier"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "redis_cache_id" {
|
||||
description = "Redis cache ID (Azure)"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Dashboard Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "enable_dashboard" {
|
||||
description = "Enable monitoring dashboard"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "dashboard_name" {
|
||||
description = "Dashboard name"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Alerting Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "enable_alerts" {
|
||||
description = "Enable alerting rules"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "alert_notification_arn" {
|
||||
description = "SNS topic ARN (AWS)"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "alert_email" {
|
||||
description = "Email for alert notifications"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "alert_notification_channels" {
|
||||
description = "Alert notification channel IDs (GCP)"
|
||||
type = list(string)
|
||||
default = []
|
||||
}
|
||||
|
||||
variable "action_group_id" {
|
||||
description = "Action group ID (Azure)"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Log Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "log_retention_days" {
|
||||
description = "Log retention period in days"
|
||||
type = number
|
||||
default = 30
|
||||
}
|
||||
|
||||
variable "enable_log_export" {
|
||||
description = "Enable log export to storage"
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "log_storage_bucket" {
|
||||
description = "Storage bucket for log export"
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Local Values
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
locals {
|
||||
common_tags = merge(var.tags, {
|
||||
"app.kubernetes.io/name" = "monitoring"
|
||||
"app.kubernetes.io/component" = "observability"
|
||||
"app.kubernetes.io/part-of" = "openclaw"
|
||||
"app.kubernetes.io/managed-by" = "terraform"
|
||||
})
|
||||
|
||||
dashboard_name = var.dashboard_name != null ? var.dashboard_name : "${var.name_prefix}-dashboard"
|
||||
|
||||
alert_prefix = "${var.name_prefix}-alert"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# AWS Resources
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
# CloudWatch Dashboard
|
||||
resource "aws_cloudwatch_dashboard" "openclaw" {
|
||||
count = var.cloud_provider == "aws" && var.enable_dashboard ? 1 : 0
|
||||
|
||||
dashboard_name = local.dashboard_name
|
||||
|
||||
dashboard_body = jsonencode({
|
||||
widgets = [
|
||||
{
|
||||
type = "metric"
|
||||
x = 0
|
||||
y = 0
|
||||
width = 12
|
||||
height = 6
|
||||
properties = {
|
||||
title = "EKS Cluster CPU Utilization"
|
||||
region = var.region
|
||||
metrics = [
|
||||
["AWS/EKS", "CPUUtilization", "ClusterName", var.cluster_name, { stat = "Average" }]
|
||||
]
|
||||
view = "timeSeries"
|
||||
period = 300
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
x = 12
|
||||
y = 0
|
||||
width = 12
|
||||
height = 6
|
||||
properties = {
|
||||
title = "RDS CPU Utilization"
|
||||
region = var.region
|
||||
metrics = [
|
||||
["AWS/RDS", "CPUUtilization", "DBInstanceIdentifier", var.database_instance_id, { stat = "Average" }]
|
||||
]
|
||||
view = "timeSeries"
|
||||
period = 300
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
x = 0
|
||||
y = 6
|
||||
width = 12
|
||||
height = 6
|
||||
properties = {
|
||||
title = "ElastiCache CPU Utilization"
|
||||
region = var.region
|
||||
metrics = [
|
||||
["AWS/ElastiCache", "CPUUtilization", "CacheClusterId", var.cache_cluster_id, { stat = "Average" }]
|
||||
]
|
||||
view = "timeSeries"
|
||||
period = 300
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
x = 12
|
||||
y = 6
|
||||
width = 12
|
||||
height = 6
|
||||
properties = {
|
||||
title = "ALB Request Count"
|
||||
region = var.region
|
||||
metrics = [
|
||||
["AWS/ApplicationELB", "RequestCount", "LoadBalancer", var.name_prefix, { stat = "Sum" }]
|
||||
]
|
||||
view = "timeSeries"
|
||||
period = 300
|
||||
}
|
||||
}
|
||||
]
|
||||
})
|
||||
}
|
||||
|
||||
# CloudWatch Alarms - EKS
|
||||
resource "aws_cloudwatch_metric_alarm" "eks_cpu" {
|
||||
count = var.cloud_provider == "aws" && var.enable_alerts ? 1 : 0
|
||||
|
||||
alarm_name = "${local.alert_prefix}-eks-cpu"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = 2
|
||||
metric_name = "CPUUtilization"
|
||||
namespace = "AWS/EKS"
|
||||
period = 300
|
||||
statistic = "Average"
|
||||
threshold = 80
|
||||
alarm_description = "EKS cluster CPU utilization is too high"
|
||||
alarm_actions = var.alert_notification_arn != null ? [var.alert_notification_arn] : []
|
||||
ok_actions = var.alert_notification_arn != null ? [var.alert_notification_arn] : []
|
||||
|
||||
dimensions = {
|
||||
ClusterName = var.cluster_name
|
||||
}
|
||||
}
|
||||
|
||||
# CloudWatch Alarms - RDS
|
||||
resource "aws_cloudwatch_metric_alarm" "rds_cpu" {
|
||||
count = var.cloud_provider == "aws" && var.enable_alerts ? 1 : 0
|
||||
|
||||
alarm_name = "${local.alert_prefix}-rds-cpu"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = 2
|
||||
metric_name = "CPUUtilization"
|
||||
namespace = "AWS/RDS"
|
||||
period = 300
|
||||
statistic = "Average"
|
||||
threshold = 80
|
||||
alarm_description = "RDS CPU utilization is too high"
|
||||
alarm_actions = var.alert_notification_arn != null ? [var.alert_notification_arn] : []
|
||||
|
||||
dimensions = {
|
||||
DBInstanceIdentifier = var.database_instance_id
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "rds_storage" {
|
||||
count = var.cloud_provider == "aws" && var.enable_alerts ? 1 : 0
|
||||
|
||||
alarm_name = "${local.alert_prefix}-rds-storage"
|
||||
comparison_operator = "LessThanThreshold"
|
||||
evaluation_periods = 2
|
||||
metric_name = "FreeStorageSpace"
|
||||
namespace = "AWS/RDS"
|
||||
period = 300
|
||||
statistic = "Average"
|
||||
threshold = 10737418240 # 10GB
|
||||
alarm_description = "RDS free storage space is too low"
|
||||
alarm_actions = var.alert_notification_arn != null ? [var.alert_notification_arn] : []
|
||||
|
||||
dimensions = {
|
||||
DBInstanceIdentifier = var.database_instance_id
|
||||
}
|
||||
}
|
||||
|
||||
# CloudWatch Alarms - ElastiCache
|
||||
resource "aws_cloudwatch_metric_alarm" "elasticache_cpu" {
|
||||
count = var.cloud_provider == "aws" && var.enable_alerts ? 1 : 0
|
||||
|
||||
alarm_name = "${local.alert_prefix}-elasticache-cpu"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = 2
|
||||
metric_name = "CPUUtilization"
|
||||
namespace = "AWS/ElastiCache"
|
||||
period = 300
|
||||
statistic = "Average"
|
||||
threshold = 80
|
||||
alarm_description = "ElastiCache CPU utilization is too high"
|
||||
alarm_actions = var.alert_notification_arn != null ? [var.alert_notification_arn] : []
|
||||
|
||||
dimensions = {
|
||||
CacheClusterId = var.cache_cluster_id
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "elasticache_memory" {
|
||||
count = var.cloud_provider == "aws" && var.enable_alerts ? 1 : 0
|
||||
|
||||
alarm_name = "${local.alert_prefix}-elasticache-memory"
|
||||
comparison_operator = "LessThanThreshold"
|
||||
evaluation_periods = 2
|
||||
metric_name = "FreeableMemory"
|
||||
namespace = "AWS/ElastiCache"
|
||||
period = 300
|
||||
statistic = "Average"
|
||||
threshold = 268435456 # 256MB
|
||||
alarm_description = "ElastiCache freeable memory is too low"
|
||||
alarm_actions = var.alert_notification_arn != null ? [var.alert_notification_arn] : []
|
||||
|
||||
dimensions = {
|
||||
CacheClusterId = var.cache_cluster_id
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# GCP Resources
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
# Cloud Monitoring Dashboard
|
||||
resource "google_monitoring_dashboard" "openclaw" {
|
||||
count = var.cloud_provider == "gcp" && var.enable_dashboard ? 1 : 0
|
||||
|
||||
dashboard_json = jsonencode({
|
||||
displayName = local.dashboard_name
|
||||
gridLayout = {
|
||||
columns = 2
|
||||
widgets = [
|
||||
{
|
||||
title = "GKE Cluster CPU"
|
||||
xyChart = {
|
||||
dataSets = [{
|
||||
timeSeriesQuery = {
|
||||
apiSource = "CLOUD_MONITORING_API"
|
||||
timeSeriesFilter = {
|
||||
filter = "resource.type=\"k8s_container\" AND metric.type=\"kubernetes.io/container/cpu/limit_utilization\""
|
||||
aggregation = {
|
||||
alignmentPeriod = "300s"
|
||||
perSeriesAligner = "ALIGN_MEAN"
|
||||
crossSeriesReducer = "REDUCE_MEAN"
|
||||
groupByFields = ["resource.label.\"cluster_name\""]
|
||||
}
|
||||
}
|
||||
}
|
||||
}]
|
||||
}
|
||||
},
|
||||
{
|
||||
title = "Cloud SQL CPU"
|
||||
xyChart = {
|
||||
dataSets = [{
|
||||
timeSeriesQuery = {
|
||||
apiSource = "CLOUD_MONITORING_API"
|
||||
timeSeriesFilter = {
|
||||
filter = "resource.type=\"cloudsql_database\" AND metric.type=\"cloudsql.googleapis.com/database/cpu/utilization\""
|
||||
aggregation = {
|
||||
alignmentPeriod = "300s"
|
||||
perSeriesAligner = "ALIGN_MEAN"
|
||||
}
|
||||
}
|
||||
}
|
||||
}]
|
||||
}
|
||||
},
|
||||
{
|
||||
title = "Memorystore Memory"
|
||||
xyChart = {
|
||||
dataSets = [{
|
||||
timeSeriesQuery = {
|
||||
apiSource = "CLOUD_MONITORING_API"
|
||||
timeSeriesFilter = {
|
||||
filter = "resource.type=\"cloud_memorystore_instance\" AND metric.type=\"redis.googleapis.com/memory/usage_ratio\""
|
||||
}
|
||||
}
|
||||
}]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
# GCP Alert Policies
|
||||
resource "google_monitoring_alert_policy" "gke_cpu" {
|
||||
count = var.cloud_provider == "gcp" && var.enable_alerts ? 1 : 0
|
||||
|
||||
display_name = "${local.alert_prefix}-gke-cpu"
|
||||
project = var.project_id
|
||||
|
||||
conditions {
|
||||
display_name = "GKE CPU utilization > 80%"
|
||||
condition_threshold {
|
||||
filter = "resource.type=\"k8s_container\" AND metric.type=\"kubernetes.io/container/cpu/limit_utilization\""
|
||||
duration = "300s"
|
||||
comparison = "COMPARISON_GT"
|
||||
threshold_value = 0.8
|
||||
aggregations {
|
||||
alignment_period = "300s"
|
||||
per_series_aligner = "ALIGN_MEAN"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
notification_channels = var.alert_notification_channels
|
||||
severity = "WARNING"
|
||||
}
|
||||
|
||||
resource "google_monitoring_alert_policy" "cloud_sql_cpu" {
|
||||
count = var.cloud_provider == "gcp" && var.enable_alerts && var.database_instance_name != null ? 1 : 0
|
||||
|
||||
display_name = "${local.alert_prefix}-cloud-sql-cpu"
|
||||
project = var.project_id
|
||||
|
||||
conditions {
|
||||
display_name = "Cloud SQL CPU utilization > 80%"
|
||||
condition_threshold {
|
||||
filter = "resource.type=\"cloudsql_database\" AND metric.type=\"cloudsql.googleapis.com/database/cpu/utilization\" AND resource.label.\"database_id\" = \"${var.database_instance_name}\""
|
||||
duration = "300s"
|
||||
comparison = "COMPARISON_GT"
|
||||
threshold_value = 80
|
||||
}
|
||||
}
|
||||
|
||||
notification_channels = var.alert_notification_channels
|
||||
severity = "WARNING"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Azure Resources
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
# Azure Monitor Dashboard
|
||||
resource "azurerm_dashboard" "openclaw" {
|
||||
count = var.cloud_provider == "azure" && var.enable_dashboard ? 1 : 0
|
||||
|
||||
name = local.dashboard_name
|
||||
resource_group_name = var.resource_group_name
|
||||
location = var.region
|
||||
tags = local.common_tags
|
||||
|
||||
dashboard_properties = jsonencode({
|
||||
lenses = {
|
||||
"0" = {
|
||||
order = 0
|
||||
parts = {
|
||||
"0" = {
|
||||
position = { x = 0, y = 0, colSpan = 2, rowSpan = 1 }
|
||||
metadata = {
|
||||
inputs = []
|
||||
type = "Extension/HubsExtension/PartType/MonitorChartPart"
|
||||
settings = {
|
||||
content = {
|
||||
options = {
|
||||
chart = {
|
||||
metrics = [{
|
||||
resourceMetadata = { id = var.cluster_id }
|
||||
name = "cpuUsagePercentage"
|
||||
namespace = "Insights.Container/containers"
|
||||
}]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
# Azure Monitor Alerts
|
||||
resource "azurerm_monitor_metric_alert" "aks_cpu" {
|
||||
count = var.cloud_provider == "azure" && var.enable_alerts && var.cluster_id != null ? 1 : 0
|
||||
|
||||
name = "${local.alert_prefix}-aks-cpu"
|
||||
resource_group_name = var.resource_group_name
|
||||
scopes = [var.cluster_id]
|
||||
description = "AKS cluster CPU utilization is too high"
|
||||
|
||||
criteria {
|
||||
metric_namespace = "Insights.Container/containers"
|
||||
metric_name = "cpuUsagePercentage"
|
||||
aggregation = "Average"
|
||||
operator = "GreaterThan"
|
||||
threshold = 80
|
||||
}
|
||||
|
||||
severity = 3
|
||||
|
||||
dynamic "action" {
|
||||
for_each = var.action_group_id != null ? [1] : []
|
||||
content {
|
||||
action_group_id = var.action_group_id
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "azurerm_monitor_metric_alert" "postgresql_cpu" {
|
||||
count = var.cloud_provider == "azure" && var.enable_alerts && var.database_server_id != null ? 1 : 0
|
||||
|
||||
name = "${local.alert_prefix}-postgresql-cpu"
|
||||
resource_group_name = var.resource_group_name
|
||||
scopes = [var.database_server_id]
|
||||
description = "PostgreSQL CPU utilization is too high"
|
||||
|
||||
criteria {
|
||||
metric_namespace = "Microsoft.DBforPostgreSQL/flexibleServers"
|
||||
metric_name = "cpu_percent"
|
||||
aggregation = "Average"
|
||||
operator = "GreaterThan"
|
||||
threshold = 80
|
||||
}
|
||||
|
||||
severity = 3
|
||||
|
||||
dynamic "action" {
|
||||
for_each = var.action_group_id != null ? [1] : []
|
||||
content {
|
||||
action_group_id = var.action_group_id
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "azurerm_monitor_metric_alert" "redis_cpu" {
|
||||
count = var.cloud_provider == "azure" && var.enable_alerts && var.redis_cache_id != null ? 1 : 0
|
||||
|
||||
name = "${local.alert_prefix}-redis-cpu"
|
||||
resource_group_name = var.resource_group_name
|
||||
scopes = [var.redis_cache_id]
|
||||
description = "Redis CPU utilization is too high"
|
||||
|
||||
criteria {
|
||||
metric_namespace = "Microsoft.Cache/Redis"
|
||||
metric_name = "UsedMemoryPercentage"
|
||||
aggregation = "Average"
|
||||
operator = "GreaterThan"
|
||||
threshold = 80
|
||||
}
|
||||
|
||||
severity = 3
|
||||
|
||||
dynamic "action" {
|
||||
for_each = var.action_group_id != null ? [1] : []
|
||||
content {
|
||||
action_group_id = var.action_group_id
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "dashboard_id" {
|
||||
description = "Dashboard ID"
|
||||
value = var.cloud_provider == "aws" ? (
|
||||
length(aws_cloudwatch_dashboard.openclaw) > 0 ? aws_cloudwatch_dashboard.openclaw[0].dashboard_name : null
|
||||
) : var.cloud_provider == "gcp" ? (
|
||||
length(google_monitoring_dashboard.openclaw) > 0 ? google_monitoring_dashboard.openclaw[0].id : null
|
||||
) : var.cloud_provider == "azure" ? (
|
||||
length(azurerm_dashboard.openclaw) > 0 ? azurerm_dashboard.openclaw[0].id : null
|
||||
) : null
|
||||
}
|
||||
|
||||
output "dashboard_name" {
|
||||
description = "Dashboard name"
|
||||
value = local.dashboard_name
|
||||
}
|
||||
|
||||
output "alarm_ids" {
|
||||
description = "List of alarm IDs"
|
||||
value = var.cloud_provider == "aws" ? concat(
|
||||
aws_cloudwatch_metric_alarm.eks_cpu[*].id,
|
||||
aws_cloudwatch_metric_alarm.rds_cpu[*].id,
|
||||
aws_cloudwatch_metric_alarm.rds_storage[*].id,
|
||||
aws_cloudwatch_metric_alarm.elasticache_cpu[*].id,
|
||||
aws_cloudwatch_metric_alarm.elasticache_memory[*].id
|
||||
) : var.cloud_provider == "gcp" ? concat(
|
||||
google_monitoring_alert_policy.gke_cpu[*].id,
|
||||
google_monitoring_alert_policy.cloud_sql_cpu[*].id
|
||||
) : []
|
||||
}
|
||||
|
||||
output "alert_policy_ids" {
|
||||
description = "List of alert policy IDs"
|
||||
value = var.cloud_provider == "gcp" ? concat(
|
||||
google_monitoring_alert_policy.gke_cpu[*].id,
|
||||
google_monitoring_alert_policy.cloud_sql_cpu[*].id
|
||||
) : []
|
||||
}
|
||||
|
||||
output "log_group_names" {
|
||||
description = "Map of CloudWatch log group names"
|
||||
value = var.cloud_provider == "aws" ? {
|
||||
eks = "/aws/containerinsights/${var.cluster_name}/application"
|
||||
cluster = "/aws/containerinsights/${var.cluster_name}/dataplane"
|
||||
} : {}
|
||||
}
|
||||
@@ -0,0 +1,324 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Common Terraform Module
|
||||
# ==============================================================================
|
||||
# Reusable module for OpenClaw deployment across cloud providers
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Module Variables
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "name" {
|
||||
description = "Name prefix for resources"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "environment" {
|
||||
description = "Environment name"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "tags" {
|
||||
description = "Tags to apply to resources"
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# OpenClaw Gateway Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "gateway" {
|
||||
description = "Gateway configuration"
|
||||
type = object({
|
||||
image = string
|
||||
replicas = number
|
||||
port = number
|
||||
resources = optional(object({
|
||||
requests = object({
|
||||
cpu = string
|
||||
memory = string
|
||||
})
|
||||
limits = object({
|
||||
cpu = string
|
||||
memory = string
|
||||
})
|
||||
}))
|
||||
autoscaling = optional(object({
|
||||
enabled = bool
|
||||
min_replicas = number
|
||||
max_replicas = number
|
||||
target_cpu = number
|
||||
}))
|
||||
})
|
||||
default = {
|
||||
image = "heretek/openclaw-gateway:latest"
|
||||
replicas = 1
|
||||
port = 18789
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# LiteLLM Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "litellm" {
|
||||
description = "LiteLLM configuration"
|
||||
type = object({
|
||||
image = string
|
||||
replicas = number
|
||||
port = number
|
||||
resources = optional(object({
|
||||
requests = object({
|
||||
cpu = string
|
||||
memory = string
|
||||
})
|
||||
limits = object({
|
||||
cpu = string
|
||||
memory = string
|
||||
})
|
||||
}))
|
||||
})
|
||||
default = {
|
||||
image = "ghcr.io/berriai/litellm:main-latest"
|
||||
replicas = 1
|
||||
port = 4000
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Database Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "database" {
|
||||
description = "Database configuration"
|
||||
type = object({
|
||||
type = string # rds, cloud_sql, azure_postgresql
|
||||
host = string
|
||||
port = number
|
||||
name = string
|
||||
username = string
|
||||
password = string
|
||||
ssl_mode = optional(string, "require")
|
||||
})
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Redis Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "redis" {
|
||||
description = "Redis configuration"
|
||||
type = object({
|
||||
type = string # elasticache, memorystore, azure_redis
|
||||
host = string
|
||||
port = number
|
||||
password = optional(string)
|
||||
ssl = optional(bool, true)
|
||||
})
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Ollama Configuration (Optional)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "ollama" {
|
||||
description = "Ollama configuration for local LLM"
|
||||
type = object({
|
||||
enabled = bool
|
||||
image = optional(string, "ollama/ollama:latest")
|
||||
gpu = optional(bool, false)
|
||||
models = optional(list(string), ["nomic-embed-text-v2-moe"])
|
||||
resources = optional(object({
|
||||
requests = object({
|
||||
cpu = string
|
||||
memory = string
|
||||
})
|
||||
limits = object({
|
||||
cpu = string
|
||||
memory = string
|
||||
gpu = optional(string)
|
||||
})
|
||||
}))
|
||||
})
|
||||
default = {
|
||||
enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Neo4j Configuration (Optional for GraphRAG)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "neo4j" {
|
||||
description = "Neo4j configuration for GraphRAG"
|
||||
type = object({
|
||||
enabled = bool
|
||||
image = optional(string, "neo4j:5.15")
|
||||
username = optional(string, "neo4j")
|
||||
password = optional(string)
|
||||
resources = optional(object({
|
||||
requests = object({
|
||||
cpu = string
|
||||
memory = string
|
||||
})
|
||||
limits = object({
|
||||
cpu = string
|
||||
memory = string
|
||||
})
|
||||
}))
|
||||
})
|
||||
default = {
|
||||
enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Langfuse Configuration (Optional for Observability)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "langfuse" {
|
||||
description = "Langfuse observability configuration"
|
||||
type = object({
|
||||
enabled = bool
|
||||
image = optional(string, "langfuse/langfuse:latest")
|
||||
host = optional(string)
|
||||
public_key = optional(string)
|
||||
secret_key = optional(string)
|
||||
})
|
||||
default = {
|
||||
enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Secrets Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "secrets" {
|
||||
description = "Secrets configuration"
|
||||
type = object({
|
||||
minimax_api_key = optional(string)
|
||||
zai_api_key = optional(string)
|
||||
anthropic_api_key = optional(string)
|
||||
openai_api_key = optional(string)
|
||||
google_api_key = optional(string)
|
||||
azure_openai_api_key = optional(string)
|
||||
})
|
||||
default = {}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Networking Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "network" {
|
||||
description = "Network configuration"
|
||||
type = object({
|
||||
vpc_id = string
|
||||
subnet_ids = list(string)
|
||||
security_groups = optional(list(string))
|
||||
})
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Monitoring Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "monitoring" {
|
||||
description = "Monitoring configuration"
|
||||
type = object({
|
||||
enabled = bool
|
||||
metrics_enabled = optional(bool, true)
|
||||
logging_enabled = optional(bool, true)
|
||||
tracing_enabled = optional(bool, false)
|
||||
})
|
||||
default = {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Local Values
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
locals {
|
||||
common_labels = merge(var.tags, {
|
||||
"app.kubernetes.io/name" = "openclaw"
|
||||
"app.kubernetes.io/component" = "gateway"
|
||||
"app.kubernetes.io/part-of" = "openclaw"
|
||||
"app.kubernetes.io/managed-by" = "terraform"
|
||||
})
|
||||
|
||||
default_resources = {
|
||||
gateway = {
|
||||
requests = {
|
||||
cpu = "2000m"
|
||||
memory = "4Gi"
|
||||
}
|
||||
limits = {
|
||||
cpu = "4000m"
|
||||
memory = "8Gi"
|
||||
}
|
||||
}
|
||||
litellm = {
|
||||
requests = {
|
||||
cpu = "1000m"
|
||||
memory = "2Gi"
|
||||
}
|
||||
limits = {
|
||||
cpu = "2000m"
|
||||
memory = "4Gi"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "gateway_config" {
|
||||
description = "Gateway configuration"
|
||||
value = {
|
||||
image = var.gateway.image
|
||||
port = var.gateway.port
|
||||
replicas = var.gateway.replicas
|
||||
}
|
||||
}
|
||||
|
||||
output "litellm_config" {
|
||||
description = "LiteLLM configuration"
|
||||
value = {
|
||||
image = var.litellm.image
|
||||
port = var.litellm.port
|
||||
replicas = var.litellm.replicas
|
||||
}
|
||||
}
|
||||
|
||||
output "database_connection_string" {
|
||||
description = "Database connection string"
|
||||
value = "postgresql://${var.database.username}:${var.database.password}@${var.database.host}:${var.database.port}/${var.database.name}"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "redis_connection_string" {
|
||||
description = "Redis connection string"
|
||||
value = "redis://${var.redis.password != null ? ":${var.redis.password}@" : ""}${var.redis.host}:${var.redis.port}"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "ollama_enabled" {
|
||||
description = "Whether Ollama is enabled"
|
||||
value = var.ollama.enabled
|
||||
}
|
||||
|
||||
output "neo4j_enabled" {
|
||||
description = "Whether Neo4j is enabled"
|
||||
value = var.neo4j.enabled
|
||||
}
|
||||
|
||||
output "langfuse_enabled" {
|
||||
description = "Whether Langfuse is enabled"
|
||||
value = var.langfuse.enabled
|
||||
}
|
||||
@@ -0,0 +1,378 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Common Module Outputs
|
||||
# ==============================================================================
|
||||
# Output definitions for the OpenClaw module
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Application Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "name" {
|
||||
description = "Name prefix used for resources"
|
||||
value = var.name
|
||||
}
|
||||
|
||||
output "environment" {
|
||||
description = "Environment name"
|
||||
value = var.environment
|
||||
}
|
||||
|
||||
output "app_version" {
|
||||
description = "Application version"
|
||||
value = var.app_version
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Gateway Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "gateway_image" {
|
||||
description = "Gateway container image"
|
||||
value = "${var.gateway.image.repository}:${var.gateway.image.tag}"
|
||||
}
|
||||
|
||||
output "gateway_port" {
|
||||
description = "Gateway service port"
|
||||
value = var.gateway.port
|
||||
}
|
||||
|
||||
output "gateway_replicas" {
|
||||
description = "Gateway replica count"
|
||||
value = var.gateway.replicas
|
||||
}
|
||||
|
||||
output "gateway_autoscaling_enabled" {
|
||||
description = "Whether gateway autoscaling is enabled"
|
||||
value = var.gateway.autoscaling.enabled
|
||||
}
|
||||
|
||||
output "gateway_ingress_enabled" {
|
||||
description = "Whether gateway ingress is enabled"
|
||||
value = var.gateway.ingress.enabled
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# LiteLLM Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "litellm_enabled" {
|
||||
description = "Whether LiteLLM is enabled"
|
||||
value = var.litellm.enabled
|
||||
}
|
||||
|
||||
output "litellm_image" {
|
||||
description = "LiteLLM container image"
|
||||
value = "${var.litellm.image.repository}:${var.litellm.image.tag}"
|
||||
}
|
||||
|
||||
output "litellm_port" {
|
||||
description = "LiteLLM service port"
|
||||
value = var.litellm.port
|
||||
}
|
||||
|
||||
output "litellm_replicas" {
|
||||
description = "LiteLLM replica count"
|
||||
value = var.litellm.replicas
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Database Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "database_type" {
|
||||
description = "Database type (managed or self-hosted)"
|
||||
value = var.database.type
|
||||
}
|
||||
|
||||
output "database_connection_string" {
|
||||
description = "Database connection string"
|
||||
value = var.database.host != null ? "postgresql://${var.database.username}:${var.database.password}@${var.database.host}:${var.database.port}/${var.database.name}" : null
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "database_pgvector_enabled" {
|
||||
description = "Whether pgvector is enabled"
|
||||
value = var.database.pgvector_enabled
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Redis Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "redis_type" {
|
||||
description = "Redis type (managed or self-hosted)"
|
||||
value = var.redis.type
|
||||
}
|
||||
|
||||
output "redis_connection_string" {
|
||||
description = "Redis connection string"
|
||||
value = var.redis.host != null ? "redis://${var.redis.password != null ? ":${var.redis.password}@" : ""}${var.redis.host}:${var.redis.port}" : null
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Ollama Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "ollama_enabled" {
|
||||
description = "Whether Ollama is enabled"
|
||||
value = var.ollama.enabled
|
||||
}
|
||||
|
||||
output "ollama_gpu_enabled" {
|
||||
description = "Whether Ollama GPU support is enabled"
|
||||
value = var.ollama.gpu.enabled
|
||||
}
|
||||
|
||||
output "ollama_gpu_type" {
|
||||
description = "Ollama GPU type (amd or nvidia)"
|
||||
value = var.ollama.gpu.type
|
||||
}
|
||||
|
||||
output "ollama_models" {
|
||||
description = "List of Ollama models to pull"
|
||||
value = var.ollama.models
|
||||
}
|
||||
|
||||
output "ollama_image" {
|
||||
description = "Ollama container image"
|
||||
value = "${var.ollama.image.repository}:${var.ollama.image.tag}"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Neo4j Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "neo4j_enabled" {
|
||||
description = "Whether Neo4j is enabled"
|
||||
value = var.neo4j.enabled
|
||||
}
|
||||
|
||||
output "neo4j_image" {
|
||||
description = "Neo4j container image"
|
||||
value = "${var.neo4j.image.repository}:${var.neo4j.image.tag}"
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Langfuse Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "langfuse_enabled" {
|
||||
description = "Whether Langfuse is enabled"
|
||||
value = var.langfuse.enabled
|
||||
}
|
||||
|
||||
output "langfuse_image" {
|
||||
description = "Langfuse container image"
|
||||
value = "${var.langfuse.image.repository}:${var.langfuse.image.tag}"
|
||||
}
|
||||
|
||||
output "langfuse_ingress_enabled" {
|
||||
description = "Whether Langfuse ingress is enabled"
|
||||
value = var.langfuse.ingress.enabled
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Secrets Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "secrets_configured" {
|
||||
description = "List of configured secret keys"
|
||||
value = [for key in keys(var.secrets) : key if var.secrets[key] != null]
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "external_secrets_enabled" {
|
||||
description = "Whether external secrets manager is enabled"
|
||||
value = var.external_secrets.enabled
|
||||
}
|
||||
|
||||
output "external_secrets_store" {
|
||||
description = "External secrets store type"
|
||||
value = var.external_secrets.store
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Network Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "vpc_id" {
|
||||
description = "VPC ID"
|
||||
value = var.network.vpc_id
|
||||
}
|
||||
|
||||
output "subnet_ids" {
|
||||
description = "Subnet IDs"
|
||||
value = var.network.subnet_ids
|
||||
}
|
||||
|
||||
output "pod_cidr" {
|
||||
description = "Pod CIDR range"
|
||||
value = var.network.pod_cidr
|
||||
}
|
||||
|
||||
output "service_cidr" {
|
||||
description = "Service CIDR range"
|
||||
value = var.network.service_cidr
|
||||
}
|
||||
|
||||
output "network_policy" {
|
||||
description = "Network policy provider"
|
||||
value = var.network.network_policy
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Domain Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "domain_enabled" {
|
||||
description = "Whether custom domain is enabled"
|
||||
value = var.domain.enabled
|
||||
}
|
||||
|
||||
output "domain_base" {
|
||||
description = "Base domain name"
|
||||
value = var.domain.base_domain
|
||||
}
|
||||
|
||||
output "domain_hosts" {
|
||||
description = "Configured domain hosts"
|
||||
value = var.domain.enabled ? {
|
||||
gateway = "${var.domain.gateway_host}.${var.domain.base_domain}"
|
||||
litellm = "${var.domain.litellm_host}.${var.domain.base_domain}"
|
||||
langfuse = "${var.domain.langfuse_host}.${var.domain.base_domain}"
|
||||
} : {}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Monitoring Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "monitoring_enabled" {
|
||||
description = "Whether monitoring is enabled"
|
||||
value = var.monitoring.enabled
|
||||
}
|
||||
|
||||
output "metrics_enabled" {
|
||||
description = "Whether metrics collection is enabled"
|
||||
value = var.monitoring.metrics_enabled
|
||||
}
|
||||
|
||||
output "logging_enabled" {
|
||||
description = "Whether logging is enabled"
|
||||
value = var.monitoring.logging_enabled
|
||||
}
|
||||
|
||||
output "tracing_enabled" {
|
||||
description = "Whether distributed tracing is enabled"
|
||||
value = var.monitoring.tracing_enabled
|
||||
}
|
||||
|
||||
output "service_monitor_enabled" {
|
||||
description = "Whether Prometheus ServiceMonitor is enabled"
|
||||
value = var.monitoring.service_monitor.enabled
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Security Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "pod_security_enabled" {
|
||||
description = "Whether pod security policy is enabled"
|
||||
value = var.security.pod_security_policy.enabled
|
||||
}
|
||||
|
||||
output "network_policy_enabled" {
|
||||
description = "Whether network policy is enabled"
|
||||
value = var.security.network_policy.enabled
|
||||
}
|
||||
|
||||
output "secrets_encryption_enabled" {
|
||||
description = "Whether secrets encryption is enabled"
|
||||
value = var.security.secrets_encryption.enabled
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Backup Outputs
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "backup_enabled" {
|
||||
description = "Whether automated backups are enabled"
|
||||
value = var.backup.enabled
|
||||
}
|
||||
|
||||
output "backup_schedule" {
|
||||
description = "Backup schedule (cron expression)"
|
||||
value = var.backup.schedule
|
||||
}
|
||||
|
||||
output "backup_retention_days" {
|
||||
description = "Backup retention period in days"
|
||||
value = var.backup.retention_days
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Resource Labels
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "common_labels" {
|
||||
description = "Common labels applied to all resources"
|
||||
value = {
|
||||
"app.kubernetes.io/name" = "openclaw"
|
||||
"app.kubernetes.io/component" = "gateway"
|
||||
"app.kubernetes.io/part-of" = "openclaw"
|
||||
"app.kubernetes.io/managed-by" = "terraform"
|
||||
"app.kubernetes.io/version" = var.app_version
|
||||
"environment" = var.environment
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Configuration Summary
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
output "configuration_summary" {
|
||||
description = "Summary of the OpenClaw configuration"
|
||||
value = {
|
||||
name = var.name
|
||||
environment = var.environment
|
||||
version = var.app_version
|
||||
|
||||
components = {
|
||||
gateway = true
|
||||
litellm = var.litellm.enabled
|
||||
ollama = var.ollama.enabled
|
||||
neo4j = var.neo4j.enabled
|
||||
langfuse = var.langfuse.enabled
|
||||
}
|
||||
|
||||
database = {
|
||||
type = var.database.type
|
||||
pgvector = var.database.pgvector_enabled
|
||||
}
|
||||
|
||||
redis = {
|
||||
type = var.redis.type
|
||||
}
|
||||
|
||||
monitoring = {
|
||||
enabled = var.monitoring.enabled
|
||||
metrics = var.monitoring.metrics_enabled
|
||||
logging = var.monitoring.logging_enabled
|
||||
tracing = var.monitoring.tracing_enabled
|
||||
}
|
||||
|
||||
security = {
|
||||
pod_security = var.security.pod_security_policy.enabled
|
||||
network_policy = var.security.network_policy.enabled
|
||||
secrets_encryption = var.security.secrets_encryption.enabled
|
||||
}
|
||||
|
||||
backup = {
|
||||
enabled = var.backup.enabled
|
||||
schedule = var.backup.schedule
|
||||
retention = var.backup.retention_days
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,426 @@
|
||||
# ==============================================================================
|
||||
# Heretek OpenClaw - Common Module Variables
|
||||
# ==============================================================================
|
||||
# Variable definitions for the OpenClaw module
|
||||
# ==============================================================================
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# General Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "name" {
|
||||
description = "Name prefix for all resources"
|
||||
type = string
|
||||
validation {
|
||||
condition = can(regex("^[a-z][a-z0-9-]{2,20}$", var.name))
|
||||
error_message = "Name must be 3-20 characters, start with a letter, and contain only lowercase alphanumeric characters and hyphens."
|
||||
}
|
||||
}
|
||||
|
||||
variable "environment" {
|
||||
description = "Deployment environment"
|
||||
type = string
|
||||
default = "dev"
|
||||
validation {
|
||||
condition = contains(["dev", "staging", "prod"], var.environment)
|
||||
error_message = "Environment must be one of: dev, staging, prod."
|
||||
}
|
||||
}
|
||||
|
||||
variable "region" {
|
||||
description = "Cloud provider region"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "tags" {
|
||||
description = "Tags to apply to all resources"
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Application Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "app_version" {
|
||||
description = "Application version to deploy"
|
||||
type = string
|
||||
default = "2026.3.28"
|
||||
}
|
||||
|
||||
variable "gateway" {
|
||||
description = "OpenClaw Gateway configuration"
|
||||
type = object({
|
||||
image = object({
|
||||
repository = string
|
||||
tag = string
|
||||
pull_policy = optional(string, "IfNotPresent")
|
||||
})
|
||||
replicas = optional(number, 1)
|
||||
port = optional(number, 18789)
|
||||
service_type = optional(string, "ClusterIP")
|
||||
resources = optional(object({
|
||||
requests = object({
|
||||
cpu = optional(string, "2000m")
|
||||
memory = optional(string, "4Gi")
|
||||
})
|
||||
limits = object({
|
||||
cpu = optional(string, "4000m")
|
||||
memory = optional(string, "8Gi")
|
||||
})
|
||||
}))
|
||||
autoscaling = optional(object({
|
||||
enabled = optional(bool, false)
|
||||
min_replicas = optional(number, 1)
|
||||
max_replicas = optional(number, 5)
|
||||
target_cpu_percent = optional(number, 80)
|
||||
target_memory_percent = optional(number, 80)
|
||||
}))
|
||||
ingress = optional(object({
|
||||
enabled = optional(bool, false)
|
||||
class_name = optional(string, "nginx")
|
||||
hosts = optional(list(string), [])
|
||||
tls = optional(list(object({
|
||||
secret_name = string
|
||||
hosts = list(string)
|
||||
})), [])
|
||||
}))
|
||||
})
|
||||
default = {
|
||||
image = {
|
||||
repository = "heretek/openclaw-gateway"
|
||||
tag = "2026.3.28"
|
||||
}
|
||||
replicas = 1
|
||||
port = 18789
|
||||
}
|
||||
}
|
||||
|
||||
variable "litellm" {
|
||||
description = "LiteLLM proxy configuration"
|
||||
type = object({
|
||||
enabled = optional(bool, true)
|
||||
image = object({
|
||||
repository = optional(string, "ghcr.io/berriai/litellm")
|
||||
tag = optional(string, "main-latest")
|
||||
})
|
||||
replicas = optional(number, 1)
|
||||
port = optional(number, 4000)
|
||||
service_type = optional(string, "ClusterIP")
|
||||
resources = optional(object({
|
||||
requests = object({
|
||||
cpu = optional(string, "1000m")
|
||||
memory = optional(string, "2Gi")
|
||||
})
|
||||
limits = object({
|
||||
cpu = optional(string, "2000m")
|
||||
memory = optional(string, "4Gi")
|
||||
})
|
||||
}))
|
||||
config = optional(object({
|
||||
master_key = optional(string)
|
||||
cost_tracking = optional(bool, true)
|
||||
metrics_enabled = optional(bool, true)
|
||||
log_level = optional(string, "INFO")
|
||||
}))
|
||||
})
|
||||
default = {
|
||||
enabled = true
|
||||
image = {
|
||||
repository = "ghcr.io/berriai/litellm"
|
||||
tag = "main-latest"
|
||||
}
|
||||
replicas = 1
|
||||
port = 4000
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Database Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "database" {
|
||||
description = "Database configuration"
|
||||
type = object({
|
||||
type = optional(string, "managed") # managed, self-hosted
|
||||
host = optional(string)
|
||||
port = optional(number, 5432)
|
||||
name = optional(string, "openclaw")
|
||||
username = optional(string, "openclaw")
|
||||
password = optional(string)
|
||||
password_secret = optional(string)
|
||||
ssl_mode = optional(string, "require")
|
||||
pool_size = optional(number, 10)
|
||||
max_connections = optional(number, 100)
|
||||
pgvector_enabled = optional(bool, true)
|
||||
})
|
||||
default = {
|
||||
type = "managed"
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Redis Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "redis" {
|
||||
description = "Redis configuration"
|
||||
type = object({
|
||||
type = optional(string, "managed") # managed, self-hosted
|
||||
host = optional(string)
|
||||
port = optional(number, 6379)
|
||||
password = optional(string)
|
||||
password_secret = optional(string)
|
||||
ssl_enabled = optional(bool, true)
|
||||
db = optional(number, 0)
|
||||
pool_size = optional(number, 10)
|
||||
})
|
||||
default = {
|
||||
type = "managed"
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Ollama Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "ollama" {
|
||||
description = "Ollama local LLM configuration"
|
||||
type = object({
|
||||
enabled = optional(bool, false)
|
||||
image = object({
|
||||
repository = optional(string, "ollama/ollama")
|
||||
tag = optional(string, "rocm") # rocm for AMD, latest for CPU
|
||||
})
|
||||
gpu = object({
|
||||
enabled = optional(bool, false)
|
||||
type = optional(string, "amd") # amd or nvidia
|
||||
device = optional(string)
|
||||
})
|
||||
models = optional(list(string), ["nomic-embed-text-v2-moe"])
|
||||
persistence = object({
|
||||
enabled = optional(bool, true)
|
||||
size = optional(string, "100Gi")
|
||||
storage_class = optional(string)
|
||||
})
|
||||
resources = optional(object({
|
||||
requests = object({
|
||||
cpu = optional(string, "4000m")
|
||||
memory = optional(string, "8Gi")
|
||||
})
|
||||
limits = object({
|
||||
cpu = optional(string, "8000m")
|
||||
memory = optional(string, "16Gi")
|
||||
gpu = optional(string)
|
||||
})
|
||||
}))
|
||||
})
|
||||
default = {
|
||||
enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Neo4j Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "neo4j" {
|
||||
description = "Neo4j GraphRAG configuration"
|
||||
type = object({
|
||||
enabled = optional(bool, true)
|
||||
image = object({
|
||||
repository = optional(string, "neo4j")
|
||||
tag = optional(string, "5.15")
|
||||
})
|
||||
auth = object({
|
||||
username = optional(string, "neo4j")
|
||||
password = optional(string)
|
||||
password_secret = optional(string)
|
||||
})
|
||||
persistence = object({
|
||||
enabled = optional(bool, true)
|
||||
size = optional(string, "20Gi")
|
||||
storage_class = optional(string)
|
||||
})
|
||||
resources = optional(object({
|
||||
requests = object({
|
||||
cpu = optional(string, "2000m")
|
||||
memory = optional(string, "4Gi")
|
||||
})
|
||||
limits = object({
|
||||
cpu = optional(string, "4000m")
|
||||
memory = optional(string, "8Gi")
|
||||
})
|
||||
}))
|
||||
})
|
||||
default = {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Langfuse Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "langfuse" {
|
||||
description = "Langfuse observability configuration"
|
||||
type = object({
|
||||
enabled = optional(bool, true)
|
||||
image = object({
|
||||
repository = optional(string, "langfuse/langfuse")
|
||||
tag = optional(string, "latest")
|
||||
})
|
||||
replicas = optional(number, 1)
|
||||
ingress = optional(object({
|
||||
enabled = optional(bool, false)
|
||||
hosts = optional(list(string), [])
|
||||
}))
|
||||
auth = optional(object({
|
||||
salt = optional(string)
|
||||
nextauth_secret = optional(string)
|
||||
sign_up_enabled = optional(bool, true)
|
||||
}))
|
||||
})
|
||||
default = {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Secrets Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "secrets" {
|
||||
description = "API keys and secrets"
|
||||
type = object({
|
||||
minimax_api_key = optional(string)
|
||||
zai_api_key = optional(string)
|
||||
anthropic_api_key = optional(string)
|
||||
openai_api_key = optional(string)
|
||||
google_api_key = optional(string)
|
||||
azure_openai_api_key = optional(string)
|
||||
azure_openai_endpoint = optional(string)
|
||||
langfuse_public_key = optional(string)
|
||||
langfuse_secret_key = optional(string)
|
||||
})
|
||||
default = {}
|
||||
}
|
||||
|
||||
variable "external_secrets" {
|
||||
description = "External secrets manager configuration"
|
||||
type = object({
|
||||
enabled = optional(bool, false)
|
||||
store = optional(string, "vault") # vault, aws, gcp, azure
|
||||
refresh_interval = optional(string, "1h")
|
||||
})
|
||||
default = {
|
||||
enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Network Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "network" {
|
||||
description = "Network configuration"
|
||||
type = object({
|
||||
vpc_id = string
|
||||
subnet_ids = list(string)
|
||||
security_group_ids = optional(list(string))
|
||||
pod_cidr = optional(string, "10.244.0.0/16")
|
||||
service_cidr = optional(string, "10.96.0.0/12")
|
||||
network_policy = optional(string, "calico")
|
||||
})
|
||||
}
|
||||
|
||||
variable "domain" {
|
||||
description = "Domain configuration"
|
||||
type = object({
|
||||
enabled = optional(bool, false)
|
||||
base_domain = optional(string)
|
||||
gateway_host = optional(string, "gateway")
|
||||
litellm_host = optional(string, "litellm")
|
||||
langfuse_host = optional(string, "langfuse")
|
||||
tls_secret = optional(string)
|
||||
})
|
||||
default = {
|
||||
enabled = false
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Monitoring Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "monitoring" {
|
||||
description = "Monitoring and observability configuration"
|
||||
type = object({
|
||||
enabled = optional(bool, true)
|
||||
metrics_enabled = optional(bool, true)
|
||||
logging_enabled = optional(bool, true)
|
||||
tracing_enabled = optional(bool, false)
|
||||
service_monitor = optional(object({
|
||||
enabled = optional(bool, false)
|
||||
interval = optional(string, "30s")
|
||||
scrape_timeout = optional(string, "10s")
|
||||
}))
|
||||
prometheus_rule = optional(object({
|
||||
enabled = optional(bool, false)
|
||||
rules = optional(list(any), [])
|
||||
}))
|
||||
})
|
||||
default = {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Security Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "security" {
|
||||
description = "Security configuration"
|
||||
type = object({
|
||||
pod_security_policy = optional(object({
|
||||
enabled = optional(bool, true)
|
||||
run_as_non_root = optional(bool, true)
|
||||
run_as_user = optional(number, 1000)
|
||||
fs_group = optional(number, 1000)
|
||||
}))
|
||||
network_policy = optional(object({
|
||||
enabled = optional(bool, true)
|
||||
default_policy = optional(string, "Deny")
|
||||
allowed_namespaces = optional(list(string), [])
|
||||
}))
|
||||
secrets_encryption = optional(object({
|
||||
enabled = optional(bool, false)
|
||||
kms_key_id = optional(string)
|
||||
}))
|
||||
})
|
||||
default = {
|
||||
pod_security_policy = {
|
||||
enabled = true
|
||||
}
|
||||
network_policy = {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Backup Configuration
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
variable "backup" {
|
||||
description = "Backup configuration"
|
||||
type = object({
|
||||
enabled = optional(bool, true)
|
||||
schedule = optional(string, "0 2 * * *") # Daily at 2 AM
|
||||
retention_days = optional(number, 7)
|
||||
storage_location = optional(string)
|
||||
})
|
||||
default = {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,41 @@
|
||||
# AWS Deployment Guide for Heretek OpenClaw
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Last Updated:** 2026-03-31
|
||||
|
||||
For complete AWS deployment instructions, see [`deploy/aws/README.md`](../../deploy/aws/README.md).
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Terraform Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| [`deploy/aws/terraform/main.tf`](../../deploy/aws/terraform/main.tf) | Main configuration |
|
||||
| [`deploy/aws/terraform/variables.tf`](../../deploy/aws/terraform/variables.tf) | Input variables |
|
||||
| [`deploy/aws/terraform/outputs.tf`](../../deploy/aws/terraform/outputs.tf) | Output values |
|
||||
| [`deploy/aws/terraform/vpc.tf`](../../deploy/aws/terraform/vpc.tf) | VPC configuration |
|
||||
| [`deploy/aws/terraform/eks.tf`](../../deploy/aws/terraform/eks.tf) | EKS cluster |
|
||||
| [`deploy/aws/terraform/rds.tf`](../../deploy/aws/terraform/rds.tf) | RDS PostgreSQL |
|
||||
| [`deploy/aws/terraform/elasticache.tf`](../../deploy/aws/terraform/elasticache.tf) | ElastiCache Redis |
|
||||
| [`deploy/aws/terraform/ecr.tf`](../../deploy/aws/terraform/ecr.tf) | ECR repositories |
|
||||
| [`deploy/aws/terraform/alb.tf`](../../deploy/aws/terraform/alb.tf) | Application Load Balancer |
|
||||
|
||||
### Deploy Commands
|
||||
|
||||
```bash
|
||||
cd deploy/aws/terraform
|
||||
terraform init
|
||||
terraform plan -var-file=terraform.dev.tfvars -out=tfplan
|
||||
terraform apply tfplan
|
||||
```
|
||||
|
||||
### kubectl Configuration
|
||||
|
||||
```bash
|
||||
aws eks update-kubeconfig --name openclaw-dev-eks --region us-east-1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
🦞 *The thought that never ends.*
|
||||
@@ -0,0 +1,41 @@
|
||||
# Azure Deployment Guide for Heretek OpenClaw
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Last Updated:** 2026-03-31
|
||||
|
||||
For complete Azure deployment instructions, see [`deploy/azure/README.md`](../../deploy/azure/README.md).
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Terraform Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| [`deploy/azure/terraform/main.tf`](../../deploy/azure/terraform/main.tf) | Main configuration |
|
||||
| [`deploy/azure/terraform/variables.tf`](../../deploy/azure/terraform/variables.tf) | Input variables |
|
||||
| [`deploy/azure/terraform/outputs.tf`](../../deploy/azure/terraform/outputs.tf) | Output values |
|
||||
| [`deploy/azure/terraform/vnet.tf`](../../deploy/azure/terraform/vnet.tf) | VNet configuration |
|
||||
| [`deploy/azure/terraform/aks.tf`](../../deploy/azure/terraform/aks.tf) | AKS cluster |
|
||||
| [`deploy/azure/terraform/postgresql.tf`](../../deploy/azure/terraform/postgresql.tf) | Azure Database for PostgreSQL |
|
||||
| [`deploy/azure/terraform/redis.tf`](../../deploy/azure/terraform/redis.tf) | Azure Cache for Redis |
|
||||
| [`deploy/azure/terraform/acr.tf`](../../deploy/azure/terraform/acr.tf) | Azure Container Registry |
|
||||
| [`deploy/azure/terraform/application-gateway.tf`](../../deploy/azure/terraform/application-gateway.tf) | Application Gateway |
|
||||
|
||||
### Deploy Commands
|
||||
|
||||
```bash
|
||||
cd deploy/azure/terraform
|
||||
terraform init
|
||||
terraform plan -var-file=terraform.dev.tfvars -out=tfplan
|
||||
terraform apply tfplan
|
||||
```
|
||||
|
||||
### kubectl Configuration
|
||||
|
||||
```bash
|
||||
az aks get-credentials --resource-group openclaw-rg --name openclaw-dev-aks
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
🦞 *The thought that never ends.*
|
||||
@@ -0,0 +1,834 @@
|
||||
# Bare Metal Deployment Guide
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Last Updated:** 2026-03-31
|
||||
**OpenClaw Version:** v2026.3.28
|
||||
|
||||
This guide provides comprehensive instructions for deploying the Heretek OpenClaw stack on bare metal Linux servers without Docker containerization.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Prerequisites](#prerequisites)
|
||||
2. [System Requirements](#system-requirements)
|
||||
3. [Installation Overview](#installation-overview)
|
||||
4. [Step 1: Install System Dependencies](#step-1-install-system-dependencies)
|
||||
5. [Step 2: Install and Configure PostgreSQL](#step-2-install-and-configure-postgresql)
|
||||
6. [Step 3: Install and Configure Redis](#step-3-install-and-configure-redis)
|
||||
7. [Step 4: Install and Configure Ollama](#step-4-install-and-configure-ollama)
|
||||
8. [Step 5: Install LiteLLM](#step-5-install-litellm)
|
||||
9. [Step 6: Install OpenClaw Gateway](#step-6-install-openclaw-gateway)
|
||||
10. [Step 7: Configure Environment Variables](#step-7-configure-environment-variables)
|
||||
11. [Step 8: Initialize Database](#step-8-initialize-database)
|
||||
12. [Step 9: Configure Systemd Services](#step-9-configure-systemd-services)
|
||||
13. [Step 10: Verify Installation](#step-10-verify-installation)
|
||||
14. [Post-Deployment Configuration](#post-deployment-configuration)
|
||||
15. [Security Hardening](#security-hardening)
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Required Knowledge
|
||||
|
||||
- Basic Linux system administration
|
||||
- Familiarity with systemd service management
|
||||
- Understanding of PostgreSQL and Redis
|
||||
- Node.js and npm package management
|
||||
- Python virtual environments
|
||||
|
||||
### Required API Keys
|
||||
|
||||
| Provider | Purpose | Get Key |
|
||||
|----------|---------|---------|
|
||||
| **MiniMax** | Primary LLM | https://platform.minimaxi.com |
|
||||
| **z.ai** | Failover LLM | https://platform.z.ai |
|
||||
| **(Optional) Langfuse** | Observability | https://cloud.langfuse.com |
|
||||
|
||||
---
|
||||
|
||||
## System Requirements
|
||||
|
||||
### Minimum Requirements
|
||||
|
||||
| Component | Minimum | Recommended |
|
||||
|-----------|---------|-------------|
|
||||
| **OS** | Ubuntu 20.04 / RHEL 8 | Ubuntu 22.04 LTS / RHEL 9 |
|
||||
| **CPU** | 4 cores | 8+ cores |
|
||||
| **RAM** | 8 GB | 16+ GB |
|
||||
| **Disk** | 20 GB SSD | 50+ GB NVMe SSD |
|
||||
| **Network** | 100 Mbps | 1 Gbps |
|
||||
|
||||
### GPU Requirements (Optional)
|
||||
|
||||
| GPU Type | Requirements | Notes |
|
||||
|----------|--------------|-------|
|
||||
| **AMD ROCm** | RX 6000/7000 series, MI50/MI100 | ROCm 5.6+ required |
|
||||
| **NVIDIA CUDA** | RTX 3000/4000 series, A100/H100 | CUDA 11.8+, cuDNN 8.6+ |
|
||||
|
||||
---
|
||||
|
||||
## Installation Overview
|
||||
|
||||
The bare metal installation involves the following components:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Heretek OpenClaw Stack │
|
||||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||
│ │ Core Services │ │
|
||||
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────┐ │ │
|
||||
│ │ │ LiteLLM │ │PostgreSQL│ │ Redis │ │ Ollama │ │ │
|
||||
│ │ │ :4000 │ │ :5432 │ │ :6379 │ │ :11434 │ │ │
|
||||
│ │ │ Python │ │ +pgvector│ │ Cache │ │ Local LLM │ │ │
|
||||
│ │ └──────────┘ └──────────┘ └──────────┘ └───────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||
│ │ OpenClaw Gateway (Port 18789) │ │
|
||||
│ │ All 12 agents run as workspaces within Gateway process │ │
|
||||
│ │ Agent workspaces: ~/.openclaw/agents/{agent}/ │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||
│ │ Web Interface │ │
|
||||
│ │ ┌────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ Web Dashboard (:3000) │ │ │
|
||||
│ │ │ SvelteKit • TypeScript • TailwindCSS • WebSocket │ │ │
|
||||
│ │ └────────────────────────────────────────────────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Default Ports
|
||||
|
||||
| Service | Port | Protocol |
|
||||
|---------|------|----------|
|
||||
| LiteLLM Gateway | 4000 | HTTP |
|
||||
| PostgreSQL | 5432 | TCP |
|
||||
| Redis | 6379 | TCP |
|
||||
| Ollama | 11434 | HTTP |
|
||||
| OpenClaw Gateway | 18789 | WebSocket |
|
||||
| Web Dashboard | 3000 | HTTP |
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Install System Dependencies
|
||||
|
||||
### Ubuntu/Debian
|
||||
|
||||
```bash
|
||||
# Update system packages
|
||||
sudo apt-get update && sudo apt-get upgrade -y
|
||||
|
||||
# Install core dependencies
|
||||
sudo apt-get install -y \
|
||||
curl \
|
||||
git \
|
||||
wget \
|
||||
gnupg \
|
||||
ca-certificates \
|
||||
software-properties-common \
|
||||
build-essential \
|
||||
libssl-dev \
|
||||
libffi-dev \
|
||||
python3-dev \
|
||||
python3-pip \
|
||||
python3-venv \
|
||||
zlib1g-dev \
|
||||
libbz2-dev \
|
||||
libreadline-dev \
|
||||
libsqlite3-dev \
|
||||
libncursesw5-dev \
|
||||
xz-utils \
|
||||
tk-dev \
|
||||
libxml2-dev \
|
||||
libxmlsec1-dev \
|
||||
liblzma-dev
|
||||
|
||||
# Install Node.js 20 LTS
|
||||
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
|
||||
sudo apt-get install -y nodejs
|
||||
|
||||
# Verify installations
|
||||
node --version # Should be v20.x
|
||||
npm --version # Should be 10.x
|
||||
python3 --version # Should be 3.10+
|
||||
```
|
||||
|
||||
### RHEL/CentOS/Rocky Linux
|
||||
|
||||
```bash
|
||||
# Update system packages
|
||||
sudo dnf update -y
|
||||
|
||||
# Install EPEL repository
|
||||
sudo dnf install -y epel-release
|
||||
|
||||
# Install core dependencies
|
||||
sudo dnf install -y \
|
||||
curl \
|
||||
git \
|
||||
wget \
|
||||
gnupg2 \
|
||||
ca-certificates \
|
||||
gcc \
|
||||
gcc-c++ \
|
||||
make \
|
||||
openssl-devel \
|
||||
libffi-devel \
|
||||
python3-devel \
|
||||
python3-pip \
|
||||
bzip2-devel \
|
||||
readline-devel \
|
||||
sqlite-devel \
|
||||
ncurses-devel \
|
||||
xz-devel \
|
||||
tk-devel \
|
||||
libxml2-devel \
|
||||
libxmlsec1-devel \
|
||||
zlib-devel
|
||||
|
||||
# Install Node.js 20 LTS
|
||||
curl -fsSL https://rpm.nodesource.com/setup_20.x | sudo -E bash -
|
||||
sudo dnf install -y nodejs
|
||||
|
||||
# Verify installations
|
||||
node --version # Should be v20.x
|
||||
npm --version # Should be 10.x
|
||||
python3 --version # Should be 3.10+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Install and Configure PostgreSQL
|
||||
|
||||
### Install PostgreSQL 15+
|
||||
|
||||
#### Ubuntu/Debian
|
||||
|
||||
```bash
|
||||
# Add PostgreSQL repository
|
||||
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
|
||||
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
|
||||
|
||||
# Install PostgreSQL 15
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y postgresql-15 postgresql-contrib-15 postgresql-15-pgvector
|
||||
|
||||
# Start and enable PostgreSQL
|
||||
sudo systemctl start postgresql
|
||||
sudo systemctl enable postgresql
|
||||
```
|
||||
|
||||
#### RHEL/CentOS
|
||||
|
||||
```bash
|
||||
# Add PostgreSQL repository
|
||||
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
|
||||
|
||||
# Disable default PostgreSQL module
|
||||
sudo dnf -qy module disable postgresql
|
||||
|
||||
# Install PostgreSQL 15
|
||||
sudo dnf install -y postgresql15 postgresql15-contrib postgresql15-pgvector
|
||||
|
||||
# Start and enable PostgreSQL
|
||||
sudo systemctl start postgresql-15
|
||||
sudo systemctl enable postgresql-15
|
||||
```
|
||||
|
||||
### Configure PostgreSQL
|
||||
|
||||
```bash
|
||||
# Switch to postgres user
|
||||
sudo -u postgres psql
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Create OpenClaw database and user
|
||||
CREATE DATABASE openclaw;
|
||||
CREATE USER openclaw WITH PASSWORD 'generate-secure-password-here';
|
||||
GRANT ALL PRIVILEGES ON DATABASE openclaw TO openclaw;
|
||||
|
||||
-- Enable pgvector extension
|
||||
\c openclaw
|
||||
CREATE EXTENSION IF NOT EXISTS vector;
|
||||
|
||||
-- Verify extension
|
||||
SELECT * FROM pg_extension WHERE extname = 'vector';
|
||||
|
||||
-- Exit psql
|
||||
\q
|
||||
```
|
||||
|
||||
### Configure PostgreSQL for Remote Access (Optional)
|
||||
|
||||
```bash
|
||||
# Edit PostgreSQL configuration
|
||||
sudo nano /etc/postgresql/15/main/postgresql.conf
|
||||
```
|
||||
|
||||
```ini
|
||||
# postgresql.conf
|
||||
listen_addresses = 'localhost' # Change to '*' for remote access
|
||||
max_connections = 100
|
||||
shared_buffers = 256MB
|
||||
work_mem = 8MB
|
||||
```
|
||||
|
||||
```bash
|
||||
# Edit pg_hba.conf for authentication
|
||||
sudo nano /etc/postgresql/15/main/pg_hba.conf
|
||||
```
|
||||
|
||||
```ini
|
||||
# pg_hba.conf
|
||||
# TYPE DATABASE USER ADDRESS METHOD
|
||||
local all all peer
|
||||
host openclaw openclaw 127.0.0.1/32 scram-sha-256
|
||||
host openclaw openclaw ::1/128 scram-sha-256
|
||||
```
|
||||
|
||||
```bash
|
||||
# Restart PostgreSQL
|
||||
sudo systemctl restart postgresql
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Install and Configure Redis
|
||||
|
||||
### Install Redis 7+
|
||||
|
||||
#### Ubuntu/Debian
|
||||
|
||||
```bash
|
||||
# Add Redis repository
|
||||
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
|
||||
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
|
||||
|
||||
# Install Redis
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y redis
|
||||
|
||||
# Start and enable Redis
|
||||
sudo systemctl start redis
|
||||
sudo systemctl enable redis
|
||||
```
|
||||
|
||||
#### RHEL/CentOS
|
||||
|
||||
```bash
|
||||
# Install Redis from Remi repository
|
||||
sudo dnf install -y dnf-utils
|
||||
sudo dnf config-manager --set-enabled powertools
|
||||
sudo dnf install -y https://rpms.remirepo.net/enterprise/remi-release-9.rpm
|
||||
sudo dnf module reset redis -y
|
||||
sudo dnf module enable redis:7 -y
|
||||
sudo dnf install -y redis
|
||||
|
||||
# Start and enable Redis
|
||||
sudo systemctl start redis
|
||||
sudo systemctl enable redis
|
||||
```
|
||||
|
||||
### Configure Redis
|
||||
|
||||
```bash
|
||||
# Edit Redis configuration
|
||||
sudo nano /etc/redis/redis.conf
|
||||
```
|
||||
|
||||
```ini
|
||||
# redis.conf
|
||||
bind 127.0.0.1
|
||||
port 6379
|
||||
protected-mode yes
|
||||
requirepass generate-secure-redis-password-here
|
||||
maxmemory 256mb
|
||||
maxmemory-policy allkeys-lru
|
||||
appendonly yes
|
||||
appendfsync everysec
|
||||
```
|
||||
|
||||
```bash
|
||||
# Restart Redis
|
||||
sudo systemctl restart redis
|
||||
|
||||
# Verify Redis
|
||||
redis-cli -a your-redis-password ping # Should return PONG
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Install and Configure Ollama
|
||||
|
||||
### Install Ollama
|
||||
|
||||
```bash
|
||||
# Install Ollama (official installer)
|
||||
curl -fsSL https://ollama.ai/install.sh | sh
|
||||
|
||||
# Start and enable Ollama
|
||||
sudo systemctl start ollama
|
||||
sudo systemctl enable ollama
|
||||
```
|
||||
|
||||
### Configure Ollama for GPU
|
||||
|
||||
#### AMD ROCm
|
||||
|
||||
```bash
|
||||
# Create systemd override for ROCm
|
||||
sudo mkdir -p /etc/systemd/system/ollama.service.d
|
||||
sudo nano /etc/systemd/system/ollama.service.d/rocm.conf
|
||||
```
|
||||
|
||||
```ini
|
||||
# rocm.conf
|
||||
[Service]
|
||||
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
|
||||
Environment="OLLAMA_HOST=0.0.0.0:11434"
|
||||
DevicePolicy=closed
|
||||
DeviceAllow=/dev/kfd rw
|
||||
DeviceAllow=/dev/dri rw
|
||||
```
|
||||
|
||||
#### NVIDIA CUDA
|
||||
|
||||
```bash
|
||||
# Create systemd override for CUDA
|
||||
sudo mkdir -p /etc/systemd/system/ollama.service.d
|
||||
sudo nano /etc/systemd/system/ollama.service.d/cuda.conf
|
||||
```
|
||||
|
||||
```ini
|
||||
# cuda.conf
|
||||
[Service]
|
||||
Environment="OLLAMA_HOST=0.0.0.0:11434"
|
||||
Environment="PATH=/usr/bin:/usr/local/cuda/bin"
|
||||
Environment="LD_LIBRARY_PATH=/usr/local/cuda/lib64"
|
||||
DevicePolicy=closed
|
||||
DeviceAllow=/dev/nvidia0 rw
|
||||
DeviceAllow=/dev/nvidiactl rw
|
||||
DeviceAllow=/dev/nvidia-uvm rw
|
||||
```
|
||||
|
||||
### Pull Embedding Models
|
||||
|
||||
```bash
|
||||
# Pull embedding model
|
||||
ollama pull nomic-embed-text-v2-moe
|
||||
|
||||
# Verify model
|
||||
ollama list
|
||||
|
||||
# Test Ollama
|
||||
curl http://localhost:11434/api/tags
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Install LiteLLM
|
||||
|
||||
### Create Python Virtual Environment
|
||||
|
||||
```bash
|
||||
# Create LiteLLM user
|
||||
sudo useradd -r -s /bin/false litellm
|
||||
sudo mkdir -p /opt/litellm
|
||||
sudo chown litellm:litellm /opt/litellm
|
||||
|
||||
# Create virtual environment
|
||||
sudo -u litellm python3 -m venv /opt/litellm/venv
|
||||
sudo -u litellm /opt/litellm/venv/bin/pip install --upgrade pip
|
||||
```
|
||||
|
||||
### Install LiteLLM
|
||||
|
||||
```bash
|
||||
# Install LiteLLM with dependencies
|
||||
sudo -u litellm /opt/litellm/venv/bin/pip install \
|
||||
'litellm[proxy]' \
|
||||
'litellm[langfuse]' \
|
||||
'litellm[postgres]' \
|
||||
'litellm[redis]' \
|
||||
psycopg2-binary \
|
||||
redis \
|
||||
langfuse
|
||||
```
|
||||
|
||||
### Configure LiteLLM
|
||||
|
||||
```bash
|
||||
# Create LiteLLM config directory
|
||||
sudo mkdir -p /etc/litellm
|
||||
sudo cp litellm_config.yaml /etc/litellm/litellm_config.yaml
|
||||
sudo chown litellm:litellm /etc/litellm/litellm_config.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Install OpenClaw Gateway
|
||||
|
||||
### Install OpenClaw
|
||||
|
||||
```bash
|
||||
# Install OpenClaw Gateway
|
||||
curl -fsSL https://openclaw.ai/install.sh | bash
|
||||
|
||||
# Verify installation
|
||||
openclaw --version
|
||||
|
||||
# Initialize daemon
|
||||
openclaw onboard --install-daemon
|
||||
|
||||
# Verify Gateway status
|
||||
openclaw gateway status
|
||||
```
|
||||
|
||||
### Configure OpenClaw
|
||||
|
||||
```bash
|
||||
# Copy Gateway configuration
|
||||
cp openclaw.json ~/.openclaw/openclaw.json
|
||||
|
||||
# Validate configuration
|
||||
openclaw gateway validate
|
||||
|
||||
# Restart Gateway
|
||||
openclaw gateway restart
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 7: Configure Environment Variables
|
||||
|
||||
### Create Environment File
|
||||
|
||||
```bash
|
||||
# Copy environment template
|
||||
cp .env.bare-metal.example .env
|
||||
|
||||
# Edit with your values
|
||||
nano .env
|
||||
```
|
||||
|
||||
### Required Environment Variables
|
||||
|
||||
See [`.env.bare-metal.example`](../../.env.bare-metal.example) for the complete template.
|
||||
|
||||
Key variables to configure:
|
||||
|
||||
```bash
|
||||
# LiteLLM Gateway
|
||||
LITELLM_MASTER_KEY=generate-a-secure-key-here
|
||||
LITELLM_SALT_KEY=generate-another-secure-key
|
||||
|
||||
# Model Providers
|
||||
MINIMAX_API_KEY=your_minimax_api_key
|
||||
ZAI_API_KEY=your_zai_api_key
|
||||
|
||||
# Database
|
||||
POSTGRES_USER=openclaw
|
||||
POSTGRES_PASSWORD=generate-secure-db-password
|
||||
POSTGRES_DB=openclaw
|
||||
DATABASE_URL=postgresql://openclaw:your_password@localhost:5432/openclaw
|
||||
|
||||
# Redis
|
||||
REDIS_URL=redis://:your-redis-password@localhost:6379/0
|
||||
|
||||
# Ollama
|
||||
OLLAMA_HOST=http://localhost:11434
|
||||
|
||||
# OpenClaw Gateway
|
||||
OPENCLAW_DIR=/root/.openclaw
|
||||
OPENCLAW_WORKSPACE=/root/.openclaw/agents
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 8: Initialize Database
|
||||
|
||||
### Run Database Migrations
|
||||
|
||||
```bash
|
||||
# Activate LiteLLM virtual environment
|
||||
source /opt/litellm/venv/bin/activate
|
||||
|
||||
# Run OpenClaw database migrations
|
||||
cd /root/heretek/heretek-openclaw
|
||||
npm run db:migrate
|
||||
|
||||
# Verify database tables
|
||||
psql -U openclaw -d openclaw -c "\dt"
|
||||
```
|
||||
|
||||
### Initialize LiteLLM Database
|
||||
|
||||
```bash
|
||||
# LiteLLM will auto-create tables on first run
|
||||
# Verify tables after starting LiteLLM
|
||||
psql -U openclaw -d openclaw -c "\dt litellm*"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 9: Configure Systemd Services
|
||||
|
||||
### Install Systemd Service Files
|
||||
|
||||
```bash
|
||||
# Copy service files
|
||||
sudo cp systemd/openclaw-gateway.service /etc/systemd/system/
|
||||
sudo cp systemd/litellm.service /etc/systemd/system/
|
||||
sudo cp systemd/ollama.service /etc/systemd/system/
|
||||
sudo cp systemd/redis.service /etc/systemd/system/
|
||||
sudo cp systemd/postgresql.service /etc/systemd/system/
|
||||
|
||||
# Reload systemd
|
||||
sudo systemctl daemon-reload
|
||||
```
|
||||
|
||||
### Enable and Start Services
|
||||
|
||||
```bash
|
||||
# Start services in order
|
||||
sudo systemctl start postgresql
|
||||
sudo systemctl start redis
|
||||
sudo systemctl start ollama
|
||||
sudo systemctl start litellm
|
||||
sudo systemctl start openclaw-gateway
|
||||
|
||||
# Enable auto-start on boot
|
||||
sudo systemctl enable postgresql
|
||||
sudo systemctl enable redis
|
||||
sudo systemctl enable ollama
|
||||
sudo systemctl enable litellm
|
||||
sudo systemctl enable openclaw-gateway
|
||||
|
||||
# Verify services
|
||||
sudo systemctl status postgresql
|
||||
sudo systemctl status redis
|
||||
sudo systemctl status ollama
|
||||
sudo systemctl status litellm
|
||||
sudo systemctl status openclaw-gateway
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 10: Verify Installation
|
||||
|
||||
### Health Checks
|
||||
|
||||
```bash
|
||||
# Check PostgreSQL
|
||||
curl -f http://localhost:5432 || psql -U openclaw -d openclaw -c "SELECT version();"
|
||||
|
||||
# Check Redis
|
||||
redis-cli -a your-redis-password ping
|
||||
|
||||
# Check Ollama
|
||||
curl http://localhost:11434/api/tags
|
||||
|
||||
# Check LiteLLM
|
||||
curl http://localhost:4000/health
|
||||
|
||||
# Check OpenClaw Gateway
|
||||
openclaw gateway status
|
||||
```
|
||||
|
||||
### Expected Output
|
||||
|
||||
```
|
||||
Gateway: Running
|
||||
Version: v2026.3.28
|
||||
Workspace: /root/.openclaw
|
||||
Agents: 12 configured (main + 11 collective)
|
||||
Plugins: 0 loaded
|
||||
Skills: 0 loaded
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Post-Deployment Configuration
|
||||
|
||||
### Create Agent Workspaces
|
||||
|
||||
```bash
|
||||
# Run agent creation script for each agent
|
||||
./agents/deploy-agent.sh steward orchestrator
|
||||
./agents/deploy-agent.sh alpha triad
|
||||
./agents/deploy-agent.sh beta triad
|
||||
./agents/deploy-agent.sh charlie triad
|
||||
./agents/deploy-agent.sh examiner interrogator
|
||||
./agents/deploy-agent.sh explorer scout
|
||||
./agents/deploy-agent.sh sentinel guardian
|
||||
./agents/deploy-agent.sh coder artisan
|
||||
./agents/deploy-agent.sh dreamer visionary
|
||||
./agents/deploy-agent.sh empath diplomat
|
||||
./agents/deploy-agent.sh historian archivist
|
||||
|
||||
# Verify workspaces created
|
||||
ls -la ~/.openclaw/agents/
|
||||
```
|
||||
|
||||
### Install Plugins & Skills
|
||||
|
||||
```bash
|
||||
# Install consciousness plugin
|
||||
cd plugins/openclaw-consciousness-plugin
|
||||
npm install
|
||||
npm link
|
||||
openclaw plugins install @heretek-ai/openclaw-consciousness-plugin
|
||||
|
||||
# Install liberation plugin
|
||||
cd ../openclaw-liberation-plugin
|
||||
npm install
|
||||
npm link
|
||||
openclaw plugins install @heretek-ai/openclaw-liberation-plugin
|
||||
|
||||
# Install skills
|
||||
cd ../../skills/triad-consensus
|
||||
openclaw skills install ./SKILL.md
|
||||
```
|
||||
|
||||
### Configure LiteLLM
|
||||
|
||||
```bash
|
||||
# Copy LiteLLM configuration
|
||||
sudo cp /root/heretek/heretek-openclaw/litellm_config.yaml /etc/litellm/litellm_config.yaml
|
||||
|
||||
# Restart LiteLLM
|
||||
sudo systemctl restart litellm
|
||||
|
||||
# Verify endpoints
|
||||
curl http://localhost:4000/v1/models
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Hardening
|
||||
|
||||
### Firewall Configuration
|
||||
|
||||
#### UFW (Ubuntu)
|
||||
|
||||
```bash
|
||||
# Enable UFW
|
||||
sudo ufw enable
|
||||
|
||||
# Allow SSH
|
||||
sudo ufw allow ssh
|
||||
|
||||
# Allow only localhost for internal services
|
||||
sudo ufw allow from 127.0.0.1 to any port 5432 # PostgreSQL
|
||||
sudo ufw allow from 127.0.0.1 to any port 6379 # Redis
|
||||
sudo ufw allow from 127.0.0.1 to any port 11434 # Ollama
|
||||
|
||||
# Allow public access to LiteLLM and OpenClaw
|
||||
sudo ufw allow 4000/tcp # LiteLLM
|
||||
sudo ufw allow 18789/tcp # OpenClaw Gateway
|
||||
|
||||
# Check status
|
||||
sudo ufw status verbose
|
||||
```
|
||||
|
||||
#### firewalld (RHEL)
|
||||
|
||||
```bash
|
||||
# Enable firewalld
|
||||
sudo systemctl start firewalld
|
||||
sudo systemctl enable firewalld
|
||||
|
||||
# Allow SSH
|
||||
sudo firewall-cmd --permanent --add-service=ssh
|
||||
|
||||
# Allow only localhost for internal services
|
||||
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="127.0.0.1" port port="5432" protocol="tcp" accept'
|
||||
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="127.0.0.1" port port="6379" protocol="tcp" accept'
|
||||
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="127.0.0.1" port port="11434" protocol="tcp" accept'
|
||||
|
||||
# Allow public access
|
||||
sudo firewall-cmd --permanent --add-port=4000/tcp
|
||||
sudo firewall-cmd --permanent --add-port=18789/tcp
|
||||
|
||||
# Reload firewall
|
||||
sudo firewall-cmd --reload
|
||||
```
|
||||
|
||||
### SSL/TLS Configuration
|
||||
|
||||
For production deployments, configure SSL/TLS for LiteLLM and OpenClaw Gateway using nginx or Apache as a reverse proxy.
|
||||
|
||||
### API Key Management
|
||||
|
||||
```bash
|
||||
# Generate secure keys
|
||||
openssl rand -hex 32 # For LITELLM_MASTER_KEY
|
||||
openssl rand -hex 32 # For LITELLM_SALT_KEY
|
||||
|
||||
# Store keys securely
|
||||
sudo mkdir -p /etc/openclaw/secrets
|
||||
sudo chmod 700 /etc/openclaw/secrets
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
See [`NON_DOCKER_TROUBLESHOOTING.md`](./NON_DOCKER_TROUBLESHOOTING.md) for detailed troubleshooting guide.
|
||||
|
||||
### Common Issues
|
||||
|
||||
| Issue | Solution |
|
||||
|-------|----------|
|
||||
| PostgreSQL won't start | Check logs: `journalctl -u postgresql -f` |
|
||||
| Redis connection refused | Verify password in redis.conf |
|
||||
| Ollama GPU not detected | Check ROCm/CUDA installation |
|
||||
| LiteLLM health check fails | Verify DATABASE_URL and REDIS_URL |
|
||||
| OpenClaw Gateway not running | Check workspace permissions |
|
||||
|
||||
---
|
||||
|
||||
## Backup Configuration
|
||||
|
||||
```bash
|
||||
# Backup OpenClaw configuration
|
||||
tar -czf openclaw-backup-$(date +%Y%m%d).tar.gz \
|
||||
~/.openclaw/openclaw.json \
|
||||
~/.openclaw/agents/ \
|
||||
/etc/litellm/litellm_config.yaml \
|
||||
/etc/openclaw/.env
|
||||
|
||||
# Backup PostgreSQL
|
||||
pg_dump -U openclaw openclaw > openclaw-db-$(date +%Y%m%d).sql
|
||||
|
||||
# Backup is stored in current directory
|
||||
ls -la openclaw-backup-*.tar.gz openclaw-db-*.sql
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
After successful deployment:
|
||||
|
||||
1. **Access LiteLLM Dashboard** - http://localhost:4000/ui
|
||||
2. **Test Agent Communication** - Send messages via Gateway WebSocket RPC
|
||||
3. **Configure User Profiles** - Set up user rolodex
|
||||
4. **Enable Autonomous Operations** - Activate dreamer agent
|
||||
5. **Review Documentation** - See [`docs/`](../../docs/) for advanced configuration
|
||||
|
||||
---
|
||||
|
||||
## Support
|
||||
|
||||
For issues or questions:
|
||||
- Check [`NON_DOCKER_TROUBLESHOOTING.md`](./NON_DOCKER_TROUBLESHOOTING.md)
|
||||
- Review [`CHANGELOG.md`](../../CHANGELOG.md)
|
||||
- Open an issue on GitHub: https://github.com/Heretek-AI/heretek-openclaw/issues
|
||||
|
||||
---
|
||||
|
||||
🦞 *The thought that never ends.*
|
||||
@@ -0,0 +1,493 @@
|
||||
# Cloud-Native Deployment Guide for Heretek OpenClaw
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Last Updated:** 2026-03-31
|
||||
**OpenClaw Version:** v2026.3.28
|
||||
|
||||
This guide provides comprehensive instructions for deploying Heretek OpenClaw on major cloud platforms using Infrastructure as Code (IaC) and Kubernetes.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Architecture](#architecture)
|
||||
3. [Supported Cloud Providers](#supported-cloud-providers)
|
||||
4. [Prerequisites](#prerequisites)
|
||||
5. [Quick Start](#quick-start)
|
||||
6. [Deployment Options](#deployment-options)
|
||||
7. [Configuration Reference](#configuration-reference)
|
||||
8. [Security](#security)
|
||||
9. [Monitoring](#monitoring)
|
||||
10. [Backup & Disaster Recovery](#backup--disaster-recovery)
|
||||
11. [Cost Optimization](#cost-optimization)
|
||||
12. [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Heretek OpenClaw supports cloud-native deployments across all major cloud providers:
|
||||
|
||||
- **AWS** - EKS, RDS PostgreSQL, ElastiCache, ECR, ALB
|
||||
- **GCP** - GKE, Cloud SQL, Memorystore, Artifact Registry, Cloud Load Balancing
|
||||
- **Azure** - AKS, Azure Database for PostgreSQL, Azure Cache for Redis, ACR, Application Gateway
|
||||
|
||||
### Key Features
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| **Infrastructure as Code** | Terraform configurations for all cloud providers |
|
||||
| **Kubernetes Native** | Kustomize overlays for dev, staging, prod |
|
||||
| **High Availability** | Multi-AZ deployments with auto-scaling |
|
||||
| **GPU Support** | Optional GPU nodes for Ollama (G5, A2, NCas) |
|
||||
| **Managed Services** | Managed databases, caches, and container registries |
|
||||
| **Observability** | Integrated monitoring, logging, and alerting |
|
||||
| **Security** | Private networking, encryption, IAM roles |
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
### High-Level Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ Cloud Provider │
|
||||
│ (AWS / GCP / Azure) │
|
||||
└─────────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────────┼─────────────────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
|
||||
│ Kubernetes │ │ Managed │ │ Managed │
|
||||
│ Cluster │ │ Database │ │ Cache │
|
||||
│ (EKS/GKE/AKS) │ │ (RDS/Cloud SQL/ │ │ (ElastiCache/ │
|
||||
│ │ │ Azure PG) │ │ Memorystore/ │
|
||||
│ ┌────────────────┐ │ │ │ │ Azure Redis) │
|
||||
│ │ OpenClaw │ │ │ │ │ │
|
||||
│ │ Gateway │ │ │ │ │ │
|
||||
│ └────────────────┘ │ │ │ │ │
|
||||
│ ┌────────────────┐ │ │ │ │ │
|
||||
│ │ LiteLLM Proxy │ │ │ │ │ │
|
||||
│ └────────────────┘ │ │ │ │ │
|
||||
│ ┌────────────────┐ │ │ │ │ │
|
||||
│ │ Ollama (GPU) │ │ │ │ │ │
|
||||
│ └────────────────┘ │ │ │ │ │
|
||||
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
|
||||
│ │ │
|
||||
└─────────────────────────────────┼─────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────────┼─────────────────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
|
||||
│ Container │ │ Load Balancer │ │ Monitoring │
|
||||
│ Registry │ │ (ALB/CLB/App GW) │ │ (CloudWatch/ │
|
||||
│ (ECR/AR/ACR) │ │ │ │ Monitoring/ │
|
||||
│ │ │ │ │ Azure Monitor) │
|
||||
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
|
||||
```
|
||||
|
||||
### Network Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ VPC / VNet / VPC │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Public Subnet(s) │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
|
||||
│ │ │ NAT GW │ │ NAT GW │ │ NAT GW │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ Load Balancer │ │ │
|
||||
│ │ │ (Public-facing, SSL Termination) │ │ │
|
||||
│ │ └─────────────────────────────────────────────────────────────────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Private Subnet(s) │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
|
||||
│ │ │ K8s Nodes │ │ K8s Nodes │ │ K8s Nodes │ │ │
|
||||
│ │ │ (General) │ │ (Compute) │ │ (GPU) │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Database Subnet(s) │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ │ │
|
||||
│ │ │ RDS/PG │ │ RDS/PG │ │ │
|
||||
│ │ │ (Primary) │ │ (Standby) │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Cache Subnet(s) │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ │ │
|
||||
│ │ │ Redis │ │ Redis │ │ │
|
||||
│ │ │ (Primary) │ │ (Replica) │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Supported Cloud Providers
|
||||
|
||||
### AWS
|
||||
|
||||
| Service | Resource | Purpose |
|
||||
|---------|----------|---------|
|
||||
| EKS | eks_cluster | Kubernetes cluster |
|
||||
| RDS | rds_postgresql | PostgreSQL database |
|
||||
| ElastiCache | elasticache_redis | Redis cache |
|
||||
| ECR | ecr_repository | Container registry |
|
||||
| ALB | application_lb | Load balancer |
|
||||
| CloudWatch | cloudwatch_dashboard | Monitoring |
|
||||
|
||||
**Documentation:** [`deploy/aws/README.md`](../../deploy/aws/README.md)
|
||||
|
||||
### GCP
|
||||
|
||||
| Service | Resource | Purpose |
|
||||
|---------|----------|---------|
|
||||
| GKE | gke_cluster | Kubernetes cluster |
|
||||
| Cloud SQL | cloud_sql_postgresql | PostgreSQL database |
|
||||
| Memorystore | memorystore_redis | Redis cache |
|
||||
| Artifact Registry | artifact_registry | Container registry |
|
||||
| Cloud LB | cloud_load_balancer | Load balancer |
|
||||
| Cloud Monitoring | monitoring_dashboard | Monitoring |
|
||||
|
||||
**Documentation:** [`deploy/gcp/README.md`](../../deploy/gcp/README.md)
|
||||
|
||||
### Azure
|
||||
|
||||
| Service | Resource | Purpose |
|
||||
|---------|----------|---------|
|
||||
| AKS | aks_cluster | Kubernetes cluster |
|
||||
| Azure DB for PostgreSQL | postgresql_flexible_server | PostgreSQL database |
|
||||
| Azure Cache for Redis | redis_cache | Redis cache |
|
||||
| ACR | container_registry | Container registry |
|
||||
| Application Gateway | application_gateway | Load balancer |
|
||||
| Azure Monitor | azure_monitor | Monitoring |
|
||||
|
||||
**Documentation:** [`deploy/azure/README.md`](../../deploy/azure/README.md)
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Required Tools
|
||||
|
||||
```bash
|
||||
# Terraform
|
||||
brew install terraform
|
||||
|
||||
# kubectl
|
||||
brew install kubectl
|
||||
|
||||
# Helm
|
||||
brew install helm
|
||||
|
||||
# Cloud provider CLIs
|
||||
brew install awscli # AWS
|
||||
brew install --cask google-cloud-sdk # GCP
|
||||
brew install azure-cli # Azure
|
||||
```
|
||||
|
||||
### Cloud Account Requirements
|
||||
|
||||
| Provider | Requirements |
|
||||
|----------|-------------|
|
||||
| AWS | IAM user with admin access, budget alerts configured |
|
||||
| GCP | Project with billing enabled, required APIs enabled |
|
||||
| Azure | Subscription with contributor access, resource providers registered |
|
||||
|
||||
### Kubernetes Requirements
|
||||
|
||||
- Kubernetes 1.26+ cluster
|
||||
- Storage class for persistent volumes
|
||||
- Ingress controller (nginx recommended)
|
||||
- Metrics server for HPA
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### AWS Quick Start
|
||||
|
||||
```bash
|
||||
cd deploy/aws/terraform
|
||||
|
||||
# Initialize Terraform
|
||||
terraform init
|
||||
|
||||
# Create variables file
|
||||
cat > terraform.tfvars <<EOF
|
||||
aws_region = "us-east-1"
|
||||
environment = "dev"
|
||||
db_password = "secure-password-here"
|
||||
redis_auth_token = "secure-token-here"
|
||||
EOF
|
||||
|
||||
# Deploy
|
||||
terraform plan -out=tfplan
|
||||
terraform apply tfplan
|
||||
|
||||
# Configure kubectl
|
||||
aws eks update-kubeconfig --name openclaw-dev-eks --region us-east-1
|
||||
|
||||
# Deploy OpenClaw
|
||||
cd ../../kubernetes
|
||||
kubectl apply -k overlays/dev
|
||||
```
|
||||
|
||||
### GCP Quick Start
|
||||
|
||||
```bash
|
||||
cd deploy/gcp/terraform
|
||||
|
||||
# Initialize Terraform
|
||||
terraform init
|
||||
|
||||
# Create variables file
|
||||
cat > terraform.tfvars <<EOF
|
||||
project_id = "your-project-id"
|
||||
region = "us-central1"
|
||||
environment = "dev"
|
||||
db_password = "secure-password-here"
|
||||
EOF
|
||||
|
||||
# Deploy
|
||||
terraform plan -out=tfplan
|
||||
terraform apply tfplan
|
||||
|
||||
# Configure kubectl
|
||||
gcloud container clusters get-credentials openclaw-dev-gke --region us-central1
|
||||
|
||||
# Deploy OpenClaw
|
||||
cd ../../kubernetes
|
||||
kubectl apply -k overlays/dev
|
||||
```
|
||||
|
||||
### Azure Quick Start
|
||||
|
||||
```bash
|
||||
cd deploy/azure/terraform
|
||||
|
||||
# Initialize Terraform
|
||||
terraform init
|
||||
|
||||
# Create variables file
|
||||
cat > terraform.tfvars <<EOF
|
||||
resource_group_name = "openclaw-rg"
|
||||
location = "eastus"
|
||||
environment = "dev"
|
||||
db_administrator_password = "secure-password-here"
|
||||
EOF
|
||||
|
||||
# Deploy
|
||||
terraform plan -out=tfplan
|
||||
terraform apply tfplan
|
||||
|
||||
# Configure kubectl
|
||||
az aks get-credentials --resource-group openclaw-rg --name openclaw-dev-aks
|
||||
|
||||
# Deploy OpenClaw
|
||||
cd ../../kubernetes
|
||||
kubectl apply -k overlays/dev
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Options
|
||||
|
||||
### Environment Overlays
|
||||
|
||||
| Environment | Replicas | Resources | Use Case |
|
||||
|-------------|----------|-----------|----------|
|
||||
| Dev | 1 | Minimal | Development, testing |
|
||||
| Staging | 2 | Medium | Pre-production validation |
|
||||
| Production | 3+ | Full | Production workloads |
|
||||
|
||||
### GPU Support
|
||||
|
||||
Enable GPU nodes for Ollama local LLM inference:
|
||||
|
||||
```hcl
|
||||
# terraform.tfvars
|
||||
enable_gpu_support = true
|
||||
|
||||
# AWS
|
||||
gpu_instance_types = ["g5.xlarge", "g5.2xlarge"]
|
||||
|
||||
# GCP
|
||||
gpu_node_pool = {
|
||||
machine_type = "g2-standard-4"
|
||||
accelerator_type = "nvidia-l4"
|
||||
accelerator_count = 1
|
||||
}
|
||||
|
||||
# Azure
|
||||
gpu_node_pool = {
|
||||
vm_size = "Standard_NC4as_T4_v3"
|
||||
}
|
||||
```
|
||||
|
||||
### High Availability
|
||||
|
||||
Production deployments include:
|
||||
|
||||
- Multi-AZ database (RDS/Cloud SQL/Azure PG)
|
||||
- Multi-AZ cache (ElastiCache/Memorystore/Azure Redis)
|
||||
- Multiple node pools across availability zones
|
||||
- Pod disruption budgets
|
||||
- Horizontal pod autoscaling
|
||||
|
||||
---
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
### Input Variables
|
||||
|
||||
See individual provider documentation for complete variable lists:
|
||||
|
||||
- [AWS Variables](../../deploy/aws/terraform/variables.tf)
|
||||
- [GCP Variables](../../deploy/gcp/terraform/variables.tf)
|
||||
- [Azure Variables](../../deploy/azure/terraform/variables.tf)
|
||||
|
||||
### Kubernetes Configuration
|
||||
|
||||
Base manifests: [`deploy/kubernetes/base/`](../../deploy/kubernetes/base/)
|
||||
|
||||
Overlays:
|
||||
- [`deploy/kubernetes/overlays/dev/`](../../deploy/kubernetes/overlays/dev/)
|
||||
- [`deploy/kubernetes/overlays/staging/`](../../deploy/kubernetes/overlays/staging/)
|
||||
- [`deploy/kubernetes/overlays/prod/`](../../deploy/kubernetes/overlays/prod/)
|
||||
|
||||
### Secrets Management
|
||||
|
||||
**Never commit secrets to version control.** Use:
|
||||
|
||||
1. **Cloud Secret Managers**
|
||||
- AWS Secrets Manager
|
||||
- GCP Secret Manager
|
||||
- Azure Key Vault
|
||||
|
||||
2. **Kubernetes Secrets** (encrypted at rest)
|
||||
|
||||
3. **External Secrets Operator** for sync from cloud secret managers
|
||||
|
||||
---
|
||||
|
||||
## Security
|
||||
|
||||
### Network Security
|
||||
|
||||
- Private subnets for application workloads
|
||||
- Security groups / firewall rules for least privilege
|
||||
- VPC Flow Logs for network monitoring
|
||||
- Private endpoints for managed services
|
||||
|
||||
### Data Security
|
||||
|
||||
- Encryption at rest (database, cache, storage)
|
||||
- Encryption in transit (TLS 1.2+)
|
||||
- Secrets encryption with KMS
|
||||
- Network policies for pod isolation
|
||||
|
||||
### Access Control
|
||||
|
||||
- IAM roles for service accounts (IRSA/Workload Identity)
|
||||
- RBAC for Kubernetes access
|
||||
- Network policies for pod communication
|
||||
- Pod security policies/standards
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Cloud-Native Monitoring
|
||||
|
||||
Each deployment includes:
|
||||
|
||||
- Cloud provider dashboards (CloudWatch/Cloud Monitoring/Azure Monitor)
|
||||
- Pre-configured alerts for CPU, memory, storage
|
||||
- Log aggregation and retention
|
||||
- Cost monitoring and budget alerts
|
||||
|
||||
### Kubernetes Monitoring
|
||||
|
||||
- Prometheus metrics via ServiceMonitor
|
||||
- Grafana dashboards
|
||||
- Alertmanager for notifications
|
||||
- Distributed tracing (optional)
|
||||
|
||||
---
|
||||
|
||||
## Backup & Disaster Recovery
|
||||
|
||||
### Automated Backups
|
||||
|
||||
| Resource | Strategy | Retention |
|
||||
|----------|----------|-----------|
|
||||
| Database | Automated snapshots | 7-35 days |
|
||||
| Cache | Persistence + manual snapshots | Manual |
|
||||
| Container Registry | Lifecycle policies | 30 days |
|
||||
| Terraform State | Versioned storage | Unlimited |
|
||||
|
||||
### Disaster Recovery
|
||||
|
||||
1. **Multi-AZ** - Automatic failover within region
|
||||
2. **Cross-Region** - Manual failover to secondary region
|
||||
3. **Backup Restoration** - Documented procedures for each service
|
||||
|
||||
---
|
||||
|
||||
## Cost Optimization
|
||||
|
||||
### Development Environments
|
||||
|
||||
- Single NAT Gateway
|
||||
- Burstable database instances
|
||||
- Basic cache tier
|
||||
- Spot/preemptible instances for non-critical workloads
|
||||
|
||||
### Production Optimizations
|
||||
|
||||
- Reserved instances / committed use discounts
|
||||
- Savings plans for predictable workloads
|
||||
- Cluster autoscaler for dynamic scaling
|
||||
- Right-sizing based on actual usage
|
||||
|
||||
### Cost Estimates
|
||||
|
||||
See individual provider READMEs for detailed cost breakdowns:
|
||||
|
||||
- [AWS Cost Estimates](../../deploy/aws/README.md#cost-estimates)
|
||||
- [GCP Cost Estimates](../../deploy/gcp/README.md#cost-estimates)
|
||||
- [Azure Cost Estimates](../../deploy/azure/README.md#cost-estimates)
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
| Issue | Solution |
|
||||
|-------|----------|
|
||||
| Pods not scheduling | Check node pool capacity, taints, tolerations |
|
||||
| Database connection failures | Verify security groups, private endpoints |
|
||||
| Load balancer not routing | Check target group health, ingress configuration |
|
||||
| GPU not detected | Verify device plugin, node labels, tolerations |
|
||||
|
||||
### Support Resources
|
||||
|
||||
- [AWS Deployment Guide](../../deploy/aws/README.md)
|
||||
- [GCP Deployment Guide](../../deploy/gcp/README.md)
|
||||
- [Azure Deployment Guide](../../deploy/azure/README.md)
|
||||
- [Kubernetes Deployment Guide](../../docs/deployment/KUBERNETES_DEPLOYMENT.md)
|
||||
|
||||
---
|
||||
|
||||
🦞 *The thought that never ends.*
|
||||
@@ -0,0 +1,890 @@
|
||||
# Docker to Bare Metal Migration Guide
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Last Updated:** 2026-03-31
|
||||
**OpenClaw Version:** v2026.3.28
|
||||
|
||||
This guide provides step-by-step instructions for migrating from a Docker-based OpenClaw deployment to a bare metal or VM installation.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Pre-Migration Checklist](#pre-migration-checklist)
|
||||
3. [Migration Planning](#migration-planning)
|
||||
4. [Step 1: Backup Docker Deployment](#step-1-backup-docker-deployment)
|
||||
5. [Step 2: Prepare Target System](#step-2-prepare-target-system)
|
||||
6. [Step 3: Export Docker Data](#step-3-export-docker-data)
|
||||
7. [Step 4: Install Bare Metal Dependencies](#step-4-install-bare-metal-dependencies)
|
||||
8. [Step 5: Migrate PostgreSQL Data](#step-5-migrate-postgresql-data)
|
||||
9. [Step 6: Migrate Redis Data](#step-6-migrate-redis-data)
|
||||
10. [Step 7: Migrate Ollama Models](#step-7-migrate-ollama-models)
|
||||
11. [Step 8: Configure LiteLLM](#step-8-configure-litellm)
|
||||
12. [Step 9: Migrate OpenClaw Configuration](#step-9-migrate-openclaw-configuration)
|
||||
13. [Step 10: Start and Verify Services](#step-10-start-and-verify-services)
|
||||
14. [Rollback Procedures](#rollback-procedures)
|
||||
15. [Post-Migration Tasks](#post-migration-tasks)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
### Why Migrate?
|
||||
|
||||
| Reason | Docker | Bare Metal |
|
||||
|--------|--------|------------|
|
||||
| **Performance** | Container overhead | Native performance |
|
||||
| **GPU Access** | Passthrough complexity | Direct access |
|
||||
| **Debugging** | Limited visibility | Full system access |
|
||||
| **Compliance** | Container restrictions | Full control |
|
||||
| **Cost** | Docker Enterprise licensing | No licensing costs |
|
||||
|
||||
### Migration Architecture Comparison
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Docker Deployment │
|
||||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||
│ │ Docker Engine │ │
|
||||
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
|
||||
│ │ │ LiteLLM │ │PostgreSQL│ │ Redis │ │ Ollama │ │ │
|
||||
│ │ │ Container│ │ Container│ │ Container│ │ Container│ │ │
|
||||
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Bare Metal Deployment │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ LiteLLM │ │PostgreSQL│ │ Redis │ │ Ollama │ │
|
||||
│ │ System │ │ System │ │ System │ │ System │ │
|
||||
│ │ Service │ │ Service │ │ Service │ │ Service │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Port Mapping
|
||||
|
||||
| Service | Docker Port | Bare Metal Port | Notes |
|
||||
|---------|-------------|-----------------|-------|
|
||||
| LiteLLM | 4000 | 4000 | Same |
|
||||
| PostgreSQL | 5432 (internal) | 5432 (localhost) | Bind to localhost |
|
||||
| Redis | 6379 (internal) | 6379 (localhost) | Bind to localhost |
|
||||
| Ollama | 11434 (internal) | 11434 (localhost) | Bind to localhost |
|
||||
| OpenClaw Gateway | 18789 | 18789 | Same |
|
||||
|
||||
---
|
||||
|
||||
## Pre-Migration Checklist
|
||||
|
||||
### Current State Assessment
|
||||
|
||||
```bash
|
||||
# Verify Docker deployment is healthy
|
||||
docker compose ps
|
||||
|
||||
# Check all services are running
|
||||
docker compose ps | grep -E "Up|healthy"
|
||||
|
||||
# Document current configuration
|
||||
docker compose config > docker-compose-config-backup.yaml
|
||||
|
||||
# List Docker volumes
|
||||
docker volume ls
|
||||
|
||||
# Check disk usage
|
||||
docker system df
|
||||
```
|
||||
|
||||
### Required Information
|
||||
|
||||
| Item | Location | Example |
|
||||
|------|----------|---------|
|
||||
| Docker Compose file | `docker-compose.yml` | Current directory |
|
||||
| Environment file | `.env` | Current directory |
|
||||
| PostgreSQL password | `.env` or secrets | `POSTGRES_PASSWORD` |
|
||||
| Redis password | `.env` or secrets | `REDIS_URL` |
|
||||
| LiteLLM keys | `.env` | `LITELLM_MASTER_KEY` |
|
||||
| Provider API keys | `.env` | `MINIMAX_API_KEY` |
|
||||
| OpenClaw config | `~/.openclaw/openclaw.json` | Home directory |
|
||||
| Agent workspaces | `~/.openclaw/agents/` | Home directory |
|
||||
|
||||
### Tools Required
|
||||
|
||||
```bash
|
||||
# Install migration tools
|
||||
sudo apt-get install -y \
|
||||
postgresql-client \
|
||||
redis-tools \
|
||||
jq \
|
||||
yq
|
||||
|
||||
# Or for RHEL
|
||||
sudo dnf install -y \
|
||||
postgresql \
|
||||
redis-tools \
|
||||
jq \
|
||||
yq
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Planning
|
||||
|
||||
### Downtime Estimation
|
||||
|
||||
| Phase | Estimated Time | Can Run While Docker Running? |
|
||||
|-------|----------------|-------------------------------|
|
||||
| Backup | 5-10 minutes | Yes |
|
||||
| Target preparation | 30-60 minutes | Yes |
|
||||
| Data export | 10-30 minutes | No (read-only recommended) |
|
||||
| Data import | 10-30 minutes | No |
|
||||
| Configuration | 15-30 minutes | No |
|
||||
| Verification | 10-15 minutes | No |
|
||||
| **Total** | **80-155 minutes** | **Partial** |
|
||||
|
||||
### Migration Window Planning
|
||||
|
||||
```bash
|
||||
# Calculate migration window
|
||||
# Recommended: Schedule during low-usage period
|
||||
# Minimum: 2 hours downtime
|
||||
# Recommended: 4 hours for first migration
|
||||
|
||||
# Notify stakeholders
|
||||
# Example notification template:
|
||||
cat << 'EOF'
|
||||
Subject: Scheduled Maintenance - OpenClaw Migration
|
||||
|
||||
Dear Team,
|
||||
|
||||
We will be performing a planned migration of the OpenClaw system
|
||||
from Docker to bare metal deployment.
|
||||
|
||||
Maintenance Window:
|
||||
- Start: [DATE] at [TIME]
|
||||
- Expected Duration: 2-4 hours
|
||||
- Impact: OpenClaw services will be unavailable
|
||||
|
||||
Rollback Plan:
|
||||
If issues occur, we will revert to the Docker deployment
|
||||
within 30 minutes.
|
||||
|
||||
Contact: [YOUR_CONTACT]
|
||||
|
||||
EOF
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Backup Docker Deployment
|
||||
|
||||
### Full System Backup
|
||||
|
||||
```bash
|
||||
# Create backup directory
|
||||
BACKUP_DIR="/tmp/openclaw-migration-$(date +%Y%m%d-%H%M%S)"
|
||||
mkdir -p $BACKUP_DIR
|
||||
|
||||
# Backup Docker Compose configuration
|
||||
cp docker-compose.yml $BACKUP_DIR/
|
||||
cp .env $BACKUP_DIR/
|
||||
cp .env.example $BACKUP_DIR/
|
||||
cp litellm_config.yaml $BACKUP_DIR/
|
||||
cp openclaw.json $BACKUP_DIR/
|
||||
|
||||
# Backup OpenClaw data
|
||||
tar -czf $BACKUP_DIR/openclaw-data.tar.gz ~/.openclaw/
|
||||
|
||||
# Backup Docker volumes
|
||||
docker run --rm \
|
||||
-v heretek-openclaw_postgres_data:/source:ro \
|
||||
-v $BACKUP_DIR:/backup \
|
||||
alpine tar -czf /backup/postgres-data.tar.gz -C /source .
|
||||
|
||||
docker run --rm \
|
||||
-v heretek-openclaw_redis_data:/source:ro \
|
||||
-v $BACKUP_DIR:/backup \
|
||||
alpine tar -czf /backup/redis-data.tar.gz -C /source .
|
||||
|
||||
docker run --rm \
|
||||
-v heretek-openclaw_ollama_data:/source:ro \
|
||||
-v $BACKUP_DIR:/backup \
|
||||
alpine tar -czf /backup/ollama-data.tar.gz -C /source .
|
||||
|
||||
# Verify backups
|
||||
ls -lah $BACKUP_DIR/
|
||||
echo "Backup completed: $BACKUP_DIR"
|
||||
```
|
||||
|
||||
### Database Backup
|
||||
|
||||
```bash
|
||||
# Export PostgreSQL database
|
||||
docker compose exec -T postgres pg_dump -U openclaw openclaw > $BACKUP_DIR/openclaw-database.sql
|
||||
|
||||
# Verify SQL dump
|
||||
wc -l $BACKUP_DIR/openclaw-database.sql
|
||||
head -20 $BACKUP_DIR/openclaw-database.sql
|
||||
```
|
||||
|
||||
### Redis Backup
|
||||
|
||||
```bash
|
||||
# Trigger Redis BGSAVE
|
||||
docker compose exec redis redis-cli BGSAVE
|
||||
|
||||
# Wait for save to complete
|
||||
sleep 5
|
||||
|
||||
# Export Redis data
|
||||
docker cp heretek-redis:/data/dump.rdb $BACKUP_DIR/dump.rdb
|
||||
|
||||
# Verify RDB file
|
||||
ls -lah $BACKUP_DIR/dump.rdb
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Prepare Target System
|
||||
|
||||
### System Requirements Check
|
||||
|
||||
```bash
|
||||
# Check OS version
|
||||
cat /etc/os-release
|
||||
|
||||
# Check available disk space
|
||||
df -h /
|
||||
|
||||
# Check available memory
|
||||
free -h
|
||||
|
||||
# Check CPU cores
|
||||
nproc
|
||||
|
||||
# Check GPU (if applicable)
|
||||
lspci | grep -i vga
|
||||
```
|
||||
|
||||
### Install Prerequisites
|
||||
|
||||
```bash
|
||||
# For Ubuntu/Debian
|
||||
curl -fsSL https://raw.githubusercontent.com/Heretek-AI/heretek-openclaw/main/scripts/install/ubuntu-deps.sh -o ubuntu-deps.sh
|
||||
chmod +x ubuntu-deps.sh
|
||||
sudo ./ubuntu-deps.sh
|
||||
|
||||
# For RHEL/CentOS
|
||||
curl -fsSL https://raw.githubusercontent.com/Heretek-AI/heretek-openclaw/main/scripts/install/rhel-deps.sh -o rhel-deps.sh
|
||||
chmod +x rhel-deps.sh
|
||||
sudo ./rhel-deps.sh
|
||||
```
|
||||
|
||||
### Create Required Users and Directories
|
||||
|
||||
```bash
|
||||
# Create litellm user
|
||||
sudo useradd -r -s /bin/false litellm
|
||||
sudo mkdir -p /opt/litellm
|
||||
sudo chown litellm:litellm /opt/litellm
|
||||
|
||||
# Create OpenClaw directories
|
||||
sudo mkdir -p /etc/litellm
|
||||
sudo mkdir -p /etc/openclaw
|
||||
sudo mkdir -p /var/log/openclaw
|
||||
|
||||
# Set permissions
|
||||
sudo chmod 755 /etc/litellm
|
||||
sudo chmod 755 /etc/openclaw
|
||||
sudo chmod 755 /var/log/openclaw
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Export Docker Data
|
||||
|
||||
### Export PostgreSQL Data
|
||||
|
||||
```bash
|
||||
# Export full database with schema
|
||||
docker compose exec -T postgres pg_dumpall -U openclaw > $BACKUP_DIR/full-export.sql
|
||||
|
||||
# Export specific database
|
||||
docker compose exec -T postgres pg_dump -U openclaw -Fc openclaw > $BACKUP_DIR/openclaw.custom
|
||||
|
||||
# Export schema only (for reference)
|
||||
docker compose exec -T postgres pg_dump -U openclaw --schema-only openclaw > $BACKUP_DIR/schema.sql
|
||||
|
||||
# Export data only
|
||||
docker compose exec -T postgres pg_dump -U openclaw --data-only openclaw > $BACKUP_DIR/data.sql
|
||||
|
||||
# Verify exports
|
||||
ls -lah $BACKUP_DIR/*.sql $BACKUP_DIR/*.custom
|
||||
```
|
||||
|
||||
### Export Redis Data
|
||||
|
||||
```bash
|
||||
# Export Redis data in different formats
|
||||
docker compose exec redis redis-cli --rdb /data/dump.rdb
|
||||
docker cp heretek-redis:/data/dump.rdb $BACKUP_DIR/
|
||||
|
||||
# Export as RDB
|
||||
docker compose exec redis redis-cli SAVE
|
||||
docker cp heretek-redis:/data/dump.rdb $BACKUP_DIR/redis-dump.rdb
|
||||
|
||||
# Export specific keys (optional)
|
||||
docker compose exec redis redis-cli KEYS '*' > $BACKUP_DIR/redis-keys.txt
|
||||
```
|
||||
|
||||
### Export Ollama Models
|
||||
|
||||
```bash
|
||||
# List Ollama models
|
||||
docker compose exec ollama ollama list
|
||||
|
||||
# Export model files
|
||||
docker run --rm \
|
||||
-v heretek-openclaw_ollama_data:/ollama:ro \
|
||||
-v $BACKUP_DIR:/backup \
|
||||
alpine tar -czf /backup/ollama-models.tar.gz -C /ollama .
|
||||
|
||||
# Alternative: Pull models on target system
|
||||
# (Recommended for large models)
|
||||
docker compose exec ollama ollama list --format json > $BACKUP_DIR/ollama-models.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Install Bare Metal Dependencies
|
||||
|
||||
### Install PostgreSQL
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt-get install -y postgresql-15 postgresql-contrib-15 postgresql-15-pgvector
|
||||
|
||||
# RHEL/CentOS
|
||||
sudo dnf install -y postgresql15 postgresql15-contrib postgresql15-pgvector
|
||||
|
||||
# Start PostgreSQL
|
||||
sudo systemctl start postgresql
|
||||
sudo systemctl enable postgresql
|
||||
```
|
||||
|
||||
### Install Redis
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt-get install -y redis
|
||||
|
||||
# RHEL/CentOS
|
||||
sudo dnf install -y redis
|
||||
|
||||
# Start Redis
|
||||
sudo systemctl start redis
|
||||
sudo systemctl enable redis
|
||||
```
|
||||
|
||||
### Install Ollama
|
||||
|
||||
```bash
|
||||
# Install Ollama
|
||||
curl -fsSL https://ollama.ai/install.sh | sh
|
||||
|
||||
# Configure Ollama (see BARE_METAL_DEPLOYMENT.md for GPU setup)
|
||||
sudo systemctl start ollama
|
||||
sudo systemctl enable ollama
|
||||
```
|
||||
|
||||
### Install LiteLLM
|
||||
|
||||
```bash
|
||||
# Create virtual environment
|
||||
sudo -u litellm python3 -m venv /opt/litellm/venv
|
||||
|
||||
# Install LiteLLM
|
||||
sudo -u litellm /opt/litellm/venv/bin/pip install \
|
||||
'litellm[proxy]' \
|
||||
'litellm[langfuse]' \
|
||||
'litellm[postgres]' \
|
||||
'litellm[redis]' \
|
||||
psycopg2-binary \
|
||||
redis \
|
||||
langfuse
|
||||
```
|
||||
|
||||
### Install OpenClaw Gateway
|
||||
|
||||
```bash
|
||||
# Install OpenClaw
|
||||
curl -fsSL https://openclaw.ai/install.sh | bash
|
||||
|
||||
# Verify installation
|
||||
openclaw --version
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Migrate PostgreSQL Data
|
||||
|
||||
### Create Database and User
|
||||
|
||||
```bash
|
||||
# Connect to PostgreSQL
|
||||
sudo -u postgres psql
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Create database and user
|
||||
CREATE DATABASE openclaw;
|
||||
CREATE USER openclaw WITH PASSWORD 'your-secure-password';
|
||||
GRANT ALL PRIVILEGES ON DATABASE openclaw TO openclaw;
|
||||
|
||||
-- Enable pgvector extension
|
||||
\c openclaw
|
||||
CREATE EXTENSION IF NOT EXISTS vector;
|
||||
|
||||
-- Verify
|
||||
\dx
|
||||
\q
|
||||
```
|
||||
|
||||
### Import Data
|
||||
|
||||
```bash
|
||||
# Import SQL dump
|
||||
psql -U openclaw -d openclaw -f $BACKUP_DIR/openclaw-database.sql
|
||||
|
||||
# Or import custom format
|
||||
pg_restore -U openclaw -d openclaw $BACKUP_DIR/openclaw.custom
|
||||
|
||||
# Or import full export
|
||||
psql -U openclaw -d openclaw -f $BACKUP_DIR/full-export.sql
|
||||
|
||||
# Verify import
|
||||
psql -U openclaw -d openclaw -c "SELECT COUNT(*) FROM pg_tables WHERE schemaname = 'public';"
|
||||
psql -U openclaw -d openclaw -c "\dt"
|
||||
```
|
||||
|
||||
### Update Connection Strings
|
||||
|
||||
```bash
|
||||
# The DATABASE_URL needs to change from Docker to localhost
|
||||
# Docker: postgresql://openclaw:password@postgres:5432/openclaw
|
||||
# Bare Metal: postgresql://openclaw:password@localhost:5432/openclaw
|
||||
|
||||
# Update environment file
|
||||
sed -i 's/@postgres:5432/@localhost:5432/g' $BACKUP_DIR/.env
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Migrate Redis Data
|
||||
|
||||
### Stop Redis Service
|
||||
|
||||
```bash
|
||||
# Stop Redis temporarily
|
||||
sudo systemctl stop redis
|
||||
```
|
||||
|
||||
### Import RDB File
|
||||
|
||||
```bash
|
||||
# Copy RDB file to Redis data directory
|
||||
sudo cp $BACKUP_DIR/dump.rdb /var/lib/redis/dump.rdb
|
||||
|
||||
# Set correct ownership
|
||||
sudo chown redis:redis /var/lib/redis/dump.rdb
|
||||
sudo chmod 640 /var/lib/redis/dump.rdb
|
||||
```
|
||||
|
||||
### Start Redis Service
|
||||
|
||||
```bash
|
||||
# Start Redis
|
||||
sudo systemctl start redis
|
||||
|
||||
# Verify data loaded
|
||||
redis-cli -a your-redis-password KEYS '*' | head -20
|
||||
```
|
||||
|
||||
### Update Redis URL
|
||||
|
||||
```bash
|
||||
# Update REDIS_URL in environment file
|
||||
# Docker: redis://redis:6379/0
|
||||
# Bare Metal: redis://:password@localhost:6379/0
|
||||
|
||||
sed -i 's|redis://redis:6379|redis://:your-redis-password@localhost:6379|g' $BACKUP_DIR/.env
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 7: Migrate Ollama Models
|
||||
|
||||
### Option 1: Restore from Backup
|
||||
|
||||
```bash
|
||||
# Stop Ollama
|
||||
sudo systemctl stop ollama
|
||||
|
||||
# Restore model data
|
||||
sudo tar -xzf $BACKUP_DIR/ollama-models.tar.gz -C /var/lib/ollama/
|
||||
|
||||
# Set permissions
|
||||
sudo chown -R ollama:ollama /var/lib/ollama
|
||||
|
||||
# Start Ollama
|
||||
sudo systemctl start ollama
|
||||
|
||||
# Verify models
|
||||
ollama list
|
||||
```
|
||||
|
||||
### Option 2: Re-pull Models (Recommended)
|
||||
|
||||
```bash
|
||||
# Get list of models from backup
|
||||
cat $BACKUP_DIR/ollama-models.json | jq -r '.[].name' > $BACKUP_DIR/model-list.txt
|
||||
|
||||
# Pull each model
|
||||
while read model; do
|
||||
echo "Pulling $model..."
|
||||
ollama pull $model
|
||||
done < $BACKUP_DIR/model-list.txt
|
||||
|
||||
# Verify models
|
||||
ollama list
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 8: Configure LiteLLM
|
||||
|
||||
### Copy Configuration
|
||||
|
||||
```bash
|
||||
# Copy LiteLLM configuration
|
||||
sudo cp $BACKUP_DIR/litellm_config.yaml /etc/litellm/litellm_config.yaml
|
||||
sudo chown litellm:litellm /etc/litellm/litellm_config.yaml
|
||||
```
|
||||
|
||||
### Update Configuration for Bare Metal
|
||||
|
||||
```bash
|
||||
# Update database connection in litellm_config.yaml
|
||||
# Change postgres host from 'postgres' to 'localhost'
|
||||
|
||||
# Update Redis connection
|
||||
# Change redis host from 'redis' to 'localhost'
|
||||
|
||||
# Or use environment variables (recommended)
|
||||
# The systemd service will set these
|
||||
```
|
||||
|
||||
### Create Environment File
|
||||
|
||||
```bash
|
||||
# Copy environment template
|
||||
cp $BACKUP_DIR/.env /etc/openclaw/.env
|
||||
|
||||
# Update for bare metal
|
||||
sed -i 's/@postgres:5432/@localhost:5432/g' /etc/openclaw/.env
|
||||
sed -i 's|redis://redis:6379|redis://localhost:6379|g' /etc/openclaw/.env
|
||||
sed -i 's|OLLAMA_HOST=http://ollama:11434|OLLAMA_HOST=http://localhost:11434|g' /etc/openclaw/.env
|
||||
|
||||
# Set permissions
|
||||
sudo chmod 600 /etc/openclaw/.env
|
||||
sudo chown root:root /etc/openclaw/.env
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 9: Migrate OpenClaw Configuration
|
||||
|
||||
### Restore OpenClaw Data
|
||||
|
||||
```bash
|
||||
# Extract OpenClaw data
|
||||
tar -xzf $BACKUP_DIR/openclaw-data.tar.gz -C ~/
|
||||
|
||||
# Verify extraction
|
||||
ls -la ~/.openclaw/
|
||||
ls -la ~/.openclaw/agents/
|
||||
```
|
||||
|
||||
### Validate Configuration
|
||||
|
||||
```bash
|
||||
# Validate openclaw.json
|
||||
openclaw gateway validate
|
||||
|
||||
# Check agent workspaces
|
||||
for agent in steward alpha beta charlie examiner explorer sentinel coder dreamer empath historian; do
|
||||
echo "=== $agent ==="
|
||||
ls -la ~/.openclaw/agents/$agent/
|
||||
done
|
||||
```
|
||||
|
||||
### Update Configuration Paths
|
||||
|
||||
```bash
|
||||
# If paths need to be updated, edit openclaw.json
|
||||
nano ~/.openclaw/openclaw.json
|
||||
|
||||
# Common path changes:
|
||||
# - Database URLs
|
||||
# - File paths
|
||||
# - API endpoints
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 10: Start and Verify Services
|
||||
|
||||
### Start Services in Order
|
||||
|
||||
```bash
|
||||
# Start PostgreSQL
|
||||
sudo systemctl start postgresql
|
||||
sudo systemctl status postgresql
|
||||
|
||||
# Start Redis
|
||||
sudo systemctl start redis
|
||||
sudo systemctl status redis
|
||||
|
||||
# Start Ollama
|
||||
sudo systemctl start ollama
|
||||
sudo systemctl status ollama
|
||||
|
||||
# Start LiteLLM
|
||||
sudo systemctl start litellm
|
||||
sudo systemctl status litellm
|
||||
|
||||
# Start OpenClaw Gateway
|
||||
sudo systemctl start openclaw-gateway
|
||||
sudo systemctl status openclaw-gateway
|
||||
```
|
||||
|
||||
### Verify Services
|
||||
|
||||
```bash
|
||||
# Check PostgreSQL
|
||||
psql -U openclaw -d openclaw -c "SELECT version();"
|
||||
|
||||
# Check Redis
|
||||
redis-cli -a your-redis-password ping
|
||||
|
||||
# Check Ollama
|
||||
curl http://localhost:11434/api/tags
|
||||
|
||||
# Check LiteLLM
|
||||
curl http://localhost:4000/health
|
||||
|
||||
# Check OpenClaw Gateway
|
||||
openclaw gateway status
|
||||
```
|
||||
|
||||
### Run Health Checks
|
||||
|
||||
```bash
|
||||
# Run comprehensive health check
|
||||
cd /root/heretek/heretek-openclaw
|
||||
./scripts/health-check.sh
|
||||
|
||||
# Or individual checks
|
||||
curl http://localhost:4000/v1/models
|
||||
openclaw agent status steward
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rollback Procedures
|
||||
|
||||
### Quick Rollback to Docker
|
||||
|
||||
If the bare metal deployment fails, you can quickly rollback to Docker:
|
||||
|
||||
```bash
|
||||
# Stop bare metal services
|
||||
sudo systemctl stop openclaw-gateway
|
||||
sudo systemctl stop litellm
|
||||
sudo systemctl stop ollama
|
||||
|
||||
# Return to project directory
|
||||
cd /root/heretek/heretek-openclaw
|
||||
|
||||
# Start Docker deployment
|
||||
docker compose up -d
|
||||
|
||||
# Verify Docker services
|
||||
docker compose ps
|
||||
```
|
||||
|
||||
### Rollback Decision Tree
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Rollback Decision Tree │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Issue Type │ Action │
|
||||
├───────────────────────────────┼─────────────────────────────┤
|
||||
│ PostgreSQL migration failed │ Restore from SQL dump │
|
||||
│ Redis data corrupted │ Restore RDB file │
|
||||
│ Ollama models missing │ Re-pull models │
|
||||
│ LiteLLM won't start │ Check logs, restore config │
|
||||
│ OpenClaw agents not loading │ Validate openclaw.json │
|
||||
│ Critical failure │ Full Docker rollback │
|
||||
└────��────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Rollback Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# rollback-to-docker.sh
|
||||
|
||||
echo "Starting rollback to Docker deployment..."
|
||||
|
||||
# Stop bare metal services
|
||||
sudo systemctl stop openclaw-gateway litellm ollama redis postgresql
|
||||
|
||||
# Start Docker
|
||||
cd /root/heretek/heretek-openclaw
|
||||
docker compose up -d
|
||||
|
||||
# Wait for services
|
||||
sleep 30
|
||||
|
||||
# Verify
|
||||
docker compose ps
|
||||
|
||||
echo "Rollback complete. Verify services with: docker compose ps"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Post-Migration Tasks
|
||||
|
||||
### Update Documentation
|
||||
|
||||
```bash
|
||||
# Document the migration
|
||||
cat << EOF >> /var/log/openclaw/migration-log.txt
|
||||
Migration Date: $(date)
|
||||
From: Docker Deployment
|
||||
To: Bare Metal Deployment
|
||||
Duration: [TIME]
|
||||
Issues: [LIST ANY ISSUES]
|
||||
Resolution: [LIST RESOLUTIONS]
|
||||
Verified By: [NAME]
|
||||
EOF
|
||||
```
|
||||
|
||||
### Configure Monitoring
|
||||
|
||||
```bash
|
||||
# Enable systemd service monitoring
|
||||
sudo systemctl enable --now openclaw-gateway
|
||||
sudo systemctl enable --now litellm
|
||||
|
||||
# Configure log rotation
|
||||
sudo nano /etc/logrotate.d/openclaw
|
||||
```
|
||||
|
||||
```
|
||||
/var/log/openclaw/*.log {
|
||||
daily
|
||||
rotate 7
|
||||
compress
|
||||
delaycompress
|
||||
missingok
|
||||
notifempty
|
||||
create 0640 root root
|
||||
}
|
||||
```
|
||||
|
||||
### Update Backup Procedures
|
||||
|
||||
```bash
|
||||
# Update backup scripts to use system paths
|
||||
# See BARE_METAL_DEPLOYMENT.md for backup configuration
|
||||
|
||||
# Test backup restoration
|
||||
# Restore from new backup to verify process
|
||||
```
|
||||
|
||||
### Performance Validation
|
||||
|
||||
```bash
|
||||
# Compare performance metrics
|
||||
# Docker vs Bare Metal
|
||||
|
||||
# Response time
|
||||
time curl -s http://localhost:4000/health
|
||||
|
||||
# Database query time
|
||||
psql -U openclaw -d openclaw -c "\timing" -c "SELECT COUNT(*) FROM pg_tables;"
|
||||
|
||||
# Redis latency
|
||||
redis-cli -a your-redis-password --latency
|
||||
```
|
||||
|
||||
### Security Validation
|
||||
|
||||
```bash
|
||||
# Verify firewall rules
|
||||
sudo ufw status # Ubuntu
|
||||
sudo firewall-cmd --list-all # RHEL
|
||||
|
||||
# Verify service isolation
|
||||
netstat -tlnp | grep -E '5432|6379|11434|4000|18789'
|
||||
|
||||
# Verify SSL/TLS (if configured)
|
||||
openssl s_client -connect localhost:4000 -servername localhost
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Migration Issues
|
||||
|
||||
| Issue | Cause | Solution |
|
||||
|-------|-------|----------|
|
||||
| PostgreSQL connection refused | Wrong host in connection string | Change `postgres` to `localhost` |
|
||||
| Redis authentication failed | Password not set in bare metal | Add password to redis.conf |
|
||||
| Ollama models not found | Models not migrated | Re-pull models or restore backup |
|
||||
| LiteLLM health check fails | Database/Redis connection | Verify environment variables |
|
||||
| OpenClaw agents missing | Workspace paths incorrect | Check ~/.openclaw/agents/ |
|
||||
|
||||
### Migration Logs
|
||||
|
||||
```bash
|
||||
# Check service logs
|
||||
journalctl -u postgresql -f
|
||||
journalctl -u redis -f
|
||||
journalctl -u ollama -f
|
||||
journalctl -u litellm -f
|
||||
journalctl -u openclaw-gateway -f
|
||||
|
||||
# Check migration log
|
||||
cat /var/log/openclaw/migration-log.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Support
|
||||
|
||||
For issues or questions:
|
||||
- Check [`BARE_METAL_DEPLOYMENT.md`](./BARE_METAL_DEPLOYMENT.md)
|
||||
- Check [`NON_DOCKER_TROUBLESHOOTING.md`](./NON_DOCKER_TROUBLESHOOTING.md)
|
||||
- Open an issue on GitHub: https://github.com/Heretek-AI/heretek-openclaw/issues
|
||||
|
||||
---
|
||||
|
||||
🦞 *The thought that never ends.*
|
||||
@@ -0,0 +1,41 @@
|
||||
# GCP Deployment Guide for Heretek OpenClaw
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Last Updated:** 2026-03-31
|
||||
|
||||
For complete GCP deployment instructions, see [`deploy/gcp/README.md`](../../deploy/gcp/README.md).
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Terraform Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| [`deploy/gcp/terraform/main.tf`](../../deploy/gcp/terraform/main.tf) | Main configuration |
|
||||
| [`deploy/gcp/terraform/variables.tf`](../../deploy/gcp/terraform/variables.tf) | Input variables |
|
||||
| [`deploy/gcp/terraform/outputs.tf`](../../deploy/gcp/terraform/outputs.tf) | Output values |
|
||||
| [`deploy/gcp/terraform/vpc.tf`](../../deploy/gcp/terraform/vpc.tf) | VPC configuration |
|
||||
| [`deploy/gcp/terraform/gke.tf`](../../deploy/gcp/terraform/gke.tf) | GKE cluster |
|
||||
| [`deploy/gcp/terraform/cloud-sql.tf`](../../deploy/gcp/terraform/cloud-sql.tf) | Cloud SQL PostgreSQL |
|
||||
| [`deploy/gcp/terraform/memorystore.tf`](../../deploy/gcp/terraform/memorystore.tf) | Memorystore Redis |
|
||||
| [`deploy/gcp/terraform/artifact-registry.tf`](../../deploy/gcp/terraform/artifact-registry.tf) | Artifact Registry |
|
||||
| [`deploy/gcp/terraform/load-balancer.tf`](../../deploy/gcp/terraform/load-balancer.tf) | Cloud Load Balancing |
|
||||
|
||||
### Deploy Commands
|
||||
|
||||
```bash
|
||||
cd deploy/gcp/terraform
|
||||
terraform init
|
||||
terraform plan -var-file=terraform.dev.tfvars -out=tfplan
|
||||
terraform apply tfplan
|
||||
```
|
||||
|
||||
### kubectl Configuration
|
||||
|
||||
```bash
|
||||
gcloud container clusters get-credentials openclaw-dev-gke --region us-central1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
🦞 *The thought that never ends.*
|
||||
@@ -0,0 +1,271 @@
|
||||
# Kubernetes Deployment Guide for Heretek OpenClaw
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Last Updated:** 2026-03-31
|
||||
**OpenClaw Version:** v2026.3.28
|
||||
|
||||
This guide provides instructions for deploying Heretek OpenClaw to Kubernetes clusters using Kustomize.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Prerequisites](#prerequisites)
|
||||
3. [Directory Structure](#directory-structure)
|
||||
4. [Base Configuration](#base-configuration)
|
||||
5. [Environment Overlays](#environment-overlays)
|
||||
6. [Deployment](#deployment)
|
||||
7. [Post-Deployment](#post-deployment)
|
||||
8. [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The Kubernetes deployment uses Kustomize for environment-specific configurations:
|
||||
|
||||
- **Base manifests** - Common resources for all environments
|
||||
- **Overlays** - Environment-specific customizations (dev, staging, prod)
|
||||
|
||||
### Components
|
||||
|
||||
| Component | Resource Type | Purpose |
|
||||
|-----------|--------------|---------|
|
||||
| OpenClaw Gateway | Deployment + Service | Main application gateway |
|
||||
| LiteLLM Proxy | Deployment + Service | LLM routing and proxy |
|
||||
| PostgreSQL | StatefulSet + Service | Primary database with pgvector |
|
||||
| Redis | StatefulSet + Service | Cache and session management |
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Required Tools
|
||||
|
||||
```bash
|
||||
# kubectl
|
||||
kubectl version --client
|
||||
|
||||
# Kustomize (included in kubectl 1.14+)
|
||||
kubectl version --client --short
|
||||
```
|
||||
|
||||
### Kubernetes Requirements
|
||||
|
||||
- Kubernetes 1.26+ cluster
|
||||
- Storage class for persistent volumes
|
||||
- Ingress controller (nginx recommended)
|
||||
- Metrics server for HPA
|
||||
|
||||
---
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
deploy/kubernetes/
|
||||
├── base/
|
||||
│ ├── namespace.yaml
|
||||
│ ├── openclaw-deployment.yaml
|
||||
│ ├── openclaw-service.yaml
|
||||
│ ├── litellm-deployment.yaml
|
||||
│ ├── litellm-service.yaml
|
||||
│ ├── postgresql-statefulset.yaml
|
||||
│ └── redis-statefulset.yaml
|
||||
└── overlays/
|
||||
├── dev/
|
||||
│ └── kustomization.yaml
|
||||
├── staging/
|
||||
│ └── kustomization.yaml
|
||||
└── prod/
|
||||
└── kustomization.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Base Configuration
|
||||
|
||||
### Namespace
|
||||
|
||||
All resources are deployed to the `openclaw` namespace by default.
|
||||
|
||||
### OpenClaw Gateway
|
||||
|
||||
- **Replicas:** 1 (base)
|
||||
- **Port:** 18789 (HTTP), 18790 (WebSocket)
|
||||
- **Resources:** 2-4 CPU, 4-8Gi memory
|
||||
- **Storage:** 10Gi persistent volume
|
||||
|
||||
### LiteLLM Proxy
|
||||
|
||||
- **Replicas:** 1 (base)
|
||||
- **Port:** 4000
|
||||
- **Resources:** 1-2 CPU, 2-4Gi memory
|
||||
- **Config:** ConfigMap for model configuration
|
||||
|
||||
### PostgreSQL
|
||||
|
||||
- **Replicas:** 1
|
||||
- **Port:** 5432
|
||||
- **Image:** pgvector/pgvector:pg17
|
||||
- **Storage:** 50Gi persistent volume
|
||||
- **Extensions:** pgvector enabled
|
||||
|
||||
### Redis
|
||||
|
||||
- **Replicas:** 1
|
||||
- **Port:** 6379
|
||||
- **Image:** redis:7-alpine
|
||||
- **Storage:** 10Gi persistent volume
|
||||
- **Persistence:** AOF enabled
|
||||
|
||||
---
|
||||
|
||||
## Environment Overlays
|
||||
|
||||
### Development
|
||||
|
||||
```bash
|
||||
kubectl apply -k deploy/kubernetes/overlays/dev
|
||||
```
|
||||
|
||||
**Characteristics:**
|
||||
- Namespace: `openclaw-dev`
|
||||
- Minimal resources
|
||||
- Debug logging enabled
|
||||
- Development secrets
|
||||
|
||||
### Staging
|
||||
|
||||
```bash
|
||||
kubectl apply -k deploy/kubernetes/overlays/staging
|
||||
```
|
||||
|
||||
**Characteristics:**
|
||||
- Namespace: `openclaw-staging`
|
||||
- 2 replicas for HA
|
||||
- Production-like configuration
|
||||
- Staging secrets
|
||||
|
||||
### Production
|
||||
|
||||
```bash
|
||||
kubectl apply -k deploy/kubernetes/overlays/prod
|
||||
```
|
||||
|
||||
**Characteristics:**
|
||||
- Namespace: `openclaw-prod`
|
||||
- 3+ replicas for HA
|
||||
- Pod disruption budgets
|
||||
- Resource limits enforced
|
||||
- Production secrets (from secret manager)
|
||||
|
||||
---
|
||||
|
||||
## Deployment
|
||||
|
||||
### Step 1: Create Secrets
|
||||
|
||||
```bash
|
||||
kubectl create namespace openclaw-dev
|
||||
|
||||
kubectl create secret generic openclaw-secrets \
|
||||
--namespace openclaw-dev \
|
||||
--from-literal=database-url="postgresql://user:pass@host:5432/db" \
|
||||
--from-literal=redis-url="redis://:password@host:6379/0" \
|
||||
--from-literal=minimax-api-key="your-key" \
|
||||
--from-literal=zai-api-key="your-key"
|
||||
```
|
||||
|
||||
### Step 2: Deploy
|
||||
|
||||
```bash
|
||||
# Development
|
||||
kubectl apply -k deploy/kubernetes/overlays/dev
|
||||
|
||||
# Staging
|
||||
kubectl apply -k deploy/kubernetes/overlays/staging
|
||||
|
||||
# Production
|
||||
kubectl apply -k deploy/kubernetes/overlays/prod
|
||||
```
|
||||
|
||||
### Step 3: Verify
|
||||
|
||||
```bash
|
||||
# Check pods
|
||||
kubectl get pods -n openclaw-dev
|
||||
|
||||
# Check services
|
||||
kubectl get svc -n openclaw-dev
|
||||
|
||||
# Check logs
|
||||
kubectl logs -n openclaw-dev -l app.kubernetes.io/name=openclaw-gateway
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Post-Deployment
|
||||
|
||||
### Access Gateway
|
||||
|
||||
```bash
|
||||
# Port forward for local access
|
||||
kubectl port-forward -n openclaw-dev svc/dev-openclaw-gateway 18789:18789
|
||||
|
||||
# Or access via ingress
|
||||
curl http://openclaw.local/health
|
||||
```
|
||||
|
||||
### Access LiteLLM
|
||||
|
||||
```bash
|
||||
# Port forward
|
||||
kubectl port-forward -n openclaw-dev svc/dev-litellm 4000:4000
|
||||
|
||||
# Test endpoint
|
||||
curl http://localhost:4000/health
|
||||
```
|
||||
|
||||
### Scale Components
|
||||
|
||||
```bash
|
||||
# Scale Gateway
|
||||
kubectl scale deployment dev-openclaw-gateway --replicas=3 -n openclaw-dev
|
||||
|
||||
# Scale LiteLLM
|
||||
kubectl scale deployment dev-litellm --replicas=2 -n openclaw-dev
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
| Issue | Solution |
|
||||
|-------|----------|
|
||||
| Pods pending | Check storage class, node capacity |
|
||||
| CrashLoopBackOff | Check logs, secrets configuration |
|
||||
| Service not accessible | Check ingress, network policies |
|
||||
| Database connection failed | Verify secrets, network connectivity |
|
||||
|
||||
### Debug Commands
|
||||
|
||||
```bash
|
||||
# Describe pod for events
|
||||
kubectl describe pod <pod-name> -n openclaw-dev
|
||||
|
||||
# Check logs
|
||||
kubectl logs <pod-name> -n openclaw-dev
|
||||
|
||||
# Exec into pod
|
||||
kubectl exec -it <pod-name> -n openclaw-dev -- /bin/sh
|
||||
|
||||
# Check resource usage
|
||||
kubectl top pods -n openclaw-dev
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
🦞 *The thought that never ends.*
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,750 @@
|
||||
# VM Deployment Guide
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Last Updated:** 2026-03-31
|
||||
**OpenClaw Version:** v2026.3.28
|
||||
|
||||
This guide provides instructions for deploying the Heretek OpenClaw stack on virtual machines (VMs) across different platforms and operating systems.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Ubuntu/Debian VM Deployment](#ubuntudebian-vm-deployment)
|
||||
3. [RHEL/CentOS VM Deployment](#rhelcentos-vm-deployment)
|
||||
4. [Cloud VM Considerations](#cloud-vm-considerations)
|
||||
5. [Network Configuration](#network-configuration)
|
||||
6. [Security Hardening](#security-hardening)
|
||||
7. [Resource Optimization](#resource-optimization)
|
||||
8. [Backup and Recovery](#backup-and-recovery)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
### Supported VM Platforms
|
||||
|
||||
| Platform | Supported OS | Notes |
|
||||
|----------|--------------|-------|
|
||||
| **AWS EC2** | Ubuntu 22.04, RHEL 9 | Use Graviton (ARM) or x86_64 |
|
||||
| **GCP Compute** | Ubuntu 22.04, Rocky Linux 9 | N1, N2, or C2 machine types |
|
||||
| **Azure VM** | Ubuntu 22.04, RHEL 9 | D-series or E-series |
|
||||
| **DigitalOcean** | Ubuntu 22.04 | Droplets with 4+ GB RAM |
|
||||
| **Linode** | Ubuntu 22.04, AlmaLinux 9 | Linode 4GB+ plans |
|
||||
| **Proxmox** | Any supported OS | LXC or full VM |
|
||||
| **VMware** | Any supported OS | ESXi 7.0+ |
|
||||
|
||||
### VM Sizing Recommendations
|
||||
|
||||
| Workload | vCPU | RAM | Storage | GPU |
|
||||
|----------|------|-----|---------|-----|
|
||||
| **Development** | 2-4 | 8 GB | 50 GB SSD | Optional |
|
||||
| **Production (Small)** | 4-8 | 16 GB | 100 GB SSD | Optional |
|
||||
| **Production (Medium)** | 8-16 | 32 GB | 200 GB SSD | Recommended |
|
||||
| **Production (Large)** | 16-32 | 64 GB | 500 GB NVMe | Required |
|
||||
|
||||
---
|
||||
|
||||
## Ubuntu/Debian VM Deployment
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Ubuntu 22.04 LTS VM instance
|
||||
- SSH access with sudo privileges
|
||||
- Outbound internet access
|
||||
- Minimum 4 vCPU, 8 GB RAM
|
||||
|
||||
### Quick Start Script
|
||||
|
||||
```bash
|
||||
# Download and run the VM installer
|
||||
curl -fsSL https://raw.githubusercontent.com/Heretek-AI/heretek-openclaw/main/scripts/install/vm-install.sh -o vm-install.sh
|
||||
chmod +x vm-install.sh
|
||||
sudo ./vm-install.sh --os ubuntu --gpu none
|
||||
```
|
||||
|
||||
### Manual Installation
|
||||
|
||||
#### Step 1: System Update
|
||||
|
||||
```bash
|
||||
# Update system packages
|
||||
sudo apt-get update && sudo apt-get upgrade -y
|
||||
|
||||
# Install essential tools
|
||||
sudo apt-get install -y \
|
||||
curl \
|
||||
git \
|
||||
wget \
|
||||
gnupg \
|
||||
ca-certificates \
|
||||
software-properties-common
|
||||
```
|
||||
|
||||
#### Step 2: Install Dependencies
|
||||
|
||||
```bash
|
||||
# Run Ubuntu dependencies script
|
||||
curl -fsSL https://raw.githubusercontent.com/Heretek-AI/heretek-openclaw/main/scripts/install/ubuntu-deps.sh -o ubuntu-deps.sh
|
||||
chmod +x ubuntu-deps.sh
|
||||
sudo ./ubuntu-deps.sh
|
||||
```
|
||||
|
||||
#### Step 3: Clone Repository
|
||||
|
||||
```bash
|
||||
# Clone OpenClaw repository
|
||||
git clone https://github.com/Heretek-AI/heretek-openclaw.git
|
||||
cd heretek-openclaw
|
||||
|
||||
# Verify repository structure
|
||||
ls -la
|
||||
```
|
||||
|
||||
#### Step 4: Configure Environment
|
||||
|
||||
```bash
|
||||
# Copy environment template
|
||||
cp .env.vm.example .env
|
||||
|
||||
# Edit with your values
|
||||
nano .env
|
||||
```
|
||||
|
||||
#### Step 5: Run Post-Installation
|
||||
|
||||
```bash
|
||||
# Run post-installation script
|
||||
sudo ./scripts/install/post-install.sh
|
||||
|
||||
# Verify installation
|
||||
./scripts/health-check.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## RHEL/CentOS VM Deployment
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- RHEL 9 or Rocky Linux 9 VM instance
|
||||
- SSH access with sudo privileges
|
||||
- Outbound internet access
|
||||
- Minimum 4 vCPU, 8 GB RAM
|
||||
|
||||
### Quick Start Script
|
||||
|
||||
```bash
|
||||
# Download and run the VM installer
|
||||
curl -fsSL https://raw.githubusercontent.com/Heretek-AI/heretek-openclaw/main/scripts/install/vm-install.sh -o vm-install.sh
|
||||
chmod +x vm-install.sh
|
||||
sudo ./vm-install.sh --os rhel --gpu none
|
||||
```
|
||||
|
||||
### Manual Installation
|
||||
|
||||
#### Step 1: System Update
|
||||
|
||||
```bash
|
||||
# Update system packages
|
||||
sudo dnf update -y
|
||||
|
||||
# Install essential tools
|
||||
sudo dnf install -y \
|
||||
curl \
|
||||
git \
|
||||
wget \
|
||||
gnupg2 \
|
||||
ca-certificates \
|
||||
epel-release
|
||||
```
|
||||
|
||||
#### Step 2: Install Dependencies
|
||||
|
||||
```bash
|
||||
# Run RHEL dependencies script
|
||||
curl -fsSL https://raw.githubusercontent.com/Heretek-AI/heretek-openclaw/main/scripts/install/rhel-deps.sh -o rhel-deps.sh
|
||||
chmod +x rhel-deps.sh
|
||||
sudo ./rhel-deps.sh
|
||||
```
|
||||
|
||||
#### Step 3: Clone Repository
|
||||
|
||||
```bash
|
||||
# Clone OpenClaw repository
|
||||
git clone https://github.com/Heretek-AI/heretek-openclaw.git
|
||||
cd heretek-openclaw
|
||||
|
||||
# Verify repository structure
|
||||
ls -la
|
||||
```
|
||||
|
||||
#### Step 4: Configure Environment
|
||||
|
||||
```bash
|
||||
# Copy environment template
|
||||
cp .env.vm.example .env
|
||||
|
||||
# Edit with your values
|
||||
nano .env
|
||||
```
|
||||
|
||||
#### Step 5: Run Post-Installation
|
||||
|
||||
```bash
|
||||
# Run post-installation script
|
||||
sudo ./scripts/install/post-install.sh
|
||||
|
||||
# Verify installation
|
||||
./scripts/health-check.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cloud VM Considerations
|
||||
|
||||
### AWS EC2
|
||||
|
||||
#### Instance Types
|
||||
|
||||
| Use Case | Instance Type | vCPU | RAM | Notes |
|
||||
|----------|---------------|------|-----|-------|
|
||||
| Development | t3.medium | 2 | 4 GB | Burstable |
|
||||
| Production Small | m5.large | 2 | 8 GB | General purpose |
|
||||
| Production Medium | m5.xlarge | 4 | 16 GB | General purpose |
|
||||
| Production Large | m5.2xlarge | 8 | 32 GB | General purpose |
|
||||
| GPU Workload | g5.xlarge | 4 | 16 GB | NVIDIA A10G |
|
||||
|
||||
#### Security Group Rules
|
||||
|
||||
```bash
|
||||
# Required inbound rules
|
||||
Type: SSH, Port: 22, Source: Your IP
|
||||
Type: Custom TCP, Port: 4000, Source: Your IP (LiteLLM)
|
||||
Type: Custom TCP, Port: 18789, Source: Your IP (OpenClaw)
|
||||
Type: Custom TCP, Port: 3000, Source: Your IP (Dashboard - optional)
|
||||
```
|
||||
|
||||
#### IAM Role (Optional)
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"s3:GetObject",
|
||||
"s3:PutObject"
|
||||
],
|
||||
"Resource": "arn:aws:s3:::your-backup-bucket/*"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### User Data Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# EC2 User Data for automatic installation
|
||||
yum update -y
|
||||
yum install -y git curl wget
|
||||
git clone https://github.com/Heretek-AI/heretek-openclaw.git
|
||||
cd heretek-openclaw
|
||||
./scripts/install/rhel-deps.sh
|
||||
./scripts/install/post-install.sh
|
||||
```
|
||||
|
||||
### GCP Compute Engine
|
||||
|
||||
#### Machine Types
|
||||
|
||||
| Use Case | Machine Type | vCPU | RAM | Notes |
|
||||
|----------|--------------|------|-----|-------|
|
||||
| Development | e2-medium | 2 | 4 GB | Balanced |
|
||||
| Production Small | n2-standard-2 | 2 | 8 GB | General purpose |
|
||||
| Production Medium | n2-standard-4 | 4 | 16 GB | General purpose |
|
||||
| Production Large | n2-standard-8 | 8 | 32 GB | General purpose |
|
||||
| GPU Workload | g2-standard-4 | 4 | 24 GB | NVIDIA L4 |
|
||||
|
||||
#### Firewall Rules
|
||||
|
||||
```bash
|
||||
# Create firewall rule
|
||||
gcloud compute firewall-rules create openclaw-allow \
|
||||
--allow tcp:22,tcp:4000,tcp:18789,tcp:3000 \
|
||||
--source-ranges YOUR_IP/32 \
|
||||
--target-tags openclaw-instance
|
||||
```
|
||||
|
||||
#### Service Account
|
||||
|
||||
```bash
|
||||
# Create service account
|
||||
gcloud iam service-accounts create openclaw-sa \
|
||||
--display-name "OpenClaw Service Account"
|
||||
|
||||
# Grant storage access
|
||||
gcloud projects add-iam-policy-binding PROJECT_ID \
|
||||
--member "serviceAccount:openclaw-sa@PROJECT_ID.iam.gserviceaccount.com" \
|
||||
--role "roles/storage.objectAdmin"
|
||||
```
|
||||
|
||||
### Azure VM
|
||||
|
||||
#### VM Sizes
|
||||
|
||||
| Use Case | VM Size | vCPU | RAM | Notes |
|
||||
|----------|---------|------|-----|-------|
|
||||
| Development | Standard_B2s | 2 | 4 GB | Burstable |
|
||||
| Production Small | Standard_D2s_v3 | 2 | 8 GB | General purpose |
|
||||
| Production Medium | Standard_D4s_v3 | 4 | 16 GB | General purpose |
|
||||
| Production Large | Standard_D8s_v3 | 8 | 32 GB | General purpose |
|
||||
| GPU Workload | Standard_NC4as_T4_v3 | 4 | 28 GB | NVIDIA T4 |
|
||||
|
||||
#### Network Security Group
|
||||
|
||||
```bash
|
||||
# Create NSG rule
|
||||
az network nsg rule create \
|
||||
--resource-group openclaw-rg \
|
||||
--nsg-name openclaw-nsg \
|
||||
--name AllowOpenClaw \
|
||||
--priority 1000 \
|
||||
--source-address-prefixes YOUR_IP/32 \
|
||||
--destination-port-ranges 22 4000 18789 3000 \
|
||||
--access Allow \
|
||||
--protocol Tcp
|
||||
```
|
||||
|
||||
#### Managed Identity
|
||||
|
||||
```bash
|
||||
# Create managed identity
|
||||
az identity create \
|
||||
--resource-group openclaw-rg \
|
||||
--name openclaw-identity
|
||||
|
||||
# Grant storage access
|
||||
az role assignment create \
|
||||
--assignee OBJECT_ID \
|
||||
--role "Storage Blob Data Contributor" \
|
||||
--scope /subscriptions/SUBSCRIPTION_ID/resourceGroups/openclaw-rg
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Network Configuration
|
||||
|
||||
### Static IP Configuration
|
||||
|
||||
#### Ubuntu/Debian (netplan)
|
||||
|
||||
```yaml
|
||||
# /etc/netplan/01-netcfg.yaml
|
||||
network:
|
||||
version: 2
|
||||
ethernets:
|
||||
eth0:
|
||||
addresses:
|
||||
- 192.168.1.100/24
|
||||
routes:
|
||||
- to: default
|
||||
via: 192.168.1.1
|
||||
nameservers:
|
||||
addresses:
|
||||
- 1.1.1.1
|
||||
- 8.8.8.8
|
||||
```
|
||||
|
||||
#### RHEL/CentOS (NetworkManager)
|
||||
|
||||
```bash
|
||||
# Configure static IP
|
||||
nmcli connection modify eth0 \
|
||||
ipv4.addresses 192.168.1.100/24 \
|
||||
ipv4.gateway 192.168.1.1 \
|
||||
ipv4.dns "1.1.1.1 8.8.8.8" \
|
||||
ipv4.method manual
|
||||
|
||||
nmcli connection up eth0
|
||||
```
|
||||
|
||||
### DNS Configuration
|
||||
|
||||
```bash
|
||||
# Configure DNS resolver
|
||||
sudo nano /etc/systemd/resolved.conf
|
||||
```
|
||||
|
||||
```ini
|
||||
[Resolve]
|
||||
DNS=1.1.1.1 8.8.8.8
|
||||
FallbackDNS=9.9.9.9
|
||||
DNSSEC=allow-downgrade
|
||||
```
|
||||
|
||||
```bash
|
||||
# Restart systemd-resolved
|
||||
sudo systemctl restart systemd-resolved
|
||||
```
|
||||
|
||||
### Hostname Configuration
|
||||
|
||||
```bash
|
||||
# Set hostname
|
||||
sudo hostnamectl set-hostname openclaw-server
|
||||
|
||||
# Update /etc/hosts
|
||||
sudo nano /etc/hosts
|
||||
```
|
||||
|
||||
```
|
||||
127.0.0.1 localhost localhost.localdomain
|
||||
192.168.1.100 openclaw-server openclaw
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Hardening
|
||||
|
||||
### SSH Hardening
|
||||
|
||||
```bash
|
||||
# Edit SSH configuration
|
||||
sudo nano /etc/ssh/sshd_config
|
||||
```
|
||||
|
||||
```ini
|
||||
# SSH Hardening
|
||||
Port 2222 # Change from default
|
||||
PermitRootLogin no
|
||||
PasswordAuthentication no
|
||||
PubkeyAuthentication yes
|
||||
AuthenticationMethods publickey
|
||||
MaxAuthTries 3
|
||||
ClientAliveInterval 300
|
||||
ClientAliveCountMax 2
|
||||
X11Forwarding no
|
||||
AllowTcpForwarding no
|
||||
```
|
||||
|
||||
```bash
|
||||
# Restart SSH
|
||||
sudo systemctl restart sshd
|
||||
```
|
||||
|
||||
### Fail2Ban Configuration
|
||||
|
||||
```bash
|
||||
# Install Fail2Ban
|
||||
sudo apt-get install -y fail2ban # Ubuntu
|
||||
sudo dnf install -y fail2ban # RHEL
|
||||
|
||||
# Configure Fail2Ban
|
||||
sudo nano /etc/fail2ban/jail.local
|
||||
```
|
||||
|
||||
```ini
|
||||
[DEFAULT]
|
||||
bantime = 3600
|
||||
findtime = 600
|
||||
maxretry = 5
|
||||
|
||||
[sshd]
|
||||
enabled = true
|
||||
port = 2222
|
||||
filter = sshd
|
||||
logpath = /var/log/auth.log
|
||||
maxretry = 3
|
||||
|
||||
[openclaw]
|
||||
enabled = true
|
||||
port = 4000,18789
|
||||
filter = openclaw
|
||||
logpath = /var/log/openclaw/*.log
|
||||
maxretry = 10
|
||||
```
|
||||
|
||||
```bash
|
||||
# Create OpenClaw filter
|
||||
sudo nano /etc/fail2ban/filter.d/openclaw.conf
|
||||
```
|
||||
|
||||
```ini
|
||||
[Definition]
|
||||
failregex = ^.*Failed authentication.*$
|
||||
^.*Invalid API key.*$
|
||||
^.*Rate limit exceeded.*$
|
||||
ignoreregex =
|
||||
```
|
||||
|
||||
```bash
|
||||
# Start Fail2Ban
|
||||
sudo systemctl enable fail2ban
|
||||
sudo systemctl start fail2ban
|
||||
```
|
||||
|
||||
### SELinux Configuration (RHEL)
|
||||
|
||||
```bash
|
||||
# Check SELinux status
|
||||
getenforce
|
||||
|
||||
# Set to permissive for testing
|
||||
sudo setenforce 0
|
||||
|
||||
# Create SELinux policy for OpenClaw
|
||||
sudo nano /etc/selinux/targeted/src/policy/local.te
|
||||
```
|
||||
|
||||
```
|
||||
module openclaw 1.0;
|
||||
|
||||
require {
|
||||
type http_port_t;
|
||||
type postgresql_port_t;
|
||||
class tcp_socket name_connect;
|
||||
}
|
||||
|
||||
# Allow OpenClaw to bind to ports
|
||||
allow http_port_t self:tcp_socket name_connect;
|
||||
allow postgresql_port_t self:tcp_socket name_connect;
|
||||
```
|
||||
|
||||
```bash
|
||||
# Compile and install policy
|
||||
cd /etc/selinux/targeted/src/policy
|
||||
make -f /usr/share/selinux/devel/Makefile
|
||||
sudo semodule -i openclaw.pp
|
||||
|
||||
# Re-enable SELinux
|
||||
sudo setenforce 1
|
||||
```
|
||||
|
||||
### Audit Logging
|
||||
|
||||
```bash
|
||||
# Install auditd
|
||||
sudo apt-get install -y auditd # Ubuntu
|
||||
sudo dnf install -y audit # RHEL
|
||||
|
||||
# Configure audit rules
|
||||
sudo auditctl -w /etc/openclaw -p wa -k openclaw-config
|
||||
sudo auditctl -w /root/.openclaw -p wa -k openclaw-data
|
||||
sudo auditctl -w /etc/litellm -p wa -k litellm-config
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Resource Optimization
|
||||
|
||||
### Memory Optimization
|
||||
|
||||
```bash
|
||||
# Configure swap (if needed)
|
||||
sudo fallocate -l 4G /swapfile
|
||||
sudo chmod 600 /swapfile
|
||||
sudo mkswap /swapfile
|
||||
sudo swapon /swapfile
|
||||
|
||||
# Make swap permanent
|
||||
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
|
||||
|
||||
# Verify swap
|
||||
free -h
|
||||
```
|
||||
|
||||
### CPU Optimization
|
||||
|
||||
```bash
|
||||
# Set CPU governor to performance
|
||||
sudo apt-get install -y linux-tools-common linux-tools-generic
|
||||
sudo cpupower frequency-set -g performance
|
||||
|
||||
# Verify CPU governor
|
||||
cpupower frequency-info
|
||||
```
|
||||
|
||||
### Disk I/O Optimization
|
||||
|
||||
```bash
|
||||
# Check current I/O scheduler
|
||||
cat /sys/block/sda/queue/scheduler
|
||||
|
||||
# Set to deadline for better performance
|
||||
echo deadline | sudo tee /sys/block/sda/queue/scheduler
|
||||
|
||||
# Make permanent
|
||||
sudo nano /etc/default/grub
|
||||
```
|
||||
|
||||
```
|
||||
GRUB_CMDLINE_LINUX="elevator=deadline"
|
||||
```
|
||||
|
||||
```bash
|
||||
# Update GRUB
|
||||
sudo update-grub # Ubuntu
|
||||
sudo grub2-mkconfig -o /boot/grub2/grub.cfg # RHEL
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Backup and Recovery
|
||||
|
||||
### Automated Backup Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# /usr/local/bin/openclaw-backup.sh
|
||||
|
||||
BACKUP_DIR="/backup/openclaw"
|
||||
DATE=$(date +%Y%m%d_%H%M%S)
|
||||
RETENTION_DAYS=7
|
||||
|
||||
# Create backup directory
|
||||
mkdir -p $BACKUP_DIR
|
||||
|
||||
# Backup OpenClaw configuration
|
||||
tar -czf $BACKUP_DIR/openclaw-config-$DATE.tar.gz \
|
||||
~/.openclaw/ \
|
||||
/etc/litellm/ \
|
||||
/etc/openclaw/
|
||||
|
||||
# Backup PostgreSQL
|
||||
pg_dump -U openclaw openclaw > $BACKUP_DIR/openclaw-db-$DATE.sql
|
||||
|
||||
# Backup Redis
|
||||
redis-cli -a $REDIS_PASSWORD BGSAVE
|
||||
cp /var/lib/redis/dump.rdb $BACKUP_DIR/redis-dump-$DATE.rdb
|
||||
|
||||
# Compress database backup
|
||||
gzip $BACKUP_DIR/openclaw-db-$DATE.sql
|
||||
|
||||
# Remove old backups
|
||||
find $BACKUP_DIR -name "*.tar.gz" -mtime +$RETENTION_DAYS -delete
|
||||
find $BACKUP_DIR -name "*.sql.gz" -mtime +$RETENTION_DAYS -delete
|
||||
find $BACKUP_DIR -name "*.rdb" -mtime +$RETENTION_DAYS -delete
|
||||
|
||||
# Log backup
|
||||
echo "Backup completed: $DATE" >> /var/log/openclaw-backup.log
|
||||
```
|
||||
|
||||
### Systemd Backup Timer
|
||||
|
||||
```ini
|
||||
# /etc/systemd/system/openclaw-backup.timer
|
||||
[Unit]
|
||||
Description=Daily OpenClaw Backup
|
||||
Documentation=file:///root/heretek/heretek-openclaw/docs/operations/AUTOMATED_BACKUP.md
|
||||
|
||||
[Timer]
|
||||
OnCalendar=daily
|
||||
Persistent=true
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
```
|
||||
|
||||
```ini
|
||||
# /etc/systemd/system/openclaw-backup.service
|
||||
[Unit]
|
||||
Description=OpenClaw Backup Service
|
||||
After=postgresql.service redis.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/local/bin/openclaw-backup.sh
|
||||
User=root
|
||||
Group=root
|
||||
```
|
||||
|
||||
```bash
|
||||
# Enable backup timer
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable openclaw-backup.timer
|
||||
sudo systemctl start openclaw-backup.timer
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common VM Issues
|
||||
|
||||
| Issue | Solution |
|
||||
|-------|----------|
|
||||
| VM won't boot after installation | Check cloud-init logs: `/var/log/cloud-init.log` |
|
||||
| Network connectivity issues | Verify security group/firewall rules |
|
||||
| Performance degradation | Check resource allocation, enable swap |
|
||||
| SSH connection refused | Verify SSH port and security group |
|
||||
| Disk space warnings | Extend volume or clean up old backups |
|
||||
|
||||
### Cloud-Specific Commands
|
||||
|
||||
#### AWS EC2
|
||||
|
||||
```bash
|
||||
# Check instance status
|
||||
aws ec2 describe-instance-status --instance-ids i-1234567890abcdef0
|
||||
|
||||
# Get system log
|
||||
aws ec2 get-console-output --instance-id i-1234567890abcdef0
|
||||
|
||||
# Reboot instance
|
||||
aws ec2 reboot-instances --instance-ids i-1234567890abcdef0
|
||||
```
|
||||
|
||||
#### GCP Compute
|
||||
|
||||
```bash
|
||||
# Check instance status
|
||||
gcloud compute instances describe INSTANCE_NAME
|
||||
|
||||
# Get serial port output
|
||||
gcloud compute instances get-serial-port-output INSTANCE_NAME
|
||||
|
||||
# Reset instance
|
||||
gcloud compute instances reset INSTANCE_NAME
|
||||
```
|
||||
|
||||
#### Azure VM
|
||||
|
||||
```bash
|
||||
# Check VM status
|
||||
az vm show -d -g openclaw-rg -n openclaw-vm
|
||||
|
||||
# Get boot diagnostics
|
||||
az vm boot-diagnostics get-boot-log -g openclaw-rg -n openclaw-vm
|
||||
|
||||
# Restart VM
|
||||
az vm restart -g openclaw-rg -n openclaw-vm
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
After successful VM deployment:
|
||||
|
||||
1. **Configure Monitoring** - Set up cloud monitoring and alerts
|
||||
2. **Enable Auto-Scaling** (if applicable) - Configure scaling policies
|
||||
3. **Set Up Backup** - Configure automated backups to cloud storage
|
||||
4. **Configure DNS** - Set up domain name and SSL certificates
|
||||
5. **Test Failover** - Verify backup and recovery procedures
|
||||
|
||||
---
|
||||
|
||||
## Support
|
||||
|
||||
For issues or questions:
|
||||
- Check [`NON_DOCKER_TROUBLESHOOTING.md`](./NON_DOCKER_TROUBLESHOOTING.md)
|
||||
- Review [`BARE_METAL_DEPLOYMENT.md`](./BARE_METAL_DEPLOYMENT.md)
|
||||
- Open an issue on GitHub: https://github.com/Heretek-AI/heretek-openclaw/issues
|
||||
|
||||
---
|
||||
|
||||
🦞 *The thought that never ends.*
|
||||
Reference in New Issue
Block a user