mirror of
https://github.com/openclaw/clawdinators.git
synced 2026-07-01 20:24:02 -04:00
280744ce0c
What: - bound CLAWDINATOR image artifact retention with S3 lifecycle, AMI pruning, and import provenance tags - reduce the AWS fleet to Babelfish-only and make GitHub credentials opt-in per host - disable the AMI build, nix-openclaw bump, and release workflows by moving them out of .github/workflows/ - update operator docs for the new explicit build and deploy model Why: - stop unbounded S3 and snapshot growth from image builds - remove unattended resurrection paths and shut down the unused t3.large instances - keep the remaining Babelfish host running without GitHub App credentials or sync timers Tests: - `nix shell nixpkgs#shellcheck nixpkgs#shfmt -c bash scripts/lint-shell.sh` (pass) - `nix build .#nixosConfigurations.clawdinator-babelfish.config.system.build.toplevel .#nixosConfigurations.clawdinator-1.config.system.build.toplevel .#nixosConfigurations.clawdinator-2.config.system.build.toplevel` (pass) - `AWS_PROFILE=homelab-admin TF_VAR_aws_region=eu-central-1 TF_VAR_ami_id=ami-0a9abe17feeee0079 TF_VAR_ssh_public_key="$(cat ~/.ssh/id_ed25519.pub)" nix shell nixpkgs#opentofu -c sh -lc 'tofu fmt -check && tofu validate'` (pass) - live AWS apply: destroyed `clawdinator-1` and `clawdinator-2`, replaced Babelfish, and verified only `Fleet Deploy` remains active in GitHub Actions
2.0 KiB
2.0 KiB
Deployment model (fast + declarative)
This repo uses a two-lane delivery model:
-
Lane A: Base AMI (slow path, rare)
- Purpose: reliable boot substrate (Nix + systemd + networking + EFS + SSM + bootstrap services).
- Built by: explicit operator flow. The old
.github/workflows/image-build.ymlworkflow is intentionally disabled under.github/workflows-disabled/. - Tradeoff: EC2 VM Import is slow/variable; do not run per-commit.
-
Lane B: Release + Fleet switch (fast path, manual)
- Purpose: ship config/app changes quickly while staying reproducible.
- Built by: explicit operator flow. The old
.github/workflows/release.ymlworkflow is intentionally disabled under.github/workflows-disabled/. - Steps:
- Fail-fast eval of NixOS configs.
- Upload bootstrap bundles to S3 (repo seeds, workspace, secrets references).
- Deploy via SSM:
nixos-rebuild switch --flake github:openclaw/clawdinators/<rev>#<host>.
Primitives
- Source of truth: git SHA +
flake.lock. - Artifact: NixOS system closure for each host config.
- Distribution: Nix substituters + S3 bootstrap bundle.
- Activation:
nixos-rebuild switch. - Rollout: canary order (clawdinator-1 then clawdinator-2).
- Rollback: redeploy an older git SHA.
Tradeoffs
-
Pros:
- Fast deploys (minutes) vs AMI import (tens of minutes).
- Cattle-friendly: hosts stay disposable; state lives on EFS.
- Reproducible: deploys are pinned to a git SHA.
-
Cons:
nixos-rebuild switchrestarts services; expect brief bot downtime per release.- Requires AWS SSM permissions for the CI user (see
infra/opentofu/aws/main.tf). - If Nix caches miss, deploys can be slower (still typically faster than AMI import).
Infra requirement: CI SSM permissions
The old release.yml workflow used aws ssm send-command; that path is intentionally disabled now.
After pulling these changes, run tofu apply in infra/opentofu/aws (with admin creds)
so the CI IAM policy includes the FleetDeploySSM statement.