Files
John Doe 98a9fc7f81 feat: Implement A2A Protocol infrastructure
- Added Redis-based A2A messaging skill (skills/a2a-message-send/a2a-redis.js)
  - sendMessage, broadcast, getMessages, pingAgent functions
  - Message persistence via Redis lists
  - Agent registration and discovery
  - Inbox management (count, clear, mark as read)

- Added Redis-WebSocket bridge module (modules/communication/redis-websocket-bridge.js)
  - Bridges Redis pub/sub to WebSocket clients
  - Real-time message forwarding for dashboard
  - Client management and heartbeat support

- Added OpenClaw Gateway server (gateway/openclaw-gateway.js)
  - WebSocket RPC server on port 18789
  - HTTP health endpoints on port 18788
  - Agent registration and message routing
  - Redis integration for offline message queuing

- Added Docker configuration
  - docker-compose.redis.yml (Redis service)
  - docker-compose.gateway.yml (Gateway service)
  - Dockerfile.gateway (Gateway container)

- Added documentation
  - DEBUG_A2A.md (debug report with findings and fixes)
  - skills/a2a-message-send/SKILL.md (skill documentation)

Fixes: A2A protocol was non-functional due to missing implementation
components. Tests referenced modules that didn't exist.
The Collective could not communicate between agents.
2026-04-01 12:29:33 -04:00

489 lines
17 KiB
Markdown

# A2A Protocol Debug Report
**Date:** 2026-04-01
**Status:** Fixed
**Version:** 1.0.0
## Executive Summary
The A2A (Agent-to-Agent) Protocol in OpenClaw was non-functional due to missing implementation components. This report documents the bugs found, root causes identified, and fixes applied.
---
## Bugs Found
### Bug 1: Missing Redis-based A2A Skill Module
**Severity:** Critical
**File:** `skills/a2a-message-send/a2a-redis.js`
**Status:** ✅ Fixed
**Description:**
The test files (`tests/skills/a2a-message-send.test.js` and `tests/integration/a2a-communication.test.ts`) reference a module at `skills/a2a-message-send/a2a-redis.js` that did not exist. This module is responsible for Redis-based agent-to-agent messaging.
**Expected Functions (from tests):**
- `sendMessage(from, to, content, options)`
- `broadcast(from, content)`
- `broadcastToAgents(from, agents, content)`
- `broadcastToTriad(from, content)`
- `getMessages(agentId, limit)`
- `getUnreadMessages(agentId, limit)`
- `markAsRead(agentId, messageId)`
- `countMessages(agentId)`
- `clearMessages(agentId)`
- `pingAgent(from, to)`
- `pingTriad(from)`
- `validateMessage(message)`
- `validateAgentId(agentId)`
- `registerAgent(agentId, metadata)`
- `unregisterAgent(agentId)`
- `getRegisteredAgents()`
**Impact:**
- Agents could not communicate via Redis
- Message persistence was unavailable
- Tests failed with `MODULE_NOT_FOUND` errors
- The Collective could not coordinate actions
---
### Bug 2: Missing Redis-WebSocket Bridge Module
**Severity:** Critical
**File:** `modules/communication/redis-websocket-bridge.js`
**Status:** ✅ Fixed
**Description:**
The test file `tests/unit/redis-bridge.test.ts` and `tests/integration/websocket-bridge.test.ts` reference a module at `modules/communication/redis-websocket-bridge.js` that did not exist. The entire `modules/` directory was missing.
**Expected Class (from tests):**
- `RedisToWebSocketBridge`
- `start()` - Start the bridge
- `stop()` - Stop the bridge
- `broadcast(message)` - Broadcast to WebSocket clients
- `getStatus()` - Get bridge status
- `clients` - Set of connected WebSocket clients
- `redisClient` - Redis pub/sub client
- `isRunning` - Running status flag
**Expected Channels:**
- `CHANNELS.A2A` - 'openclaw:a2a:broadcast'
- `CHANNELS.HEARTBEAT` - 'openclaw:a2a:heartbeat'
**Impact:**
- No real-time WebSocket updates from Redis pub/sub
- Dashboard could not receive live A2A message updates
- Tests failed with import errors
---
### Bug 3: Missing Gateway Server Implementation
**Severity:** Critical
**File:** `gateway/openclaw-gateway.js`
**Status:** ✅ Fixed
**Description:**
The `agent-client.js` library contains a `GatewayClient` class that connects to a WebSocket server at `ws://127.0.0.1:18789`, but no gateway server implementation existed to listen on this port.
**Expected Gateway Features:**
- WebSocket server on port 18789 at path `/a2a`
- HTTP endpoints for health checks on port 18788
- Agent registration and discovery
- Message routing between agents
- Redis integration for message persistence
- Broadcast support
- Health check/ping endpoints
**Impact:**
- Agent WebSocket connections failed immediately
- No A2A message routing
- Agent discovery impossible
- Gateway connection errors in agent logs
---
### Bug 4: Architecture Mismatch
**Severity:** High
**Files:** Multiple
**Status:** ✅ Documented and Resolved
**Description:**
The codebase had conflicting architectural approaches:
1. `agent-client.js` uses WebSocket Gateway RPC (port 18789)
2. Tests expect Redis pub/sub messaging
3. LiteLLM A2A protocol (`litellm/litellm/a2a_protocol/`) is designed for external A2A SDK agents, not internal OpenClaw communication
**Root Cause:**
Architecture shifted to Gateway-based WebSocket RPC but:
- Gateway server was never implemented
- Tests weren't updated to match new architecture
- Redis infrastructure existed in docker-compose.yml but had no consumers
**Resolution:**
Implemented BOTH approaches:
- Redis-based A2A messaging for persistence and async communication
- Gateway WebSocket RPC for real-time agent communication
- Bridge module to connect Redis pub/sub to WebSocket clients
---
### Bug 5: Docker Compose Not Modular
**Severity:** Medium
**File:** `docker-compose.yml`
**Status:** ✅ Fixed
**Description:**
The monolithic `docker-compose.yml` made it difficult to deploy Redis and Gateway services independently. Additionally, the Redis-to-WebSocket bridge service was commented out with a note that `Dockerfile.websocket-bridge` was missing.
**Resolution:**
Created modular compose files:
- `docker-compose.redis.yml` - Redis service
- `docker-compose.gateway.yml` - Gateway service
- `Dockerfile.gateway` - Gateway container image
---
## Fixes Applied
### Fix 1: Created Redis A2A Skill Module
**File:** `skills/a2a-message-send/a2a-redis.js`
**Implementation Details:**
- Full Redis-based messaging with ioredis
- Message persistence using Redis lists
- Agent registration using Redis sets
- Broadcast via Redis pub/sub
- Inbox management (get, count, clear, mark as read)
- Ping/pong health checks with latency measurement
- Message validation
- Priority messaging support
- Known agents list (22 agents in the collective)
**Redis Data Structures:**
```
openclaw:a2a:inbox:{agentId} - List of messages
openclaw:a2a:agents - Set of registered agents
openclaw:a2a:agent:{agentId} - Hash with agent metadata
openclaw:a2a:broadcast - Pub/sub channel
openclaw:a2a:read:{agentId} - Set of read message IDs
```
---
### Fix 2: Created Redis-WebSocket Bridge Module
**File:** `modules/communication/redis-websocket-bridge.js`
**Implementation Details:**
- `RedisToWebSocketBridge` class extending EventEmitter
- WebSocket server for client connections
- Redis pub/sub subscription
- Message forwarding from Redis to WebSocket clients
- Client management (add, remove, count)
- Heartbeat/ping-pong support
- Automatic Redis reconnection with exponential backoff
- Singleton pattern with `getBridge()`, `startBridge()`, `stopBridge()` functions
**Architecture:**
```
Redis Pub/Sub --> Bridge --> WebSocket Clients
└── Subscribe to: openclaw:a2a:broadcast
openclaw:a2a:heartbeat
```
---
### Fix 3: Created OpenClaw Gateway Server
**File:** `gateway/openclaw-gateway.js`
**Implementation Details:**
- `OpenClawGateway` class extending EventEmitter
- WebSocket server on port 18789 at `/a2a`
- HTTP server on port 18788 for health endpoints
- Agent registration and tracking
- Message routing between connected agents
- Message queuing in Redis for offline agents
- Broadcast to all connected agents
- Agent discovery endpoint
- Health check endpoints
- Heartbeat mechanism for connection management
**HTTP Endpoints:**
- `GET /health` - Gateway status
- `GET /agents` - Connected agents list
**WebSocket Message Types:**
- `register` - Agent registration
- `message` - A2A message routing
- `response` - Response to pending message
- `broadcast` - Broadcast to all agents
- `ping/pong` - Health check
- `discover` - Get agent list
- `health` - Get gateway status
---
### Fix 4: Created Docker Configuration
**Files:**
- `docker-compose.redis.yml` - Redis service
- `docker-compose.gateway.yml` - Gateway service
- `Dockerfile.gateway` - Gateway container
**Usage:**
```bash
# Start Redis only
docker compose -f docker-compose.redis.yml up -d
# Start Gateway with Redis
docker compose -f docker-compose.redis.yml -f docker-compose.gateway.yml up -d
# Start full stack
docker compose -f docker-compose.yml -f docker-compose.redis.yml -f docker-compose.gateway.yml up -d
```
---
### Fix 5: Created Skill Documentation
**File:** `skills/a2a-message-send/SKILL.md`
Comprehensive documentation including:
- Architecture overview
- Installation instructions
- Usage examples
- API reference tables
- Message format specification
- Error handling guide
- Testing instructions
---
## Testing
### Run Unit Tests
```bash
# A2A message send skill tests
node --test tests/skills/a2a-message-send.test.js
# Redis bridge tests
npm run test:unit -- redis-bridge
# Gateway tests (when available)
npm run test:unit -- gateway
```
### Run Integration Tests
```bash
# A2A communication integration tests
npm run test:integration -- a2a-communication
# WebSocket bridge integration tests
npm run test:integration -- websocket-bridge
```
### Manual Testing
```bash
# Start Redis
docker compose -f docker-compose.redis.yml up -d
# Test Redis connection
redis-cli ping
# Expected: PONG
# Start Gateway
docker compose -f docker-compose.gateway.yml up -d
# Test Gateway health
curl http://localhost:18788/health
# Test WebSocket connection
wscat -c ws://localhost:18789/a2a
```
---
## Verification Checklist
- [x] Redis A2A skill module created (`skills/a2a-message-send/a2a-redis.js`)
- [x] Redis-WebSocket bridge module created (`modules/communication/redis-websocket-bridge.js`)
- [x] Gateway server created (`gateway/openclaw-gateway.js`)
- [x] Docker compose files created (`docker-compose.redis.yml`, `docker-compose.gateway.yml`)
- [x] Gateway Dockerfile created (`Dockerfile.gateway`)
- [x] Skill documentation created (`skills/a2a-message-send/SKILL.md`)
- [ ] Tests passing (requires manual verification)
- [ ] Redis service running (requires deployment)
- [ ] Gateway service running (requires deployment)
---
## Deployment Steps
### 1. Start Redis Service
```bash
cd heretek-openclaw-core
docker compose -f docker-compose.redis.yml up -d
# Verify
docker compose -f docker-compose.redis.yml ps
# Should show: redis - running
```
### 2. Start Gateway Service
```bash
docker compose -f docker-compose.gateway.yml up -d
# Verify
docker compose -f docker-compose.gateway.yml ps
# Should show: gateway - running
```
### 3. Verify Services
```bash
# Check Redis
docker exec heretek-redis redis-cli ping
# Expected: PONG
# Check Gateway health
curl http://localhost:18788/health
# Expected: {"running":true,"port":18789,...}
# Check Gateway logs
docker logs heretek-gateway
# Should show: "[Gateway] OpenClaw Gateway running on..."
```
### 4. Test A2A Communication
```javascript
// Test from Node.js
const { sendMessage, getMessages, pingAgent } = require('./skills/a2a-message-send/a2a-redis.js');
async function test() {
// Register agents
await registerAgent('steward', { role: 'orchestrator' });
await registerAgent('alpha', { role: 'triad' });
// Send message
const result = await sendMessage('steward', 'alpha', 'Hello Alpha!');
console.log('Send result:', result);
// Get messages
const messages = await getMessages('alpha', 10);
console.log('Alpha inbox:', messages);
// Ping test
const ping = await pingAgent('steward', 'alpha');
console.log('Ping result:', ping);
}
test().catch(console.error);
```
---
## Known Issues
1. **Redis Authentication** - If Redis requires authentication, set `REDIS_URL` with password:
```
REDIS_URL=redis://:password@host:6379
```
2. **Gateway Port Conflicts** - If port 18789 is in use, change via environment:
```
GATEWAY_PORT=18790
```
3. **Agent Registration** - Agents must register with the Gateway on connection:
```javascript
ws.send(JSON.stringify({
type: 'register',
agentId: 'steward',
metadata: { role: 'orchestrator' }
}));
```
---
## Architecture Diagram
```
┌─────────────────────────────────────────────────────────────────────────┐
│ Heretek OpenClaw A2A Stack │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Agent A │ │ Agent B │ │ Agent C │ │
│ │ (steward) │ │ (alpha) │ │ (beta) │ │
│ │ port 8001 │ │ port 8002 │ │ port 8003 │ │
│ └──────┬──────┘ └──────┬──────┘ └───────────┬─────────────┘ │
│ │ │ │ │
│ │ WebSocket RPC │ │ │
│ │ ws://18789/a2a │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ OpenClaw Gateway │ │
│ │ - Message Routing - Agent Discovery - Health Checks │ │
│ │ - Broadcast - Session Mgmt - Redis Persistence │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ Redis │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Redis │ │
│ │ - Message Queues - Agent Registry - Pub/Sub Channels │ │
│ │ - Inbox Lists - Read Status - Broadcast │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ Pub/Sub │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Redis-WebSocket Bridge │ │
│ │ - Subscribe to Redis - Forward to WS Clients - Clients │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Web Dashboard │ │
│ │ - Real-time A2A updates - Agent status - Message logs │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
```
---
## Conclusion
The A2A Protocol issues have been resolved by implementing the missing components:
1. **Redis-based messaging** for persistent, async agent communication
2. **Gateway WebSocket server** for real-time RPC communication
3. **Redis-WebSocket bridge** for live dashboard updates
The system now supports both synchronous (WebSocket RPC) and asynchronous (Redis queues) communication patterns, providing flexibility for different use cases within the OpenClaw collective.
---
## References
- [`skills/a2a-message-send/a2a-redis.js`](skills/a2a-message-send/a2a-redis.js) - Redis A2A module
- [`modules/communication/redis-websocket-bridge.js`](modules/communication/redis-websocket-bridge.js) - Redis-WS bridge
- [`gateway/openclaw-gateway.js`](gateway/openclaw-gateway.js) - Gateway server
- [`skills/a2a-message-send/SKILL.md`](skills/a2a-message-send/SKILL.md) - Skill documentation
- [`docker-compose.redis.yml`](docker-compose.redis.yml) - Redis service config
- [`docker-compose.gateway.yml`](docker-compose.gateway.yml) - Gateway service config
- [`Dockerfile.gateway`](Dockerfile.gateway) - Gateway Dockerfile