Files
Ramon Nogueira 3a0e2b4672 feat: plan mode with model-driven entry and collaborative review (#1580)
* feat: add plan mode for read-only research and planning

Adds a per-run plan_mode flag that puts the agent in a read-only
research phase: a strong prompt section is injected and mutating tools
are stripped via ExcludeToolsMiddleware so the agent proposes a
reviewable implementation plan before any edits. Surfaced in the
dashboard UI with a Plan toggle (Shift+Tab) wired through the thread API.

Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com>

* fix: enforce plan-mode read-only at tool layer and disable subagents

Addresses PR review: plan mode previously relied on prompt text to keep
the shell read-only and left the task subagent (built with its own
write/PR/Linear tools) unrestricted. Now `task` is excluded so research
cannot be delegated to a mutating subagent, and a new
PlanModeShellGuardMiddleware enforces a read-only command allowlist on
`execute`, blocking writes, git state changes, installs, redirection,
and command substitution regardless of model/prompt-injection compliance.

Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com>

* fix: harden plan-mode shell guard against wrapped mutations

Block git global options that take values (-C, --git-dir, ...) from being
misread as the subcommand, reject config-injection options (-c,
--config-env, --exec-path), and drop the env command wrapper that could
run arbitrary commands.

Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com>

* feat: add plan mode with enter_plan_mode tool, profile/team defaults, Slack commands and approval flow

- enter_plan_mode tool: agent self-activates plan mode via Command(update={'plan_mode': True})
- Plan mode resolution: per-thread > profile default > team default > False
- PLAN_MODE_GUIDANCE_SECTION: always-present prompt section telling agent about the tool
- profile_plan_mode_default and team plan_mode_default settings
- Slack plan on/off/status commands with thread metadata persistence
- slack_thread_reply plan_approval=True renders Approve/Revise/Cancel buttons
- Interactivity handler: approve triggers implementation run, cancel posts confirmation
- Frontend: plan_mode_default in Profile/ProfileUpdate/TeamSettings types and UI toggles

Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com>

* test: add tests for enter_plan_mode tool, profile/team defaults, Slack plan commands, approval blocks

Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com>

* refactor(plan-mode): drop shell guard, rely on prompt for read-only discipline

Remove PlanModeShellGuardMiddleware and its enforcement of read-only shell
commands during plan mode. Plan mode now relies on the system prompt to
instruct the agent not to run mutating commands; the mutating-tool exclusion
(ExcludeToolsMiddleware) is retained.

* test(open-swe): add Playwright E2E for the Slack → PR → web handoff

Local, secrets-free end-to-end suite that drives the full happy path through mock Slack/GitHub control panels and the real dashboard UI. Only the LLM and external SaaS HTTP boundaries (GitHub/Slack APIs, OAuth token mint) are faked — the real process_slack_mention, get_agent, deepagents loop, tools, middleware, and dashboard authorization all run under `langgraph dev` with a scripted fake chat model and a local temp-dir sandbox.

- full_flow: a Slack mention runs the agent, which implements a change in the sandbox, opens a PR against a fake GitHub remote, and replies with the PR link in the same thread.
- dashboard: clicking the bot's real "Open in Web" link loads the built ui/ app (served same-origin); the thread owner can continue the conversation, while a different user sees the same thread read-only (no composer).

Wired into Agent CI as a `Playwright E2E` job that runs on pull requests.

* fix(open-swe): serve E2E UI assets via explicit route; pin Playwright

The dashboard E2E served the built ui/ SPA's /assets via app.mount(StaticFiles), but LangGraph's custom-app loader serves APIRoutes and drops sub-app Mounts, so /assets 404'd under `langgraph dev` in CI — the React app never booted and the composer/transcript never rendered. Serve assets via an explicit route instead.

Also pin @playwright/test to the latest (1.61.0) for reproducible runs, and make the owner composer assertion tolerant of either hydration state.

* test(open-swe): record Playwright trace + video on every E2E run

Capture a replayable trace (DOM snapshots, network, console, source) and a screen recording for every test, not just retries, plus a screenshot on failure. The CI job already uploads playwright-report/ and test-results/, so each run now has a downloadable replay; documented how to open it.

* feat(plan-mode): collaborative plan review with BlockNote + Yjs

When the agent enters plan mode it writes the plan as a markdown file in the
sandbox (save_plan tool), publishes it, and posts a review link to the source
channel. Reviewers open the plan inside the dashboard (under the /agents shell),
read it rendered in a BlockNote editor, and leave inline comments synced live
over Yjs. Only the thread owner can approve; any reviewer can request changes.
On approve/reject the comments are harvested and handed to the agent for the
follow-up run; the agent never sees comments mid-review.

- agent: enter_plan_mode persists plan state; new save_plan tool; prompt shares
  the plan-review link.
- dashboard: Yjs WebSocket collab server (pycrdt-websocket) with store-backed
  snapshots; plan content/status store; plan REST API (get/approve/reject,
  owner-only approve, client-harvested comments); planStatus on thread summaries.
- ui: BlockNote native comments (CommentsExtension + YjsThreadStore) plan page
  mounted under the agents shell, with a "Review plan" banner in the thread view
  and a back-link; theme-aware (dark mode) using the dashboard tokens.
- e2e: Playwright coverage of the full Slack -> plan -> review -> approve -> PR
  flow, including cross-user comment sync and owner-only approval.

* fix(plan-mode): address review feedback (authz, overrides, leaks, deps)

- plan-collab WS: authorize per-thread before joining a room (same read gate as
  the REST API) — previously any logged-in user could join any thread (IDOR).
- plan-collab: tie the snapshot flusher to active connections (refcount) so each
  opened plan no longer leaks a permanent 1.5s task on the shared event loop.
- plan decisions: include thread_id in the follow-up run configurable so the run
  resumes the existing thread; set plan_mode explicitly so approve forces it off.
- get_agent: an explicit per-thread plan_mode (Slack `plan off`, approved plan,
  dashboard toggle) now overrides profile/team defaults instead of falling back.
- plan mode tool gating moved to a state-aware PlanModeMiddleware installed
  unconditionally, so a mid-run enter_plan_mode restricts the next model turn;
  before_agent resets stale plan_mode so a later run isn't forced back into it.
- exclude write-capable http_request from plan mode.
- pin pycrdt / pycrdt-websocket with upper bounds.

Includes the latest base (#1583): E2E UI assets served via explicit route
(fixes the Playwright CI failure — LangGraph's app loader drops sub-app mounts).

* style: ruff format plan_collab.py

* fix(plan-mode): owner-gate Slack approval + same-origin check on collab WS

- Slack "Approve & Implement" now verifies the clicking user is the plan
  requester (owner, via the stored triggering_user_id) before implementing —
  matching the dashboard API's owner-only approval. Non-owners are pointed to
  Revise / feedback.
- The plan-collab WebSocket validates the handshake Origin against the dashboard
  allowlist before accept() (no-op when unconfigured, e.g. local/dev), mirroring
  the REST require_same_origin CSRF defense.

* fix(plan-mode): enter plan mode only via the model + local mock dev harness

Plan mode is now entered solely when the model calls enter_plan_mode.
Removed the per-user and team plan_mode_default settings (backend + UI)
and the Slack `plan on/off/status` toggle.

- enter_plan_mode returns a terminating ToolMessage, fixing the missing
  ToolMessage error that silently dropped plan mode mid-run.
- PlanReview: defer Yjs provider/doc teardown so React StrictMode's dev
  remount doesn't destroy and then reuse the collaboration provider.
- e2e plan_review spec asserts plan_mode actually engages.
- LangSmith trace-url resolution is best-effort: bail before any API
  call when the tenant is unset, cache failures, log at debug.
- Add `pnpm run dev:mock`: same-origin Vite HMR harness with a real LLM,
  Alice/Bob mock users, and a GitHub login picker.

* docs(plan-mode): drop stale references to removed profile/team defaults

The plan_mode middleware docstring and the approve/reject dispatch comment
still described the profile/team plan_mode_default resolution that no longer
exists; reword to match model-driven entry + the per-thread carry.

* feat(plan-mode): let any reviewer edit the plan, not just comment

Drop the owner/commenter split for the plan document: everyone with read
access edits and comments alike (DefaultThreadStoreAuth "editor" for all,
editor always editable until a decision, anyone seeds the empty doc). This
matches the collab WS, which already relays frames to every readable user.
Plan approval stays owner-gated.

* test(plan-mode): assert plan-mode entry via the tool's success message

plan_mode lives only in run state for tool gating; it is not a persisted
thread-state channel, so the previous `values.plan_mode === true` poll
could never pass. Assert instead that enter_plan_mode's success ToolMessage
("Plan mode is active …") lands in the thread — which only happens when the
tool's Command applies cleanly, the exact regression this guards.

---------

Co-authored-by: Johannes du Plessis <51395795+johannes117@users.noreply.github.com>
Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com>
2026-06-23 12:06:58 -07:00

8 lines
109 B
JSON

{
"name": "open-swe",
"private": true,
"scripts": {
"dev:mock": "bash tests/e2e/dev-mock.sh"
}
}