mirror of
https://github.com/langchain-ai/open-swe.git
synced 2026-07-01 20:24:09 -04:00
3a0e2b4672
* feat: add plan mode for read-only research and planning Adds a per-run plan_mode flag that puts the agent in a read-only research phase: a strong prompt section is injected and mutating tools are stripped via ExcludeToolsMiddleware so the agent proposes a reviewable implementation plan before any edits. Surfaced in the dashboard UI with a Plan toggle (Shift+Tab) wired through the thread API. Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com> * fix: enforce plan-mode read-only at tool layer and disable subagents Addresses PR review: plan mode previously relied on prompt text to keep the shell read-only and left the task subagent (built with its own write/PR/Linear tools) unrestricted. Now `task` is excluded so research cannot be delegated to a mutating subagent, and a new PlanModeShellGuardMiddleware enforces a read-only command allowlist on `execute`, blocking writes, git state changes, installs, redirection, and command substitution regardless of model/prompt-injection compliance. Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com> * fix: harden plan-mode shell guard against wrapped mutations Block git global options that take values (-C, --git-dir, ...) from being misread as the subcommand, reject config-injection options (-c, --config-env, --exec-path), and drop the env command wrapper that could run arbitrary commands. Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com> * feat: add plan mode with enter_plan_mode tool, profile/team defaults, Slack commands and approval flow - enter_plan_mode tool: agent self-activates plan mode via Command(update={'plan_mode': True}) - Plan mode resolution: per-thread > profile default > team default > False - PLAN_MODE_GUIDANCE_SECTION: always-present prompt section telling agent about the tool - profile_plan_mode_default and team plan_mode_default settings - Slack plan on/off/status commands with thread metadata persistence - slack_thread_reply plan_approval=True renders Approve/Revise/Cancel buttons - Interactivity handler: approve triggers implementation run, cancel posts confirmation - Frontend: plan_mode_default in Profile/ProfileUpdate/TeamSettings types and UI toggles Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com> * test: add tests for enter_plan_mode tool, profile/team defaults, Slack plan commands, approval blocks Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com> * refactor(plan-mode): drop shell guard, rely on prompt for read-only discipline Remove PlanModeShellGuardMiddleware and its enforcement of read-only shell commands during plan mode. Plan mode now relies on the system prompt to instruct the agent not to run mutating commands; the mutating-tool exclusion (ExcludeToolsMiddleware) is retained. * test(open-swe): add Playwright E2E for the Slack → PR → web handoff Local, secrets-free end-to-end suite that drives the full happy path through mock Slack/GitHub control panels and the real dashboard UI. Only the LLM and external SaaS HTTP boundaries (GitHub/Slack APIs, OAuth token mint) are faked — the real process_slack_mention, get_agent, deepagents loop, tools, middleware, and dashboard authorization all run under `langgraph dev` with a scripted fake chat model and a local temp-dir sandbox. - full_flow: a Slack mention runs the agent, which implements a change in the sandbox, opens a PR against a fake GitHub remote, and replies with the PR link in the same thread. - dashboard: clicking the bot's real "Open in Web" link loads the built ui/ app (served same-origin); the thread owner can continue the conversation, while a different user sees the same thread read-only (no composer). Wired into Agent CI as a `Playwright E2E` job that runs on pull requests. * fix(open-swe): serve E2E UI assets via explicit route; pin Playwright The dashboard E2E served the built ui/ SPA's /assets via app.mount(StaticFiles), but LangGraph's custom-app loader serves APIRoutes and drops sub-app Mounts, so /assets 404'd under `langgraph dev` in CI — the React app never booted and the composer/transcript never rendered. Serve assets via an explicit route instead. Also pin @playwright/test to the latest (1.61.0) for reproducible runs, and make the owner composer assertion tolerant of either hydration state. * test(open-swe): record Playwright trace + video on every E2E run Capture a replayable trace (DOM snapshots, network, console, source) and a screen recording for every test, not just retries, plus a screenshot on failure. The CI job already uploads playwright-report/ and test-results/, so each run now has a downloadable replay; documented how to open it. * feat(plan-mode): collaborative plan review with BlockNote + Yjs When the agent enters plan mode it writes the plan as a markdown file in the sandbox (save_plan tool), publishes it, and posts a review link to the source channel. Reviewers open the plan inside the dashboard (under the /agents shell), read it rendered in a BlockNote editor, and leave inline comments synced live over Yjs. Only the thread owner can approve; any reviewer can request changes. On approve/reject the comments are harvested and handed to the agent for the follow-up run; the agent never sees comments mid-review. - agent: enter_plan_mode persists plan state; new save_plan tool; prompt shares the plan-review link. - dashboard: Yjs WebSocket collab server (pycrdt-websocket) with store-backed snapshots; plan content/status store; plan REST API (get/approve/reject, owner-only approve, client-harvested comments); planStatus on thread summaries. - ui: BlockNote native comments (CommentsExtension + YjsThreadStore) plan page mounted under the agents shell, with a "Review plan" banner in the thread view and a back-link; theme-aware (dark mode) using the dashboard tokens. - e2e: Playwright coverage of the full Slack -> plan -> review -> approve -> PR flow, including cross-user comment sync and owner-only approval. * fix(plan-mode): address review feedback (authz, overrides, leaks, deps) - plan-collab WS: authorize per-thread before joining a room (same read gate as the REST API) — previously any logged-in user could join any thread (IDOR). - plan-collab: tie the snapshot flusher to active connections (refcount) so each opened plan no longer leaks a permanent 1.5s task on the shared event loop. - plan decisions: include thread_id in the follow-up run configurable so the run resumes the existing thread; set plan_mode explicitly so approve forces it off. - get_agent: an explicit per-thread plan_mode (Slack `plan off`, approved plan, dashboard toggle) now overrides profile/team defaults instead of falling back. - plan mode tool gating moved to a state-aware PlanModeMiddleware installed unconditionally, so a mid-run enter_plan_mode restricts the next model turn; before_agent resets stale plan_mode so a later run isn't forced back into it. - exclude write-capable http_request from plan mode. - pin pycrdt / pycrdt-websocket with upper bounds. Includes the latest base (#1583): E2E UI assets served via explicit route (fixes the Playwright CI failure — LangGraph's app loader drops sub-app mounts). * style: ruff format plan_collab.py * fix(plan-mode): owner-gate Slack approval + same-origin check on collab WS - Slack "Approve & Implement" now verifies the clicking user is the plan requester (owner, via the stored triggering_user_id) before implementing — matching the dashboard API's owner-only approval. Non-owners are pointed to Revise / feedback. - The plan-collab WebSocket validates the handshake Origin against the dashboard allowlist before accept() (no-op when unconfigured, e.g. local/dev), mirroring the REST require_same_origin CSRF defense. * fix(plan-mode): enter plan mode only via the model + local mock dev harness Plan mode is now entered solely when the model calls enter_plan_mode. Removed the per-user and team plan_mode_default settings (backend + UI) and the Slack `plan on/off/status` toggle. - enter_plan_mode returns a terminating ToolMessage, fixing the missing ToolMessage error that silently dropped plan mode mid-run. - PlanReview: defer Yjs provider/doc teardown so React StrictMode's dev remount doesn't destroy and then reuse the collaboration provider. - e2e plan_review spec asserts plan_mode actually engages. - LangSmith trace-url resolution is best-effort: bail before any API call when the tenant is unset, cache failures, log at debug. - Add `pnpm run dev:mock`: same-origin Vite HMR harness with a real LLM, Alice/Bob mock users, and a GitHub login picker. * docs(plan-mode): drop stale references to removed profile/team defaults The plan_mode middleware docstring and the approve/reject dispatch comment still described the profile/team plan_mode_default resolution that no longer exists; reword to match model-driven entry + the per-thread carry. * feat(plan-mode): let any reviewer edit the plan, not just comment Drop the owner/commenter split for the plan document: everyone with read access edits and comments alike (DefaultThreadStoreAuth "editor" for all, editor always editable until a decision, anyone seeds the empty doc). This matches the collab WS, which already relays frames to every readable user. Plan approval stays owner-gated. * test(plan-mode): assert plan-mode entry via the tool's success message plan_mode lives only in run state for tool gating; it is not a persisted thread-state channel, so the previous `values.plan_mode === true` poll could never pass. Assert instead that enter_plan_mode's success ToolMessage ("Plan mode is active …") lands in the thread — which only happens when the tool's Command applies cleanly, the exact regression this guards. --------- Co-authored-by: Johannes du Plessis <51395795+johannes117@users.noreply.github.com> Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com>
8 lines
109 B
JSON
8 lines
109 B
JSON
{
|
|
"name": "open-swe",
|
|
"private": true,
|
|
"scripts": {
|
|
"dev:mock": "bash tests/e2e/dev-mock.sh"
|
|
}
|
|
}
|