mirror of
https://github.com/langchain-ai/open-swe.git
synced 2026-07-01 20:24:09 -04:00
ace71b0fd0
* feat: add reviewer graph + eval target wiring
- New `reviewer` graph (`agent/reviewer.py`) registered in langgraph.json
alongside the main `agent` graph. Reuses the same sandbox lifecycle,
GH proxy auth, and middleware primitives from `agent.server`, but with
a narrower tool set, a reviewer-specific system prompt, no
commit/push, and the `task` (subagent) tool stripped via
`_ToolExclusionMiddleware` so review stays in one context.
- New `github_comment` tool: agents call it once per issue with
`(file, line, body, severity)` and the eval scores those calls
against golden comments.
- `ensure_no_empty_msg` middleware (the no_op nudge) is intentionally
*not* on the reviewer's stack — that middleware exists to enforce the
main agent's "always finalize via Slack/Linear/PR" contract, which
the reviewer doesn't have. The main agent's behavior is unchanged.
- `evals/reviewer/target.py`: send PR info as a user message, extract
every `github_comment` tool call (multiple expected per review) into
the run output.
- `evals/reviewer/judge.py`: per-example evaluator now returns a list
of metrics under `{"results": [...]}` so LangSmith averages each
numeric key (f1/precision/recall/tp/fp/fn) across the experiment in
the UI. Dropped the broken `aggregate_pr` summary evaluator that
reached for an attribute that doesn't exist on `RunTree`.
- `evals/reviewer/run_eval.py`: `--limit` now slices the dataset via
`client.list_examples(limit=N)` since `aevaluate` doesn't accept
`max_examples`.
- Makefile: `dev` and `run` targets now use `uv run` so they work
without an activated venv.
* resolve comments
---------
Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com>
69 lines
1.6 KiB
Makefile
69 lines
1.6 KiB
Makefile
.PHONY: all format format-check lint test tests integration_tests help run dev
|
|
|
|
# Default target executed when no arguments are given to make.
|
|
all: help
|
|
|
|
######################
|
|
# DEVELOPMENT
|
|
######################
|
|
|
|
dev:
|
|
uv run langgraph dev
|
|
|
|
run:
|
|
uv run uvicorn agent.webapp:app --reload --port 8000
|
|
|
|
install:
|
|
uv pip install -e .
|
|
|
|
######################
|
|
# TESTING
|
|
######################
|
|
|
|
TEST_FILE ?= tests/
|
|
|
|
test tests:
|
|
@if [ -d "$(TEST_FILE)" ] || [ -f "$(TEST_FILE)" ]; then \
|
|
uv run pytest -vvv $(TEST_FILE); \
|
|
else \
|
|
echo "Skipping tests: path not found: $(TEST_FILE)"; \
|
|
fi
|
|
|
|
integration_tests:
|
|
@if [ -d "tests/integration_tests/" ] || [ -f "tests/integration_tests/" ]; then \
|
|
uv run pytest -vvv tests/integration_tests/; \
|
|
else \
|
|
echo "Skipping integration tests: path not found: tests/integration_tests/"; \
|
|
fi
|
|
|
|
######################
|
|
# LINTING AND FORMATTING
|
|
######################
|
|
|
|
PYTHON_FILES=.
|
|
|
|
lint:
|
|
uv run ruff check $(PYTHON_FILES)
|
|
uv run ruff format $(PYTHON_FILES) --diff
|
|
|
|
format:
|
|
uv run ruff format $(PYTHON_FILES)
|
|
uv run ruff check --fix $(PYTHON_FILES)
|
|
|
|
format-check:
|
|
uv run ruff format $(PYTHON_FILES) --check
|
|
|
|
######################
|
|
# HELP
|
|
######################
|
|
|
|
help:
|
|
@echo '----'
|
|
@echo 'dev - run LangGraph dev server'
|
|
@echo 'run - run webhook server'
|
|
@echo 'install - install dependencies'
|
|
@echo 'format - run code formatters'
|
|
@echo 'lint - run linters'
|
|
@echo 'test - run unit tests'
|
|
@echo 'integration_tests - run integration tests'
|