Files
Johannes du Plessis ace71b0fd0 feat: add reviewer graph + eval target wiring (#1241)
* feat: add reviewer graph + eval target wiring

- New `reviewer` graph (`agent/reviewer.py`) registered in langgraph.json
  alongside the main `agent` graph. Reuses the same sandbox lifecycle,
  GH proxy auth, and middleware primitives from `agent.server`, but with
  a narrower tool set, a reviewer-specific system prompt, no
  commit/push, and the `task` (subagent) tool stripped via
  `_ToolExclusionMiddleware` so review stays in one context.

- New `github_comment` tool: agents call it once per issue with
  `(file, line, body, severity)` and the eval scores those calls
  against golden comments.

- `ensure_no_empty_msg` middleware (the no_op nudge) is intentionally
  *not* on the reviewer's stack — that middleware exists to enforce the
  main agent's "always finalize via Slack/Linear/PR" contract, which
  the reviewer doesn't have. The main agent's behavior is unchanged.

- `evals/reviewer/target.py`: send PR info as a user message, extract
  every `github_comment` tool call (multiple expected per review) into
  the run output.

- `evals/reviewer/judge.py`: per-example evaluator now returns a list
  of metrics under `{"results": [...]}` so LangSmith averages each
  numeric key (f1/precision/recall/tp/fp/fn) across the experiment in
  the UI. Dropped the broken `aggregate_pr` summary evaluator that
  reached for an attribute that doesn't exist on `RunTree`.

- `evals/reviewer/run_eval.py`: `--limit` now slices the dataset via
  `client.list_examples(limit=N)` since `aevaluate` doesn't accept
  `max_examples`.

- Makefile: `dev` and `run` targets now use `uv run` so they work
  without an activated venv.

* resolve comments

---------

Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com>
2026-05-06 10:15:58 -07:00

69 lines
1.6 KiB
Makefile

.PHONY: all format format-check lint test tests integration_tests help run dev
# Default target executed when no arguments are given to make.
all: help
######################
# DEVELOPMENT
######################
dev:
uv run langgraph dev
run:
uv run uvicorn agent.webapp:app --reload --port 8000
install:
uv pip install -e .
######################
# TESTING
######################
TEST_FILE ?= tests/
test tests:
@if [ -d "$(TEST_FILE)" ] || [ -f "$(TEST_FILE)" ]; then \
uv run pytest -vvv $(TEST_FILE); \
else \
echo "Skipping tests: path not found: $(TEST_FILE)"; \
fi
integration_tests:
@if [ -d "tests/integration_tests/" ] || [ -f "tests/integration_tests/" ]; then \
uv run pytest -vvv tests/integration_tests/; \
else \
echo "Skipping integration tests: path not found: tests/integration_tests/"; \
fi
######################
# LINTING AND FORMATTING
######################
PYTHON_FILES=.
lint:
uv run ruff check $(PYTHON_FILES)
uv run ruff format $(PYTHON_FILES) --diff
format:
uv run ruff format $(PYTHON_FILES)
uv run ruff check --fix $(PYTHON_FILES)
format-check:
uv run ruff format $(PYTHON_FILES) --check
######################
# HELP
######################
help:
@echo '----'
@echo 'dev - run LangGraph dev server'
@echo 'run - run webhook server'
@echo 'install - install dependencies'
@echo 'format - run code formatters'
@echo 'lint - run linters'
@echo 'test - run unit tests'
@echo 'integration_tests - run integration tests'