Contributing and Local Quality Gates
This guide explains how to install, develop, test, and run the local quality gates before opening a PR. The goal is to help maintainers and contributors answer one question before review: did this change break the core Coding Agent workflow?
Setup
Install root dependencies with Bun:
bun installIf your change touches desktop/, also install desktop dependencies:
cd desktop
bun installIf your change touches adapters/, or if you run check:adapters / check:native, install adapter dependencies:
cd adapters
bun installDo not commit local artifacts such as artifacts/quality-runs/, node_modules/, or desktop/node_modules/.
Required PR Gate
Before opening a normal PR, contributors and AI coding agents should run the single entrypoint:
bun run verifybun run verify is the one-command entrypoint and is equivalent to bun run quality:pr. This gate does not call real models, so every contributor can run it locally. It starts with an impact report, then actually runs the selected local gates for the changed paths: policy, desktop, server, adapters, native, docs, quarantine, and coverage. A non-zero exit means the branch is not ready for PR or push.
The main quality report embeds the current test scope, result matrix, coverage summary, and links to the full coverage/JUnit/log artifacts:
artifacts/quality-runs/<timestamp>/report.md
artifacts/quality-runs/<timestamp>/report.json
artifacts/quality-runs/<timestamp>/junit.xml
artifacts/quality-runs/<timestamp>/logs/*.log
artifacts/coverage/<timestamp>/coverage-report.md
artifacts/coverage/<timestamp>/coverage-report.jsonInclude the commands you ran and the report summary in your PR description. quality:pr / quality:verify remain available for contributors who prefer explicit quality command names, but docs and AI prompts should prefer bun run verify.
The coverage gate does four things: measures source-only coverage, enforces the baseline ratchet, reports target gaps against 75-80%+ maintained-area goals, and enforces changed-line coverage for new or modified executable production lines. The current baseline lives in scripts/quality-gate/coverage-baseline.json, and CI compares against the base branch baseline when available. New PRs must not lower coverage beyond the allowed window. Changes to coverage-baseline.json or coverage-thresholds.json require the maintainer-only allow-coverage-baseline-change label. Quarantine entries must keep an owner, reviewAfter date, and exit criteria; once reviewAfter expires, the default server and coverage gates fail until maintainers review the entry.
AI Coding Agent Fix Loop
When asking an AI coding agent to work in this repo, use this as the acceptance instruction:
Run `bun run verify`. If it fails, read the latest
`artifacts/quality-runs/<timestamp>/report.md` and the relevant lane log,
fix the missing tests, coverage failures, type/lint/build errors, or docs/native
failures, then rerun `bun run verify` until it passes. Do not lower coverage
baselines or thresholds unless a maintainer explicitly requested it.Agents should handle failures in this order:
- Start with the Summary and Result Matrix in
artifacts/quality-runs/<timestamp>/report.mdto identify the failing lane. - If
Path-aware PR checksfailed, check for missing same-area tests, CLI core changes, or coverage policy changes. Do not bypass normal feature PRs with maintainer overrides. - If
Coverage gatefailed, openartifacts/coverage/<timestamp>/coverage-report.mdorcoverage-report.json, then fixchangedLines.failuresandfailuresfirst.targetGapsare technical-debt signals; touched areas should still improve. - If desktop/server/adapters/native/docs failed, read
artifacts/quality-runs/<timestamp>/logs/<lane>.log, add tests or fix the build, then rerun the narrow command. - After narrow checks pass, run
bun run verifyagain. The agent may only claim ready when the final Summary hasfailed=0.
External reference points:
- Google Testing Blog: 60% acceptable, 75% commendable, 90% exemplary; 90% is a reasonable lower threshold for changed/per-commit coverage.
- Microsoft Visual Studio / Azure DevOps docs: teams typically target about 80%, typical project requirements can be 75%, and generated code may be relaxed.
- ChromiumOS EC: new or changed lines require at least 80% coverage.
Feature Quality Contract
Every feature, bugfix, and behavior change must ship with verifiable evidence. This rule applies to human authors and AI coding agents:
- Name the changed surface first:
desktop,server,adapter,native,docs,provider/runtime,agent-loop, orrelease. - Production changes under
desktop/src,src/server,src/tools,src/utils, oradaptersmust include same-area tests in the same PR unless a maintainer explicitly appliesallow-missing-tests. - Pure logic needs unit tests. Server/API/provider/runtime behavior needs API or request-shape tests. Desktop UI/store/API behavior needs Vitest or Testing Library coverage. Cross-boundary user flows through UI, WebSocket, provider proxying, native sidecars, or release packaging need E2E or agent-browser smoke.
- Agent loop, tool execution, provider routing, model selection, file editing, permissions, session resume, and desktop chat changes need mock/fixture tests in PR, plus live smoke or baseline evidence when provider access is available.
- Coverage is part of the feature. This project follows a Google/Microsoft-style policy: generated/build output is not counted as product coverage, maintained product areas should move toward 75-80%+, and new or changed executable production lines must pass the changed-line coverage threshold in
coverage-thresholds.json. - Do not lower
coverage-baseline.jsonorcoverage-thresholds.jsonjust to pass the gate; real baseline/threshold changes requireallow-coverage-baseline-changeand a reason. Legacy low-coverage areas are debt; new PRs must leave touched areas better than they found them. - The PR description must record changed files, tests added, coverage report path, E2E/live report path or blocker, and remaining risk.
Local Pre-Push Gate
Git hooks are local, so each clone needs to install the hook once:
bun run hooks:installAfter installation, every git push runs the fast local gate internally (bun run quality:push). It reuses the PR gate impact, policy, and path-aware checks, but skips the expensive coverage lane by default; full coverage remains in bun run verify, bun run quality:pr, and CI. If unit tests, docs/native/adapter checks, or any other selected fast path-aware lane fails, the local hook blocks the push.
Maintainers or contributors with model quota can also add real provider smoke and desktop agent-browser smoke to the pre-push hook:
bun run quality:providers
bun run hooks:install -- --live-provider-model minimax:main:minimax-mainTo run the full live baseline before every push, use:
bun run hooks:install -- --live-provider-model minimax:main:minimax-main --live-mode baselineThese options are stored in local .git/config as quality.prePush* keys, so provider selectors and secrets are not committed. smoke mode covers real provider connectivity plus the desktop UI chat smoke; baseline mode also runs every real Coding Agent baseline case.
Maintainer-level overrides must also be explicit local config before the hook passes them to quality:push:
bun run hooks:install -- --allow-cli-core-change --allow-coverage-baseline-changeThis only affects the current clone and is not committed; PR CI still requires the matching labels.
PR CI Merge Gate
.github/workflows/pr-quality.yml runs for PR opened, synchronize, reopened, ready_for_review, labeled, and unlabeled events. It starts with change-policy, which maps changed files to the desktop, server, adapter, native, docs, and coverage lanes. The final pr-quality-gate job aggregates every selected job: failed selected jobs fail pr-quality-gate, while unselected jobs may be skipped.
Repository settings should protect main with GitHub branch protection / rulesets and require the pr-quality-gate status check. The local hook blocks low-quality pushes; the PR gate blocks low-quality merges.
Area-Specific Checks
Run the checks that match the files you changed:
bun run check:server # Server API, WebSocket, providers, sessions, and related tests
bun run check:desktop # Desktop lint, Vitest, and production build
bun run check:adapters # IM adapter tests
bun run check:native # Desktop sidecars and Tauri native checks
bun run check:docs # Docs build, using npm ci + docs:build
bun run check:quarantine # Quarantine owners, exit criteria, and review windows
bun run check:coverage # Root, desktop, and adapter coverage reports plus ratchet enforcementFocused tests are fine while developing, but run bun run verify before sending the PR.
Production code changes must include matching tests. Changes under desktop/src/**, src/server/**, src/tools/**, src/utils/**, or adapters/** without a same-area test file are blocked unless a maintainer applies allow-missing-tests. Coverage baseline/threshold changes are also blocked unless a maintainer applies allow-coverage-baseline-change.
Live Model Baseline
quality:baseline runs real Coding Agent tasks: it starts the local server, creates isolated fixtures, asks a model through chat to fix code, runs tests, and saves transcripts, diffs, verification logs, and a report. It also runs provider live smoke: saved or active OpenAI-compatible providers validate connectivity, proxy conversion, and streaming proxy behavior; env-only provider smoke validates upstream connectivity and the transform pipeline.
The default baseline command does not call real models:
bun run quality:baselineTo actually call models, pass --allow-live and choose a local provider.
First list your local providers and copyable selectors:
bun run quality:providersExample output:
Saved providers:
MiniMax
selector: minimax
main: MiniMax-M2.7-highspeed
--provider-model minimax:main:minimax-mainCopy one of the listed values:
bun run quality:gate --mode baseline --allow-live --provider-model minimax:main:minimax-mainTo run only provider smoke plus desktop agent-browser smoke, use:
bun run quality:smoke --provider-model minimax:main:minimax-mainYou can run multiple models in one pass:
bun run quality:gate --mode baseline --allow-live \
--provider-model codingplan:main:codingplan-main \
--provider-model minimax:main:minimax-mainProvider selectors come from the providers saved in your local Desktop Settings > Providers page. Contributors do not need the maintainer's provider UUIDs or vendor accounts. They can add their own provider locally, run bun run quality:providers, and choose their own model.
If you do not have a saved provider, you can run one unsaved provider smoke with environment variables:
QUALITY_GATE_PROVIDER_BASE_URL=https://example.com \
QUALITY_GATE_PROVIDER_API_KEY=... \
QUALITY_GATE_PROVIDER_MODEL=model-id \
QUALITY_GATE_PROVIDER_API_FORMAT=openai_chat \
bun run quality:gate --mode baseline --allow-liveWhen To Run The Baseline
Run the live baseline for changes touching:
- Desktop chat, session resume, WebSocket, or the CLI bridge
- Provider, model, or runtime selection
- Permissions, tool calls, file edits, and task execution
- agent-browser smoke, Computer Use, Skills, or MCP
- Release preparation or broad cross-module refactors
If you do not have model access, still run bun run verify and state in the PR why the live baseline was not run.
Release Gate
Before a release, run release mode:
bun run quality:gate --mode release --allow-live --provider-model <selector>:mainRelease mode composes PR checks, baseline catalog validation, live baseline cases, provider smoke, desktop smoke, and native checks. Reports are written to artifacts/quality-runs/<timestamp>/. The hosted release workflow now runs quality:gate --mode pr as a non-live preflight before the packaging matrix and uploads a release-quality-gate artifact; maintainers still need to run the live release gate explicitly with an available provider.
In release mode, live lanes are not allowed to be silently skipped. Missing providers, model quota, or external account access will fail the gate and must be recorded as a release blocker.
PR Workflow
- Create a product branch such as
fix/session-reconnectorfeat/provider-quality-gate. - Install dependencies and make the change.
- Add tests for behavior changes.
- Run focused checks for the affected area.
- Optional but recommended: run
bun run hooks:installso later pushes are blocked by failing gates. - Run
bun run verify. - Run the live baseline for high-risk changes.
- In the PR description, include user impact, verification commands, coverage/quality report summary, and known risks.
FAQ
Can I run checks without a provider?
Yes. Run the normal PR gate:
bun run verifyOnly the live baseline needs a real model. Add your provider in Desktop Settings > Providers, then run:
bun run quality:providersWhat if provider selectors conflict?
If two provider names produce the same selector, quality:providers falls back to the provider ID. Copy the --provider-model ... value it prints.
What if a model ID contains a colon?
Prefer role selectors:
--provider-model custom:haiku:custom-haikuThe runner resolves haiku to the real model ID from your local provider configuration.