The handbook.

The five-phase loop that drives every agent run. State machine, transition rules, retry caps.

Perception strategies

DOM-first via Playwright a11y tree; vision fallback via Qwen2.5-VL + Tesseract OCR.

Bench corpus

How regressions are detected. The gate for any PR touching agent / llm / perception / execution.

Installation

Get the desktop app running and pull a planner model. Five minutes end-to-end.

Quickstart

From a blank machine to a green sample bench run. Five steps, copy-pastable.

Download

macOS DMG, Windows MSI, Linux DEB. Signed installers.

System requirements

8 GB RAM minimum, 16 GB recommended, 32 GB for 14B+ planners. Apple Silicon and Intel both supported.

Configuration

AppPreferences JSON shape, environment variables, and per-project overrides.

AppPreferences (~/.automator/prefs.json)

plannerModel, embeddingModel, visionModel, browserEngine, browserDevice, headless.

Environment variables

AUTOMATOR_LLM_PROVIDER, OLLAMA_URL, OLLAMA_MODEL, ANTHROPIC_API_KEY, AGENT_SMART_AUTOFILL.

Per-project overrides

.automator.json in your project root overrides global preferences for runs from that directory.

Agent loop & states

The internals. Read after you have run the sample flow.

AgentState enum

IDLE → PERCEIVING → PLANNING → EXECUTING → VERIFYING → RECORDING → IDLE. Plus FAILED, STUCK, COMPLETE.

AgentAction sealed class

Every action the planner can emit. Click, Type, Navigate, Scroll, Hover, Drag, Screenshot, Wait, Done.

Verifier semantics

URL + DOM hash + visible text comparison. When does a step count as success?

Scenarios DSL

Describe what you want the agent to do. Forward-references commonMain abstractions being built in parallel.

Goal-driven scenarios

One natural-language goal. Planner figures out steps. Best for exploratory testing.

Pinned-action scenarios

Lock specific actions to specific steps. Best for deterministic regression replay.

Pre / post conditions

Not started

Setup actions before the run starts; teardown actions after. Failure modes.

MCP server

Expose the agent's 31 tools to any MCP-compatible client (Claude Desktop, Cursor, Zed, custom).

Enable the MCP server

Settings → Integrations → MCP. Choose stdio or HTTP transport, set the port.

Tool reference

All 31 actions with their JSON Schemas. Copy-paste into your MCP client config.

Authentication

Not started

Bearer-token gate for HTTP transport. stdio is process-isolated.

Bench corpus

Lock in good behaviour. AssertionEvaluator compares new runs to the canonical bench.

Bench overview

Why bench gates engine PRs and how to add your own flows.

AssertionEvaluator

URL match, DOM hash, visible-text match, custom predicates. Fuzzy vs strict.

SeedBenchFlows

The starter corpus that ships with Tanvrit Automator.

API reference

For library consumers. Public Kotlin API exposed via commonMain interfaces.

AgentCore

runAgentLoop(scenario, onLog, onScreenshot). The entry point.

TrajectoryRecorder / TrajectoryExporter

Record every step; export as JSONL for SFT or human review.

BenchRunner

Programmatic bench execution and assertion checking.

Examples

Worked examples, end-to-end. Clone, run, modify.

GitHub login flow

The default sample. Two-factor handled, success verified by URL hash.

SPA with mostly-canvas DOM

When DOM perception returns nothing useful and vision fallback takes over.

Cross-browser parity