Feature · Agent loop

Five phases. Bounded. Auditable.

AgentCore.runAgentLoop() drives every Tanvrit Automator run. Each step cycles through five deterministic phases, each with its own retry caps and failure modes. Nothing happens that the trajectory recorder does not see.

// Agent constants

MAX_STEPS = 50

MAX_RETRIES_PER_STEP = 3

MAX_CONSECUTIVE_LLM_FAILURES = 5

STUCK_THRESHOLD = 3

DOM_EMPTY_THRESHOLD = 2

Phase 1

Perceive

DOMPerception extracts the Playwright accessibility tree of the current page — every interactive element, its role, its label, its bounds. The tree is canonicalised into a UIElement list and condensed into a PageSummary the planner can fit in its context window.

Falls back to VisionPerception (Qwen2.5-VL via Ollama → Tesseract OCR) when the DOM is empty for DOM_EMPTY_THRESHOLD consecutive steps. The threshold is conservative — DOM perception is preferred because it is cheaper, faster, and more accurate than vision.

Phase 2

Plan

OllamaClient (or AnthropicClient when AUTOMATOR_LLM_PROVIDER=anthropic and ANTHROPIC_API_KEY is set) selects the next AgentAction from a PromptBuilder-assembled prompt that includes RAG exemplars from TrajectoryRAG.

ResponseParser turns the LLM's textual reply into a strongly-typed AgentAction. If the parse fails (malformed JSON, unrecognised action), the failure counts toward MAX_CONSECUTIVE_LLM_FAILURES=5 and the loop retries with a tightened prompt.

Phase 3

Execute

DOMExecutor runs the action via Playwright — click, type, hover, scroll, drag, navigate, evaluate, etc. VisionExecutor is the fallback when the chosen action targets pixels rather than DOM nodes.

Per-action timeout is bounded; on transient failure the agent retries up to MAX_RETRIES_PER_STEP=3 within the same step before propagating to the planner.

Phase 4

Verify

Verifier compares before/after PageState (URL + DOM hash + visible text). The comparison is pure and stateless — the same inputs always produce the same verdict.

If consecutive Verify results indicate the page hasn't changed for STUCK_THRESHOLD=3 steps, the agent enters STUCK state and asks the planner for a recovery action (refresh, scroll, alternate selector). If recovery fails, the run halts with diagnostic output rather than spinning forever.

Phase 5

Record

TrajectoryRecorder persists every observation, LLM call, action, and verification result to SQLite at ~/.automator/automator.db. Screenshots are stored in ~/.automator/screenshots/<run-id>/.

TrajectoryExporter dumps the same data as JSONL for supervised fine-tuning corpora. TrajectoryRAG indexes successful runs so future plans can retrieve them as in-context exemplars — the data flywheel.

State machine

AgentState transitions are explicit. Static export means we describe the diagram as text rather than render it; the source of truth is agent/AgentState.kt.

IDLE
  ↓ run()
PERCEIVING
  ↓ pageSummary
PLANNING
  ↓ action (or LLM_FAIL → PLANNING up to MAX_CONSECUTIVE_LLM_FAILURES)
EXECUTING
  ↓ executed (or RETRY → EXECUTING up to MAX_RETRIES_PER_STEP)
VERIFYING
  ↓ verdict
RECORDING
  ↓ persisted
  ↓ if step < MAX_STEPS and not Done → PERCEIVING
  ↓ if Done → COMPLETE
  ↓ if STUCK_THRESHOLD reached → STUCK → recovery → PERCEIVING or FAILED
  ↓ if exhausted retries → FAILED

Why bounded loops matter

An autonomous browser agent that loops forever is a liability. Every retry cap above is a deliberate halt condition — the agent stops and surfaces diagnostics rather than spending your CPU and tokens chasing a phantom selector.

When the agent halts in STUCK or FAILED state the trajectory is still recorded. You can replay it, diff it against a known-good bench run, or upload it to share a regression case.