How SentienceAPI Works

Rendered DOM snapshots (post-hydration), ordinality, and Jest-style assertions for AI web agents.

"Think Jest for AI web agents — deterministic selection, pass/fail verification, explainable failures."

Not HTML parsing. Sentience snapshots the rendered DOM + layout from a live browser after SPA hydration. It works on JS-heavy sites because it captures post-hydration state.

Local-first by default. When the page is unstable, we resnapshot, then reset to checkpoint. Only if snapshot attempts are exhausted do we use optional vision fallback or model escalation.

Most tools help agents load pages or read content.
SentienceAPI helps agents act — and verify — reliably.

The Problem: Agents fail at execution, not reasoning

Modern LLMs are good at deciding what they want to do.
They are unreliable at deciding where to do it — and whether it worked.

✗

Clicking invisible or occluded elements

✗

Guessing between similar buttons

✗

No way to verify actions succeeded

✗

Defaulting to vision loops when DOM structure is available

✗

Non-reproducible behavior across runs

✗

No deterministic way to select "3rd result"

✗

No explicit failure policy (retry/reset/escalate)

Sentience fixes this with semantic snapshots, ordinal selection, and built-in assertions.

Failure modes are explicit: resnapshot → reset → fallback → escalate.

Two Paths: Local or Gateway

Sentience works entirely locally with the browser extension, or you can add the optional Gateway for ML-powered reranking.

Local Mode

Extension-only • Free

✓

Rendered DOM snapshots

Post-hydration state, not static HTML

✓

Ordinal selection

group_key, group_index, dominant_group

✓

Layout detection

Grid positions, regions, parent/child

✓

Jest-style assertions

assert_, assert_done for verification

Works on SPAs because snapshots are taken from the live rendered page, not static HTML.

Best for: Development, testing, cost-sensitive production, frameworks like browser-use

Gateway Mode

Cloud API • Pro/Enterprise

✓

Everything in Local, plus:

✓

ML-powered reranking

ONNX model for optimal element selection

✓

Goal-conditioned reranking

Improves target ordering when you provide a goal

✓

Cloud trace storage

Sentience Studio for team debugging

Best for: Production agents, maximum accuracy, team collaboration, observability

Both Modes Include Full Tracing

•Every snapshot recorded

•Replay, diff, debug any run

•Local JSON or Studio

•Confidence + reasons on instability

What's Inside a Semantic Snapshot

Each snapshot contains ~0.6–1.2k tokens per step — enough for your LLM to make deterministic decisions.

With structured snapshots, 3B-class models become viable. Larger models (7B/14B+) still help with planning and recovery, but they're no longer required just to operate a browser.

Example Element in Snapshot

{
  "id": 42,
  "role": "button",
  "text": "Add to Cart",
  "importance": 95,
  "bbox": { "x": 320, "y": 480, "width": 120, "height": 40 },
  "in_viewport": true,
  "is_occluded": false,
  "visual_cues": {
    "is_primary": true,
    "is_clickable": true,
    "background_color_name": "green"
  },
  // Ordinal fields for "click 3rd result" queries
  "group_key": "480_main",
  "group_index": 2,
  "in_dominant_group": true,
  // Layout detection
  "layout": {
    "grid_id": 1,
    "grid_pos": { "row_index": 0, "col_index": 2 },
    "region": "main"
  }
}

Snapshot Metadata

{
  "snapshot_confidence": 0.92,
  "stability_reasons": [],  // empty = stable
  // On unstable pages:
  // "stability_reasons": ["dom_unstable", "layout_shifting"]
}

Core Fields

• id – stable element identifier
• role – button, link, input, etc.
• text – visible label
• bbox – exact pixel coordinates

Visibility

• in_viewport – currently visible
• is_occluded – covered by overlay
• importance – relevance score
• is_primary – main CTA

Ordinal Selection

• group_key – geometric bucket
• group_index – position in group
• in_dominant_group – main content
• grid_pos – row/column indices

~0.6–1.2k tokens per snapshot — compare to 10–50k+ tokens for vision-based approaches

How It Works (5 Steps + Failure Policy)

Your Agent Defines the Goal

Your agent (LLM + logic) decides what it wants to do:

•"Click the third search result"
•"Add the item to cart"
•"Assert the confirmation message appears"

Sentience does not replace planning or reasoning.

The Sentience SDK Controls the Browser

Using the Sentience SDK (Python or TypeScript), your agent:

•launches a real browser
•navigates pages (waits for SPA hydration)
•requests a snapshot()

This snapshot is not raw HTML. Not screenshots by default. It's the rendered DOM + layout signals captured from the live page after hydration.

Browser + WASM Capture Post-Hydration State

A lightweight browser extension captures from the rendered page:

•post-hydration DOM state (JS-rendered content)
•element bounding boxes (x, y, w, h)
•visibility and occlusion at time of action
•layout structure and stable coordinates

No inference. No guessing. Ground truth from the live browser.

Deterministic Actions Execute

Your agent selects a target from the snapshot and executes:

•click("Add to Cart")
•type("search input", "query")
•Use ordinals: group_index=2 for "3rd result"
•Use grids: row_index=0, col_index=2

Actions execute exactly where intended — no coordinate guessing.

Failure Policy: Explicit Recovery

When actions fail or pages are unstable, Sentience follows a deterministic escalation path:

Resnapshot

When DOM is unstable, take a fresh snapshot

Reset to checkpoint

On repeated failures, return to known state

Vision fallback (optional)

Only when snapshot attempts are exhausted

Model escalation (optional)

Bigger local model or cloud API when needed

Assertions remain the verifier throughout — pass or fail, not "maybe."

Verify with Jest-Style Assertions

Like Jest for web automation — assert expectations, not hope:

•assert_("Order confirmed") — verify text appears
•assert_("cart badge", text="3") — verify content
•assert_done("checkout complete") — task completion

Assertions use the same semantic snapshot — deterministic, traceable verification.

Want to see this in action?

Run a live example using the Sentience SDK — no setup required.

👉Try it live

What Makes This Different

Sentience vs Browser Infrastructure

Browser infrastructure gives you a place to run code.

Sentience gives your agent certainty about where to act.

Without grounded action selection, agents still guess.

Sentience vs Scrapers / Read APIs

Scrapers parse static HTML. Sentience snapshots the rendered DOM after SPA hydration.

Scrapers don't tell agents:

• what is clickable
• what is visible
• where it is on screen

Reading ≠ acting. Sentience is for agents that must interact.

Works with browser-use

Already using browser-use? Sentience integrates seamlessly via BrowserUseAdapter — just swap your backend and get semantic snapshots + assertions.

View browser-use integration guide

What the Agent Actually Receives

Instead of pixels (~10–50k tokens) or raw DOM, your agent gets a compact semantic snapshot:

Ranked actionable elements

~0.6–1.2k tokens per step

Ordinal selection fields

group_key, group_index, dominant_group

Layout detection

Grid positions, regions, parent/child

Visibility signals

in_viewport, is_occluded, is_primary

Token Efficiency Comparison:

✗Vision: 10–50k tokens/step

✗Raw DOM: 5–20k tokens

✓Sentience: ~0.6–1.2k tokens

Built-In Observability (Traces & Studio)

Every step is recorded automatically — use local JSON traces or Sentience Studio:

•snapshots

•ranked targets

•chosen action

•execution result

These traces power:

→step-by-step replay

→visual debugging

→determinism diffing

→CI-style validation

When something fails, you get a reasoned failure artifact — not a vague LLM apology.

When You Should Use Sentience

Sentience is designed for:

Agents that must act, not just read
Production workflows where retries are expensive
Systems that need auditability and replay
Teams debugging real-world agent failures

If your agent only reads text, Sentience is unnecessary.

If your agent must click, type, scroll, or submit — Sentience is the missing layer.

Try It Live

If you're building agents that must act, SentienceAPI is the missing layer.

Explore interactive SDK examples or test the API directly with real automation scenarios

Select Example:

Navigate to a login page, find email/password fields semantically, and submit the form.

1# No selectors. No vision. Stable semantic targets.
2from sentience import SentienceBrowser, snapshot, find, click, type_text, wait_for
3
4# Initialize browser with API key
5browser = SentienceBrowser(api_key="sk_live_...")
6browser.start()
7
8# Navigate to login page
9browser.page.goto("https://example.com/login")
10
11# PERCEPTION: Find elements semantically
12snap = snapshot(browser)
13email_field = find(snap, "role=textbox text~'email'")
14password_field = find(snap, "role=textbox text~'password'")
15submit_btn = find(snap, "role=button text~'sign in'")
16
17# ACTION: Interact with the page
18type_text(browser, email_field.id, "user@example.com")
19type_text(browser, password_field.id, "secure_password")
20click(browser, submit_btn.id)
21
22# VERIFICATION: Wait for navigation
23wait_for(browser, "role=heading text~'Dashboard'", timeout=5.0)
24
25print("✅ Login successful!")
26browser.close()

Execution Output

🎯 Semantic Discovery

Find elements by role, text, and visual cues - not fragile CSS selectors

⚡ Token Optimization

Intelligent filtering reduces token usage by up to 73% vs vision models

🔒 Deterministic

Same input produces same output every time - no random failures

SentienceAPI focuses on execution intelligence. Browser runtimes and navigation engines are intentionally decoupled.