Docs/SDK Overview

Sentience SDK Overview

Sentience is like Jest for AI web agents. It provides semantic snapshots and assertions so agents can act deterministically and verify results — without vision models or brittle CSS selectors.

browser-use acts; Sentience asserts.
Snapshots compress the UI into a bounded action space.

The SDK runs standalone with Playwright or embeds under agent frameworks like browser-use. Gateway API is optional — the core features work locally with just the browser extension.

How It Works

Sentience operates in two modes:

Mode A: Local (Extension-Only)

SDK calls the browser extension to collect raw + semantic geometry
SDK produces a snapshot prompt (structured text, ~0.6–1.2k tokens)
Agent selects an action (click, type, scroll)
SDK executes via the backend protocol
SDK runs assert_ checks and records the trace

No server calls required. Works offline and with local LLMs.

Mode B: Gateway (Optional)

Same flow as above, but between steps 1 and 2:

Raw elements are sent to the Gateway for heuristic filtering, dominant group detection, and ONNX-based reranking (goal-conditioned)

Gateway is optional — use it for production accuracy boosts, not required for development.

What's in a Semantic Snapshot

Each snapshot contains elements with these key fields:

Field	Description
`id`	Stable element identifier
`role`	Semantic role (button, link, input, etc.)
`text`	Normalized visible text
`importance`	Relevance score (0–100)
`bbox`	Bounding box coordinates
`doc_y` / `center_y`	Document and viewport Y positions
`group_key`	Geometric bucket for ordinal grouping
`group_index`	Position within group (0-indexed)
`in_dominant_group`	Whether element is in main content area
`in_viewport`	Currently visible on screen
`is_occluded`	Covered by overlay/modal
`href`	Link URL (for anchor elements)

Compact format example:

ID|role|text|imp|docYq|ord|DG|href
42|button|Add to Cart|95|480|0|true|
43|link|Product Details|70|520|1|true|/products/123
44|button|Buy Now|90|480|2|true|
45|link|Reviews|60|560|3|true|#reviews
46|input|Search|80|100|0|false|

This structured format lets LLMs make deterministic selections without parsing raw HTML.

Token Usage

Semantic snapshots are designed for token efficiency:

~0.6–1.2k tokens per step (typical snapshot prompt)
Compare to vision: 10–50k tokens per screenshot
Compare to raw DOM: 5–20k tokens

Why it helps:

Fewer screenshots needed (structured data replaces vision)
Bounded action space (only actionable elements included)
No DOM spam (filtered and ranked by importance)

Token savings vary by page complexity, but the bounded format consistently keeps prompts small enough for local LLMs like Qwen 2.5 3B.

Assertions (`assert_`)

Assertions are machine-verifiable checks over fresh snapshots. They help agents know when they're "done" and prevent drift.

Available assertions:

Assertion	Description
`assert_(selector)`	Verify element exists and is visible
`assert_(selector, text="...")`	Verify element contains expected text
`assert_done(condition)`	Mark task as complete when condition met
`url_matches(pattern)`	Check current URL matches regex
`url_contains(substring)`	Check URL contains string
`exists(selector)`	Element exists in snapshot

Planned assertions:

assert_in_dominant_group(id) — verify element is in main content
assert_ordinal_first(selector) — verify element is first in its group

Example usage:

await runtime.snapshot()
runtime.assert_(exists("role=button"), label="has_buttons")
runtime.assert_(url_contains("checkout"), label="on_checkout")

if runtime.assert_done(exists("text~'Order confirmed'"), label="complete"):
    print("Task finished!")

browser-use Integration

Sentience integrates with browser-use via a backend adapter, enabling:

Screenshots-off runs (semantic snapshots replace vision)
Local LLM support (Qwen 2.5 3B works reliably with bounded prompts)
Full tracing and assertions on top of browser-use actions

Installation:

pip install "sentience[browser-use]"

Usage:

from browser_use import BrowserSession, BrowserProfile
from sentience import get_extension_dir
from sentience.backends import BrowserUseAdapter
from sentience.agent_runtime import AgentRuntime

# Setup browser-use with Sentience extension
profile = BrowserProfile(args=[f"--load-extension={get_extension_dir()}"])
session = BrowserSession(browser_profile=profile)
await session.start()

# Create backend adapter
adapter = BrowserUseAdapter(session)
backend = await adapter.create_backend()

# Create runtime for assertions
runtime = AgentRuntime(backend=backend, tracer=tracer)
await runtime.snapshot()
runtime.assert_(exists("role=button"), label="ready")

See the browser-use integration guide for full details.

Live demos and examples:

Traces and Replay

Every agent run produces a trace containing:

Snapshots at each step
Actions executed (click, type, scroll)
Assertion results (pass/fail with labels)
Screenshots (optional)

Traces enable:

Debugging: Step through what the agent saw and did
Regression testing: Compare traces across runs
CI validation: Assert expected behavior in automated tests

Traces can be stored locally (JSON/JSONL) or uploaded to Sentience Studio for visual debugging.

Open Source Repositories

Python SDK — pip install sentience
TypeScript SDK — npm install sentienceapi

Next Steps

Installation → — Install Python or TypeScript SDK
Quick Start → — Get started in 5 minutes
Agent Runtime → — Jest-style assertions for agents
Ordinality & Layout → — Position-based selection
browser-use Integration → — Use with browser-use

Installation