This guide shows:
PydanticAI docs: Pydantic AI
Use PydanticAI as the orchestration layer while keeping Sentience as the browser capability layer with typed tools, bounded context, and action verification.
snapshot uses limit=50 by default.click, type_text, press_key) plus lightweight guards (verify_url_matches, verify_text_present) to build reliable flows.tracer.close())From the Python SDK:
pip install sentienceapi[pydanticai]Sentience provides a small integration layer:
SentiencePydanticDeps: deps container (DI) for PydanticAIregister_sentience_tools(agent): registers Sentience tools on your PydanticAI agentImports:
from sentience import AsyncSentienceBrowser
from sentience.integrations.pydanticai import SentiencePydanticDeps, register_sentience_toolsPydanticAI passes dependencies through ctx.deps. We inject:
browser: AsyncSentienceBrowsertracer: sentience.tracing.Tracerdeps = SentiencePydanticDeps(browser=browser, tracer=tracer)
result = await agent.run("...", deps=deps)Registered tools include:
| Tool | Description |
|---|---|
snapshot_state(limit=50, include_screenshot=False) | Bounded BrowserState(url, elements[]) |
read_page(format="text"|"markdown"|"raw") | Returns ReadResult |
| Tool | Description |
|---|---|
click(element_id) | Click a specific element by ID |
type_text(element_id, text) | Type text into element |
press_key(key) | Send a keypress (e.g., "Enter") |
scroll_to(element_id, behavior, block) | Scroll element into view |
navigate(url) | Navigate to URL |
click_rect(x, y, width, height, button, click_count) | Click by pixel coordinates |
| Tool | Description |
|---|---|
find_text_rect(text, case_sensitive=False, whole_word=False, max_results=10) | Find text coordinates on page |
| Tool | Description |
|---|---|
verify_url_matches(pattern) | Check URL contains pattern |
verify_text_present(text, format, case_sensitive) | Check text appears on page |
assert_eventually_url_matches(pattern, timeout_s, poll_s) | Wait for URL to match pattern |
Notes:
limit capped unless you explicitly need more.type_text tracing intentionally avoids recording the full text payload to reduce accidental PII leakage.snapshot_state(...)limit, default 50) and returns a typed summary of interactive elements.click, type_text, scroll_to.snapshot_state()read_page(...)format="text" for simple checksformat="markdown" for more structured extractionasync because they drive a live browser session and often wait for navigation/DOM updates.await (e.g., await browser.goto(...), await scroll_to_async(...)).click(...), type_text(...), scroll_to(...), snapshot(...)) for non-PydanticAI usage, but the PydanticAI toolset is designed to be async-first.click(element_id)snapshot_state() when you have a target button/link.await it internally).type_text(element_id, text)await it internally).press_key(key)"Enter", "Escape", "Tab").press_key("Enter").await it internally).scroll_to(element_id, ...)snapshot_state() contains your element but it's not in the viewportawait it internally).navigate(url)page.goto through AsyncSentienceBrowser.goto).await it internally).click_rect(x, y, width, height, ...)find_text_rect) but don't have a stable element id.find_text_rect("Sign In") → click the first visible match's rectangle center.await it internally).These are best used after an action to confirm the browser is now in the expected state.
verify_url_matches(pattern)/checkout.verify_text_present(text, ...)"Thank you" appears.assert_eventually_url_matches(pattern, timeout_s=..., poll_s=...)verify_url_matches in a loop until:
timeout_s is reached.poll_s seconds, it re-checks the URL.This is the minimal working pattern:
import asyncio
from pydantic import BaseModel
from pydantic_ai import Agent
from sentience import AsyncSentienceBrowser
from sentience.integrations.pydanticai import SentiencePydanticDeps, register_sentience_tools
class PageSummary(BaseModel):
url: str
headline: str
async def main():
browser = AsyncSentienceBrowser(headless=False)
await browser.start()
await browser.page.goto("https://example.com")
agent = Agent(
"openai:gpt-5",
deps_type=SentiencePydanticDeps,
output_type=PageSummary,
instructions="Use the Sentience tools to read the page and return a typed summary.",
)
register_sentience_tools(agent)
deps = SentiencePydanticDeps(browser=browser)
result = await agent.run("Return the url and the main headline.", deps=deps)
print(result.output)
await browser.close()
if __name__ == "__main__":
asyncio.run(main())This pattern is ideal when you care about validated structured data.
See also: sdk-python/examples/pydantic_ai/pydantic_ai_typed_extraction.py
High-level approach:
read_page(format="markdown") or read_page(format="text")See also: sdk-python/examples/pydantic_ai/pydantic_ai_self_correcting_click.py
Pattern:
snapshot_state() → find element IDclick(element_id)assert_eventually_url_matches(...) to confirm the click really navigatedThis is a common "reliable interaction" sequence when the target element is off-screen:
navigate(url) to force a known starting statesnapshot_state() to get element IDsscroll_to(element_id) to bring the target into viewclick(element_id) to interactassert_eventually_url_matches(...) to confirm the state transitionConcrete (copy/paste) example:
import asyncio
from pydantic_ai import Agent
from sentience import AsyncSentienceBrowser
from sentience.integrations.pydanticai import SentiencePydanticDeps, register_sentience_tools
async def main():
browser = AsyncSentienceBrowser(headless=False)
await browser.start()
agent = Agent(
"openai:gpt-5",
deps_type=SentiencePydanticDeps,
output_type=str,
instructions=(
"Use these tools in order: "
"navigate(url), snapshot_state(), scroll_to(element_id), click(element_id), "
"then assert_eventually_url_matches(...) if navigation is expected."
),
)
register_sentience_tools(agent)
deps = SentiencePydanticDeps(browser=browser)
result = await agent.run(
"Go to https://example.com, find a link, scroll to it if needed, click it, and confirm URL changed.",
deps=deps,
)
print(result.output)
await browser.close()
if __name__ == "__main__":
asyncio.run(main())Use find_text_rect("Sign In") when the best handle is visible text.
from pydantic_ai import Agent
# ... create browser + agent + register tools ...
# In your agent instructions, encourage:
# 1) find_text_rect("Sign In")
# 2) click_rect(...) using the returned coordinatesConcrete pattern:
find_text_rect("Sign In")in_viewportclick_rect(x=match.rect.x, y=match.rect.y, width=match.rect.width, height=match.rect.height)Concrete (copy/paste) example (direct tool calls, no LLM decision-making):
import asyncio
from pydantic_ai import Agent
from sentience import AsyncSentienceBrowser
from sentience.integrations.pydanticai import SentiencePydanticDeps, register_sentience_tools
async def main():
browser = AsyncSentienceBrowser(headless=False)
await browser.start()
await browser.goto("https://example.com")
agent = Agent(
"openai:gpt-5",
deps_type=SentiencePydanticDeps,
output_type=str,
instructions="You may call Sentience tools, but the Python code will also demonstrate direct tool usage.",
)
tools = register_sentience_tools(agent)
ctx = type("Ctx", (), {})()
ctx.deps = SentiencePydanticDeps(browser=browser)
# 1) Locate text on screen
matches = await tools["find_text_rect"](ctx, "Sign In")
if matches.status != "success" or not matches.results:
raise RuntimeError(f"Text not found: {matches.error}")
# 2) Click the first in-viewport match by rectangle
m0 = next((m for m in matches.results if m.in_viewport), matches.results[0])
await tools["click_rect"](
ctx,
x=m0.rect.x,
y=m0.rect.y,
width=m0.rect.width,
height=m0.rect.height,
)
await browser.close()
if __name__ == "__main__":
asyncio.run(main())Notes:
snapshot_state → click(element_id)), since it's usually more stable.find_text_rect + click_rect when:
When you pass a tracer via SentiencePydanticDeps(..., tracer=tracer), each tool call emits structured trace events:
run_start — marks the beginning of an agent runstep_start — before each tool invocationstep_end — after each tool completeserror — when exceptions occurThis gives you a clean, replayable timeline of what the agent actually did in the browser, separate from PydanticAI's orchestration layer.
Sentience tracing supports two modes:
Local tracing writes JSONL to disk (JsonlTraceSink) for debugging and development:
from sentience import create_tracer
from sentience.integrations.pydanticai import SentiencePydanticDeps
# Create local tracer
tracer = create_tracer(run_id="pydanticai-demo")
deps = SentiencePydanticDeps(browser=browser, tracer=tracer)
result = await agent.run("...", deps=deps)
# Always close to flush events
tracer.close()Cloud tracing (Pro/Enterprise) buffers JSONL locally and uploads once on tracer.close():
from sentience import create_tracer
from sentience.integrations.pydanticai import SentiencePydanticDeps
# Create cloud tracer
tracer = create_tracer(
api_key="sk_pro_...",
upload_trace=True,
goal="PydanticAI + Sentience run",
agent_type="PydanticAI",
)
deps = SentiencePydanticDeps(browser=browser, tracer=tracer)
result = await agent.run("...", deps=deps)
# Uploads trace on close
tracer.close()Key insight: Your framework (PydanticAI) owns LLM orchestration, while Sentience owns browser execution + structured state.
You can (and often should) instrument both:
This dual-layer observability gives you complete visibility into both what the agent decided and what it actually did in the browser.
| Issue | Solution |
|---|---|
window.sentience is not available | Ensure the Sentience extension is loaded and injected into the Playwright session. |
| Tool calls succeed but nothing changes | Add guards: verify_url_matches, verify_text_present, and/or assert_eventually_url_matches. |
| Extraction is flaky | Prefer read_page(format="markdown") for extraction and keep snapshot_state(limit=50) for interaction targeting. |
Last updated: January 2026