Execution Intelligence for AI Agents
Browser infrastructure gives you a browser. Scrapers give you text. SentienceAPI gives agents a grounded action space— visible, clickable elements with deterministic geometry and lightweight visual cues.
Note: This page compares action grounding (deciding what to click and where). Full browser execution is handled by an external runtime.
Browserbase
→ Infrastructure (cloud browsers)
Firecrawl
→ Data Extraction (text/markdown)
SentienceAPI
→ Execution Intelligence (grounded action space)
Feature Comparison
Focus: what an agent can reliably use to choose actions (not marketing checkboxes).
| Capability | Browserbase | Firecrawl | SentienceAPI |
|---|---|---|---|
| Primary Use Case | Browser runtime | Read / RAG | Agent action grounding |
| Output an agent can act on | Screenshots / DOM | Markdown / JSON | Grounded action space |
| Element coordinates (x, y, w, h) | Not native | N/A | Yes |
| Visibility & occlusion awareness | Requires inference | Read-only | Explicit signals |
| Visual cues for action choice (e.g. is_primary) | Model-inferred | None | Computed |
| Determinism (same page → same grounding) | Varies by model | High (read-only) | High |
| Retries needed to choose a target | Often 1–3 | N/A | Often 0 |
| Integration surface | Browser + model | Scraper API | One grounding API |
| Best For | Execution runtime | Reading & extraction | Agents that must act |
Benchmark Methodology
We benchmark the decision layer: how much cost and uncertainty it takes to identify the correct action target (what to click/type, and where). Full browser execution (navigation, JavaScript side effects, session state) is handled by an external runtime and is reported separately.
Test Cases
- Link Click: example.com → click “Learn more”
- Search Input: wikipedia.org → locate search box and submit
- Commerce Flow: amazon.com → Best Sellers → PDP → Add to Cart (when accessible)
Metrics
- Decision cost: tokens/credits per successful target selection
- Retries per action: how many re-attempts before selecting a valid element
- Misclick rate: wrong target selected (e.g., image instead of CTA)
- Access reliability: block/throttle rate and tokens wasted on blocked pages
- Determinism: same page snapshot → same grounded map
Fairness Notes
- We report blocked pages explicitly (no stack can click “Add to Cart” if the page is throttled).
- SentienceAPI can simulate multi-step flows by chaining observations across pages; execution is optional.
- Where execution is involved, we separate “grounding success” from “navigation success.”
Why browser infrastructure isn't enough
Browser infrastructure gives you a place to run a session, but agents still need a reliable way to choose actions. Vision-first loops often guess and retry. SentienceAPI provides a grounded action space upfront: visible elements, deterministic geometry, and lightweight visual cues for action selection.
Why reading ≠ acting
Content extraction tools are excellent for reading, summarization, and RAG. But agents that must interact need more than text — they need to know what is clickable, what is visible, and where it is on the screen.
Why determinism matters for production agents
Production agents must be debuggable and reproducible. Deterministic grounding reduces retries, reduces cost variance, and makes failures explainable (blocked page, occlusion, ambiguity) instead of mysterious.
If you're building agents that must act, SentienceAPI is the missing layer.
Try the Playground — Explore SDK Examples or Test the API Directly
Navigate to a login page, find email/password fields semantically, and submit the form.
1# No selectors. No vision. Stable semantic targets.
2from sentience import SentienceBrowser, snapshot, find, click, type_text, wait_for
3
4# Initialize browser with API key
5browser = SentienceBrowser(api_key="sk_live_...")
6browser.start()
7
8# Navigate to login page
9browser.page.goto("https://example.com/login")
10
11# PERCEPTION: Find elements semantically
12snap = snapshot(browser)
13email_field = find(snap, "role=textbox text~'email'")
14password_field = find(snap, "role=textbox text~'password'")
15submit_btn = find(snap, "role=button text~'sign in'")
16
17# ACTION: Interact with the page
18type_text(browser, email_field.id, "user@example.com")
19type_text(browser, password_field.id, "secure_password")
20click(browser, submit_btn.id)
21
22# VERIFICATION: Wait for navigation
23wait_for(browser, "role=heading text~'Dashboard'", timeout=5.0)
24
25print("✅ Login successful!")
26browser.close()🎯 Semantic Discovery
Find elements by role, text, and visual cues - not fragile CSS selectors
⚡ Token Optimization
Intelligent filtering reduces token usage by up to 73% vs vision models
🔒 Deterministic
Same input produces same output every time - no random failures