How to Run Web Agents with Local LLM (3B)

Running web agents locally used to feel unrealistic.

Most agents today:

rely on vision models
send screenshots every step
dump large DOMs into the prompt
assume cloud-scale inference

That works — but it’s expensive, slow, and hard to trust.

At Sentience, we took a different approach:
remove pixels, reduce DOM noise, and verify outcomes structurally.

The result:

agents that run on 3B–14B local models
with ~50% lower token usage per step
while still completing real browser tasks correctly

This post shows how.

The core problem: pixels and raw DOMs are expensive

A typical vision-first browser agent does something like:

Take a screenshot
Serialize a large DOM
Ask the model “what do you see?”
Guess what to click next

That means:

thousands of tokens per step
vision tokens every iteration
lots of irrelevant UI noise
retries that silently burn cost

This is why small local models struggle:
they’re overwhelmed by perception, not reasoning.

The alternative: structure-first snapshots

Instead of sending pixels or raw HTML, Sentience snapshots the rendered DOM after hydration and formats it for LLM reasoning.

Example (Hacker News “Show HN”):

id	role	name	importance	doc_y	ordinal	dominant_group
49	link	Show HN: 15 Years of StarCraft II…	173	0	15	1
454	link	Show HN: InfiniteGPU…	192	3	230	1
550	link	Show HN: ElixirBrowser…	189	4	282	1

What the agent sees:

only interactive elements
ordered by importance and position
grouped into dominant repeated structures (feeds)
no screenshots
no full DOM dump

This is semantic geometry, not scraping.

Token usage: before vs after

Before: Vision + full DOM

DOM limit: 40,000 chars
DOM state: ~10,195 chars (~2,548 tokens)
Screenshots sent to model
~3,166 tokens per step

Total tokens (task): 37,051
Total cost: $0.0096

After: Sentience SDK (no vision, ranked DOM)

DOM limit: 5,000 chars
DOM state: 5,000 chars (~1,250 tokens)
Vision disabled (0 screenshots)
~1,604 tokens per step

Total tokens (task): 14,143
Total cost: $0.0043

~50% reduction in tokens per step
~55% reduction in total cost
Same task completed successfully

What the agent actually did

With Sentience integrated into browser-use, the agent:

received 50 ranked semantic elements (1,557 chars)
reasoned over structure, not pixels
identified the top Show HN post
completed the task in fewer steps

Log excerpt:

🧠 Sentience: Injected 50 semantic elements (1557 chars)
📊 DOM state truncated to 5000 chars (~1250 tokens)
✅ Vision DISABLED
▶️ done: The number 1 post on Show HN is:
   “Show HN: 15 Years of StarCraft II Balance Changes…”

No screenshots.
No guessing.
No retries.

Why this enables local models

Small models aren’t bad at reasoning.
They’re bad at filtering noise.

By removing:

irrelevant DOM nodes
layout boilerplate
pixel-level perception

…we reduce the reasoning load to something a 3B model can handle.

We’ve validated multi-step browser tasks using Qwen 2.5 3B locally with this approach:

fewer tokens
predictable behavior
deterministic completion

Vision is optional — and used last

Vision isn’t banned.

But in Sentience:

vision is disabled by default
structure is tried first
vision can be used only after snapshot exhaustion
assertions stay the same

This keeps costs low without sacrificing correctness.

The real takeaway

Token efficiency isn’t just about cost.

It’s about making agents practical:

local execution
privacy-friendly
predictable CI runs
smaller models that actually work

The biggest gains didn’t come from better prompts.
They came from changing what the model sees.

Try it yourself

browser-use + Sentience integration:
github.com/SentienceAPI/browser-use/pull/1
multi-step tasks on Qwen 2.5 3B:
github.com/SentienceAPI/browser-use/pull/6

If you care about running agents locally, this is the direction.

One-line summary

To run agents locally, stop sending pixels.
Send structure, verify outcomes, and let small models reason.

Want to run agents locally?

Start with the SDK quickstart and see how structure-first snapshots change what the model sees.

Read the SDK Quickstart