Docs/SDK/Ordinality & Layout

Ordinality & Layout Support

Semantic Geometry is the foundational feature of Sentience SDK that helps AI agents perceive and interact with web pages. Ordinality and layout support extend this by enabling agents to understand element positions when users specify goals like "click the first result", "select the 2nd item", or "choose the first product with rating >= 4".

Overview

When users give AI agents positional instructions, the agent needs to know:

Which elements belong together (e.g., all search results vs navigation links)
What order they're in (1st, 2nd, 3rd, last)
Where they are on the page (row/column position in a grid)

Sentience SDK solves this with two complementary features:

Ordinality - Assigns position indices to elements within groups ("1st", "2nd", "last")
Layout Detection - Identifies grids, lists, and page regions (header, nav, main, footer)

Why This Matters for LLMs

Layout detection is the critical missing link that allows an LLM to reliably answer "list + constraint" queries like "Find the first product with rating >= 4".

Without layout detection, an LLM sees a flat soup of text nodes and has to "guess" which rating belongs to which product based on proximity. With layout support, that soup transforms into structured objects, making the task trivial.

The Association Problem

The biggest challenge for an LLM on a web page is knowing boundaries between items.

Without Layout: The LLM sees a text sequence like:

...[Product A]... [Price]... [3.5 Stars]... [Product B]...

It might incorrectly assume "3.5 stars" belongs to Product B if the DOM structure is messy.

With Layout: You can present the LLM with structured objects:

{
  "grid_id": 101,
  "label": "Product Card",
  "children": [
    { "text": "Wireless Headphones", "role": "title" },
    { "text": "4.0", "role": "rating" }
  ]
}

The rating is explicitly linked to its parent product - no guessing required.

The Ordering Problem

"First" is ambiguous in a responsive grid.

Without Layout: DOM order often differs from visual order (e.g., in masonry layouts or flex-direction columns). The "first" product in the DOM might actually be in the top-right corner visually.

With Layout: The grid detection algorithm sorts items by visual rows first, then columns. This guarantees that "first" means "top-left," matching human intuition.

Summary of Benefits

Feature	Benefit for "List + Constraints"
Dominant Group	Filters out noise (nav, footer) so the LLM only checks the relevant list
Container Inference	Solves the "Association Problem" - knowing which price/rating belongs to which item
Grid Sorting	Solves the "Ordering Problem" - correctly identifying the "first" item visually

How It Works

Dominant Group Detection

The SDK automatically identifies the main content group on a page. This is typically the primary list or grid that users want to interact with (search results, product listings, article feeds).

Each element gets a group_key that indicates which visual group it belongs to. The most common group is marked as the dominant_group_key in the snapshot.

from sentience import SentienceBrowser, snapshot

with SentienceBrowser() as browser:
    browser.page.goto("https://news.ycombinator.com")
    snap = snapshot(browser)

    # The dominant group is the main content area
    print(f"Dominant group: {snap.dominant_group_key}")

    # Find elements in the dominant group
    main_items = [e for e in snap.elements if e.in_dominant_group]
    print(f"Found {len(main_items)} items in main content")

Ordinal Selection

Each element in a group has a group_index (0-based position). This enables selecting elements by ordinal position:

# Get elements sorted by position in the dominant group
dominant_elements = sorted(
    [e for e in snap.elements if e.in_dominant_group],
    key=lambda e: e.group_index or 0
)

# Select by position
first_item = dominant_elements[0]   # "click the first result"
second_item = dominant_elements[1]  # "select the 2nd item"
last_item = dominant_elements[-1]   # "click the last one"

print(f"First item: {first_item.text}")
print(f"Last item: {last_item.text}")

Element Position Fields

Each element includes position data for ordinal selection:

Field	Type	Description
`center_x`	`number`	X coordinate of element center (viewport-relative)
`center_y`	`number`	Y coordinate of element center (viewport-relative)
`doc_y`	`number`	Absolute Y position in document (includes scroll offset)
`group_key`	`string`	Geometric bucket key for grouping (format: `x{bucket}-h{bucket}`)
`group_index`	`number`	Position within group (0-indexed, sorted by doc_y)
`in_dominant_group`	`boolean`	Whether element is in the main content group
`href`	`string`	Hyperlink URL (for link elements)

Layout Detection

Layout detection provides detailed grid and region information for complex page structures.

Layout Fields on Elements

Elements may include a layout field with geometric metadata:

Field	Type	Description
`grid_id`	`number`	Unique ID for the grid this element belongs to
`grid_pos`	`GridPosition`	Row and column indices (0-based)
`parent_index`	`number`	Index of inferred parent element in the elements array
`children_indices`	`number[]`	List of child element indices (capped at 30)
`region`	`string`	Page region: `header`, `nav`, `main`, `aside`, or `footer`
`grid_confidence`	`number`	Confidence score for grid assignment (0.0-1.0)

Grid Coordinates API

Get bounding boxes and metadata for detected grids:

from sentience import SentienceBrowser, snapshot

with SentienceBrowser() as browser:
    browser.page.goto("https://example.com/products")
    snap = snapshot(browser)

    # Get all detected grids
    all_grids = snap.get_grid_bounds()
    for grid in all_grids:
        print(f"Grid {grid.grid_id}: {grid.item_count} items")
        print(f"  Size: {grid.row_count} rows x {grid.col_count} cols")
        print(f"  Position: ({grid.bbox.x}, {grid.bbox.y})")
        print(f"  Label: {grid.label}")  # e.g., "product_grid", "search_results"
        print(f"  Is dominant: {grid.is_dominant}")

    # Get a specific grid by ID
    main_grid = snap.get_grid_bounds(grid_id=0)
    if main_grid:
        grid = main_grid[0]
        print(f"Main grid: {grid.bbox.width}x{grid.bbox.height} pixels")

GridInfo Properties

Property	Type	Description
`grid_id`	`number`	Unique identifier for the grid
`bbox`	`BBox`	Bounding box (x, y, width, height) in document coordinates
`row_count`	`number`	Number of rows in the grid
`col_count`	`number`	Number of columns in the grid
`item_count`	`number`	Total number of items in the grid
`label`	`string \| null`	Inferred semantic label (see below)
`is_dominant`	`boolean`	Whether this is the main content grid

Grid Labels

The SDK automatically infers grid labels based on content patterns:

Label	Detected When
`product_grid`	Price patterns ($, €, £), "Add to cart", ratings
`search_results`	Snippets, ellipses, mostly links
`article_feed`	Timestamps ("2 hours ago"), bylines, dates
`navigation`	Short text, homogeneous links, nav keywords
`button_grid`	All elements are buttons
`link_list`	80%+ of elements are links

Working with Grid Positions

Access individual element positions within a grid:

# Access element layout data
for elem in snap.elements:
    if elem.layout and elem.layout.grid_id is not None:
        print(f"Element '{elem.text}' is in grid {elem.layout.grid_id}")

        if elem.layout.grid_pos:
            row = elem.layout.grid_pos.row_index
            col = elem.layout.grid_pos.col_index
            print(f"  Position: row {row}, column {col}")

        if elem.layout.region:
            print(f"  Region: {elem.layout.region}")

Practical Examples

Example 1: Click the First Search Result

from sentience import SentienceBrowser, snapshot, click

with SentienceBrowser() as browser:
    browser.page.goto("https://google.com")
    # ... perform search ...

    snap = snapshot(browser)

    # Find all items in the dominant group (search results)
    results = sorted(
        [e for e in snap.elements if e.in_dominant_group],
        key=lambda e: e.group_index or 0
    )

    if results:
        first_result = results[0]
        click(browser, first_result.id)
        print(f"Clicked: {first_result.text}")

Example 2: Select Product in Grid by Row/Column

# Find product at row 1, column 2 (0-indexed)
target_row, target_col = 1, 2

for elem in snap.elements:
    if elem.layout and elem.layout.grid_pos:
        pos = elem.layout.grid_pos
        if pos.row_index == target_row and pos.col_index == target_col:
            print(f"Found product at ({target_row}, {target_col}): {elem.text}")
            click(browser, elem.id)
            break

Example 3: Filter by Region

# Get only elements in the main content area
main_elements = [
    e for e in snap.elements
    if e.layout and e.layout.region == "main"
]

# Get navigation links
nav_links = [
    e for e in snap.elements
    if e.layout and e.layout.region == "nav"
]

print(f"Main content: {len(main_elements)} elements")
print(f"Navigation: {len(nav_links)} links")

Important Notes

The layout field is optional and may not be present in all snapshots
Grid labels are best-effort heuristics and may not always be accurate
children_indices is capped at 30 elements to prevent large payloads
Confidence scores (grid_confidence, region_confidence) indicate detection reliability

Agent Runtime

Browser Setup