Docs/SDK/Ordinality & Layout

Ordinality & Layout Support

Semantic Geometry is the foundational feature of Sentience SDK that helps AI agents perceive and interact with web pages. Ordinality and layout support extend this by enabling agents to understand element positions when users specify goals like "click the first result", "select the 2nd item", or "choose the first product with rating >= 4".

Overview

When users give AI agents positional instructions, the agent needs to know:

  1. Which elements belong together (e.g., all search results vs navigation links)
  2. What order they're in (1st, 2nd, 3rd, last)
  3. Where they are on the page (row/column position in a grid)

Sentience SDK solves this with two complementary features:

Why This Matters for LLMs

Layout detection is the critical missing link that allows an LLM to reliably answer "list + constraint" queries like "Find the first product with rating >= 4".

Without layout detection, an LLM sees a flat soup of text nodes and has to "guess" which rating belongs to which product based on proximity. With layout support, that soup transforms into structured objects, making the task trivial.

The Association Problem

The biggest challenge for an LLM on a web page is knowing boundaries between items.

Without Layout: The LLM sees a text sequence like:

...[Product A]... [Price]... [3.5 Stars]... [Product B]...

It might incorrectly assume "3.5 stars" belongs to Product B if the DOM structure is messy.

With Layout: You can present the LLM with structured objects:

{
  "grid_id": 101,
  "label": "Product Card",
  "children": [
    { "text": "Wireless Headphones", "role": "title" },
    { "text": "4.0", "role": "rating" }
  ]
}

The rating is explicitly linked to its parent product - no guessing required.

The Ordering Problem

"First" is ambiguous in a responsive grid.

Without Layout: DOM order often differs from visual order (e.g., in masonry layouts or flex-direction columns). The "first" product in the DOM might actually be in the top-right corner visually.

With Layout: The grid detection algorithm sorts items by visual rows first, then columns. This guarantees that "first" means "top-left," matching human intuition.

Summary of Benefits

FeatureBenefit for "List + Constraints"
Dominant GroupFilters out noise (nav, footer) so the LLM only checks the relevant list
Container InferenceSolves the "Association Problem" - knowing which price/rating belongs to which item
Grid SortingSolves the "Ordering Problem" - correctly identifying the "first" item visually

How It Works

Dominant Group Detection

The SDK automatically identifies the main content group on a page. This is typically the primary list or grid that users want to interact with (search results, product listings, article feeds).

Each element gets a group_key that indicates which visual group it belongs to. The most common group is marked as the dominant_group_key in the snapshot.

from sentience import SentienceBrowser, snapshot

with SentienceBrowser() as browser:
    browser.page.goto("https://news.ycombinator.com")
    snap = snapshot(browser)

    # The dominant group is the main content area
    print(f"Dominant group: {snap.dominant_group_key}")

    # Find elements in the dominant group
    main_items = [e for e in snap.elements if e.in_dominant_group]
    print(f"Found {len(main_items)} items in main content")

Ordinal Selection

Each element in a group has a group_index (0-based position). This enables selecting elements by ordinal position:

# Get elements sorted by position in the dominant group
dominant_elements = sorted(
    [e for e in snap.elements if e.in_dominant_group],
    key=lambda e: e.group_index or 0
)

# Select by position
first_item = dominant_elements[0]   # "click the first result"
second_item = dominant_elements[1]  # "select the 2nd item"
last_item = dominant_elements[-1]   # "click the last one"

print(f"First item: {first_item.text}")
print(f"Last item: {last_item.text}")

Element Position Fields

Each element includes position data for ordinal selection:

FieldTypeDescription
center_xnumberX coordinate of element center (viewport-relative)
center_ynumberY coordinate of element center (viewport-relative)
doc_ynumberAbsolute Y position in document (includes scroll offset)
group_keystringGeometric bucket key for grouping (format: x{bucket}-h{bucket})
group_indexnumberPosition within group (0-indexed, sorted by doc_y)
in_dominant_groupbooleanWhether element is in the main content group
hrefstringHyperlink URL (for link elements)

Layout Detection

Layout detection provides detailed grid and region information for complex page structures.

Layout Fields on Elements

Elements may include a layout field with geometric metadata:

FieldTypeDescription
grid_idnumberUnique ID for the grid this element belongs to
grid_posGridPositionRow and column indices (0-based)
parent_indexnumberIndex of inferred parent element in the elements array
children_indicesnumber[]List of child element indices (capped at 30)
regionstringPage region: header, nav, main, aside, or footer
grid_confidencenumberConfidence score for grid assignment (0.0-1.0)

Grid Coordinates API

Get bounding boxes and metadata for detected grids:

from sentience import SentienceBrowser, snapshot

with SentienceBrowser() as browser:
    browser.page.goto("https://example.com/products")
    snap = snapshot(browser)

    # Get all detected grids
    all_grids = snap.get_grid_bounds()
    for grid in all_grids:
        print(f"Grid {grid.grid_id}: {grid.item_count} items")
        print(f"  Size: {grid.row_count} rows x {grid.col_count} cols")
        print(f"  Position: ({grid.bbox.x}, {grid.bbox.y})")
        print(f"  Label: {grid.label}")  # e.g., "product_grid", "search_results"
        print(f"  Is dominant: {grid.is_dominant}")

    # Get a specific grid by ID
    main_grid = snap.get_grid_bounds(grid_id=0)
    if main_grid:
        grid = main_grid[0]
        print(f"Main grid: {grid.bbox.width}x{grid.bbox.height} pixels")

GridInfo Properties

PropertyTypeDescription
grid_idnumberUnique identifier for the grid
bboxBBoxBounding box (x, y, width, height) in document coordinates
row_countnumberNumber of rows in the grid
col_countnumberNumber of columns in the grid
item_countnumberTotal number of items in the grid
labelstring | nullInferred semantic label (see below)
is_dominantbooleanWhether this is the main content grid

Grid Labels

The SDK automatically infers grid labels based on content patterns:

LabelDetected When
product_gridPrice patterns ($, €, £), "Add to cart", ratings
search_resultsSnippets, ellipses, mostly links
article_feedTimestamps ("2 hours ago"), bylines, dates
navigationShort text, homogeneous links, nav keywords
button_gridAll elements are buttons
link_list80%+ of elements are links

Working with Grid Positions

Access individual element positions within a grid:

# Access element layout data
for elem in snap.elements:
    if elem.layout and elem.layout.grid_id is not None:
        print(f"Element '{elem.text}' is in grid {elem.layout.grid_id}")

        if elem.layout.grid_pos:
            row = elem.layout.grid_pos.row_index
            col = elem.layout.grid_pos.col_index
            print(f"  Position: row {row}, column {col}")

        if elem.layout.region:
            print(f"  Region: {elem.layout.region}")

Practical Examples

Example 1: Click the First Search Result

from sentience import SentienceBrowser, snapshot, click

with SentienceBrowser() as browser:
    browser.page.goto("https://google.com")
    # ... perform search ...

    snap = snapshot(browser)

    # Find all items in the dominant group (search results)
    results = sorted(
        [e for e in snap.elements if e.in_dominant_group],
        key=lambda e: e.group_index or 0
    )

    if results:
        first_result = results[0]
        click(browser, first_result.id)
        print(f"Clicked: {first_result.text}")

Example 2: Select Product in Grid by Row/Column

# Find product at row 1, column 2 (0-indexed)
target_row, target_col = 1, 2

for elem in snap.elements:
    if elem.layout and elem.layout.grid_pos:
        pos = elem.layout.grid_pos
        if pos.row_index == target_row and pos.col_index == target_col:
            print(f"Found product at ({target_row}, {target_col}): {elem.text}")
            click(browser, elem.id)
            break

Example 3: Filter by Region

# Get only elements in the main content area
main_elements = [
    e for e in snap.elements
    if e.layout and e.layout.region == "main"
]

# Get navigation links
nav_links = [
    e for e in snap.elements
    if e.layout and e.layout.region == "nav"
]

print(f"Main content: {len(main_elements)} elements")
print(f"Navigation: {len(nav_links)} links")

Important Notes