Skip to content

iframes #778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jun 10, 2025
Merged

iframes #778

merged 31 commits into from
Jun 10, 2025

Conversation

seanmcguire12
Copy link
Member

@seanmcguire12 seanmcguire12 commented May 28, 2025

why

What changed

This PR adds full support for interacting with nested iframes in
Stagehand—across extract, observe, and act by:

  • Building a combined accessibility tree (and URL/XPath mappings)
    spanning the main document and any nested iframes.
  • Tracking frame‑scoped element IDs so that element identifiers are
    globally unique across frames. This is done because backendNodeId's are not guaranteed to be unique across OOPIF's (out of process iframes).
  • Extending CDP session management to correctly target Out‑Of‑Process
    iframes (OOPIFs) and fallback to same‑process frames.
  • Introducing a “deep” XPath‑based locator that can step into <iframe>
    elements when performing Playwright actions.
  • Updating Zod schema transforms to expect and handle the new
    “frameId-backendId” string format for element IDs.
  • Adding new error types for improved diagnostics when iframe resolution
    or XPath lookups fail.
  • Updating internal types (context.ts, stagehand.ts,
    stagehandErrors.ts) and utility modules (a11y/utils.ts, utils.ts,
    handlers) to accommodate frame‑aware operations.
  • Adding three end‑to‑end eval tasks testing iframe support:
    • iframe_hn (extract)
    • iframe_same_proc (act)
    • iframe_form_filling (act).

Details

1. Frame‑Scoped Element IDs (EncodedId)

  • lib/StagehandPage.ts
    • Introduced encodeWithFrameId(…), ordinalForFrameId(…), and
      resetFrameOrdinals() to assign and track per‑frame ordinals.
    • CDP session caching moved to a WeakMap<Page|Frame, CDPSession> so we
      can open sessions against arbitrary frames.
  • types/context.ts
    • Defined EncodedId = ${number}-${number} for
      “frameOrdinal-backendNodeId” IDs.
    • Updated TreeResult to key xpathMap/idToUrl by EncodedId.

2. Combined Accessibility Tree Across Frames

  • lib/a11y/utils.ts
    • Added getAccessibilityTreeWithFrames() which walks the CDP frame
      tree, captures accessibility sub‑trees for each frame, and concatenates them
      into a single “combinedTree” string plus combined URL/XPath maps keyed by
      EncodedId.
    • Updated formatSimplifiedTree() to emit the new encodedId in tree
      lines.
    • Updated buildBackendIdMaps() to traverse nested frame DOM nodes
      (OOPIF and same‑process iframes) and include the frame’s frameId when encoding
      backend IDs.

3. Deep XPath Locator for Frame Actions

  • lib/handlers/handlerUtils/actHandlerUtils.ts
    Added deepLocator(root, rawXPath) which splits an XPath on <iframe>
    steps to descend into FrameLocators automatically before applying the
    remainder of the path.
  • lib/handlers/actHandler.ts
    Uses deepLocator() instead of a flat page.locator(...) so that
    Playwright actions can target elements inside iframes when options.iframes is
    set.

4. Frame‑Aware Extract & Observe Handlers

  • lib/handlers/extractHandler.ts
    • Imports and uses getAccessibilityTreeWithFrames() when iframes: true; otherwise falls back to the legacy single‑frame tree.
    • Passes through the iframes flag into its internal calls.
  • lib/handlers/observeHandler.ts
    • Similarly leverages getAccessibilityTreeWithFrames() and builds its
      element‑to‑XPath mapping from the combined tree.
    • Removes the old ad‑hoc iframe injection logic in favor of the unified
      combined tree approach.

5. Inference Schema & Element ID Changes

  • lib/inference.ts
    • Changed the Observe LLM output schema so elementId is now a
      string matching the regex /^\d+-\d+$/ (frame‑ID plus backend‑ID) instead
      of a raw number.
    • Updated parsing logic to no longer coerce to Number.

6. Zod URL‑Field Transformation

  • lib/utils.ts
    • Renamed makeIdNumberSchema()makeIdStringSchema() to emit
      z.string().regex(/^\d+-\d+$/) for fields that were formerly string().url(),
      so extracted URL placeholders match the new EncodedId format.
    • Updated injectUrls() to map both numeric and frame‑ID strings back
      into real URLs once extraction is complete.

to do:

  • clean up code
  • add JSDoc
  • add comprehensive PR desc
  • parameterize iframe traversal. ie, need to set iframes: true
  • ignore iframes without content
  • add evals

Copy link

changeset-bot bot commented May 28, 2025

🦋 Changeset detected

Latest commit: c17a209

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@seanmcguire12 seanmcguire12 added act These changes pertain to the act function extract These changes pertain to the extract function observe These changes pertain to the observe function combination These changes affect multiple Stagehand functions targeted-extract These changes pertain to targeted extract labels Jun 2, 2025
@seanmcguire12 seanmcguire12 changed the title [WIP] iframes iframes Jun 3, 2025
@seanmcguire12 seanmcguire12 marked this pull request as ready for review June 3, 2025 00:50
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR implements comprehensive iframe support in Stagehand, enabling interaction with nested iframes across extract, observe, and act operations. The changes introduce frame-scoped element IDs, combined accessibility trees, and CDP session management for both same-process and out-of-process iframes.

  • Added frame-scoped element IDs using new EncodedId format (${frameOrdinal}-${backendNodeId}) for unique element identification across frames
  • Implemented getAccessibilityTreeWithFrames() to build combined accessibility trees spanning main document and nested iframes
  • Added deepLocator function in actHandlerUtils.ts to handle XPath selectors traversing through iframes
  • Added three comprehensive eval tasks (iframe_hn, iframe_same_proc, iframe_form_filling) to test iframe functionality
  • Fixed redundant error check in iframe_hn.ts where the same condition is checked twice

16 file(s) reviewed, 16 comment(s)
Edit PR Review Bot Settings | Greptile

@miguelg719
Copy link
Collaborator

working through it, another note before approving is adding this behind experimental:true

Copy link
Collaborator

@miguelg719 miguelg719 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDP Goat status

@seanmcguire12 seanmcguire12 merged commit df570b6 into main Jun 10, 2025
13 checks passed
@github-actions github-actions bot mentioned this pull request Jun 10, 2025
@seanmcguire12 seanmcguire12 mentioned this pull request Jun 10, 2025
kamath pushed a commit that referenced this pull request Jun 17, 2025
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @browserbasehq/[email protected]

### Patch Changes

- [#796](#796)
[`12a99b3`](12a99b3)
Thanks [@miguelg719](https://github.com/miguelg719)! - Added a
experimental flag to enable the newest and most experimental features

- [#807](#807)
[`2451797`](2451797)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - include
version number in StagehandDefaultError message

- [#803](#803)
[`1d631a5`](1d631a5)
Thanks [@miguelg719](https://github.com/miguelg719)! - Enable session
affinity for cache optimization

- [#804](#804)
[`9c398bb`](9c398bb)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - update
operatorResponseSchema based on new openai spec

- [#786](#786)
[`c19ad7f`](c19ad7f)
Thanks [@miguelg719](https://github.com/miguelg719)! - Handle reroute to
account for rollout

## @browserbasehq/[email protected]

### Minor Changes

- [#778](#778)
[`df570b6`](df570b6)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - iframe
support

### Patch Changes

- [#809](#809)
[`03ebebc`](03ebebc)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - log
NoObjectGenerated error details

- [#801](#801)
[`1d4f0ab`](1d4f0ab)
Thanks [@miguelg719](https://github.com/miguelg719)! - Default use API
to true

- [#798](#798)
[`d86200b`](d86200b)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fix pino logging
memory leak by reusing worker

## @browserbasehq/[email protected]

### Patch Changes

- Updated dependencies
\[[`12a99b3`](12a99b3),
[`2451797`](2451797),
[`1d631a5`](1d631a5),
[`9c398bb`](9c398bb),
[`c19ad7f`](c19ad7f)]:
    -   @browserbasehq/[email protected]

## @browserbasehq/[email protected]

### Patch Changes

- Updated dependencies
\[[`12a99b3`](12a99b3),
[`2451797`](2451797),
[`1d631a5`](1d631a5),
[`9c398bb`](9c398bb),
[`c19ad7f`](c19ad7f)]:
    -   @browserbasehq/[email protected]

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
act These changes pertain to the act function combination These changes affect multiple Stagehand functions extract These changes pertain to the extract function observe These changes pertain to the observe function targeted-extract These changes pertain to targeted extract
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants