iFrames support #362

alexzarbn · 2024-05-24T10:43:13Z

Problem

Crawling iframes is currently not supported, but to properly navigate some websites it is crucial to interact with iframes.

Solution

Crawl iframes by injecting domUtils.js into each iframe and building element tree by combining elements from the main page and iframes.
When processing an action, resolve locator instances based on the frame the action has to take place (main page or a specific iframe).

Note: the proposed implementation works with iframes one level deep, meaning iframes in iframes are not supported.

The PR is based on the #361

Closes #8

ykeremy

Thank you so much for looking into this! I was excited to even review it 🤩

I have some questions and concerns around some major points. Let me know if you want to hop on a call and discuss things. We can definitely resolve it async as well. Both works for me.

ykeremy · 2024-05-26T00:18:51Z

skyvern/forge/prompts/skyvern/extract-action.j2

@@ -14,6 +14,7 @@ Reply in JSON format with the following keys:
 "confidence_float": float, // The confidence of the action. Pick a number between 0.0 and 1.0. 0.0 means no confidence, 1.0 means full confidence
 "action_type": str, // It's a string enum: "CLICK", "INPUT_TEXT", "UPLOAD_FILE", "SELECT_OPTION", "WAIT", "SOLVE_CAPTCHA", "COMPLETE", "TERMINATE". "CLICK" is an element you'd like to click. "INPUT_TEXT" is an element you'd like to input text into. "UPLOAD_FILE" is an element you'd like to upload a file into. "SELECT_OPTION" is an element you'd like to select an option from. "WAIT" action should be used if there are no actions to take and there is some indication on screen that waiting could yield more actions. "WAIT" should not be used if there are actions to take. "SOLVE_CAPTCHA" should be used if there's a captcha to solve on the screen. "COMPLETE" is used when the user goal has been achieved AND if there's any data extraction goal, you should be able to get data from the page. Never return a COMPLETE action unless the user goal is achieved. "TERMINATE" is used to terminate the whole task with a failure when it doesn't seem like the user goal can be achieved. Do not use "TERMINATE" if waiting could lead the user towards the goal. Only return "TERMINATE" if you are on a page where the user goal cannot be achieved. All other actions are ignored when "TERMINATE" is returned.
 "id": str, // The id of the element to take action on. The id has to be one from the elements list
+ "frame": str, // It's either "main" or the id of the iframe element from the elements list


I'd like to keep the LLM unaware of frames. Can we create a mapping unique_id -> frame and decide which frame we're going to use based on the unique_id LLM returns?

Do you see any potential problems with the proposed approach ☝🏼 ?

Not at all. It should also reduce any potential errors when LLM would not return the correct frame.

ykeremy · 2024-05-26T00:23:11Z

skyvern/webeye/actions/actions.py

@@ -36,6 +36,7 @@ class Action(BaseModel):


 class WebAction(Action, abc.ABC):
+ frame: str | int


Same here, I think we only need to update the action handling logic. Specifically how playwright interacts with the page.

We can build and store the element_id_to_frame mapping in the ScrapedPage object we build as a result of scrape_web_unsafe.

Then, we can pass that info to resolve_locator and trace back to the original frame there without changing the way LLM, Action objects work.

WDYT?

Sounds good! I have just pushed the update with id -> frame mapping. Please let me know, if everything is in order.

ykeremy · 2024-05-26T00:28:29Z

skyvern/webeye/scraper/domUtils.js

+ } else if (element.tagName.toLowerCase() === 'iframe') {
+ let iframeElementObject = buildElementObject(element, true);
+
+ elements.push(iframeElementObject);
+ resultArray.push(iframeElementObject);


Should we skip the iframes here?

function buildElementObject(element, interactable) {

Interactable here means that LLM can take an action such as CLICK/INPUT_TEXT/SELECT_OPTION etc on this specific element. Is that possible for this root iframe element here? Or are we making it possible via building an element tree for each of the iframes?

The intention of including iframes in the elements list was to set unique id on the iframe for mapping later on and build element tree for each iframe. Not sure how to implement that without building the element object.

…access in O(1) instead

…ames-support

LawyZheng · 2024-06-03T13:14:28Z

close to #405

alexzarbn added 2 commits May 24, 2024 14:43

feat: 4-char element ids

e48e8c0

feat: iframes support

08db040

ykeremy requested changes May 26, 2024

View reviewed changes

alexzarbn added 4 commits May 28, 2024 12:31

fix: handling select options via xpath, code styling issues

83b4258

chore: add todo about using dict/object so that element children are …

9daf02d

…access in O(1) instead

Merge branch 'refs/heads/feature/4-char-element-ids' into feature/ifr…

7344fd1

…ames-support

feat: use id -> frame mapping for resolving locators

cee1c91

alexzarbn requested a review from ykeremy May 28, 2024 09:27

LawyZheng closed this Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iFrames support #362

iFrames support #362

alexzarbn commented May 24, 2024

ykeremy left a comment

ykeremy May 26, 2024

alexzarbn May 28, 2024

ykeremy May 26, 2024

alexzarbn May 28, 2024

ykeremy May 26, 2024

alexzarbn May 28, 2024

LawyZheng commented Jun 3, 2024

		@@ -36,6 +36,7 @@ class Action(BaseModel):


		class WebAction(Action, abc.ABC):
		frame: str \| int

iFrames support #362

iFrames support #362

Conversation

alexzarbn commented May 24, 2024

Problem

Solution

ykeremy left a comment

Choose a reason for hiding this comment

ykeremy May 26, 2024

Choose a reason for hiding this comment

alexzarbn May 28, 2024

Choose a reason for hiding this comment

ykeremy May 26, 2024

Choose a reason for hiding this comment

alexzarbn May 28, 2024

Choose a reason for hiding this comment

ykeremy May 26, 2024

Choose a reason for hiding this comment

alexzarbn May 28, 2024

Choose a reason for hiding this comment

LawyZheng commented Jun 3, 2024