feat: UI structured representation #495

LaPetiteSouris · 2023-09-14T07:30:24Z

What kind of change does this PR introduce?

Introduce DOMElement class, which will represent fully a screen state regardless of OS type. This DOMElement is the atomic unit to represent any UI and serves as the lowest level of representation. It is the adapter which "translate" the window state into an uniform representational format

Summary

Refer to RFC on UI Representation, the job of prediction of a single action to take on a single screen can be translated into the following steps:

Represent the screen into UI Tree
Translate UI Tree into the universal prompt format
Inference with LLMs model using the translated prompt
Translate LLMs models' inference into valid action

Backward wise, the evaluation process used in model training/tuning/RFLHF can be decomposed into several steps.

Given a window state and action taken as reference pair, then a current window state a predicted action as inference pair:

Represent the reference window state in UI Tree

Base on the action, find the element (usually actionable components like Button, Text Area, Link... etc) that has been interacted with.
Represent the current window state in the UI Tree
Find the actionable elements in the UI Tree that the predicted action interact with
Compare the reference elements and predicted elements

This DOMElement is the low level mapping of everything into a single unified UI, which can later be translated into prompt language, regardless of UI state or OS type.

Checklist

My code follows the style guidelines of OpenAdapt
I have performed a self-review of my code
If applicable, I have added tests to prove my fix is functional/effective
[] I have linted my code locally prior to submission
I have commented my code, particularly in hard-to-understand areas
[] I have made corresponding changes to the documentation (e.g. README.md, requirements.txt)
New and existing unit tests pass locally with my changes

How can your code be run and tested?

Other information

Next step:

Define a uniform/base prompt structure. I.e: to have a class such as:

class PromptGenerator
""" To generate prediction pipeline. The structure of the prompt and the prompt pipeline is TBD
"""

griptape AI is a very good candidate for this

With implementation of CompletionProvider, in general CompletionProvider should take the generic prompt generated or the pipeline generated to run based on specific provider.

Write an UITranslator class, which takes DOMElement and translate the element into operational prompt to ask. This is the "secret glue" which translates the representational of the UI to LLM language.

NOTE The main effort is to uniform, standardize the way UI is interacted with first. Then to uniform and standardize the way the models are interacted with. I.e: to translate and represent the UI into LLMs' language.

* Add 2 columns in screenshot table to store png_diff_data and png_mask_diff_data. * CRUD now supports calculation and save screenshots diff data on the flight.

* SAVE_SCREENSHOT_DIFF indicates that 2 neighbors screenshot will be compared and the difference will be saved to db

LaPetiteSouris added 17 commits July 9, 2023 17:21

feat(crud): Compute and save screenshot diff

e2d30df

* Add 2 columns in screenshot table to store png_diff_data and png_mask_diff_data. * CRUD now supports calculation and save screenshots diff data on the flight.

feat(config): Add SAVE_SCREENSHOT_DIFF environment variable

cb14ec3

* SAVE_SCREENSHOT_DIFF indicates that 2 neighbors screenshot will be compared and the difference will be saved to db

Merge remote-tracking branch 'upstream/main'

f5047e8

Merge remote-tracking branch 'upstream/main'

35fb336

Merge remote-tracking branch 'upstream/main'

27db3ec

Merge remote-tracking branch 'upstream/main'

c6712b6

feat(crud): add missing import after merge

09281c1

refactor(crud): add missing type annotations

a7f6542

refactor(crud): add missing type annotations

79a8cce

Merge remote-tracking branch 'upstream/main'

05d9999

Merge remote-tracking branch 'upstream/main'

76a4865

feat(window/ui): Add Element structure

8d46604

feat(window/ui): Refactor unit tests for UI elements

26ce44b

feat(window/ui) refactor dom tests

34b292a

feat(window/ui): Add method to convert window dict to DOMElement

a230fab

(window/ui) Add docstring

15f80c4

feat(window/ui) Refactor tests

50f1bf0

LaPetiteSouris marked this pull request as draft September 14, 2023 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: UI structured representation #495

feat: UI structured representation #495

LaPetiteSouris commented Sep 14, 2023 •

edited

feat: UI structured representation #495

Are you sure you want to change the base?

feat: UI structured representation #495

Conversation

LaPetiteSouris commented Sep 14, 2023 • edited

LaPetiteSouris commented Sep 14, 2023 •

edited