You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I try to start up app.py and ask it to find me a private room in New York, I get the following error:
$ python app.py
[INFO] computer_use_demo.tools.logger - Starting the gradio app
[INFO] computer_use_demo.tools.logger - Found 2 screens
[INFO] computer_use_demo.tools.logger - loaded initial api_key for openai: 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
/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/components/chatbot.py:288: UserWarning: The 'tuples' format for chatbot messages is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.
warnings.warn(
* Running on local URL: http://127.0.0.1:7888
To create a public link, set `share=True` in `launch()`.
[INFO] computer_use_demo.tools.logger - loaded initial api_key for openai: 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
[INFO] computer_use_demo.tools.logger - Model inited on device: mps.
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Screen BBox: (0, 0, 2560, 1440)
[INFO] computer_use_demo.tools.logger - Start the message loop. User messages: [{'role': 'user', 'content': [TextBlock(citations=None, text='Find a private room in New York for next week', type='text')]}]
filtered_messages: ['Find a private room in New York for next week']
[INFO] computer_use_demo.tools.logger - _render_message: Screenshot for **VLMPlanner**:
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ4CAYAA
[INFO] computer_use_demo.tools.logger - chatbot_output_callback chatbot_state: [('Find a private room in New York for next week', None), (None, 'Screenshot for **VLMPlanner**:\n<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ4CAYAAADo08FDAAABYGlDQ1BJQ0MgUHJvZmlsZQAAeJxtkMErg3EYxz+badEKtZaDww6SA5ohjrYRaoeFFW7v3s2mXq+3d5OUgwM3h4mLi8TFX2AXB/8BpRyElJOz7ID1et4N2/D79fT99O15np6+4PQohqG5gGU9Z05PhP1z8wt+9zNOmvDRjU9Rs0YoFotKC99a/4o3OGy97rV35dtGn+4PUxMbmrPwtrtz+re/7jUnU1lV9EMqqBpmDhwB4dhazrB5U9hrylHC+zanK2zv9SYqfF7umZ2OCF8Jt6oZJSn8KNyTqPHTNbysrapfN9jXe1J6fEbUJ9XBGONE5fuJE2SYfoaYlIz+nxksz0RYwWAdkyXSZMjJdEgcA42U8BQ6Kn30CAcJSA3ZWf/OsOqtH...')] (truncated)
Sending messages to VLMPlanner: ['Find a private room in New York for next week']
[oai] sending messages: [{'role': 'system', 'content': '\nYou are using an Darwin device.\nYou are able to use a mouse and keyboard to interact with the computer based on the given task and screenshot.\nYou can only interact with the desktop GUI (no terminal or application menu access).\n\nYou may be given some history plan and actions, this is the response from the previous loop.\nYou should carefully consider your plan base on the task, screenshot, and history actions.\n\nYour available "Next Action" only include:\n- ENTER: Press an enter key.\n- ESCAPE: Press an ESCAPE key.\n- INPUT: Input a string of text.\n- CLICK: Describe the ui element to be clicked.\n- HOVER: Describe the ui element to be hovered.\n- SCROLL: Scroll the screen, you must specify up or down.\n- PRESS: Describe the ui element to be pressed.\n\n\nOutput format:\n```json\n{\n "Thinking": str, # describe your thoughts on how to achieve the task, choose one action from available actions at a time.\n "Next Action": "action_type, action description" | "None" # one action at a time, describe it in short and precisely. \n}\n```\n\nOne Example:\n```json\n{ \n "Thinking": "I need to search and navigate to amazon.com.",\n "Next Action": "CLICK \'Search Google or type a URL\'."\n}\n```\n\nIMPORTANT NOTES:\n1. Carefully observe the screenshot to understand the current state and read history actions.\n2. You should only give a single action at a time. for example, INPUT text, and ENTER can\'t be in one Next Action.\n3. Attach the text to Next Action, if there is text or any description for the button. \n4. You should not include other actions, such as keyboard shortcuts.\n5. When the task is completed, you should say "Next Action": "None" in the json field.\n\n\nNOTE: you are operating a Mac machine'}, {'role': 'user', 'content': [{'type': 'text', 'text': 'Find a private room in New York for next week'}]}]
oai token usage: 458
VLMPlanner response: {
"Thinking": "I need to open a web browser to search for a private room in New York for next week.",
"Next Action": "CLICK 'Safari' or any web browser icon on the desktop."
}
VLMPlanner total token usage so far: 458. Total cost so far: $USD0.00007
[INFO] computer_use_demo.tools.logger - _render_message: **VLMPlanner**:
I need to open a web browser to search for a private room in New York for next week.
[INFO] computer_use_demo.tools.logger - chatbot_output_callback chatbot_state: [('Find a private room in New York for next week', None), (None, 'Screenshot for **VLMPlanner**:\n<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ4CAYAAADo08FDAAABYGlDQ1BJQ0MgUHJvZmlsZQAAeJxtkMErg3EYxz+badEKtZaDww6SA5ohjrYRaoeFFW7v3s2mXq+3d5OUgwM3h4mLi8TFX2AXB/8BpRyElJOz7ID1et4N2/D79fT99O15np6+4PQohqG5gGU9Z05PhP1z8wt+9zNOmvDRjU9Rs0YoFotKC99a/4o3OGy97rV35dtGn+4PUxMbmrPwtrtz+re/7jUnU1lV9EMqqBpmDhwB4dhazrB5U9hrylHC+zanK2zv9SYqfF7umZ2OCF8Jt6oZJSn8KNyTqPHTNbysrapfN9jXe1J6fEbUJ9XBGONE5fuJE2SYfoaYlIz+nxksz0RYwWAdkyXSZMjJdEgcA42U8BQ6Kn30CAcJSA3ZWf/OsOqtH...'), (None, "**VLMPlanner**:\nI need to open a web browser to search for a private room in New York for next week.\nNext Action: CLICK 'Safari' or any web browser icon on the desktop.")] (truncated)
[INFO] computer_use_demo.tools.logger - _render_message: **VLMPlanner** sending action to **<span style="color:rgb(106, 158, 210)">S</span><span style="color
[INFO] computer_use_demo.tools.logger - chatbot_output_callback chatbot_state: [('Find a private room in New York for next week', None), (None, 'Screenshot for **VLMPlanner**:\n<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ4CAYAAADo08FDAAABYGlDQ1BJQ0MgUHJvZmlsZQAAeJxtkMErg3EYxz+badEKtZaDww6SA5ohjrYRaoeFFW7v3s2mXq+3d5OUgwM3h4mLi8TFX2AXB/8BpRyElJOz7ID1et4N2/D79fT99O15np6+4PQohqG5gGU9Z05PhP1z8wt+9zNOmvDRjU9Rs0YoFotKC99a/4o3OGy97rV35dtGn+4PUxMbmrPwtrtz+re/7jUnU1lV9EMqqBpmDhwB4dhazrB5U9hrylHC+zanK2zv9SYqfF7umZ2OCF8Jt6oZJSn8KNyTqPHTNbysrapfN9jXe1J6fEbUJ9XBGONE5fuJE2SYfoaYlIz+nxksz0RYwWAdkyXSZMjJdEgcA42U8BQ6Kn30CAcJSA3ZWf/OsOqtH...'), (None, "**VLMPlanner**:\nI need to open a web browser to search for a private room in New York for next week.\nNext Action: CLICK 'Safari' or any web browser icon on the desktop."), (None, '**VLMPlanner** sending action to **<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)">h</span><span style="color:rgb(209, 100, 94)">o</span><span style="color:rgb(238, 171, 106)">w</span><span style="color:rgb(0, 0, 0)">U</span><span style="color:rgb(0, 0, 0)">I</span>**:\nCLICK \'Safari\' or any web browser icon on the desktop.')] (truncated)
[INFO] computer_use_demo.tools.logger - _render_message: Screenshot for **<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)
[INFO] computer_use_demo.tools.logger - chatbot_output_callback chatbot_state: [('Find a private room in New York for next week', None), (None, 'Screenshot for **VLMPlanner**:\n<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ4CAYAAADo08FDAAABYGlDQ1BJQ0MgUHJvZmlsZQAAeJxtkMErg3EYxz+badEKtZaDww6SA5ohjrYRaoeFFW7v3s2mXq+3d5OUgwM3h4mLi8TFX2AXB/8BpRyElJOz7ID1et4N2/D79fT99O15np6+4PQohqG5gGU9Z05PhP1z8wt+9zNOmvDRjU9Rs0YoFotKC99a/4o3OGy97rV35dtGn+4PUxMbmrPwtrtz+re/7jUnU1lV9EMqqBpmDhwB4dhazrB5U9hrylHC+zanK2zv9SYqfF7umZ2OCF8Jt6oZJSn8KNyTqPHTNbysrapfN9jXe1J6fEbUJ9XBGONE5fuJE2SYfoaYlIz+nxksz0RYwWAdkyXSZMjJdEgcA42U8BQ6Kn30CAcJSA3ZWf/OsOqtH...'), (None, "**VLMPlanner**:\nI need to open a web browser to search for a private room in New York for next week.\nNext Action: CLICK 'Safari' or any web browser icon on the desktop."), (None, '**VLMPlanner** sending action to **<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)">h</span><span style="color:rgb(209, 100, 94)">o</span><span style="color:rgb(238, 171, 106)">w</span><span style="color:rgb(0, 0, 0)">U</span><span style="color:rgb(0, 0, 0)">I</span>**:\nCLICK \'Safari\' or any web browser icon on the desktop.'), (None, 'Screenshot for **<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)">h</span><span style="color:rgb(209, 100, 94)">o</span><span style="color:rgb(238, 171, 106)">w</span><span style="color:rgb(0, 0, 0)">U</span><span style="color:rgb(0, 0, 0)">I</span>**:\n<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ4CAYAAADo08FDAAABYGlDQ1BJQ0MgUHJvZmlsZQAAeJxtkMErg3EYxz+badEKtZaDww6SA5ohjrYRaoeFFW7v3s2mXq+3d5OUgwM3h4mLi8TFX2AXB/8BpRyElJOz7ID1et4N2/D79fT99O15np6...')] (truncated)
Traceback (most recent call last):
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/queueing.py", line 715, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/blocks.py", line 2044, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1603, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/utils.py", line 728, in async_iteration
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/utils.py", line 722, in __anext__
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 962, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/utils.py", line 705, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/utils.py", line 866, in gen_wrapper
response = next(iterator)
^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/app.py", line 222, in process_input
for loop_msg in sampling_loop_sync(
File "/Users/nateaune/Documents/code/computer_use_ootb/computer_use_demo/loop.py", line 233, in sampling_loop_sync
actor_response = actor(messages=next_action)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/computer_use_demo/gui_agent/actor/showui_agent.py", line 131, in __call__
generated_ids = self.model.generate(**inputs, max_new_tokens=128)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/transformers/generation/utils.py", line 2255, in generate
result = self._sample(
^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/transformers/generation/utils.py", line 3254, in _sample
outputs = self(**model_inputs, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1644, in forward
image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1013, in forward
hidden_states = blk(hidden_states, cu_seqlens=cu_seqlens, rotary_pos_emb=rotary_pos_emb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 426, in forward
hidden_states = hidden_states + self.attn(
^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 399, in forward
attn_output = F.scaled_dot_product_attention(q, k, v, attention_mask, dropout_p=0.0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)
The text was updated successfully, but these errors were encountered:
When I try to start up app.py and ask it to find me a private room in New York, I get the following error:
The text was updated successfully, but these errors were encountered: