Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3) #75

Open
natea opened this issue Feb 10, 2025 · 0 comments

Comments

@natea
Copy link

natea commented Feb 10, 2025

When I try to start up app.py and ask it to find me a private room in New York, I get the following error:

$ python app.py
[INFO] computer_use_demo.tools.logger - Starting the gradio app
[INFO] computer_use_demo.tools.logger - Found 2 screens
[INFO] computer_use_demo.tools.logger - loaded initial api_key for openai: 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
/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/components/chatbot.py:288: UserWarning: The 'tuples' format for chatbot messages is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.
  warnings.warn(
* Running on local URL:  http://127.0.0.1:7888

To create a public link, set `share=True` in `launch()`.
[INFO] computer_use_demo.tools.logger - loaded initial api_key for openai: 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
[INFO] computer_use_demo.tools.logger - Model inited on device: mps.
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Screen BBox: (0, 0, 2560, 1440)
[INFO] computer_use_demo.tools.logger - Start the message loop. User messages: [{'role': 'user', 'content': [TextBlock(citations=None, text='Find a private room in New York for next week', type='text')]}]
filtered_messages: ['Find a private room in New York for next week']
[INFO] computer_use_demo.tools.logger - _render_message: Screenshot for **VLMPlanner**:
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ4CAYAA
[INFO] computer_use_demo.tools.logger - chatbot_output_callback chatbot_state: [('Find a private room in New York for next week', None), (None, 'Screenshot for **VLMPlanner**:\n<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ4CAYAAADo08FDAAABYGlDQ1BJQ0MgUHJvZmlsZQAAeJxtkMErg3EYxz+badEKtZaDww6SA5ohjrYRaoeFFW7v3s2mXq+3d5OUgwM3h4mLi8TFX2AXB/8BpRyElJOz7ID1et4N2/D79fT99O15np6+4PQohqG5gGU9Z05PhP1z8wt+9zNOmvDRjU9Rs0YoFotKC99a/4o3OGy97rV35dtGn+4PUxMbmrPwtrtz+re/7jUnU1lV9EMqqBpmDhwB4dhazrB5U9hrylHC+zanK2zv9SYqfF7umZ2OCF8Jt6oZJSn8KNyTqPHTNbysrapfN9jXe1J6fEbUJ9XBGONE5fuJE2SYfoaYlIz+nxksz0RYwWAdkyXSZMjJdEgcA42U8BQ6Kn30CAcJSA3ZWf/OsOqtH...')] (truncated)
Sending messages to VLMPlanner: ['Find a private room in New York for next week']
[oai] sending messages: [{'role': 'system', 'content': '\nYou are using an Darwin device.\nYou are able to use a mouse and keyboard to interact with the computer based on the given task and screenshot.\nYou can only interact with the desktop GUI (no terminal or application menu access).\n\nYou may be given some history plan and actions, this is the response from the previous loop.\nYou should carefully consider your plan base on the task, screenshot, and history actions.\n\nYour available "Next Action" only include:\n- ENTER: Press an enter key.\n- ESCAPE: Press an ESCAPE key.\n- INPUT: Input a string of text.\n- CLICK: Describe the ui element to be clicked.\n- HOVER: Describe the ui element to be hovered.\n- SCROLL: Scroll the screen, you must specify up or down.\n- PRESS: Describe the ui element to be pressed.\n\n\nOutput format:\n```json\n{\n    "Thinking": str, # describe your thoughts on how to achieve the task, choose one action from available actions at a time.\n    "Next Action": "action_type, action description" | "None" # one action at a time, describe it in short and precisely. \n}\n```\n\nOne Example:\n```json\n{  \n    "Thinking": "I need to search and navigate to amazon.com.",\n    "Next Action": "CLICK \'Search Google or type a URL\'."\n}\n```\n\nIMPORTANT NOTES:\n1. Carefully observe the screenshot to understand the current state and read history actions.\n2. You should only give a single action at a time. for example, INPUT text, and ENTER can\'t be in one Next Action.\n3. Attach the text to Next Action, if there is text or any description for the button. \n4. You should not include other actions, such as keyboard shortcuts.\n5. When the task is completed, you should say "Next Action": "None" in the json field.\n\n\nNOTE: you are operating a Mac machine'}, {'role': 'user', 'content': [{'type': 'text', 'text': 'Find a private room in New York for next week'}]}]
oai token usage: 458
VLMPlanner response: {
    "Thinking": "I need to open a web browser to search for a private room in New York for next week.",
    "Next Action": "CLICK 'Safari' or any web browser icon on the desktop."
}
VLMPlanner total token usage so far: 458. Total cost so far: $USD0.00007
[INFO] computer_use_demo.tools.logger - _render_message: **VLMPlanner**:
I need to open a web browser to search for a private room in New York for next week.
[INFO] computer_use_demo.tools.logger - chatbot_output_callback chatbot_state: [('Find a private room in New York for next week', None), (None, 'Screenshot for **VLMPlanner**:\n<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ4CAYAAADo08FDAAABYGlDQ1BJQ0MgUHJvZmlsZQAAeJxtkMErg3EYxz+badEKtZaDww6SA5ohjrYRaoeFFW7v3s2mXq+3d5OUgwM3h4mLi8TFX2AXB/8BpRyElJOz7ID1et4N2/D79fT99O15np6+4PQohqG5gGU9Z05PhP1z8wt+9zNOmvDRjU9Rs0YoFotKC99a/4o3OGy97rV35dtGn+4PUxMbmrPwtrtz+re/7jUnU1lV9EMqqBpmDhwB4dhazrB5U9hrylHC+zanK2zv9SYqfF7umZ2OCF8Jt6oZJSn8KNyTqPHTNbysrapfN9jXe1J6fEbUJ9XBGONE5fuJE2SYfoaYlIz+nxksz0RYwWAdkyXSZMjJdEgcA42U8BQ6Kn30CAcJSA3ZWf/OsOqtH...'), (None, "**VLMPlanner**:\nI need to open a web browser to search for a private room in New York for next week.\nNext Action: CLICK 'Safari' or any web browser icon on the desktop.")] (truncated)
[INFO] computer_use_demo.tools.logger - _render_message: **VLMPlanner** sending action to **<span style="color:rgb(106, 158, 210)">S</span><span style="color
[INFO] computer_use_demo.tools.logger - chatbot_output_callback chatbot_state: [('Find a private room in New York for next week', None), (None, 'Screenshot for **VLMPlanner**:\n<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ4CAYAAADo08FDAAABYGlDQ1BJQ0MgUHJvZmlsZQAAeJxtkMErg3EYxz+badEKtZaDww6SA5ohjrYRaoeFFW7v3s2mXq+3d5OUgwM3h4mLi8TFX2AXB/8BpRyElJOz7ID1et4N2/D79fT99O15np6+4PQohqG5gGU9Z05PhP1z8wt+9zNOmvDRjU9Rs0YoFotKC99a/4o3OGy97rV35dtGn+4PUxMbmrPwtrtz+re/7jUnU1lV9EMqqBpmDhwB4dhazrB5U9hrylHC+zanK2zv9SYqfF7umZ2OCF8Jt6oZJSn8KNyTqPHTNbysrapfN9jXe1J6fEbUJ9XBGONE5fuJE2SYfoaYlIz+nxksz0RYwWAdkyXSZMjJdEgcA42U8BQ6Kn30CAcJSA3ZWf/OsOqtH...'), (None, "**VLMPlanner**:\nI need to open a web browser to search for a private room in New York for next week.\nNext Action: CLICK 'Safari' or any web browser icon on the desktop."), (None, '**VLMPlanner** sending action to **<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)">h</span><span style="color:rgb(209, 100, 94)">o</span><span style="color:rgb(238, 171, 106)">w</span><span style="color:rgb(0, 0, 0)">U</span><span style="color:rgb(0, 0, 0)">I</span>**:\nCLICK \'Safari\' or any web browser icon on the desktop.')] (truncated)
[INFO] computer_use_demo.tools.logger - _render_message: Screenshot for **<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)
[INFO] computer_use_demo.tools.logger - chatbot_output_callback chatbot_state: [('Find a private room in New York for next week', None), (None, 'Screenshot for **VLMPlanner**:\n<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ4CAYAAADo08FDAAABYGlDQ1BJQ0MgUHJvZmlsZQAAeJxtkMErg3EYxz+badEKtZaDww6SA5ohjrYRaoeFFW7v3s2mXq+3d5OUgwM3h4mLi8TFX2AXB/8BpRyElJOz7ID1et4N2/D79fT99O15np6+4PQohqG5gGU9Z05PhP1z8wt+9zNOmvDRjU9Rs0YoFotKC99a/4o3OGy97rV35dtGn+4PUxMbmrPwtrtz+re/7jUnU1lV9EMqqBpmDhwB4dhazrB5U9hrylHC+zanK2zv9SYqfF7umZ2OCF8Jt6oZJSn8KNyTqPHTNbysrapfN9jXe1J6fEbUJ9XBGONE5fuJE2SYfoaYlIz+nxksz0RYwWAdkyXSZMjJdEgcA42U8BQ6Kn30CAcJSA3ZWf/OsOqtH...'), (None, "**VLMPlanner**:\nI need to open a web browser to search for a private room in New York for next week.\nNext Action: CLICK 'Safari' or any web browser icon on the desktop."), (None, '**VLMPlanner** sending action to **<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)">h</span><span style="color:rgb(209, 100, 94)">o</span><span style="color:rgb(238, 171, 106)">w</span><span style="color:rgb(0, 0, 0)">U</span><span style="color:rgb(0, 0, 0)">I</span>**:\nCLICK \'Safari\' or any web browser icon on the desktop.'), (None, 'Screenshot for **<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)">h</span><span style="color:rgb(209, 100, 94)">o</span><span style="color:rgb(238, 171, 106)">w</span><span style="color:rgb(0, 0, 0)">U</span><span style="color:rgb(0, 0, 0)">I</span>**:\n<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAB4AAAAQ4CAYAAADo08FDAAABYGlDQ1BJQ0MgUHJvZmlsZQAAeJxtkMErg3EYxz+badEKtZaDww6SA5ohjrYRaoeFFW7v3s2mXq+3d5OUgwM3h4mLi8TFX2AXB/8BpRyElJOz7ID1et4N2/D79fT99O15np6...')] (truncated)
Traceback (most recent call last):
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/queueing.py", line 715, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/blocks.py", line 2044, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1603, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/utils.py", line 728, in async_iteration
    return await anext(iterator)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/utils.py", line 722, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 962, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/utils.py", line 705, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/gradio/utils.py", line 866, in gen_wrapper
    response = next(iterator)
               ^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/app.py", line 222, in process_input
    for loop_msg in sampling_loop_sync(
  File "/Users/nateaune/Documents/code/computer_use_ootb/computer_use_demo/loop.py", line 233, in sampling_loop_sync
    actor_response = actor(messages=next_action)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/computer_use_demo/gui_agent/actor/showui_agent.py", line 131, in __call__
    generated_ids = self.model.generate(**inputs, max_new_tokens=128)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/transformers/generation/utils.py", line 2255, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/transformers/generation/utils.py", line 3254, in _sample
    outputs = self(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1644, in forward
    image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1013, in forward
    hidden_states = blk(hidden_states, cu_seqlens=cu_seqlens, rotary_pos_emb=rotary_pos_emb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 426, in forward
    hidden_states = hidden_states + self.attn(
                                    ^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nateaune/Documents/code/computer_use_ootb/venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 399, in forward
    attn_output = F.scaled_dot_product_attention(q, k, v, attention_mask, dropout_p=0.0)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant