IndexError: index 1 is out of bounds for dimension 0 with size 1 #2875

cs-mshah · 2025-01-07T09:18:50Z

Describe the bug
I tried running the inference script: https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo_mllm.py
However, this gave issues while running. I tried changing the limit_mm_per_prompt={"video": 1} but this gives the following error while running:

Your hardware and system info
CUDA 12.1
vllm 0.6.6.post1
torch 2.5.1+cu121
torchaudio 2.5.0
torchvision 0.20.1+cu121
ms-swift 3.0.1.post1
sentence-transformers 3.3.1
transformers 4.47.1
transformers-stream-generator 0.0.5
qwen-vl-utils 0.0.8

Additional context
Logs:

[INFO:swift] Successfully registered `~/miniconda3/envs/vllm/lib/python3.12/site-packages/swift/llm/dataset/data/dataset_info.json`                                               
[INFO:swift.hub.hub] Downloading the model from ModelScope Hub, model_id: Qwen/Qwen2-VL-2B-Instruct                                                                                           
Downloading Model to directory: /vmdata/manan/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct                                                                                                 
[WARNING:modelscope] Using branch: master as version is unstable, use with caution                                                                                                            
[INFO:swift] Loading the model using model_dir: /vmdata/manan/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct                                                                                 
[INFO:swift] Setting torch_dtype: torch.bfloat16                                                                                                                                              
[INFO:swift] Setting image_factor: 28. You can adjust this hyperparameter through the environment variable: `IMAGE_FACTOR`.                                                                   
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.                                                                     
[INFO:swift] Setting max_pixels: 12845056. You can adjust this hyperparameter through the environment variable: `MAX_PIXELS`.                                                                 
[INFO:swift] Setting max_ratio: 200. You can adjust this hyperparameter through the environment variable: `MAX_RATIO`.                                                                        
[INFO:swift] Setting video_min_pixels: 100352. You can adjust this hyperparameter through the environment variable: `VIDEO_MIN_PIXELS`.                                                       
[INFO:swift] Setting video_max_pixels: 602112. You can adjust this hyperparameter through the environment variable: `VIDEO_MAX_PIXELS`.                                                       
[INFO:swift] Setting video_total_pixels: 19267584. You can adjust this hyperparameter through the environment variable: `VIDEO_TOTAL_PIXELS`.                                                 
[INFO:swift] Setting frame_factor: 2. You can adjust this hyperparameter through the environment variable: `FRAME_FACTOR`.                                                                    
[INFO:swift] Setting fps: 2.0. You can adjust this hyperparameter through the environment variable: `FPS`.                                                                                    
[INFO:swift] Setting fps_min_frames: 4. You can adjust this hyperparameter through the environment variable: `FPS_MIN_FRAMES`.                                                                
[INFO:swift] Setting fps_max_frames: 768. You can adjust this hyperparameter through the environment variable: `FPS_MAX_FRAMES`.                                                              
INFO 01-07 09:12:51 config.py:510] This model supports multiple tasks: {'score', 'reward', 'classify', 'generate', 'embed'}. Defaulting to 'generate'.                                        
INFO 01-07 09:12:51 llm_engine.py:234] Initializing an LLM engine (v0.6.6.post1) with config: model='~/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct', speculative_config=None, 
tokenizer='~/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None
, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce
=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), obs
ervability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=~/.cache/modelscop
e/hub/Qwen/Qwen2-VL-2B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_pre
processor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"candidate_compile_si
zes":[],"compile_sizes":[],"capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256},
 use_cached_outputs=False,                                                                                                                                                                    
INFO 01-07 09:12:52 selector.py:120] Using Flash Attention backend.                                                                                                                           
INFO 01-07 09:12:52 model_runner.py:1094] Starting to load model /vmdata/manan/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct...                                                             
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]                                                                                                                  
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:01<00:01,  1.22s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:01<00:00,  1.53it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:01<00:00,  1.35it/s]

INFO 01-07 09:12:54 model_runner.py:1099] Loading model weights took 4.1273 GB
INFO 01-07 09:13:06 worker.py:241] Memory profiling takes 11.36 seconds
INFO 01-07 09:13:06 worker.py:241] the current vLLM instance can use total_gpu_memory (39.38GiB) x gpu_memory_utilization (0.90) = 35.44GiB
INFO 01-07 09:13:06 worker.py:241] model weights take 4.13GiB; non_torch_memory takes 0.72GiB; PyTorch activation peak memory takes 3.30GiB; the rest of the memory reserved for KV Cache is 2
7.30GiB.
INFO 01-07 09:13:06 gpu_executor.py:76] # GPU blocks: 63904, # CPU blocks: 9362
INFO 01-07 09:13:06 gpu_executor.py:80] Maximum concurrency for 32768 tokens per request: 31.20x
INFO 01-07 09:13:10 model_runner.py:1415] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce
_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory_utilization` or switching to eager mode. You can als
o reduce the `max_num_seqs` as needed to decrease memory usage.
Capturing CUDA graph shapes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:19<00:00,  1.84it/s]
INFO 01-07 09:13:29 model_runner.py:1535] Graph capturing finished in 19 secs, took 0.33 GiB
INFO 01-07 09:13:29 llm_engine.py:431] init engine (profile, create kv cache, warmup model) took 34.25 seconds
query: <video>Describe this video.
response: qwen-vl-utils using decord to read video.
WARNING 01-07 09:13:30 preprocess.py:262] Passing `multi_modal_data` in TokensPrompt isdeprecated and will be removed in a future update
Exception in callback VllmEngine.patch_remove_log.<locals>.new_log_task_completion(error_callback=<bound method...151000ea1ee0>>)(<Task finishe...with size 1')>) at /vmdata/manan/miniconda3/
envs/vllm/lib/python3.12/site-packages/swift/llm/infer/infer_engine/vllm_engine.py:397
handle: <Handle VllmEngine.patch_remove_log.<locals>.new_log_task_completion(error_callback=<bound method...151000ea1ee0>>)(<Task finishe...with size 1')>) at /vmdata/manan/miniconda3/envs/v
llm/lib/python3.12/site-packages/swift/llm/infer/infer_engine/vllm_engine.py:397>
Traceback (most recent call last):
  File "~/miniconda3/envs/vllm/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "~/miniconda3/envs/vllm/lib/python3.12/site-packages/swift/llm/infer/infer_engine/vllm_engine.py", line 399, in new_log_task_completion
    return_value = task.result()
                   ^^^^^^^^^^^^^
  File "~/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 888, in run_engine_loop
    result = task.result()
             ^^^^^^^^^^^^^
File "~/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 799, in engine_step
    await self.engine.add_request_async(**new_request)
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 493, in add_request_async
    preprocessed_inputs = await self.input_preprocessor.preprocess_async(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 678, in preprocess_async
    return await self._process_decoder_only_prompt_async(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 618, in _process_decoder_only_prompt_async
    prompt_comps = await self._prompt_to_llm_inputs_async(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 387, in _prompt_to_llm_inputs_async
    return await self._process_multimodal_async(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 268, in _process_multimodal_async
    return mm_processor.apply(prompt, mm_data, mm_processor_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 807, in apply
    all_placeholders = self._find_placeholders(all_prompt_repls,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 651, in _find_placeholders
    return list(
           ^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 578, in iter_placeholders
    yield from _iter_modality_placeholders(
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 536, in _iter_modality_placeholders
    replacement = repl_info.get_replacement(item_index)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 188, in get_replacement
    replacement = replacement(item_idx)
                  ^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_vl.py", line 834, in get_replacement_qwen2vl
    grid_thw = hf_inputs[f"{modality}_grid_thw"][item_idx]
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
IndexError: index 1 is out of bounds for dimension 0 with size 1

The text was updated successfully, but these errors were encountered:

cs-mshah · 2025-01-07T11:04:19Z

Using the LMDeploy backend also gives issues.

[INFO:swift] Successfully registered `/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/swift/llm/dataset/data/dataset_info.json`
[INFO:swift.hub.hub] Downloading the model from ModelScope Hub, model_id: Qwen/Qwen2-VL-2B-Instruct
Downloading Model to directory: /vmdata/manan/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /vmdata/manan/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct
[INFO:swift] Setting torch_dtype: torch.bfloat16
[INFO:swift] Setting image_factor: 28. You can adjust this hyperparameter through the environment variable: `IMAGE_FACTOR`.
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Setting max_pixels: 12845056. You can adjust this hyperparameter through the environment variable: `MAX_PIXELS`.
[INFO:swift] Setting max_ratio: 200. You can adjust this hyperparameter through the environment variable: `MAX_RATIO`.
[INFO:swift] Setting video_min_pixels: 100352. You can adjust this hyperparameter through the environment variable: `VIDEO_MIN_PIXELS`.
[INFO:swift] Setting video_max_pixels: 602112. You can adjust this hyperparameter through the environment variable: `VIDEO_MAX_PIXELS`.
[INFO:swift] Setting video_total_pixels: 19267584. You can adjust this hyperparameter through the environment variable: `VIDEO_TOTAL_PIXELS`.
[INFO:swift] Setting frame_factor: 2. You can adjust this hyperparameter through the environment variable: `FRAME_FACTOR`.
[INFO:swift] Setting fps: 2.0. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting fps_min_frames: 4. You can adjust this hyperparameter through the environment variable: `FPS_MIN_FRAMES`.
[INFO:swift] Setting fps_max_frames: 768. You can adjust this hyperparameter through the environment variable: `FPS_MAX_FRAMES`.
2025-01-07 11:03:07,813 - lmdeploy - WARNING - archs.py:53 - Fallback to pytorch engine because `/vmdata/manan/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct` not supported by turbomind engine.
[INFO:swift] backend_config: PytorchEngineConfig(dtype='auto', tp=1, session_len=None, max_batch_size=None, cache_max_entry_count=0.8, prefill_interval=16, block_size=64, num_cpu_blocks=0, num_gpu_blocks=0, adapters=None, max_prefill_token_num=8192, thread_safe=True, enable_prefix_caching=False, device_type='cuda', eager_mode=False, custom_module_map=None, download_dir=None, revision=None, quant_policy=0)
[INFO:swift] vision_config: VisionConfig(max_batch_size=8, thread_safe=False)
2025-01-07 11:03:09,016 - lmdeploy - WARNING - model.py:76 - Could not find qwen2_vl in registered models. Register qwen2_vl using the BaseChatTemplate.
2025-01-07 11:03:10,466 - lmdeploy - WARNING - transformers.py:22 - LMDeploy requires transformers version: [4.33.0 ~ 4.46.1], but found version: 4.48.0.dev0
2025-01-07 11:03:10,469 - lmdeploy - WARNING - config.py:35 - Model config does not have `torch_dtype`, use: float16
2025-01-07 11:03:10,469 - lmdeploy - WARNING - engine_checker.py:70 - thread safe mode has been deprecated and it would be removed in the future.
query: <video>Describe this video.

query: <video>Describe this video.
response: qwen-vl-utils using decord to read video.
2025-01-07 11:03:14,513 - lmdeploy - WARNING - messages.py:82 - `temperature` is 0, set to 1e-6
!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
metric: {'num_prompt_tokens': 26, 'num_generated_tokens': 512, 'num_samples': 1, 'runtime': 5.579125477001071, 'samples/s': 0.1792395607738736, 'tokens/s': 91.77065511622328}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError: index 1 is out of bounds for dimension 0 with size 1 #2875

IndexError: index 1 is out of bounds for dimension 0 with size 1 #2875

cs-mshah commented Jan 7, 2025

cs-mshah commented Jan 7, 2025

IndexError: index 1 is out of bounds for dimension 0 with size 1 #2875

IndexError: index 1 is out of bounds for dimension 0 with size 1 #2875

Comments

cs-mshah commented Jan 7, 2025

cs-mshah commented Jan 7, 2025