Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: index 1 is out of bounds for dimension 0 with size 1 #2875

Open
cs-mshah opened this issue Jan 7, 2025 · 1 comment
Open

IndexError: index 1 is out of bounds for dimension 0 with size 1 #2875

cs-mshah opened this issue Jan 7, 2025 · 1 comment

Comments

@cs-mshah
Copy link

cs-mshah commented Jan 7, 2025

Describe the bug
I tried running the inference script: https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo_mllm.py
However, this gave issues while running. I tried changing the limit_mm_per_prompt={"video": 1} but this gives the following error while running:

Your hardware and system info
CUDA 12.1
vllm 0.6.6.post1
torch 2.5.1+cu121
torchaudio 2.5.0
torchvision 0.20.1+cu121
ms-swift 3.0.1.post1
sentence-transformers 3.3.1
transformers 4.47.1
transformers-stream-generator 0.0.5
qwen-vl-utils 0.0.8

Additional context
Logs:

[INFO:swift] Successfully registered `~/miniconda3/envs/vllm/lib/python3.12/site-packages/swift/llm/dataset/data/dataset_info.json`                                               
[INFO:swift.hub.hub] Downloading the model from ModelScope Hub, model_id: Qwen/Qwen2-VL-2B-Instruct                                                                                           
Downloading Model to directory: /vmdata/manan/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct                                                                                                 
[WARNING:modelscope] Using branch: master as version is unstable, use with caution                                                                                                            
[INFO:swift] Loading the model using model_dir: /vmdata/manan/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct                                                                                 
[INFO:swift] Setting torch_dtype: torch.bfloat16                                                                                                                                              
[INFO:swift] Setting image_factor: 28. You can adjust this hyperparameter through the environment variable: `IMAGE_FACTOR`.                                                                   
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.                                                                     
[INFO:swift] Setting max_pixels: 12845056. You can adjust this hyperparameter through the environment variable: `MAX_PIXELS`.                                                                 
[INFO:swift] Setting max_ratio: 200. You can adjust this hyperparameter through the environment variable: `MAX_RATIO`.                                                                        
[INFO:swift] Setting video_min_pixels: 100352. You can adjust this hyperparameter through the environment variable: `VIDEO_MIN_PIXELS`.                                                       
[INFO:swift] Setting video_max_pixels: 602112. You can adjust this hyperparameter through the environment variable: `VIDEO_MAX_PIXELS`.                                                       
[INFO:swift] Setting video_total_pixels: 19267584. You can adjust this hyperparameter through the environment variable: `VIDEO_TOTAL_PIXELS`.                                                 
[INFO:swift] Setting frame_factor: 2. You can adjust this hyperparameter through the environment variable: `FRAME_FACTOR`.                                                                    
[INFO:swift] Setting fps: 2.0. You can adjust this hyperparameter through the environment variable: `FPS`.                                                                                    
[INFO:swift] Setting fps_min_frames: 4. You can adjust this hyperparameter through the environment variable: `FPS_MIN_FRAMES`.                                                                
[INFO:swift] Setting fps_max_frames: 768. You can adjust this hyperparameter through the environment variable: `FPS_MAX_FRAMES`.                                                              
INFO 01-07 09:12:51 config.py:510] This model supports multiple tasks: {'score', 'reward', 'classify', 'generate', 'embed'}. Defaulting to 'generate'.                                        
INFO 01-07 09:12:51 llm_engine.py:234] Initializing an LLM engine (v0.6.6.post1) with config: model='~/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct', speculative_config=None, 
tokenizer='~/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None
, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce
=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), obs
ervability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=~/.cache/modelscop
e/hub/Qwen/Qwen2-VL-2B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_pre
processor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"candidate_compile_si
zes":[],"compile_sizes":[],"capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256},
 use_cached_outputs=False,                                                                                                                                                                    
INFO 01-07 09:12:52 selector.py:120] Using Flash Attention backend.                                                                                                                           
INFO 01-07 09:12:52 model_runner.py:1094] Starting to load model /vmdata/manan/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct...                                                             
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]                                                                                                                  
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:01<00:01,  1.22s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:01<00:00,  1.53it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:01<00:00,  1.35it/s]

INFO 01-07 09:12:54 model_runner.py:1099] Loading model weights took 4.1273 GB
INFO 01-07 09:13:06 worker.py:241] Memory profiling takes 11.36 seconds
INFO 01-07 09:13:06 worker.py:241] the current vLLM instance can use total_gpu_memory (39.38GiB) x gpu_memory_utilization (0.90) = 35.44GiB
INFO 01-07 09:13:06 worker.py:241] model weights take 4.13GiB; non_torch_memory takes 0.72GiB; PyTorch activation peak memory takes 3.30GiB; the rest of the memory reserved for KV Cache is 2
7.30GiB.
INFO 01-07 09:13:06 gpu_executor.py:76] # GPU blocks: 63904, # CPU blocks: 9362
INFO 01-07 09:13:06 gpu_executor.py:80] Maximum concurrency for 32768 tokens per request: 31.20x
INFO 01-07 09:13:10 model_runner.py:1415] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce
_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory_utilization` or switching to eager mode. You can als
o reduce the `max_num_seqs` as needed to decrease memory usage.
Capturing CUDA graph shapes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:19<00:00,  1.84it/s]
INFO 01-07 09:13:29 model_runner.py:1535] Graph capturing finished in 19 secs, took 0.33 GiB
INFO 01-07 09:13:29 llm_engine.py:431] init engine (profile, create kv cache, warmup model) took 34.25 seconds
query: <video>Describe this video.
response: qwen-vl-utils using decord to read video.
WARNING 01-07 09:13:30 preprocess.py:262] Passing `multi_modal_data` in TokensPrompt isdeprecated and will be removed in a future update
Exception in callback VllmEngine.patch_remove_log.<locals>.new_log_task_completion(error_callback=<bound method...151000ea1ee0>>)(<Task finishe...with size 1')>) at /vmdata/manan/miniconda3/
envs/vllm/lib/python3.12/site-packages/swift/llm/infer/infer_engine/vllm_engine.py:397
handle: <Handle VllmEngine.patch_remove_log.<locals>.new_log_task_completion(error_callback=<bound method...151000ea1ee0>>)(<Task finishe...with size 1')>) at /vmdata/manan/miniconda3/envs/v
llm/lib/python3.12/site-packages/swift/llm/infer/infer_engine/vllm_engine.py:397>
Traceback (most recent call last):
  File "~/miniconda3/envs/vllm/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "~/miniconda3/envs/vllm/lib/python3.12/site-packages/swift/llm/infer/infer_engine/vllm_engine.py", line 399, in new_log_task_completion
    return_value = task.result()
                   ^^^^^^^^^^^^^
  File "~/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 888, in run_engine_loop
    result = task.result()
             ^^^^^^^^^^^^^
File "~/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 799, in engine_step
    await self.engine.add_request_async(**new_request)
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 493, in add_request_async
    preprocessed_inputs = await self.input_preprocessor.preprocess_async(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 678, in preprocess_async
    return await self._process_decoder_only_prompt_async(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 618, in _process_decoder_only_prompt_async
    prompt_comps = await self._prompt_to_llm_inputs_async(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 387, in _prompt_to_llm_inputs_async
    return await self._process_multimodal_async(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/inputs/preprocess.py", line 268, in _process_multimodal_async
    return mm_processor.apply(prompt, mm_data, mm_processor_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 807, in apply
    all_placeholders = self._find_placeholders(all_prompt_repls,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 651, in _find_placeholders
    return list(
           ^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 578, in iter_placeholders
    yield from _iter_modality_placeholders(
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 536, in _iter_modality_placeholders
    replacement = repl_info.get_replacement(item_index)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/multimodal/processing.py", line 188, in get_replacement
    replacement = replacement(item_idx)
                  ^^^^^^^^^^^^^^^^^^^^^
  File "/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_vl.py", line 834, in get_replacement_qwen2vl
    grid_thw = hf_inputs[f"{modality}_grid_thw"][item_idx]
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
IndexError: index 1 is out of bounds for dimension 0 with size 1
@cs-mshah
Copy link
Author

cs-mshah commented Jan 7, 2025

Using the LMDeploy backend also gives issues.

[INFO:swift] Successfully registered `/vmdata/manan/miniconda3/envs/vllm/lib/python3.12/site-packages/swift/llm/dataset/data/dataset_info.json`
[INFO:swift.hub.hub] Downloading the model from ModelScope Hub, model_id: Qwen/Qwen2-VL-2B-Instruct
Downloading Model to directory: /vmdata/manan/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /vmdata/manan/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct
[INFO:swift] Setting torch_dtype: torch.bfloat16
[INFO:swift] Setting image_factor: 28. You can adjust this hyperparameter through the environment variable: `IMAGE_FACTOR`.
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Setting max_pixels: 12845056. You can adjust this hyperparameter through the environment variable: `MAX_PIXELS`.
[INFO:swift] Setting max_ratio: 200. You can adjust this hyperparameter through the environment variable: `MAX_RATIO`.
[INFO:swift] Setting video_min_pixels: 100352. You can adjust this hyperparameter through the environment variable: `VIDEO_MIN_PIXELS`.
[INFO:swift] Setting video_max_pixels: 602112. You can adjust this hyperparameter through the environment variable: `VIDEO_MAX_PIXELS`.
[INFO:swift] Setting video_total_pixels: 19267584. You can adjust this hyperparameter through the environment variable: `VIDEO_TOTAL_PIXELS`.
[INFO:swift] Setting frame_factor: 2. You can adjust this hyperparameter through the environment variable: `FRAME_FACTOR`.
[INFO:swift] Setting fps: 2.0. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting fps_min_frames: 4. You can adjust this hyperparameter through the environment variable: `FPS_MIN_FRAMES`.
[INFO:swift] Setting fps_max_frames: 768. You can adjust this hyperparameter through the environment variable: `FPS_MAX_FRAMES`.
2025-01-07 11:03:07,813 - lmdeploy - WARNING - archs.py:53 - Fallback to pytorch engine because `/vmdata/manan/.cache/modelscope/hub/Qwen/Qwen2-VL-2B-Instruct` not supported by turbomind engine.
[INFO:swift] backend_config: PytorchEngineConfig(dtype='auto', tp=1, session_len=None, max_batch_size=None, cache_max_entry_count=0.8, prefill_interval=16, block_size=64, num_cpu_blocks=0, num_gpu_blocks=0, adapters=None, max_prefill_token_num=8192, thread_safe=True, enable_prefix_caching=False, device_type='cuda', eager_mode=False, custom_module_map=None, download_dir=None, revision=None, quant_policy=0)
[INFO:swift] vision_config: VisionConfig(max_batch_size=8, thread_safe=False)
2025-01-07 11:03:09,016 - lmdeploy - WARNING - model.py:76 - Could not find qwen2_vl in registered models. Register qwen2_vl using the BaseChatTemplate.
2025-01-07 11:03:10,466 - lmdeploy - WARNING - transformers.py:22 - LMDeploy requires transformers version: [4.33.0 ~ 4.46.1], but found version: 4.48.0.dev0
2025-01-07 11:03:10,469 - lmdeploy - WARNING - config.py:35 - Model config does not have `torch_dtype`, use: float16
2025-01-07 11:03:10,469 - lmdeploy - WARNING - engine_checker.py:70 - thread safe mode has been deprecated and it would be removed in the future.
query: <video>Describe this video.

query: <video>Describe this video.
response: qwen-vl-utils using decord to read video.
2025-01-07 11:03:14,513 - lmdeploy - WARNING - messages.py:82 - `temperature` is 0, set to 1e-6
!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
metric: {'num_prompt_tokens': 26, 'num_generated_tokens': 512, 'num_samples': 1, 'runtime': 5.579125477001071, 'samples/s': 0.1792395607738736, 'tokens/s': 91.77065511622328}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant