Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2.5-VL failing #192

Closed
tmoroney opened this issue Jan 29, 2025 · 17 comments
Closed

Qwen2.5-VL failing #192

tmoroney opened this issue Jan 29, 2025 · 17 comments

Comments

@tmoroney
Copy link

I am getting this error when using the new Qwen2.5-VL models.

Command:

python -m mlx_vlm.generate --model mlx-community/Qwen2.5-VL-3B-Instruct-3bit --max-tokens 100 --temp 0.0 --prompt "Describe this image." --image "frames/frame-01.jpg"
Fetching 11 files: 100%|████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 55056.50it/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/moroneyt/Documents/VLM-Testing/.venv/lib/python3.12/site-packages/mlx_vlm/generate.py", line 156, in <module>
    main()
  File "/Users/moroneyt/Documents/VLM-Testing/.venv/lib/python3.12/site-packages/mlx_vlm/generate.py", line 101, in main
    prompt = apply_chat_template(processor, config, prompt, num_images=len(args.image))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/moroneyt/Documents/VLM-Testing/.venv/lib/python3.12/site-packages/mlx_vlm/prompt_utils.py", line 159, in apply_chat_template
    return processor.apply_chat_template(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/moroneyt/Documents/VLM-Testing/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 1683, in apply_chat_template
    rendered_chat = compiled_template.render(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/moroneyt/Documents/VLM-Testing/.venv/lib/python3.12/site-packages/jinja2/environment.py", line 1295, in render
    self.environment.handle_exception()
  File "/Users/moroneyt/Documents/VLM-Testing/.venv/lib/python3.12/site-packages/jinja2/environment.py", line 942, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 23, in top-level template code
TypeError: can only concatenate str (not "list") to str
@Blaizzy
Copy link
Owner

Blaizzy commented Jan 30, 2025

This was fixed

The chat template was missing but it's back

@Blaizzy
Copy link
Owner

Blaizzy commented Jan 30, 2025

Just download it again

@tmoroney
Copy link
Author

tmoroney commented Jan 30, 2025

I have tried deleting the model and downloading it again but I am getting the same issue unfortunately.

https://huggingface.co/mlx-community/Qwen2.5-VL-3B-Instruct-3bit

@tmoroney
Copy link
Author

I have checked and the chat template is included with the downloaded model files, so it seems that something else is causing this error.

@Blaizzy
Copy link
Owner

Blaizzy commented Jan 30, 2025

Could you share a reproducible script? Also please the pypi versions.

@abaranovskis-redsamurai

Same error here.

Running:

 python -m mlx_vlm.generate --model mlx-community/Qwen2.5-VL-7B-Instruct-bf16 --max-tokens 100 --temp 0.0 --prompt "Describe this image." --image http://images.cocodataset.org/val2017/000000039769.jpg
transformers==4.48.1
torchvision==0.21.0
torch==2.6.0
mlx==0.22.0
mlx-vlm==0.1.12

Error:

Traceback (most recent call last): 94%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌          | 4.93G/5.26G [27:50<01:43, 3.24MB/s]
  File "/Users/andrejb/.pyenv/versions/3.10.4/lib/python3.10/runpy.py", line 196, in _run_module_as_main███████████████████████████████████████████████████████████████████████████████████████████████████| 5.26G/5.26G [28:26<00:00, 9.41MB/s]
    return _run_code(code, main_globals, None,
  File "/Users/andrejb/.pyenv/versions/3.10.4/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/andrejb/Work/katana-git/sparrow/sparrow-data/parse/.env_sparrow_parse/lib/python3.10/site-packages/mlx_vlm/generate.py", line 156, in <module>
    main()
  File "/Users/andrejb/Work/katana-git/sparrow/sparrow-data/parse/.env_sparrow_parse/lib/python3.10/site-packages/mlx_vlm/generate.py", line 101, in main
    prompt = apply_chat_template(processor, config, prompt, num_images=len(args.image))
  File "/Users/andrejb/Work/katana-git/sparrow/sparrow-data/parse/.env_sparrow_parse/lib/python3.10/site-packages/mlx_vlm/prompt_utils.py", line 159, in apply_chat_template
    return processor.apply_chat_template(
  File "/Users/andrejb/Work/katana-git/sparrow/sparrow-data/parse/.env_sparrow_parse/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1687, in apply_chat_template
    rendered_chat = compiled_template.render(
  File "/Users/andrejb/Work/katana-git/sparrow/sparrow-data/parse/.env_sparrow_parse/lib/python3.10/site-packages/jinja2/environment.py", line 1295, in render
    self.environment.handle_exception()
  File "/Users/andrejb/Work/katana-git/sparrow/sparrow-data/parse/.env_sparrow_parse/lib/python3.10/site-packages/jinja2/environment.py", line 942, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 23, in top-level template code
TypeError: can only concatenate str (not "list") to str

@evdc
Copy link

evdc commented Jan 30, 2025

This appears to be because the Qwen2.5-VL models come with a chat_template.json file (c.f. here), which has the jinja2 template as a string value in a json object.

The transformers library expects, by default, to find the chat template as jinja2 (not inside a JSON object) in a file called chat_template.jinja (c.f. here) under the model path. Not finding one, it defaults to a generic template, which doesn't match the format in which mlx_vlm.generate passes in its messages.

This can be temporarily worked around by copying the chat template from within the JSON file I linked (for example), save it as chat_template.jinja in the same directory as the model (e.g. /Users/<your user>/.cache/huggingface/hub/models--mlx-community--Qwen2.5-VL-3B-Instruct-3bit/snapshots/<some snapshot id>/ -- poke around with ls to find the right directory).

I'll post a minimal reproducible test in a minute

@evdc
Copy link

evdc commented Jan 30, 2025

Alternately, this seems to work without having to modify downloaded files:

Change the function mlx_vlm.generate.get_model_and_processors to the following:

def get_model_and_processors(model_path, adapter_path):
    path = get_model_path(model_path)
    with open(path / "chat_template.json") as f:
        templ = json.loads(f.read())
        templ = templ["chat_template"]

    config = load_config(model_path, trust_remote_code=True)
    model, processor = load(
        model_path, adapter_path=adapter_path, lazy=False, trust_remote_code=True, chat_template=templ
    )
    return model, processor, config

we explicitly load and specify the chat template when loading the processor so it picks up the correct one.

Just verified working (latest main branch of this repo, transformers==4.48.1)

@Blaizzy
Copy link
Owner

Blaizzy commented Jan 30, 2025

That's interesting, because if you install transformers from source it works without any changes

v4.49.0.dev0

pip install git+https://github.com/huggingface/transformers

@Blaizzy
Copy link
Owner

Blaizzy commented Jan 30, 2025

v4.48.1 indeed causes the issue.

But if you install transformers from source it should work 👌🏽

@abaranovskis-redsamurai

v4.48.1 indeed causes the issue.

But if you install transformers from source it should work 👌🏽

Yes, indeed it works with transformers from source. Thanks.

@simonw
Copy link

simonw commented Jan 30, 2025

Thanks to that tip this worked for me:

uv run --with 'numpy<2' \
  --with 'git+https://github.com/huggingface/transformers' \
  --with mlx-vlm \
  python -m mlx_vlm.generate \
    --model mlx-community/Qwen2.5-VL-7B-Instruct-8bit \
    --max-tokens 100 \
    --temp 0.0 \
    --prompt "Describe this image." \
    --image path-to-image.png

Result on my blog: https://simonwillison.net/2025/Jan/27/qwen25-vl-qwen25-vl-qwen25-vl/#qwen-vl-mlx-vlm

@Blaizzy
Copy link
Owner

Blaizzy commented Jan 30, 2025

Most welcome!

Great article 🔥🙌🏽

I would love if we had a cookbook like that for mlx-vlm.

We already have a few recipes (here) but we definitely need more.

@jrp2014
Copy link

jrp2014 commented Jan 31, 2025

This model is one of the few that blows up if you give it a large pic:

Terminating due to uncaught exception: [metal::malloc] Attempting to allocate 134767706112 bytes which is greater than the maximum allowed buffer size of 77309411328 bytes.

zsh: abort      python -m mlx_vlm.generate --model mlx-community/Qwen2.5-VL-7B-Instruct-bf16 
/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@dcup
Copy link

dcup commented Feb 6, 2025

Just have a digging on the error messages and it is because the preprocessor of Qwen2.5-VL is "Qwen2_5_VLProcessor" . However, this class is not included in current release of transformers ver V4.48.2.

This class is on the dev branch v4.49.0.dev0, so install from github will be ok.

@RArchered
Copy link

same error , installing transformers from github works:
pip install git+https://github.com/huggingface/transformers

@Blaizzy
Copy link
Owner

Blaizzy commented Mar 3, 2025

Issue has been resolved.

Closing it for now.

@Blaizzy Blaizzy closed this as completed Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants