Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requesting support in Pipeline using Florence-2 models and tasks #36106

Open
mediocreatmybest opened this issue Feb 10, 2025 · 8 comments
Open
Labels
Feature request Request for a new feature

Comments

@mediocreatmybest
Copy link

Feature request

Hi!

Currently, microsoft/Florence-2-large-ft or related models cannot be loaded with HF pipeline("image-to-text") as its config is not recognised by AutoModelForVision2Seq.
When attempting to load it, Transformers raises:

“Unrecognised configuration class Florence2Config for this kind of AutoModel: AutoModelForVision2Seq.”

Florence-2 also requires trust_remote_code=True to be passed to the functions.
The current standard method works by loading Florence-2 with AutoModelForCausalLM and AutoProcessor, but this adds another flow if you are already using pipeline, Lora support also works well, having these in the pipeline would making it an amazing addition for its capable tasks.

Thanks!

Model:
https://huggingface.co/microsoft/Florence-2-large

Motivation

Adding support for pipeline with these models would give it another great set of options with tasks while lowering the barrier for entry, as the pipeline is a great feature that simplifies the writing and reusability of code for people. (Like me!)

Thanks again for all the amazing work.

Your contribution

I can test any proposed updates.

@mediocreatmybest mediocreatmybest added the Feature request Request for a new feature label Feb 10, 2025
@zucchini-nlp
Copy link
Member

It should be image-text-to-text pipeline, though not sure if it is loadable. cc @yonigozlan for VLM pipelines

@mediocreatmybest
Copy link
Author

Thanks! Just checked, same issue with the image-test-to-text task.

ValueError: Unrecognized configuration class <class 'transformers_modules.microsoft.Florence-2-large-ft.bb44b80c15e943b1bf7cec6e076359cec6e40178.configuration_florence2.Florence2Config'> for this kind of AutoModel: AutoModelForImageTextToText.
Model type should be one of AriaConfig, BlipConfig, Blip2Config, ChameleonConfig, Emu3Config, FuyuConfig, GitConfig, IdeficsConfig, Idefics2Config, Idefics3Config, InstructBlipConfig, Kosmos2Config, LlavaConfig, LlavaNextConfig, LlavaOnevisionConfig, MllamaConfig, PaliGemmaConfig, Pix2StructConfig, PixtralVisionConfig, Qwen2VLConfig, UdopConfig, VipLlavaConfig, VisionEncoderDecoderConfig.

@zucchini-nlp
Copy link
Member

I overlooked this in the morning, Florence config on the hub has an auto-mapping with AutoModelCausalLM. But since the pipeline expects an auto model for image-text-to-text, the code is erroring out. So the auto-map has to be AutoModelForImageTextToText

@mediocreatmybest
Copy link
Author

I overlooked this in the morning, Florence config on the hub has an auto-mapping with AutoModelCausalLM. But since the pipeline expects an auto model for image-text-to-text, the code is erroring out. So the auto-map has to be AutoModelForImageTextToText

Would that mean the issue is with the models configs? Or currently not in pipeline?

@zucchini-nlp
Copy link
Member

zucchini-nlp commented Feb 11, 2025

The problem is in the config, yes

Update: I found that we pass an unused argument (legacy=False) to processor which will error out for Florence2. The argument is under deprecation afaik. @yonigozlan when are we planning to remove it?

Otherwise the below code worked for me, with a small hack to register AutoProcessor

from transformers import AutoConfig, AutoModelForImageTextToText, AutoProcessor, pipeline
from transformers.image_utils import load_image

config = AutoConfig.from_pretrained('microsoft/Florence-2-large', trust_remote_code=True)
config.auto_map['AutoModelForImageTextToText'] = 'microsoft/Florence-2-large--modeling_florence2.Florence2ForConditionalGeneration'

model = AutoModelForImageTextToText.from_pretrained("microsoft/Florence-2-large", config=config, trust_remote_code=True)
processor = AutoProcessor.from_pretrained("microsoft/Florence-2-large", trust_remote_code=True)
AutoProcessor.register(type(config), processor)

pipe = pipeline(
    'image-text-to-text',
    model='microsoft/Florence-2-large',
    config=config,
    trust_remote_code=True
)

image = load_image("https://www.ilankelman.org/stopsigns/australia.jpg")
print(pipe(images=image, text="What do you see here?"))

@mediocreatmybest
Copy link
Author

Thanks, I can confirm that I'm also seeing the error with the legacy argument.

TypeError: Florence2Processor.call() got an unexpected keyword argument 'legacy'

Do we have a work around within pipeline to stop this being passed onto the processor?

@zucchini-nlp
Copy link
Member

Unfortunately no, unless the code itself changes and stops passing legacy. Another way is to enable Florence to accept **kwargs at __call__

mediocreatmybest added a commit to mediocreatmybest/ComfyUI-Transformers-Pipeline that referenced this issue Feb 17, 2025
Issue with pipeline and Florence at the moment, skipping pipeline for the moment on this node. huggingface/transformers#36106
@yonigozlan
Copy link
Member

Update: I found that we pass an unused argument (legacy=False) to processor which will error out for Florence2. The argument is under deprecation afaik. @yonigozlan when are we planning to remove it?

This should be fixed now :), and I opened a PR to fully deprecate the legacy kwarg here: #36307

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

3 participants