Skip to content

Prompts handling with Sentence Transformers #2896

@arthurbr11

Description

@arthurbr11

Describe the bug

Hello ! With Sentence Transformers v5, the default prompt initialization creates empty prompts that cause MTEB to reject valid model prompts.

Root Cause:
In ST v5, models are initialized with default empty prompts (for better consistency with update include in the release) as you can see here :

self.prompts = {"query": "", "document": ""}
if prompts:
    self.prompts.update(prompts)

The Problem:
Even when a model has valid prompts (i.e. task specific), the empty default prompts ("") make the prompt to have now necessary this two keys empty or not. This means self.prompts might contain a mix of valid prompts, empty ones and ones with wrong format for MTEB.

When MTEB validates these prompts, it calls self.validate_task_to_prompt_name(self.model.prompts). This appears to trigger a KeyError exception, which is caught in this code block :

try:
model_prompts = self.validate_task_to_prompt_name(self.model.prompts)
except KeyError:
model_prompts = None
logger.warning(
"Model prompts are not in the expected format. Ignoring them."

The KeyError exception causes MTEB to set model_prompts = None and log the warning message: "Model prompts are not in the expected format. Ignoring them."

So what we have right now is :

  • No prompts in config → No warning message
  • Wrong format prompts in config → Warning message "ignoring prompts"

And what i think should be nice to have is :

  • If only empty default prompts are present ({"query": "", "document": ""}) => Ignore them silently without raising the warning message
  • If there are prompts with valid formats => Use them for evaluation
  • If there are generic prompts for "query" and "document" in the config but no task-specific prompts ) => Use the generic ones as defaults

This approach would allow MTEB to handle ST v5's initialization pattern while still utilizing any valid prompts that are available. I'm not really aware of the bigger picture of the repo so i might miss some complications in my solution.

Thanks !

  • Arthur BRESNU

To reproduce

Having Latest versions of MTEB and Sentence Transformers v5 with a model containing task-specific prompts.

Additional information

No response

Are you interested to contribute a fix for this bug?

No

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions