Skip to content

feat: Allow the scores returned by AI Search to be populated in the Document.meta #1907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Seth-Peters
Copy link

Proposed Changes:

Expose AI Search scores in the Document.meta when returning from AI Search and converting back to a Document object. This is critical information needed by users of this integration to decide what actions to take with the returned results.

How did you test it?

Testing by searching an index with the AzureAISearchHybridRetriever, AzureAISearchBM25Retriever, and AzureAISearchEmbeddingRetriever to ensure that Document.meta contains the search scores from the results.

Notes for the reviewer

Now, when users search an AI Search index with this fix, returned Document objects will be populated like this:

from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack.components.writers import DocumentWriter

from haystack_integrations.components.retrievers.azure_ai_search import AzureAISearchEmbeddingRetriever
from haystack_integrations.document_stores.azure_ai_search import AzureAISearchDocumentStore

"""
This example demonstrates how to use the AzureAISearchEmbeddingRetriever to retrieve documents
using embeddings based on a query. To run this example, you'll need an Azure Search service endpoint
and API key, which can either be
set as environment variables (AZURE_AI_SEARCH_ENDPOINT and AZURE_AI_SEARCH_API_KEY) or
provided directly to AzureAISearchDocumentStore(as params "api_key", "azure_endpoint").
Otherwise you can use DefaultAzureCredential to authenticate with Azure services.
See more details at https://learn.microsoft.com/en-us/azure/search/keyless-connections?tabs=python%2Cazure-cli
"""

document_store = AzureAISearchDocumentStore(index_name="retrieval-example")

model = "sentence-transformers/all-mpnet-base-v2"

documents = [
    Document(content="There are over 7,000 languages spoken around the world today."),
    Document(
        content="""Elephants have been observed to behave in a way that indicates a
         high level of self-awareness, such as recognizing themselves in mirrors."""
    ),
    Document(
        content="""In certain parts of the world, like the Maldives, Puerto Rico, and
          San Diego, you can witness the phenomenon of bioluminescent waves."""
    ),
]

document_embedder = SentenceTransformersDocumentEmbedder(model=model)
document_embedder.warm_up()

# Indexing Pipeline
indexing_pipeline = Pipeline()
indexing_pipeline.add_component(instance=document_embedder, name="doc_embedder")
indexing_pipeline.add_component(instance=DocumentWriter(document_store=document_store), name="doc_writer")
indexing_pipeline.connect("doc_embedder", "doc_writer")

indexing_pipeline.run({"doc_embedder": {"documents": documents}})

# Query Pipeline
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model=model))
query_pipeline.add_component("retriever", AzureAISearchEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "How many languages are there?"

result = query_pipeline.run({"text_embedder": {"text": query}})

print(result["retriever"]["documents"][0])

# Output:
Document(
    content="There are over 7,000 languages spoken around the world today.",
    meta={
        "@search.score": 9.270645,
        "@search.reranker_score": None,
        "@search.highlights": None,
        "@search.captions": None,
        // Other index fields here...
    }
)

Checklist

…exposing these scores (as they are very critical information) to users of this integration.
…exposing these scores (as they are very critical information) to users of this integration.
@Seth-Peters Seth-Peters requested a review from a team as a code owner June 6, 2025 08:16
@Seth-Peters Seth-Peters requested review from Amnah199 and removed request for a team June 6, 2025 08:16
@CLAassistant
Copy link

CLAassistant commented Jun 6, 2025

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants