Python: Introduced a new condition to yield `StreamingChatMessageContent` directly when usage data is available. #9753

ymuichiro · 2024-11-19T09:17:15Z

Motivation and Context

This pull request addresses a bug where setting stream_options.include_usage to True does not return token usage, resulting in None for the usage field.

The issue occurs when using Azure OpenAI's GPT-4o and GPT-4omini models. In particular, if the last chunk of the response has an empty choices list, the chunk is skipped entirely, and the token usage is not processed correctly.

In the Azure OpenAI implementation, if usage information is included, the chunk should be processed appropriately. However, the current code skips processing when choices is empty. This pull request fixes this behavior so that the chunk is processed when usage is present, even if choices is empty.

Description

This fix includes the following changes:

Modified the relevant section in azure_chat_completion.py to ensure that chunks with empty choices are not skipped if usage information is present.
Specifically, the condition if len(chunk.choices) == 0: was updated to allow chunks with usage data to be processed correctly.

With these changes, setting stream_options.include_usage to True will correctly return token usage data, even for chunks where the choices list is empty.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

TaoChenOSU · 2024-11-19T20:03:35Z

Hi @ymuichiro, thank you for your contribution!

If you read the comments made to this particular _inner_get_streaming_chat_message_contents method, you will see the reason why stream_option is not allowed with Azure OpenAI.

Did you observe a different behavior with Azure OpenAI?

yuichiromukaiyama · 2024-11-20T06:16:49Z

@TaoChenOSU
Of course. I have verified this for each API version. In my environment, regardless of which API version I choose, no errors occur, and the token usage for the stream is returned.

Am I misunderstanding something? It does indeed feel odd that it works even with older API versions.

↓ success versions and sample code

2024-10-01-preview
2024-09-01-preview
2024-07-01-preview
2024-10-21
2024-06-01

The following was created directly in the shell to prevent any misunderstandings due to other causes, but even when using Semantic Kernel, the same error could not be reproduced.

payload="{\
  \"messages\": [\
    {\
      \"role\": \"user\",\
      \"content\": [\
        {\
          \"type\": \"text\",\
          \"text\": \"hi.\"\
        }\
      ]\
    }\
  ],\
  \"temperature\": 0.7,\
  \"top_p\": 0.95,\
  \"stream\": true,\
  \"stream_options\": { \"include_usage\": true },\
  \"max_tokens\": 10\
}"

curl "https://${***********************}.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2024-10-21" \
  -H "Content-Type: application/json" \
  -H "api-key: **************************" \
  -d "$payload"

async def stream_sample() -> None:
    kernel = sk.Kernel()
    service_id: str = "dummy"

    kernel.add_service(
        AzureChatCompletion(
            service_id=service_id,
            deployment_name=AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME,
            endpoint=AZURE_OPENAI_COMPLETION_ENDPOINT,
            api_key=AZURE_OPENAI_COMPLETION_API_KEY,
            api_version="2024-06-01",
        )
    )

    service = kernel.get_service(service_id=service_id)
    settings = service.get_prompt_execution_settings_class()(service_id=service_id)

    if isinstance(settings, AzureChatPromptExecutionSettings):
        settings.extra_body = {
            "stream_options": {
                "include_usage": True,
            }
        }

    history = ChatHistory()
    history.add_user_message("hello")

    async for chunk in service.get_streaming_chat_message_contents(
        chat_history=history,
        settings=settings,
        kernel=kernel,
        arguments=KernelArguments(settings=settings),
    ):
        print(chunk)

ymuichiro · 2024-11-20T14:05:11Z

Sorry, I used the wrong account but it's the same person.

TaoChenOSU · 2024-11-20T18:17:15Z

Hi @ymuichiro,

I just verified. Seems like they have resolved the issue. Could you remove the override of _inner_get_streaming_chat_message_contents in AzureChatCompletion? The default implementation is already in OpenAIChatCompletionBase which handles streaming tokens correctly.

…de of _inner_get_streaming_chat_message_contents has been removed.

ymuichiro · 2024-11-20T23:44:04Z

hi @TaoChenOSU

sure, is this ok?
I have confirmed that it works.

076c792

TaoChenOSU · 2024-11-21T15:59:57Z

python/semantic_kernel/connectors/ai/open_ai/services/azure_chat_completion.py

@@ -3,17 +3,17 @@
 import json
 import logging
 import sys
-from collections.abc import AsyncGenerator, Mapping
+from collections.abc import Mapping
 from copy import deepcopy
 from typing import Any, TypeVar
 from uuid import uuid4

 if sys.version_info >= (3, 12):


Please also remove this if block.

@TaoChenOSU
Got it, deleted it!

TaoChenOSU · 2024-11-21T16:01:31Z

hi @TaoChenOSU

sure, is this ok? I have confirmed that it works.

076c792

Yes, this is right. Just one minor comment and we are good!

ymuichiro requested a review from a team as a code owner November 19, 2024 09:17

markwallace-microsoft added the python Pull requests for the Python Semantic Kernel label Nov 19, 2024

yuichiromukaiyama mentioned this pull request Nov 19, 2024

Python: Bug: When using Azure OpenAI, even if stream options are enabled, it is not reflected in the usage. #9751

Open

TaoChenOSU linked an issue Nov 20, 2024 that may be closed by this pull request

Python: Token usage from Azure OpenAI streaming chat completion #8996

Open

Python: In Azure OpenAI, stream_options is now enabled, so the overri…

076c792

…de of _inner_get_streaming_chat_message_contents has been removed.

ymuichiro force-pushed the main branch from 2d6af6a to 076c792 Compare November 20, 2024 23:40

TaoChenOSU reviewed Nov 21, 2024

View reviewed changes

Python: Removed unnecessary imports from azure_chat_completion.py

40587be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Introduced a new condition to yield `StreamingChatMessageContent` directly when usage data is available. #9753

Python: Introduced a new condition to yield `StreamingChatMessageContent` directly when usage data is available. #9753

ymuichiro commented Nov 19, 2024

TaoChenOSU commented Nov 19, 2024

yuichiromukaiyama commented Nov 20, 2024

ymuichiro commented Nov 20, 2024

TaoChenOSU commented Nov 20, 2024

ymuichiro commented Nov 20, 2024

TaoChenOSU Nov 21, 2024

ymuichiro Nov 22, 2024

TaoChenOSU commented Nov 21, 2024

Python: Introduced a new condition to yield StreamingChatMessageContent directly when usage data is available. #9753

Are you sure you want to change the base?

Python: Introduced a new condition to yield StreamingChatMessageContent directly when usage data is available. #9753

Conversation

ymuichiro commented Nov 19, 2024

Motivation and Context

Description

Contribution Checklist

TaoChenOSU commented Nov 19, 2024

yuichiromukaiyama commented Nov 20, 2024

ymuichiro commented Nov 20, 2024

TaoChenOSU commented Nov 20, 2024

ymuichiro commented Nov 20, 2024

TaoChenOSU Nov 21, 2024

Choose a reason for hiding this comment

ymuichiro Nov 22, 2024

Choose a reason for hiding this comment

TaoChenOSU commented Nov 21, 2024

Python: Introduced a new condition to yield `StreamingChatMessageContent` directly when usage data is available. #9753

Python: Introduced a new condition to yield `StreamingChatMessageContent` directly when usage data is available. #9753