Would it be possible to expose the `usage` payload of the OpenAI response? #74

Lawouach · 2023-12-05T14:16:29Z

It would be really useful to track the number of tokens consumed. But the information is not bubbled up. I gather this may not be feasible across providers though?

jackmpcollins · 2023-12-06T08:44:21Z

Hi @Lawouach Do you mean surfacing the usage data from openai API responses?

This looks like

"usage": { "prompt_tokens": 5, "completion_tokens": 5, "total_tokens": 10 }

Since prompt-functions return a value corresponding to the return type annotation, this information would have to be reported through some other method.

One option would be add hooks that allow you to register functions that should be run before/after OpenaiChatModel.complete. Something like

token_usage = 0


def increment_token_usage(message: AssistantMessage):
    token_usage += message.usage.total_tokens


@prompt(
    "Tell me a joke",
    post_completion=increment_token_usage,
)
def tell_joke():
    ...

Other options might be

Add a context manager that tracks token usage for prompt-functions called within it. Like in LangChain https://python.langchain.com/docs/modules/model_io/llms/token_usage_tracking

It seems like adding usage to the AssistantMessage class is necessary/useful in general. And if that were present you could add the hooks by subclassing OpenaiChatModel, modifying .complete to update the token counter, and then passing this class as the model to @prompt. I would support this approach for the moment until there's more use cases to justify adding some more complex solution.

Lawouach · 2023-12-06T08:53:15Z

Hi @jackmpcollins that would be enough of a solution for my use case indeed. I think it would generalize well too.

jackmpcollins · 2024-02-18T05:32:14Z

API stats are not being returned by the OpenAI API when streaming responses (which magentic does for all responses under the hood).

Javascript package issue comment suggests this is coming soon: openai/openai-node#506 (comment)

Developer community post requesting this: https://community.openai.com/t/openai-api-get-usage-tokens-in-response-when-set-stream-true/141866?u=jackmpcollins

jackmpcollins · 2024-04-21T23:35:11Z

Corresponding openai python client issue is openai/openai-python#1053

Lawouach · 2024-05-07T11:48:52Z

Yai they seem to have shipped it.

jackmpcollins · 2024-05-16T08:17:21Z

@Lawouach I've published a prerelease to test having a .usage attribute on AssistantMessage. Could you test it out and let me know if it works for your use case please. One thing to note is that usage only becomes available (not None) once the streamed response has reached the end. This happens before return for most types, but for streamed types like StreamedStr and Iterable it happens after these have been fully iterated over.

pip install "magentic==0.25.0a0"

I have some notes on the PR #214

For the solution above, to create a wrapper ChatModel that does something with usage your code would something like below. You could pass this model as the model argument to @prompt etc.

from typing import Any, Callable, Iterable, TypeVar

from magentic import AssistantMessage, OpenaiChatModel, UserMessage
from magentic.chat_model.base import ChatModel
from magentic.chat_model.message import Message


R = TypeVar("R")


class LoggingChatModel(ChatModel):
    def __init__(self, chat_model: ChatModel):
        self.chat_model = chat_model

    def complete(
        self,
        messages: Iterable[Message[Any]],
        functions: Iterable[Callable[..., Any]] | None = None,
        output_types: Iterable[type[R]] | None = None,
        *,
        stop: list[str] | None = None,
    ) -> AssistantMessage[str] | AssistantMessage[R]:
        response = self.chat_model.complete(
            messages=messages,
            functions=functions,
            output_types=output_types,
            stop=stop,
        )
        print("usage:", response.usage)  # "Logging"
        return response

    async def acomplete(): pass  # Bypass ABC error


chat_model = LoggingChatModel(OpenaiChatModel("gpt-3.5-turbo", seed=42))
message = chat_model.complete(messages=[UserMessage("Say hello!")])
# > usage: Usage(input_tokens=10, output_tokens=9)
print(message.content)
# > Hello! How can I assist you today?

jackmpcollins mentioned this issue Apr 21, 2024

Return usage when streaming chat completions. openai/openai-python#1053

Closed

1 task

jackmpcollins linked a pull request May 16, 2024 that will close this issue

Return usage stats on AssistantMessage #214

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would it be possible to expose the `usage` payload of the OpenAI response? #74

Would it be possible to expose the `usage` payload of the OpenAI response? #74

Lawouach commented Dec 5, 2023

jackmpcollins commented Dec 6, 2023

Lawouach commented Dec 6, 2023

jackmpcollins commented Feb 18, 2024

jackmpcollins commented Apr 21, 2024

Lawouach commented May 7, 2024

jackmpcollins commented May 16, 2024

Would it be possible to expose the usage payload of the OpenAI response? #74

Would it be possible to expose the usage payload of the OpenAI response? #74

Comments

Lawouach commented Dec 5, 2023

jackmpcollins commented Dec 6, 2023

Lawouach commented Dec 6, 2023

jackmpcollins commented Feb 18, 2024

jackmpcollins commented Apr 21, 2024

Lawouach commented May 7, 2024

jackmpcollins commented May 16, 2024

Would it be possible to expose the `usage` payload of the OpenAI response? #74

Would it be possible to expose the `usage` payload of the OpenAI response? #74