Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would it be possible to expose the usage payload of the OpenAI response? #74

Open
Lawouach opened this issue Dec 5, 2023 · 6 comments · May be fixed by #214
Open

Would it be possible to expose the usage payload of the OpenAI response? #74

Lawouach opened this issue Dec 5, 2023 · 6 comments · May be fixed by #214

Comments

@Lawouach
Copy link

Lawouach commented Dec 5, 2023

It would be really useful to track the number of tokens consumed. But the information is not bubbled up. I gather this may not be feasible across providers though?

@jackmpcollins
Copy link
Owner

Hi @Lawouach Do you mean surfacing the usage data from openai API responses?

This looks like

"usage": { "prompt_tokens": 5, "completion_tokens": 5, "total_tokens": 10 }

Since prompt-functions return a value corresponding to the return type annotation, this information would have to be reported through some other method.

One option would be add hooks that allow you to register functions that should be run before/after OpenaiChatModel.complete. Something like

token_usage = 0


def increment_token_usage(message: AssistantMessage):
    token_usage += message.usage.total_tokens


@prompt(
    "Tell me a joke",
    post_completion=increment_token_usage,
)
def tell_joke():
    ...

Other options might be


It seems like adding usage to the AssistantMessage class is necessary/useful in general. And if that were present you could add the hooks by subclassing OpenaiChatModel, modifying .complete to update the token counter, and then passing this class as the model to @prompt. I would support this approach for the moment until there's more use cases to justify adding some more complex solution.

@Lawouach
Copy link
Author

Lawouach commented Dec 6, 2023

Hi @jackmpcollins that would be enough of a solution for my use case indeed. I think it would generalize well too.

@jackmpcollins
Copy link
Owner

API stats are not being returned by the OpenAI API when streaming responses (which magentic does for all responses under the hood).

Javascript package issue comment suggests this is coming soon: openai/openai-node#506 (comment)

Developer community post requesting this: https://community.openai.com/t/openai-api-get-usage-tokens-in-response-when-set-stream-true/141866?u=jackmpcollins

@jackmpcollins
Copy link
Owner

Corresponding openai python client issue is openai/openai-python#1053

@Lawouach
Copy link
Author

Lawouach commented May 7, 2024

Yai they seem to have shipped it.

@jackmpcollins jackmpcollins linked a pull request May 16, 2024 that will close this issue
@jackmpcollins
Copy link
Owner

@Lawouach I've published a prerelease to test having a .usage attribute on AssistantMessage. Could you test it out and let me know if it works for your use case please. One thing to note is that usage only becomes available (not None) once the streamed response has reached the end. This happens before return for most types, but for streamed types like StreamedStr and Iterable it happens after these have been fully iterated over.

pip install "magentic==0.25.0a0"

I have some notes on the PR #214

For the solution above, to create a wrapper ChatModel that does something with usage your code would something like below. You could pass this model as the model argument to @prompt etc.

from typing import Any, Callable, Iterable, TypeVar

from magentic import AssistantMessage, OpenaiChatModel, UserMessage
from magentic.chat_model.base import ChatModel
from magentic.chat_model.message import Message


R = TypeVar("R")


class LoggingChatModel(ChatModel):
    def __init__(self, chat_model: ChatModel):
        self.chat_model = chat_model

    def complete(
        self,
        messages: Iterable[Message[Any]],
        functions: Iterable[Callable[..., Any]] | None = None,
        output_types: Iterable[type[R]] | None = None,
        *,
        stop: list[str] | None = None,
    ) -> AssistantMessage[str] | AssistantMessage[R]:
        response = self.chat_model.complete(
            messages=messages,
            functions=functions,
            output_types=output_types,
            stop=stop,
        )
        print("usage:", response.usage)  # "Logging"
        return response

    async def acomplete(): pass  # Bypass ABC error


chat_model = LoggingChatModel(OpenaiChatModel("gpt-3.5-turbo", seed=42))
message = chat_model.complete(messages=[UserMessage("Say hello!")])
# > usage: Usage(input_tokens=10, output_tokens=9)
print(message.content)
# > Hello! How can I assist you today?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants