Skip to content

zhirafovod/example-openai-proxy

Repository files navigation

OpenAI API Proxy with OpenTelemetry

This project is a FastAPI-based proxy server for OpenAI's Chat API. It can generate traces and metrics using OpenTelemetry and send them to a collector for further processing. Metrics include the number of tokens generated by the model, the number of prompt tokens in a chat completion, and the number of completion tokens in a chat completion.

Setup and Installation

Step 1: Create a Virtual Environment

To isolate this project's dependencies from other Python projects, create a virtual environment:

python3 -m venv venv

Activate the environment:

  • On macOS and Linux:
source venv/bin/activate
  • On Windows:
.\venv\Scripts\activate

Step 2: Install Dependencies

Install the dependencies from the requirements.txt file:

pip install -r requirements.txt

Step 3: Configure OpenAI API Key

Set the OpenAI API key as an environment variable:

export OPENAI_API_KEY='your-api-key'

Running the Service

Start the service by pointing it to the OpenTelemetry (OTEL) collector. You can use the console to troubleshoot:

OTEL_RESOURCE_ATTRIBUTES=service.name=example-openai-proxy opentelemetry-instrument --traces_exporter console --metrics_exporter otlp_proto_http,console uvicorn main:app

This command sets the service name to example-openai-proxy and starts the app with uvicorn. The opentelemetry-instrument command is used to automatically instrument the app for OpenTelemetry. Traces are exported to the console for debugging, and metrics are exported to the OTLP collector and the console.

Running the Tests

In a different terminal, you can run the tests. Make sure you're in the example-openai-proxy directory:

python tests/main_test.py

Here is an example response from the script:

{
  "id": "chatcmpl-7ZN1Hb2SAMIesB509m73bE3pf5Sab",
  "object": "chat.completion",
  "created": 1688664323,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The Los Angeles Dodgers won the World Series in 2020."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 27,
    "completion_tokens": 13,
    "total_tokens": 40
  }
}

Metrics

The proxy server produces several metrics, which are printed to the console where the service is running. These metrics include:

  • The number of tokens generated by the model (tokens_counter)
  • The number of prompt tokens in a chat completion (prompt_tokens)
  • The number of completion tokens in a chat completion (completion_tokens)

These metrics are labeled with the model name, the chat completion ID, and a partially masked API key.

Example Metrics generated by the service

And here are the metrics produced for this response (in the service console):

{
    "resource_metrics": [
        ...
        {
            "scope": {
                "name": "openai.meter",
                "version": "",
                "schema_url": ""
            },
            "metrics": [
                {
                    "name": "tokens_counter",
                    "description": "The number of tokens generated by model",
                    "unit": "1",
                    "version": "",
                    "schema_url": ""
                },
                "metrics": [
                    {
                        "name": "tokens_counter",
                        "description": "The number of tokens generated by model",
                        "unit": "1",
                        "data": {
                            "data_points": [
                                {
                                    "attributes": {
                                        "model": "gpt-3.5-turbo",
                                        "chat_completion_id": "chatcmpl-7ZN1Hb2SAMIesB509m73bE3pf5Sab",
                                        "api_key": "None"
                                    },
                                    "start_time_unix_nano": 1688664323731330000,
                                    "time_unix_nano": 1688664571647765000,
                                    "value": 40
                                }
                            ],
                            "aggregation_temporality": 2,
                            "is_monotonic": true
                        }
                    }
                ]
            ]
        }
    ]
}

This metric indicates that the model gpt-3.5-turbo generated 40 tokens in response to a chat completion request with the ID chatcmpl-7ZN1Hb2SAMIesB509m73bE3pf5Sab.

The http.server.active_requests and http.server.duration metrics are automatically produced by the OpenTelemetry FastAPI instrumentation and provide information about the HTTP requests received by the server.

Remember to replace 'your-api-key' with your actual OpenAI API key. Keep it secret, and don't share it online.

Happy coding!

About

An example Python FastAPI proxy to call OpenAI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages