OpenAI API Proxy with OpenTelemetry

This project is a FastAPI-based proxy server for OpenAI's Chat API. It can generate traces and metrics using OpenTelemetry and send them to a collector for further processing. Metrics include the number of tokens generated by the model, the number of prompt tokens in a chat completion, and the number of completion tokens in a chat completion.

Setup and Installation

Step 1: Create a Virtual Environment

To isolate this project's dependencies from other Python projects, create a virtual environment:

python3 -m venv venv

Activate the environment:

On macOS and Linux:

source venv/bin/activate

On Windows:

.\venv\Scripts\activate

Step 2: Install Dependencies

Install the dependencies from the requirements.txt file:

pip install -r requirements.txt

Step 3: Configure OpenAI API Key

Set the OpenAI API key as an environment variable:

export OPENAI_API_KEY='your-api-key'

Running the Service

Start the service by pointing it to the OpenTelemetry (OTEL) collector. You can use the console to troubleshoot:

OTEL_RESOURCE_ATTRIBUTES=service.name=example-openai-proxy opentelemetry-instrument --traces_exporter console --metrics_exporter otlp_proto_http,console uvicorn main:app

This command sets the service name to example-openai-proxy and starts the app with uvicorn. The opentelemetry-instrument command is used to automatically instrument the app for OpenTelemetry. Traces are exported to the console for debugging, and metrics are exported to the OTLP collector and the console.

Running the Tests

In a different terminal, you can run the tests. Make sure you're in the example-openai-proxy directory:

python tests/main_test.py

Here is an example response from the script:

{
  "id": "chatcmpl-7ZN1Hb2SAMIesB509m73bE3pf5Sab",
  "object": "chat.completion",
  "created": 1688664323,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The Los Angeles Dodgers won the World Series in 2020."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 27,
    "completion_tokens": 13,
    "total_tokens": 40
  }
}

Metrics

The proxy server produces several metrics, which are printed to the console where the service is running. These metrics include:

The number of tokens generated by the model (tokens_counter)
The number of prompt tokens in a chat completion (prompt_tokens)
The number of completion tokens in a chat completion (completion_tokens)

These metrics are labeled with the model name, the chat completion ID, and a partially masked API key.

Example Metrics generated by the service

And here are the metrics produced for this response (in the service console):

{
    "resource_metrics": [
        ...
        {
            "scope": {
                "name": "openai.meter",
                "version": "",
                "schema_url": ""
            },
            "metrics": [
                {
                    "name": "tokens_counter",
                    "description": "The number of tokens generated by model",
                    "unit": "1",
                    "version": "",
                    "schema_url": ""
                },
                "metrics": [
                    {
                        "name": "tokens_counter",
                        "description": "The number of tokens generated by model",
                        "unit": "1",
                        "data": {
                            "data_points": [
                                {
                                    "attributes": {
                                        "model": "gpt-3.5-turbo",
                                        "chat_completion_id": "chatcmpl-7ZN1Hb2SAMIesB509m73bE3pf5Sab",
                                        "api_key": "None"
                                    },
                                    "start_time_unix_nano": 1688664323731330000,
                                    "time_unix_nano": 1688664571647765000,
                                    "value": 40
                                }
                            ],
                            "aggregation_temporality": 2,
                            "is_monotonic": true
                        }
                    }
                ]
            ]
        }
    ]
}

This metric indicates that the model gpt-3.5-turbo generated 40 tokens in response to a chat completion request with the ID chatcmpl-7ZN1Hb2SAMIesB509m73bE3pf5Sab.

The http.server.active_requests and http.server.duration metrics are automatically produced by the OpenTelemetry FastAPI instrumentation and provide information about the HTTP requests received by the server.

Remember to replace 'your-api-key' with your actual OpenAI API key. Keep it secret, and don't share it online.

Happy coding!

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
api		api
destinations		destinations
tests		tests
.gitignore		.gitignore
CONTINUOUS_TEST.md		CONTINUOUS_TEST.md
OTEL-COLLECTOR-OAUTH2.md		OTEL-COLLECTOR-OAUTH2.md
README.md		README.md
main.py		main.py
otel_config.yaml		otel_config.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAI API Proxy with OpenTelemetry

Setup and Installation

Step 1: Create a Virtual Environment

Step 2: Install Dependencies

Step 3: Configure OpenAI API Key

Running the Service

Running the Tests

Metrics

Example Metrics generated by the service

About

Releases

Packages

Languages

zhirafovod/example-openai-proxy

Folders and files

Latest commit

History

Repository files navigation

OpenAI API Proxy with OpenTelemetry

Setup and Installation

Step 1: Create a Virtual Environment

Step 2: Install Dependencies

Step 3: Configure OpenAI API Key

Running the Service

Running the Tests

Metrics

Example Metrics generated by the service

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages