This project is a FastAPI-based proxy server for OpenAI's Chat API. It can generate traces and metrics using OpenTelemetry and send them to a collector for further processing. Metrics include the number of tokens generated by the model, the number of prompt tokens in a chat completion, and the number of completion tokens in a chat completion.
To isolate this project's dependencies from other Python projects, create a virtual environment:
python3 -m venv venv
Activate the environment:
- On macOS and Linux:
source venv/bin/activate
- On Windows:
.\venv\Scripts\activate
Install the dependencies from the requirements.txt
file:
pip install -r requirements.txt
Set the OpenAI API key as an environment variable:
export OPENAI_API_KEY='your-api-key'
Start the service by pointing it to the OpenTelemetry (OTEL) collector. You can use the console to troubleshoot:
OTEL_RESOURCE_ATTRIBUTES=service.name=example-openai-proxy opentelemetry-instrument --traces_exporter console --metrics_exporter otlp_proto_http,console uvicorn main:app
This command sets the service name to example-openai-proxy
and starts the app with uvicorn
. The opentelemetry-instrument
command is used to automatically instrument the app for OpenTelemetry. Traces are exported to the console for debugging, and metrics are exported to the OTLP collector and the console.
In a different terminal, you can run the tests. Make sure you're in the example-openai-proxy
directory:
python tests/main_test.py
Here is an example response from the script:
{
"id": "chatcmpl-7ZN1Hb2SAMIesB509m73bE3pf5Sab",
"object": "chat.completion",
"created": 1688664323,
"model": "gpt-3.5-turbo-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 27,
"completion_tokens": 13,
"total_tokens": 40
}
}
The proxy server produces several metrics, which are printed to the console where the service is running. These metrics include:
- The number of tokens generated by the model (
tokens_counter
) - The number of prompt tokens in a chat completion (
prompt_tokens
) - The number of completion tokens in a chat completion (
completion_tokens
)
These metrics are labeled with the model name, the chat completion ID, and a partially masked API key.
And here are the metrics produced for this response (in the service console):
{
"resource_metrics": [
...
{
"scope": {
"name": "openai.meter",
"version": "",
"schema_url": ""
},
"metrics": [
{
"name": "tokens_counter",
"description": "The number of tokens generated by model",
"unit": "1",
"version": "",
"schema_url": ""
},
"metrics": [
{
"name": "tokens_counter",
"description": "The number of tokens generated by model",
"unit": "1",
"data": {
"data_points": [
{
"attributes": {
"model": "gpt-3.5-turbo",
"chat_completion_id": "chatcmpl-7ZN1Hb2SAMIesB509m73bE3pf5Sab",
"api_key": "None"
},
"start_time_unix_nano": 1688664323731330000,
"time_unix_nano": 1688664571647765000,
"value": 40
}
],
"aggregation_temporality": 2,
"is_monotonic": true
}
}
]
]
}
]
}
This metric indicates that the model gpt-3.5-turbo
generated 40 tokens in response to a chat completion request with the ID chatcmpl-7ZN1Hb2SAMIesB509m73bE3pf5Sab
.
The http.server.active_requests
and http.server.duration
metrics are automatically produced by the OpenTelemetry FastAPI instrumentation and provide information about the HTTP requests received by the server.
Remember to replace 'your-api-key'
with your actual OpenAI API key. Keep it secret, and don't share it online.
Happy coding!