This project is a FastAPI-based proxy server for OpenAI's Chat API. It can generate traces and metrics using OpenTelemetry and send them to a collector for further processing. Metrics include the number of tokens generated by the model, the number of prompt tokens in a chat completion, and the number of completion tokens in a chat completion.
To isolate this project's dependencies from other Python projects, create a virtual environment:
python3 -m venv venvActivate the environment:
- On macOS and Linux:
source venv/bin/activate- On Windows:
.\venv\Scripts\activateInstall the dependencies from the requirements.txt file:
pip install -r requirements.txtSet the OpenAI API key as an environment variable:
export OPENAI_API_KEY='your-api-key'Start the service by pointing it to the OpenTelemetry (OTEL) collector. You can use the console to troubleshoot:
OTEL_RESOURCE_ATTRIBUTES=service.name=example-openai-proxy opentelemetry-instrument --traces_exporter console --metrics_exporter otlp_proto_http,console uvicorn main:appThis command sets the service name to example-openai-proxy and starts the app with uvicorn. The opentelemetry-instrument command is used to automatically instrument the app for OpenTelemetry. Traces are exported to the console for debugging, and metrics are exported to the OTLP collector and the console.
In a different terminal, you can run the tests. Make sure you're in the example-openai-proxy directory:
python tests/main_test.pyHere is an example response from the script:
{
"id": "chatcmpl-7ZN1Hb2SAMIesB509m73bE3pf5Sab",
"object": "chat.completion",
"created": 1688664323,
"model": "gpt-3.5-turbo-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 27,
"completion_tokens": 13,
"total_tokens": 40
}
}The proxy server produces several metrics, which are printed to the console where the service is running. These metrics include:
- The number of tokens generated by the model (
tokens_counter) - The number of prompt tokens in a chat completion (
prompt_tokens) - The number of completion tokens in a chat completion (
completion_tokens)
These metrics are labeled with the model name, the chat completion ID, and a partially masked API key.
And here are the metrics produced for this response (in the service console):
{
"resource_metrics": [
...
{
"scope": {
"name": "openai.meter",
"version": "",
"schema_url": ""
},
"metrics": [
{
"name": "tokens_counter",
"description": "The number of tokens generated by model",
"unit": "1",
"version": "",
"schema_url": ""
},
"metrics": [
{
"name": "tokens_counter",
"description": "The number of tokens generated by model",
"unit": "1",
"data": {
"data_points": [
{
"attributes": {
"model": "gpt-3.5-turbo",
"chat_completion_id": "chatcmpl-7ZN1Hb2SAMIesB509m73bE3pf5Sab",
"api_key": "None"
},
"start_time_unix_nano": 1688664323731330000,
"time_unix_nano": 1688664571647765000,
"value": 40
}
],
"aggregation_temporality": 2,
"is_monotonic": true
}
}
]
]
}
]
}This metric indicates that the model gpt-3.5-turbo generated 40 tokens in response to a chat completion request with the ID chatcmpl-7ZN1Hb2SAMIesB509m73bE3pf5Sab.
The http.server.active_requests and http.server.duration metrics are automatically produced by the OpenTelemetry FastAPI instrumentation and provide information about the HTTP requests received by the server.
Remember to replace 'your-api-key' with your actual OpenAI API key. Keep it secret, and don't share it online.
Happy coding!