-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry support #186
base: main
Are you sure you want to change the base?
Conversation
This is heavily borrowed from the OpenAI implementation. Fixes awaescher#185
@lmolkova - can you take a look? |
private static readonly Meter _chatMeter = new("OllamaSharp.ChatClient"); | ||
|
||
// TODO: add explicit histogram buckets once System.Diagnostics.DiagnosticSource 9.0 is used | ||
private static readonly Histogram<double> _duration = _chatMeter.CreateHistogram<double>(GEN_AI_CLIENT_OPERATION_DURATION_METRIC_NAME, "s", "Measures GenAI operation duration."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to create suitable buckets for the histogram based on what a typical duration would be, using exponential-ish bucket sizes. Eg https://github.com/dotnet/aspnetcore/blob/cfcb73ab7e63ad193d20566efb23b8bf93e494ba/src/Hosting/Hosting/src/Internal/HostingMetrics.cs#L30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that you could have a model that's only a few hundred parameters, or one that is several hundreds of billions, I'm not sure that it's really possible to determine what the typical duration would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samsp-msft is is on .NET 10 only? I want to use it too
@aaronpowell users can customize buckets, the advice is part of the spec https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md#metric-gen_aiclientoperationduration - without it everyone needs to customize all the time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronpowell - The spec seems to have good exponential buckets from 100ms to 80s. You should be setting those as the default buckets using the Advise option.
@lmolkova - that code is present in the .NET 9 branch so you shouldn't have to wait for .NET 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OllamaSharp supports netstandard so the implementation will be restricted to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you build just one version, or can you have version specific ifdef's?
|
||
To enable the instrumentation: | ||
|
||
1. Set instrumentation feature-flag using one of the following options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really necessary to have a feature flag to enable telemetry? It forms yet another hurdle that developers need to jump through to get telemetry working, and frankly there are already too many of those as it is.
A simplification would be to take the same approach as .NET did with a couple of networking activities. As they are experimental, we added that to the name, so it would be clear when you enable them, that they are not yet stable. Eg https://github.com/dotnet/runtime/blob/6fe852a43152e45dc2e98bb747187d5bfe9f19e3/src/libraries/System.Net.NameResolution/src/System/Net/NameResolutionTelemetry.cs#L161
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short answer - I don't know, I was just mirroring the approach from https://github.com/openai/openai-dotnet/blob/main/docs/observability.md
|
||
internal class OpenTelemetryScope : IDisposable | ||
{ | ||
private static readonly ActivitySource _chatSource = new("OllamaSharp.ChatClient"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment in read.me - maybe rather than having a feature flag, just prepend "Experimental" to the name of the metric and activity sources, so its explicit that the data may change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OpenAI SDK doesn't include that - https://github.com/openai/openai-dotnet/blob/main/src/Utility/Telemetry/OpenTelemetryScope.cs#L13-L14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only because I never finalized to merge it openai/openai-dotnet#187, but it's approved and I highly recommend using the approach @samsp-msft suggested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll update based on that.
@lmolkova can probably give the latest status on how the content of the chat should be added to telemetry and what the best knob is for the app to turn that on/off. One scenario for emitting the chat content is to be able to use a separate service to score the responses so that you can get a sense of correctness over time as models change etc. |
public const string GEN_AI_RESPONSE_MODEL_KEY = "gen_ai.response.model"; | ||
|
||
public const string GEN_AI_SYSTEM_KEY = "gen_ai.system"; | ||
public const string GEN_AI_SYSTEM_VALUE = "ollamasharp"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this property intends to be language-agnostic. Should it be just ollama
? The language-specific part is essentially inside OllamaSharp.ChatClient
.
Also if you're interested in contributing to otel, we'd be more than happy to add ollama to the list of systems in https://github.com/open-telemetry/semantic-conventions/blob/v1.30.0/docs/gen-ai/gen-ai-spans.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public const string GEN_AI_SYSTEM_VALUE = "ollamasharp"; | |
public const string GEN_AI_SYSTEM_VALUE = "ollama"; | |
That makes sense, I can change it to be just ollama
.
If this does end up being implemented, then I'll add it to that list.
public const string GEN_AI_REQUEST_PRESENCE_PENALTY_KEY = "gen_ai.request.presence_penalty"; | ||
public const string GEN_AI_REQUEST_STOP_SEQUENCES_KEY = "gen_ai.request.stop_sequences"; | ||
|
||
public const string GEN_AI_PROMPT_KEY = "gen_ai.prompt"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we've reworked OTel semconv a bit and we no longer support this way of reporting prompts and completions.
Check out https://github.com/open-telemetry/semantic-conventions/blob/v1.30.0/docs/gen-ai/gen-ai-events.md
You can find a creative way to report those new events in
https://github.com/dotnet/extensions/blob/main/src/Libraries/Microsoft.Extensions.AI/ChatCompletion/OpenTelemetryChatClient.cs (and it should get easier over time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll review that implementation (I was going off the OpenAI one)
Fixes #185
This is heavily borrowed from the OpenAI implementation.
It focuses on tracing through the Chat/Chat Streaming part of the Ollama API, adding an opt-in activity source. If the feature isn't enabled (it's not by default) then the method calls are essentially noops.
Here's what it looks like in the Aspire dashboard. You can see the metrics in the trace from the request and response.
Included docs on how to use it.