feat: 🦠 Metrics prototype for Rack instrumentation #1129

kaylareopelle · 2024-08-19T22:26:53Z

Here's a prototype to add the http.server.request.duration metric to Rack instrumentation.

The base gem looks for the presence of the opentelemetry-metrics-api, by checking to see if OpenTelemetry::Metrics is defined before it creates a meter. Then, the instrumentation checks for the :send_metrics configuration to be true. If both conditions are true, metrics will be generated. Guard clauses are used to determine whether the metrics-related code should run.

In order to test this with an application, you need to patch the OpenTelemetry::SDK::Configurator#metrics_configuration_hook to add a meter provider and metric reader before configuring the SDK. Here's an example that uses the periodic metric reader and the OTLP metrics exporter:

module MetricsPatch
  def metrics_configuration_hook
    OpenTelemetry.meter_provider = OpenTelemetry::SDK::Metrics::MeterProvider.new(resource: @resource)
    OpenTelemetry.meter_provider
      .add_metric_reader(
        OpenTelemetry::SDK::Metrics::Export::PeriodicMetricReader.new(
          exporter: OpenTelemetry::Exporter::OTLP::Metrics::MetricsExporter.new,
          export_interval_millis: 3000,
          export_timeout_millis: 10000))
  end
end

OpenTelemetry::SDK::Configurator.prepend(MetricsPatch)

OpenTelemetry::SDK.configure do |c|
    c.use 'OpenTelemetry::Instrumentation::Rack', { send_metrics: true }
end

The attributes assigned to the http.server.request.duration metric are drawn from the span attributes. When OTEL_SEMCONV_STABILITY_OPT_IN is ready, we should update the instrumentation to emit the stability-related attributes for both spans and metrics.

To use this with another instrumentation that installs Rack (ex. Sinatra), you'll need to update the gemspec for the instrumentation that installs Rack so that the opentelemetry-instrumentation-base and opentelemetry-instrumentation-rack dependencies aren't installed.

Closes #1127

Emit http.server.request.duration metric if the application has the metrics API installed and enables the feature via config

Otherwise it may round to zero

Like the Metrics ConfigurationPatch, include the MetricsPatch modules if Metrics is enabled/present

…lle/opentelemetry-ruby-contrib into metrics-alternative-prototype

kaylareopelle · 2024-09-13T00:26:10Z

instrumentation/rack/lib/opentelemetry/instrumentation/rack/middlewares/metrics_patch.rb

+            @http_server_request_duration_histogram ||= meter.create_histogram('http.server.request.duration', unit: 's', description: 'Duration of HTTP server requests.')
+          end
+
+          def record_http_server_request_duration_metric(span)


This method is very long because it's leaving all the options open for attribute names. The attribute names we currently use are not the ones marked as stable.

I think we could use OTEL_SEMCONV_STABILITY_OPT_IN as a way to determine what attributes to send. Or, we could only emit the metric if that env var is set to http, which would mean only the new attributes would be sent. I'd love feedback here.

If the metrics attribute only cares about HTTP-related attributes, can it emit metrics with the attributes that are based on the existing span.attributes? I'm thinking something like:

# Assume only one of url.scheme and http.scheme will appear span.attributes.select { |key, value| key.match(/http\..*/) || key.match(/url\..*/) }

I like where you're headed with this! There's a lot of duplication in the current code. However, there are some http.* and url.* attributes present on the span that should not be attached to metrics (ex. url.full is on spans, but not metrics).

I wonder if we could use a key-based allowlist to select things instead? Maybe add a constant with all the HTTP metrics keys, and we search for those?

Things might need to shift again when the OTEL_SEMCONV_STABILITY_OPT_IN variable is out in the world, but we can address that later.

Thanks for the updates. Yes, I like key-based allowlist HTTP_SERVER_REQUEST_DURATION_ATTRS_FROM_SPAN.
In the future, I think we can even have something like HTTP_SERVER_REQUEST_DURATION_ATTRS_FROM_SPAN_V1_2 (V1_2 = semantic version 1.2)

ooh, nice idea!

kaylareopelle · 2024-09-13T00:31:07Z

instrumentation/base/lib/opentelemetry/instrumentation.rb

@@ -7,6 +7,7 @@
 require 'opentelemetry'
 require 'opentelemetry-registry'
 require 'opentelemetry/instrumentation/base'
+require 'opentelemetry/instrumentation/metrics_patch' if defined?(OpenTelemetry::Metrics) # maybe also add Env var check?


We talked about requiring "two keys to be turned", to enable metrics on instrumentation:

Detect the metrics api is installed by the application (don't list it as a dependency)

Add a configuration option to send the metrics (turned off by default)

Since we do some of our setup in base, base checks to see if the library is installed. If it is installed, we require the patch. The config isn't available at this time, so we don't use it in this context. The instrumentation for a particular library, however, checks for both the installation of the API and that the config option is true.

Is this sufficient to protect non-metrics users from any potential problems? Are there other approaches we'd like to explore?

I think these two safeguards are enough for users who don’t want metrics in instrumentation or wish to create custom metrics themselves (e.g., include the metrics SDK but not use it in instrumentation).

Is the patch (e.g., prepend) only because metrics are still experimental features? I was thinking of including them in the base but not installing if OpenTelemetry::Metrics is missing (similar to line 25 in metrics_patch.rb).

Thank you! I think that could work! Yes, the patch is just because the metrics are experimental features. I thought this approach might feel "safer" (since all our instrumentation runs through base), but I don't know if truly provides a benefit.

It also tries to follow the configurator patches for metrics in the SDK:

https://github.com/open-telemetry/opentelemetry-ruby/blob/main/metrics_sdk/lib/opentelemetry/sdk/metrics/configuration_patch.rb

https://github.com/open-telemetry/opentelemetry-ruby/blob/main/sdk/lib/opentelemetry/sdk/configurator.rb#L150

I'm happy to cut out the patch and just put everything in base. I sorta tried that with the other prototype.

Whoops, forgot to follow up here. I've changed the structure to use conditions instead of the patches.

instrumentation/base/lib/opentelemetry/instrumentation/base.rb

instrumentation/rack/lib/opentelemetry/instrumentation/rack/instrumentation.rb

instrumentation/rack/lib/opentelemetry/instrumentation/rack/middlewares/event_handler.rb

kaylareopelle · 2024-10-04T00:10:25Z

@xuan-cao-swi - Thank you for the feedback! I've updated the prototype to use conditions instead of patches and to add the attributes from the spans using select instead of adding each to a hash individually. Please take another look when you have a chance!

arielvalentin · 2024-10-04T00:13:23Z

instrumentation/rack/lib/opentelemetry/instrumentation/rack/middlewares/event_handler.rb

+          end
+
+          def record_http_server_request_duration_metric(span)
+            return unless metrics_enabled?


I have a bunch of ideas and questions floating around in my head that could use your input being a metrics SDK expert.

It may result in a separate dispatch call, but have you considered creating a separate handler for metric generation? That would remove the need for having conditionals in the tracer middleware.

What if our user has an o11y 1.0 mindset and only wanted to generate metrics and not traces? Would that not mean the span would be non_recording and the metric would always generate invalid histogram entries?

Is there anything in the spec that states this should be generated from span timestamps?

Is there an alternative implementation that generates metrics in the span processor pipeline?

E.g. there is a a processor that implements on_finish generated metrics for server and client spans regardless of the instrumentation?

What about one that decorates the BatchSpanProcessor export loop and generates metrics in the bsp thread to minimize the metrics from being generated in the critical path of the users request?

Is there anything in the spec that states this should be generated from span timestamps?

I don't think there is spec about where we should generate metrics. afaik, metrics spec only talks about what the metrics should looks like and how to export them.

Is there an alternative implementation that generates metrics in the span processor pipeline?

To generate the metrics from span processor, then the processor need have meter and create the instrument in processor in on_start or on_finish, I am thinkhing something like this?

def initialize ... @meter = ::OpenTelemetry.meter_provider.meter('sample_meter') end def on_start ... @histogram = @meter.create_histogram('histogram_name', unit: 'ms', description: 'measure sample data') end def on_finish ... span_time = span.end_timestamp - span.start_timestamp @histogram.record(span_time, attributes: {}) end

Perhaps.

What are other language sigs doing?

Is what I'm suggesting seemingly out of touch with what is the consistent pattern across language implementations?

Love these questions! I have some research to do to follow-up, but here are my current thoughts.

The instrumentation is based on Python's approach with the aiohttp-server. The asgi instrumentation has an example of what we could do when OTEL_SEMCONV_STABILITY_OPT_IN comes around.

It may result in a separate dispatch call, but have you considered creating a separate handler for metric generation? That would remove the need for having conditionals in the tracer middleware.

I think we could take that approach! The examples I've seen so far tend to record the metric as part of the span creation, but we don't have to and that might be safer until metrics are stable.

What if our user has an o11y 1.0 mindset and only wanted to generate metrics and not traces? Would that not mean the span would be non_recording and the metric would always generate invalid histogram entries?

This is a great thing to consider. I think that'd be the case with this current implementation, and I'd like to make some tweaks for this scenario.

I don't see an option to create only metrics and not traces in the Python instrumentation, but perhaps this exists in other languages. I'll do a bit more digging. My read on the spec is that it should leverage the span if it's available, but if the span is not available, it should still record the metric. spec 

Is there anything in the spec that states this should be generated from span timestamps?

The spec says: "When this metric is reported alongside an HTTP server span, the metric value SHOULD be the same as the HTTP server span duration."

AFAIK, we don't save duration on a span, just a start and end timestamp, so I figured this would be the way to create a duration that's the same as the HTTP server span duration. I might be interpreting the spec incorrectly. We could get the start and end timestamps in an independent fashion. That may be closer to what Python does.

Is there an alternative implementation that generates metrics in the span processor pipeline?

I haven't seen this, and this isn't something I considered. I'll poke around other languages and see if I can find an example. I haven't looked too much at the BSP metrics, I'll take a look to get a better sense of how they work.

@arielvalentin - Thanks for your patience!

I spent some time looking at metrics instrumentation in other languages. The most common design I found was to create metrics within the same instrumentation file as spans, but to calculate the duration and collect the attributes in separate ways. It seems like metrics are stable in those languages, so this might not have always been the case.
When it comes to creating a separate handler for metrics, I don't feel strongly one way or the other. Would you prefer to keep them in separate handlers?

Here are some of the files I looked at. The links should be to the code that records the metric. In most cases, the same file also creates the histogram.

Python - aiohttp

Javascript - undici

.NET - owin

Go - net/http

For the span processor question, I didn't come across use of a span processor to collect metrics. Since span processors are part of the SDK and not the API, and instrumentation should only rely on the API this doesn't seem like quite the right fit. In addition, since spans and metrics should be independent of each other, I don't think we want a span processor to control the creation of another signal. What do you think?  

It looks like there are some features coming with the meter configurator that allow users to control what data is collected by instrumentation scope. This could help someone control what telemetry they want from a given instrumentation scope, and support that Olly 1.0 mindset of sending only metrics, but not traces. I'll look into prototyping the experimental spec and use that to control whether metrics and traces are sent. We could probably do this with configs in the instrumentation too.

So as a follow-up, I'd like to:

Separate metrics from traces in the instrumentation. The metrics data should not use the span directly for its attributes/duration.

See what it's like to make an event handler specifically for metrics

Prototype the experimental metrics configuration features to control whether metrics/traces are sent

Let me know if there's anything I missed! Thanks again for your feedback!

@arielvalentin @xuan-cao-swi - I have a prototype ready that creates metrics in a separate event handler: #1213

I decided to put it in a different PR to start to make it easier to compare the two approaches.

Next on my list is to prototype the metrics configuration work, but that PR will likely live in the core repo.

github-actions · 2024-11-23T01:59:41Z

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

kaylareopelle added 5 commits August 14, 2024 13:13

feat: Add experimental metrics to Rack

2e6b9ab

Emit http.server.request.duration metric if the application has the metrics API installed and enables the feature via config

chore: Remove puts

8338fc4

chore: Misc

b031ceb

fix: Make sure duration is a float

ed4a354

Otherwise it may round to zero

feat: Create MetricsPatch modules to hold metrics

a38aa87

Like the Metrics ConfigurationPatch, include the MetricsPatch modules if Metrics is enabled/present

kaylareopelle mentioned this pull request Aug 19, 2024

Determine the structure for Metrics instrumentation #1127

Open

kaylareopelle mentioned this pull request Sep 10, 2024

🔬 Metrics Prototype: Conditions everywhere edition #1128

Closed

Merge branch 'main' into metrics-alternative-prototype

951e0af

kaylareopelle changed the title ~~🦠 Metrics prototype: Prepend edition~~ WIP: 🦠 Metrics prototype: Prepend edition Sep 10, 2024

kaylareopelle changed the title ~~WIP: 🦠 Metrics prototype: Prepend edition~~ 🦠 Metrics prototype: Prepend edition Sep 10, 2024

kaylareopelle linked an issue Sep 20, 2024 that may be closed by this pull request

Determine the structure for Metrics instrumentation #1127

Open

kaylareopelle and others added 3 commits September 19, 2024 17:33

style: Update comments, add file header

92d9ef6

Merge branch 'metrics-alternative-prototype' of github.com:kaylareope…

8b4554f

…lle/opentelemetry-ruby-contrib into metrics-alternative-prototype

Merge branch 'main' into metrics-alternative-prototype

ff2fbdb

kaylareopelle changed the title ~~🦠 Metrics prototype: Prepend edition~~ feat: 🦠 Metrics prototype: prepend edition Sep 20, 2024

kaylareopelle marked this pull request as ready for review September 20, 2024 16:28

kaylareopelle requested review from fbogsany, mwear, robertlaurin, dazuma, ericmustin, arielvalentin, ahayworth, plantfansam, robbkidd, simi, xuan-cao-swi and a team as code owners September 20, 2024 16:28

kaylareopelle commented Sep 24, 2024

View reviewed changes

kaylareopelle self-assigned this Sep 27, 2024

kaylareopelle added 2 commits October 3, 2024 16:24

refactor: Switch from prepending to conditions

3eef72a

refactor: Assign attributes using select

258b587

kaylareopelle commented Oct 3, 2024

View reviewed changes

style: Spacing updates

ba9d06d

kaylareopelle changed the title ~~feat: 🦠 Metrics prototype: prepend edition~~ feat: 🦠 Metrics prototype for Rack instrumentation Oct 3, 2024

style: Rubocop

7f8ce16

arielvalentin reviewed Oct 4, 2024

View reviewed changes

kaylareopelle mentioned this pull request Oct 24, 2024

feat: Metrics event handler prototype #1213

Draft

github-actions bot added the stale Marks an issue/PR stale label Nov 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: 🦠 Metrics prototype for Rack instrumentation #1129

feat: 🦠 Metrics prototype for Rack instrumentation #1129

kaylareopelle commented Aug 19, 2024 •

edited

Loading

kaylareopelle Sep 13, 2024

xuan-cao-swi Sep 27, 2024

kaylareopelle Oct 2, 2024

xuan-cao-swi Oct 4, 2024

kaylareopelle Oct 7, 2024

kaylareopelle Sep 13, 2024

xuan-cao-swi Sep 27, 2024

kaylareopelle Oct 2, 2024

kaylareopelle Oct 7, 2024

kaylareopelle commented Oct 4, 2024

arielvalentin Oct 4, 2024

xuan-cao-swi Oct 4, 2024

arielvalentin Oct 5, 2024

kaylareopelle Oct 8, 2024

kaylareopelle Oct 18, 2024

kaylareopelle Oct 24, 2024

github-actions bot commented Nov 23, 2024

feat: 🦠 Metrics prototype for Rack instrumentation #1129

Are you sure you want to change the base?

feat: 🦠 Metrics prototype for Rack instrumentation #1129

Conversation

kaylareopelle commented Aug 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaylareopelle commented Oct 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Nov 23, 2024

kaylareopelle commented Aug 19, 2024 •

edited

Loading