Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix metrics sorted/deduped bug AND extract common functionality for better code usage/consistency. #2097

Conversation

fraillt
Copy link
Contributor

@fraillt fraillt commented Sep 10, 2024

Fixes #
#2085 (comment)

Basically the issue was, that introduced in #1989.
It made success path super fast, by avoiding any copying/sorting/deduplicating of attributes, which is very cool,
BUT...
in order to achieve that in attribute set tracker it was inserted two records: original_list AND sorted_deduped_list.
Later, during aggregation, there was introduced deduplication mechanism, so that only one attribute set would be selected for specific tracker.
However, since rust uses randomness in hashing, the order in which these two attribute sets (original and deduped) is returned is random.
So the end result is that everytime application is restarted, MetricReader will get original_list OR sorted_dedup_list at random.

Changes

Bug fix

  • by introducing separate list for sorted_deduped attributes.
    This should also improve performance for collection stage as it only needs to go through this list of already sorted_deduped attributes.
  • added attributes_should_be_sorted_and_deduplicated test, to make sure this wouldn't happen again.

Other improvements

  • extracted common functionality used across all (except one) aggregators into few functions and classes/struts.
    • this reduces code duplication by 4x, and makes implementations more consistent.
    • all logic related to updating aggregates for specific attribute sets is more isolated an in one place.
  • several improvements related to AttributeSet
    • use faster hashing function rustc-hash.
    • improved sorting/deduping implementation.
    • sorted_deduped list uses AttributeSet as key, for faster hashing/lookup.
    • moved AttributeSet to it's own file, in order to better protect internal implementation to guarantee sorting&deduping.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@fraillt fraillt requested a review from a team September 10, 2024 14:52
@cijothomas
Copy link
Member

This looks like same as #2093?

@cijothomas
Copy link
Member

@fraillt Good work here! Just wondering if you hit the issue in a real application OR just that the prom tests started failing due to this, that led you to dig deep into this?

@fraillt
Copy link
Contributor Author

fraillt commented Sep 10, 2024

Indeed solves same issue as #2093,
I just added few extra improvements, but probably it makes sense to borrow idea of sorting/deduping only when assigning to DataPoints.

I don't use this version in production yet, but in production we have custom MetricReader that expects sorted attributes, so I would be caching it in prod as well :)

@cijothomas
Copy link
Member

I don't use this version in production yet, but in production we have custom MetricReader that expects sorted attributes, so I would be caching it in prod as well :)

Can you clarify if you expect sorted attributes or sorted+de-duped attributes? OTel spec ensures uniqueness (i.e no de-duplications), but not ordering.

My main question was - how did the instrumentation end up with duplicate attributes in the first place? That feels like an instrumentation bug to me to be fixed.

@fraillt
Copy link
Contributor Author

fraillt commented Sep 10, 2024

At the moment, our production setup is a bit custom...
We have custom exporter (MetricReader) that exports metrics in more concise way (compared to prometheus exporter).
So for this reason we need to output sorted+deduped attributes.
Our current implementation relies on fields to be sorted+deduped from Opentelemetry... so it's either works this way in opentelemetry, or we have some issues that we don't notice :))

@cijothomas
Copy link
Member

At the moment, our production setup is a bit custom... We have custom exporter (MetricReader) that exports metrics in more concise way (compared to prometheus exporter). So for this reason we need to output sorted+deduped attributes. Our current implementation relies on fields to be sorted+deduped from Opentelemetry... so it's either works this way in opentelemetry, or we have some issues that we don't notice :))

Got it. You need to do the sorting yourself, as that is not something guaranteed by OTel spec. It may work, but that is not a guarantee, so its best if your custom MetricReader does not rely on undocumented behavior.

@fraillt fraillt force-pushed the fix-issue-duplicates-attributes-are-not-removed branch from 804f325 to 89d270d Compare September 11, 2024 10:02
Copy link

codecov bot commented Sep 11, 2024

Codecov Report

Attention: Patch coverage is 96.96049% with 10 lines in your changes missing coverage. Please review.

Project coverage is 78.0%. Comparing base (5269660) to head (500c759).
Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...pentelemetry-sdk/src/metrics/internal/histogram.rs 95.5% 2 Missing ⚠️
...entelemetry-sdk/src/metrics/internal/last_value.rs 90.9% 2 Missing ⚠️
opentelemetry-sdk/src/metrics/internal/mod.rs 98.8% 2 Missing ⚠️
...emetry-sdk/src/metrics/internal/precomputed_sum.rs 93.1% 2 Missing ⚠️
opentelemetry-sdk/src/metrics/internal/sum.rs 92.8% 2 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #2097   +/-   ##
=====================================
  Coverage   78.0%   78.0%           
=====================================
  Files        121     122    +1     
  Lines      20936   20873   -63     
=====================================
- Hits       16332   16283   -49     
+ Misses      4604    4590   -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fraillt
Copy link
Contributor Author

fraillt commented Sep 11, 2024

Updated revision, with few notable changes:

  • grouped some functionality related to attribute set tracking in new classes: AttributeSetTracker, and LockFreeNoAttribsTracker.
  • introduced collect_data_points_reset and collect_data_points_readonly reusable functions that is responsible for collecting all data points. This reduced code duplication 5x and now all optimizations will be defined in one file.
  • borrowed ideas from Fix metrics dedup/sort bug #2093 and introduced separate sorted_deduped list for fast iteration at collection time.
  • restored back performance, by keeping unmodified attribs set in separate list.

Overall performance has increased, at least in few places:

  • in collecting phase, (as we don't need to check if attribute set was seen, and attributes are already sorted_deduped)
  • also performance in crease in
Counter_Overflow        time:   [165.23 ns 165.31 ns 165.38 ns]
                        change: [-74.110% -74.031% -73.967%] (p = 0.00 < 0.05)
                        Performance has improved.

/// This must implement [Hash], [PartialEq], and [Eq] so it may be used as
/// HashMap keys and other de-duplication methods.
#[derive(Clone, Default, Debug, PartialEq, Eq)]
pub(crate) struct AttributeSet(pub(crate) Vec<KeyValue>, u64);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not clear what is the reason why this is being changed? can you update the PR desc. with the reasoning.
if the goal is to get predictable attribute set in readers, then #2093 looks sufficient to achieve that, without having to touch the hot path code.

@fraillt fraillt force-pushed the fix-issue-duplicates-attributes-are-not-removed branch from 89d270d to 500c759 Compare September 11, 2024 19:30
@fraillt fraillt changed the title Fix issue where attribute set for DataPoint is either sorted/deduped or original at random. Fix metrics sorted/deduped bug AND extract common functionality for better code usage/consistency. Sep 11, 2024
@fraillt
Copy link
Contributor Author

fraillt commented Sep 13, 2024

I'll close this PR for few reasons:

@fraillt fraillt closed this Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants