-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jaeger-v2] Add remotesampling
extension
#5389
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Pushkar Mishra <[email protected]>
Signed-off-by: Pushkar Mishra <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main open-telemetry/opentelemetry-collector-contrib#5389 +/- ##
===========================================
- Coverage 94.54% 41.12% -53.42%
===========================================
Files 346 171 -175
Lines 16947 9091 -7856
===========================================
- Hits 16022 3739 -12283
- Misses 724 5018 +4294
- Partials 201 334 +133
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Pushkar Mishra <[email protected]>
Signed-off-by: Pushkar Mishra <[email protected]>
Signed-off-by: Pushkar Mishra <[email protected]>
Signed-off-by: Pushkar Mishra <[email protected]>
I am thinking of this approach.:
That's how it works in the current Jaeger collector, but we are serving this API in the collector server. Here, we have to create a new one. suggestions |
One more question: How can I run hotrod with jaeger-v2 binary? |
Signed-off-by: Pushkar Mishra <[email protected]>
We need a
We also need an
All of this should require very minimal implementation, primarily just wiring up the existing components already implemented in |
The existing component (from jaeger v1) are in blue: flowchart LR
Receiver --> AdaptiveSamplingProcessor --> BatchProcessor --> Exporter
Exporter -->|"(1) get storage"| JaegerStorageExension
Exporter -->|"(2) write trace"| TraceStorage
AdaptiveSamplingProcessor -->|"getStorage()"| StorageConfig
OTEL_SDK[OTEL
SDK]
OTEL_SDK -->|"(1) GET /sampling"| HTTP_endpoint
HTTP_endpoint -->|"(2) getStrategy()"| StrategiesProvider
style HTTP_endpoint fill:blue,color:white
subgraph Jaeger Collector
Receiver
BatchProcessor[Batch
Processor]
Exporter
TraceStorage[(Trace
Storage)]
AdaptiveSamplingProcessor[Adaptive
Sampling
Processor]
AdaptiveSamplingProcessorV1[Adaptive
Sampling
Processor_v1]
style AdaptiveSamplingProcessorV1 fill:blue,color:white
AdaptiveSamplingProcessor -->|"[]*model.Span"| AdaptiveSamplingProcessorV1
AdaptiveSamplingProcessorV1 ---|use| SamplingStorage
subgraph JaegerStorageExension[Jaeger Storage Exension]
Storage[[Storage
Config]]
end
subgraph RemoteSamplingExtension[Remote Sampling Extension]
StrategiesProvider -->|"(3b) getStrategy()"| AdaptiveProvider
StrategiesProvider -->|"(3a) getStrategy()"| FileProvider
FileProvider --> FileConfig
AdaptiveProvider --> StorageConfig
HTTP_endpoint[HTTP
endpoint]
StrategiesProvider[Strategies
Provider]
FileProvider[File
Provider]
AdaptiveProvider[Adaptive
Provider]
style StrategiesProvider fill:blue,color:white
style FileProvider fill:blue,color:white
style AdaptiveProvider fill:blue,color:white
subgraph Config
FileConfig[[File Config]]
StorageConfig[[Storage Config]]
end
StorageConfig --- SamplingStorage
SamplingStorage[(Sampling
Storage)]
style SamplingStorage fill:blue,color:white
end
end
|
remotesampling
extension
…acing#5382) This PR started as a dependabot upgrade of OTEL Collector dependencies, but the [tests were failing](https://github.com/jaegertracing/jaeger/actions/runs/8854720626/job/24318352528?pr=5382). The main change here is adding a ping from `e2eInitialize` to the jaeger-v2 binary to make sure that it started before proceeding with tests (ideally OTEL Collector should have a healthcheck endpoint that signals when all components have been started successfully). Also added some better logging. --------- Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Yuri Shkuro <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Yuri Shkuro <[email protected]>
Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.15.10 to 0.15.11. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/anchore/sbom-action/releases">anchore/sbom-action's releases</a>.</em></p> <blockquote> <h2>v0.15.11</h2> <h2>Changes in v0.15.11</h2> <ul> <li>chore(deps): update Syft to v1.3.0 (<a href="https://redirect.github.com/anchore/sbom-action/issues/456">#456</a>) [<a href="https://github.com/anchore-actions-token-generator">anchore-actions-token-generator</a>]</li> <li>chore: remove outdated snapshot workflow (<a href="https://redirect.github.com/anchore/sbom-action/issues/457">#457</a>) [<a href="https://github.com/spiffcs">spiffcs</a>]</li> <li>fix: don't pass in a separate env. This makes it impossible to pass env vars via the action context to syft. (<a href="https://redirect.github.com/anchore/sbom-action/issues/455">#455</a>) [<a href="https://github.com/iNoahNothing">iNoahNothing</a>]</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/anchore/sbom-action/commit/7ccf588e3cf3cc2611714c2eeae48550fbc17552"><code>7ccf588</code></a> chore(deps): update Syft to v1.3.0 (<a href="https://redirect.github.com/anchore/sbom-action/issues/456">#456</a>)</li> <li><a href="https://github.com/anchore/sbom-action/commit/7f33cf5b409c25dd63ce12b34f585feaed60b5bf"><code>7f33cf5</code></a> chore: remove outdated snapshot workflow (<a href="https://redirect.github.com/anchore/sbom-action/issues/457">#457</a>)</li> <li><a href="https://github.com/anchore/sbom-action/commit/04a486a88617c017d26e11a4d30b3cd198f44824"><code>04a486a</code></a> fix: extend existing environment when invoking syft instead of creating a new...</li> <li>See full diff in <a href="https://github.com/anchore/sbom-action/compare/ab5d7b5f48981941c4c5d6bf33aeb98fe3bae38c...7ccf588e3cf3cc2611714c2eeae48550fbc17552">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=anchore/sbom-action&package-manager=github_actions&previous-version=0.15.10&new-version=0.15.11)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Description of the changes - Use consistent naming for config files ## How was this change tested? - CI Signed-off-by: Yuri Shkuro <[email protected]>
…gertracing#5399) ## Which problem is this PR solving? - Part of jaegertracing#5334 ## Description of the changes - An API design of Storage V2 spanstore and its proto file. - For the detailed discussion and how we arrived to this design, please take a look at https://docs.google.com/document/d/1IvUcDsdRxMPK-xTUE32w3NSAGTkUcmnDQttN6ijUnWs/edit?usp=sharing ## How was this change tested? - This PR hasn't been tested yet since it only contains interfaces and no actual implementation to be tested. ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [ ] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `yarn lint` and `yarn test` --------- Signed-off-by: James Ryans <[email protected]>
…5401) Bumps google.golang.org/protobuf from 1.33.0 to 1.34.0. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=google.golang.org/protobuf&package-manager=go_modules&previous-version=1.33.0&new-version=1.34.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Description of the changes - This is the first jaeger release where Jaeger UI changes are included in the release notes and where we have version parity between the two repos. Signed-off-by: Albert Teoh <[email protected]> Co-authored-by: Albert Teoh <[email protected]>
## Which problem is this PR solving? - part of jaegertracing#5345 ## Description of the changes - Added Purge method for ES/OS - optimized integration test for es/os storage ## How was this change tested? - `STORAGE=elasticsearch make storage-integration-test` ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `yarn lint` and `yarn test` --------- Signed-off-by: Pushkar Mishra <[email protected]>
## Which problem is this PR solving? - The cache was initialized with incorrect parameter, and when I looked around it wasn't even used anywhere ## Description of the changes - delete indexCache and indexCacheTTL argument ## How was this change tested? ``` $ go test ./plugin/storage/es/spanstore ok github.com/jaegertracing/jaeger/plugin/storage/es/spanstore 0.471s ``` Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Pushkar Mishra <[email protected]>
Signed-off-by: Pushkar Mishra <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, in the right direction
cmd/jaeger/internal/extension/remotesampling/testdata/strategy.json
Outdated
Show resolved
Hide resolved
Signed-off-by: Pushkar Mishra <[email protected]>
Signed-off-by: Pushkar Mishra <[email protected]>
## Which problem is this PR solving? - part of #5389 ## Description of the changes - processor is co-located in strategy_store and aggregator. - In aggregator to run `generateStrategyResponses`, `runCalculationLoop`. - In strategy_store to run `loadProbabilities`, `runUpdateProbabilitiesLoop` ## How was this change tested? - `make test` ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `yarn lint` and `yarn test` --------- Signed-off-by: Pushkar Mishra <[email protected]> Signed-off-by: pushkarm029 <[email protected]>
Signed-off-by: pushkarm029 <[email protected]>
Signed-off-by: pushkarm029 <[email protected]>
Signed-off-by: pushkarm029 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do you plan to test it?
var componentType = component.MustNewType("remote_sampling") | ||
|
||
const ( | ||
HTTPPort = 12345 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have constants for ports defined in the /ports
package. Don't we already have one for sampling endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, currently we dont have any sampling endpoint in /ports
.
But in V1 Collector
: cmd/collector/app/flags/flags.go
:L193
: we are using ports.CollectorHTTP(14268)
as default
. I think we should use the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
14268 is reasonable, since its other function in v1 is receiving spans in Jaeger format and we don't really want to support that anymore in v2, only OTLP. But did you check which protocol is used by OTEL SDKs? I remember that Java SDK was using gRPC for sampling, not the HTTP. If nothing uses HTTP then we'd only want to stand up a gRPC endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when you do check OTEL SDKs please post links to the code.
|
||
ext.telemetry.Logger.Info("Starting HTTP server", zap.String("endpoint", ext.Cfg.HTTP.ServerConfig.Endpoint)) | ||
var hln net.Listener | ||
if hln, err = ext.Cfg.HTTP.ServerConfig.ToListener(ctx); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does OTEL collector ever allow multiple handlers to be attached to a single server? In Jaeger-v1 we often reused many ports for different services, e.g. for both submitting traces into backend and for retrieving sampling strategies.
Signed-off-by: pushkarm029 <[email protected]>
Signed-off-by: pushkarm029 <[email protected]>
# file: | ||
# path: ./cmd/jaeger/sampling-strategies.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should default to file, not to more complex adaptive?
## Which problem is this PR solving? - #5389 (comment) ## Description of the changes - Refactored `handleRootSpan` logic into a helper method in aggregator.go.
Adaptive AdaptiveConfig `mapstructure:"adaptive"` | ||
HTTP HTTPConfig `mapstructure:"http"` | ||
GRPC GRPCConfig `mapstructure:"grpc"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi I opened a ticket https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/33232
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we follow the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's do the same. We should define all 4 fields in the config as pointers, so that both File/Adaptive can be optional, and HTTP/GRPC can also be disabled independently.
We need to have a unit test for this behavior in the config, i.e. feed it different YAML strings and check that the corresponding sections are there or not there. I think we also have other configs that could use this pattern, I will take a look.
Lastly, the Validate function below should change to reflect this pointer change, the logic there can be different, e.g. instead of if cfg.File.Path != "" && cfg.Adaptive.StrategyStore != ""
we should check if the whole struct is defined or not.
I also think the sub-structs should have their own Validate method that can be called from the main Validate if the struct is not nil. For example, if someone defines file:
without any arguments we should raise an error that the config file name is missing (I think we can use struct tags to add these requirements).
} | ||
|
||
func (tp *traceProcessor) start(_ context.Context, host component.Host) error { | ||
extCfg, err := remotesampling.GetExtensionConfig(host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right to me. Can you create a README with a diagram showing which components are running inside processor and extension? E.g. is aggregator needed for the extension? If not, why do you need to call extension to create the aggregator? I think extension only need to instantiate the storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for bit delay
Can you create a README with a diagram showing which components are running inside processor and extension?
yes
is aggregator needed for the extension?
no
why do you need to call extension to create the aggregator?
i am calling GetExtensionConfig
, to get these configs (to avoid config duplication) : InitialSamplingProbability
, AggregationBuckets
, LeaderLeaseRefreshInterval
, FollowerLeaseRefreshInterval
.
I think extension only need to instantiate the storage.
Yes, it instantiates a sampling store, loads probabilities from it, and then serves responses through HTTP and gRPC.
I think the confusion may arise due to this 'GetExtensionConfig' naming. Its only purpose is to share InitialSamplingProbability
, AggregationBuckets
, LeaderLeaseRefreshInterval
, and FollowerLeaseRefreshInterval'
with the processor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its only purpose is to share InitialSamplingProbability, AggregationBuckets, LeaderLeaseRefreshInterval, and FollowerLeaseRefreshInterval' with the processor.
Does the processor actually need these values directly or does it need an object that is already built with these settings like the leader/follower manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [Needed Directly] InitialSamplingProbability: used in postAggregator.calculateProbability()
- AggregationBuckets: used in sampling store initialization
- LeaderLeaseRefreshInterval: used in leader election
- FollowerLeaseRefreshInterval: used in leader election
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #5389 (comment)
That comment takes care of 3 out of 4 flags, leaving only InitialSamplingProbability
. That one I do not see being used by the query-side of adaptive sampling , only by the processor, so we can move the parameter there
plugin/sampling//strategystore/adaptive/processor.go:398: oldProbability := p.InitialSamplingProbability
plugin/sampling//strategystore/adaptive/processor.go:450: if FloatEquals(probability, p.InitialSamplingProbability) {
plugin/sampling//strategystore/adaptive/processor.go:466: return e.UsingAdaptive && !FloatEquals(e.Probability, p.InitialSamplingProbability)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix the DCO and keep it green
func GetSamplingStorage( | ||
samplingStorage string, | ||
host component.Host, | ||
opts adaptive.Options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be passing any arguments here aside from Host.
The remotesampling
extension needs storage and leader/follower manager to function. It should instantiate those during its Start method. It will use storage name and options to do so. The processor does not need to know any of that, all it needs is to get those two components from the extension, the storage and leader/follower.
The reason it needs Host is because this is a static method, so we can use the Host to locate remotesampling extension and ask it to provide these two components, similar to how jaegerquery extension asks jaegerstorage for the storage (except in this case we don't even need the storage name):
func (s *server) Start(ctx context.Context, host component.Host) error {
f, err := jaegerstorage.GetStorageFactory(s.config.TraceStoragePrimary, host)
## Which problem is this PR solving? - jaegertracing#5389 (comment) ## Description of the changes - Refactored `handleRootSpan` logic into a helper method in aggregator.go. Signed-off-by: Vamshi Maskuri <[email protected]>
Description of the changes
How was this change tested?
go run -tags=ui ./cmd/jaeger --config ./cmd/jaeger/config-badger.yaml
make test
Checklist
jaeger
:make lint test
jaeger-ui
:yarn lint
andyarn test