Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement a Keptn metrics provider #1630

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

bacherfl
Copy link

Closes #1627

This PR adds the implementation of a keptn metrics provider

The main difference to other providers in Flagger is that this one is using a Kubernetes client to interact with Keptn resources (KeptnMetrics and Analyses). Therefore, in addition to the code in keptn.go, the ClusterRole needed to be adapted to be able to access the required Keptn resources.
The provider supports the use of the following resources:

  • KeptnMetric: This resource represents the current value of a certain metric. The metric is retrieved by Keptn from one of the following data sources: prometheus, thanos, cortex, dynatrace, datadog.
  • Analysis: This Keptn resource can be used to apply a more complex grading logic over a set of different metrics. More information about this concept can be found in the Keptn Docs

Comment on lines 694 to 695
query: |
keptnmetric/my-namespace/response-time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
query: |
keptnmetric/my-namespace/response-time
query: "keptnmetric/my-namespace/response-time"

I would make this a one-liner to suggest you can't have more than one object reference in there. I would also add an example for a real Istio metric like Istio 404 rate https://docs.flagger.app/usage/metrics#prometheus

@@ -18,6 +18,7 @@ package controller

import (
"fmt"
"k8s.io/client-go/rest"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sort the imports into groups, run make fmt

pkg/metrics/providers/keptn.go Show resolved Hide resolved
Copy link
Member

@aryan9600 aryan9600 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @bacherfl for this! i did a quick first pass, will take a second look soon.


// delete the created analysis at the end of the function
defer func() {
_ = k.client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of ignoring the error, we should log it.

}
return &KeptnProvider{
client: client,
analysisTimeout: 10 * time.Second,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has this been chosen randomly? how does this play with the Canary analysis's interval?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the maximum time the provider will wait for the result of a Keptn Analysis to be available - I added this to prevent the provider from blocking the flagger controller indefinitely in case the created Analysis cannot be completed.
I chose this value as this should be sufficient in most cases for the created Keptn Analysis to be reconciled by the Keptn Analysis controller, however I can also adjust this If you'd like, or if there are some guidelines on how long a provider might take to retrieve a value.

@bacherfl
Copy link
Author

thanks @bacherfl for this! i did a quick first pass, will take a second look soon.

Thanks for the review @aryan9600! I just addressed your comments and adapted the code accordingly - If there are some further questions feel free to reach out :)

Copy link
Member

@aryan9600 aryan9600 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took a second pass, its looking good, just had some minor nits.

can the docs be improved a bit more? a few suggestions:

  • the query field in MetricTemplate example mentions the timeframe and arguments as well
  • make it clearer that resource name is actually that of the AnalysisDefinition and not the to be created Analysis.
  • an example around Analysis, which shows a sample AnalysisDefinition and what the Analysis object generated by Flagger would look like

}

var analysisResource = schema.GroupVersionResource{
Group: "metrics.keptn.sh",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets define a constant for this as well

}
}
}
return 0, fmt.Errorf("could not retrieve KeptnMetric - no value found int resource %s/%s", queryObj.Namespace, queryObj.ResourceName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return 0, fmt.Errorf("could not retrieve KeptnMetric - no value found int resource %s/%s", queryObj.Namespace, queryObj.ResourceName)
return 0, fmt.Errorf("could not retrieve KeptnMetric - no value found in resource %s/%s", queryObj.Namespace, queryObj.ResourceName)

select {
case <-ctx.Done():
return 0, fmt.Errorf("encountered timeout while waiting for Keptn Analysis %s/%s to be finished", obj.Namespace, obj.ResourceName)
case <-time.After(500 * time.Millisecond):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case <-time.After(500 * time.Millisecond):
case <-time.After(time.Second):

as mentioned in the above comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Keptn Metrics Provider
3 participants