Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rotating a node pool can result in broken graphs #35

Open
felder opened this issue Mar 29, 2022 · 2 comments
Open

Rotating a node pool can result in broken graphs #35

felder opened this issue Mar 29, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@felder
Copy link
Contributor

felder commented Mar 29, 2022

Bug description

After rotating a node pool, the "Running Users", "Memory Commitment %", and "CPU Commitment %" graphs fail to render because the promql queries backing them result in illegal many -> one relationships.

For example, with "Running Users":

sum(kube_pod_status_phase{phase="Running"} * on (namespace, pod) group_right() kube_pod_labels{label_app="jupyterhub", label_component="singleuser-server"}) by (namespace)

In order for that to work, kube_pod_status_phase needs to have unique rows when keyed on namespace and pod. Unfortunately, in this case due to the node pool rotation multiple rows with different "kubernetes_node" labels can be returned for the same namespace and pod.

Expected behaviour

Graphs render.

Actual behaviour

Graphs show an error.

How to reproduce

Not easily reproduced probably, but a user may encounter it if they rotate node pools for some reason.

@felder felder added the bug Something isn't working label Mar 29, 2022
@welcome
Copy link

welcome bot commented Mar 29, 2022

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@felder
Copy link
Contributor Author

felder commented Mar 29, 2022

The fix for this is use the group() aggregator in promql to ensure uniqueness.

For "Running Users" this modification to the promql does the trick:

sum(group(kube_pod_status_phase{phase="Running"}) by (pod,namespace) * on (namespace, pod) group_right() kube_pod_labels{label_app="jupyterhub", label_component="singleuser-server"}) by (namespace)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant