[Spike] [2h] Explore and understand 'Split Cost Allocation Data' on EKS #4465

yuvipanda · 2024-07-20T04:31:47Z

AWS has recently enabled better kubernetes integration for its cost data exports. We should explore it to see if it will serve our needs - I suspect it may.

I've enabled it for the openscapes cluster, so we will have data to work with shortly.

Things to investigate:

Setting up an AWS Athena DB with this information via cloudformation (https://docs.aws.amazon.com/cur/latest/userguide/use-athena-cf.html)
The Grafana AWS Athena datasource https://grafana.com/grafana/plugins/grafana-athena-datasource/

Things we wanna track

Thing	Trackable with split cost allocation?	Trackable with AWS tags (and multiple hubs sharing nodes)	Trackable with AWS tags (hubs having dedicated nodes)
EC2 memory / CPU cost	yes	no	yes
EC2 GPU	?	NA	yes
EC2 base disk	no	no	yes
EFS	no	yes	yes
Network Egress (via browser)	no	no	no
Network Egress (via user programmatically sending out data	no	no	yes
Persistent and Scratch Bucket S3 Use	NA	yes	yes
Requestor Pays S3 use	NA	no	no

Spike outcome

Based on my exploration here, and on what was determined to be the things that would be valuble to admins right now (per #4384), I've made the following choices:

We can use AWS Athena for these queries, so yay.
We can not use the split cost allocation feature, because it doesn't cover a couple of resources important to us (disk, primarily)
For clusters where we want to offer 'per hub cost tracking', this means each hub must be on its own tagged nodepool.

yuvipanda · 2024-07-22T22:41:03Z

I tried to follow https://docs.aws.amazon.com/cur/latest/userguide/use-athena-cf.html but couldn't find the .yml file with the cloudformation template. Also I had set up the object prefix to be raw/ and turns out the extra / meant there were two / in the object name.

I set up another export in the meantime.

yuvipanda · 2024-07-22T22:43:49Z

I can't do the manual athena attempt either, because https://docs.aws.amazon.com/cur/latest/userguide/create-manual-table.html is asking me for a .sql file I don't have in the export. I wonder if the extra / is related. I will take a look once the new export sets up again and delivers data.

I've spent 15min on this so far.

yuvipanda · 2024-07-22T23:51:27Z

Aaaah, looking at https://docs.aws.amazon.com/cur/latest/userguide/dataexports-processing.html, I see:

Currently, Data Exports doesn't provide the SQL file for setting up Athena to query your exports like Cost and Usage Reports (CUR) does.

yuvipanda · 2024-07-22T23:59:56Z

I've enabled this export now with Cost and Usage Reports 'legacy' page.

yuvipanda · 2024-07-23T20:51:29Z

I can run SQL queries with Athena now!

SELECT line_item_product_code,
sum(line_item_blended_cost) AS cost, month
FROM athenacurcfn_2i2c_cost_export."2i2c_cost_export"
WHERE year='2024'
GROUP BY  line_item_product_code, month
HAVING sum(line_item_blended_cost) > 0
ORDER BY  line_item_product_code;

which gives as output:

So that's great.

Next is to examine the split output columns to see if we can use those.

yuvipanda · 2024-07-23T22:13:32Z

I can verify that individual pod names actually do make it in here, as part of line_item_resource_id. It looks like arn:aws:eks:us-west-2:783616723547:pod/openscapeshub/prod/jupyter-<username>/591c2cc5-6d89-455f-9c90-c8ddced97357, which is exciting but out of scope for right now.

yuvipanda · 2024-07-23T23:08:31Z

Running this query to see what kind of costs get allocated:

SELECT  line_item_usage_type,  sum(split_line_item_split_cost) as "cost", resource_tags_aws_eks_namespace FROM "athenacurcfn_2i2c_cost_export"."2i2c_cost_export"  where split_line_item_split_cost >0.0 group by line_item_usage_type, resource_tags_aws_eks_namespace limit 100;

I see:

USW2-EKS-EC2-vCPU-Hours
USW2-EKS-EC2-GB-Hours

And unfortunately, only that. This means the following costs are unattributed:

Disks (for hub db disks, prometheus, for sure - not sure about instance base disks)
Network egress costs

While we could tag the hub db disks and prometheus, I'm not sure we can do the same for network base disks.

Not being able to tag network requests presents both a smaller and bigger challenge. Smaller because almost all our egress goes through the proxy pods and ingress pods anyway, so per-namespace networking would be kinda 'off' regardless (everything would get attributed to nginx-ingress). But bigger challenge because it's possible for this to get really expensive, and we need to be careful to make sure we can track this information.

yuvipanda · 2024-07-23T23:12:35Z

I've activated the tags kubernetes.io/created-for/pvc/name and kubernetes.io/created-for/pvc/namespace (set by the kubernetes automatic provisioner) to see if we can incorporate those into cost calculations.

I'll explore our networking situation, as well as 'requestor pays' situation

yuvipanda · 2024-07-23T23:22:15Z

For requestor_pays, we can look at line_item_resource_id to see what bucket the operations are against. This would allow us to account for charges against our buckets vs buckets elsewhere. However, it doesn't mention which entity made the request that caused this charge, so we can not really distinguish this per-hub.

yuvipanda · 2024-07-23T23:36:47Z

Based on my exploration here, and on what was determined to be the things that would be valuble to admins right now (per #4384), I've made the following choices:

We can use AWS Athena for these queries, so yay.
We can not use the split cost allocation feature, because it doesn't cover a couple of resources important to us (disk, primarily)
For clusters where we want to offer 'per hub cost tracking', this means each hub must be on its own tagged nodepool.

I'll proceed to refine more tasks based on this.

Looking at my time tracking, this has taken about 90minutes spread out over 3 days, which isn't so bad :)

yuvipanda mentioned this issue Jul 20, 2024

[EPIC] Support attributing costs to individual hubs automatically on Openscapes #4453

Closed

2 tasks

yuvipanda self-assigned this Jul 22, 2024

yuvipanda closed this as completed Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spike] [2h] Explore and understand 'Split Cost Allocation Data' on EKS #4465

[Spike] [2h] Explore and understand 'Split Cost Allocation Data' on EKS #4465

yuvipanda commented Jul 20, 2024 •

edited

Loading

yuvipanda commented Jul 22, 2024

yuvipanda commented Jul 22, 2024

yuvipanda commented Jul 22, 2024

yuvipanda commented Jul 22, 2024

yuvipanda commented Jul 23, 2024

yuvipanda commented Jul 23, 2024

yuvipanda commented Jul 23, 2024

yuvipanda commented Jul 23, 2024

yuvipanda commented Jul 23, 2024

yuvipanda commented Jul 23, 2024

[Spike] [2h] Explore and understand 'Split Cost Allocation Data' on EKS #4465

[Spike] [2h] Explore and understand 'Split Cost Allocation Data' on EKS #4465

Comments

yuvipanda commented Jul 20, 2024 • edited Loading

Things we wanna track

Spike outcome

yuvipanda commented Jul 22, 2024

yuvipanda commented Jul 22, 2024

yuvipanda commented Jul 22, 2024

yuvipanda commented Jul 22, 2024

yuvipanda commented Jul 23, 2024

yuvipanda commented Jul 23, 2024

yuvipanda commented Jul 23, 2024

yuvipanda commented Jul 23, 2024

yuvipanda commented Jul 23, 2024

yuvipanda commented Jul 23, 2024

yuvipanda commented Jul 20, 2024 •

edited

Loading