Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COST-4745] OCPGCP Network data processing SQL #5058

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

cgoodfred
Copy link
Contributor

@cgoodfred cgoodfred commented Apr 22, 2024

Jira Ticket

COST-4745

Description

This change will add ocp on azure network processing. This change does a few things:

Identifies Network records from the GCP bill that are associated with a specific Compute Instance that can be tied to an OCP Node
Separates the usage and cost for these records into a distinct row per day, one for inbound traffic, one for outbound traffic when we aggregate the gcp_openshift_daily records up
Filter out the networking records when we are grouping by namespace because these values cannot be attributed to a specific namespace/project (hence the Network unattributed project!)
Perform a new insert into the project daily summary table for the networking records grouped by OCP node
Back populate these records into the OCPUsage table adding a data transfer direction to the group by which has 3 options, IN, OUT, and NULL

NOTE: when GCP renamed Ingress to Data Transfer in, Egress was renamed to Data Transfer that sometimes has a conditional of out but sometimes does not. Based on my understanding of this GCP article, Ingress was simply renamed to Data Transfer In and any other data transfer is Egress/Outbound

Nise has been updated and the test customer yamls now include network in and out records.

Testing

  1. Using nise > 4.5.3, create GCP compute data that has networking SKUs defined for the same resource id as an OpenShift node. Something like
---
generators:
  - ComputeEngineGenerator:
      start_date: {{start_date}}
      end_date: {{end_date}}
      price: 2
      sku_id: CF4E-A0C7-E3BF
      usage.amount_in_pricing_units: 1
      usage.pricing_unit: hour
      currency: USD
      instance_type: m2-megamem-416
      location.region: australia-southeast1-a
      resource.name: projects/nise-populator/instances/gcp_compute1
      resource.global_name: //compute.googleapis.com/projects/nise-populator/zones/australia-southeast1-a/instances/3447398860992947181
      labels: [{"environment": "clyde", "app":"winter", "version":"green", "kubernetes-io-cluster-c32se93c-73z3-3s3d-cs23-d3245sj45349": "owned"}]
  - ComputeEngineGenerator:
      start_date: {{start_date}}
      end_date: {{end_date}}
      price: 2
      sku_id: BBF8-C07D-1DF4
      usage.amount_in_pricing_units: 50
      usage.pricing_unit: hour
      currency: USD
      instance_type: m2-megamem-416
      location.region: australia-southeast1-a
      resource.name: projects/nise-populator/instances/gcp_compute1
      resource.global_name: //compute.googleapis.com/projects/nise-populator/zones/australia-southeast1-a/instances/3447398860992947181
      labels: [{"environment": "clyde", "app":"winter", "version":"green", "kubernetes-io-cluster-c32se93c-73z3-3s3d-cs23-d3245sj45349": "owned"}]
  - ComputeEngineGenerator:
      start_date: 2024-05-01
      end_date: 2024-05-31
      price: 30
      sku_id: 9DE9-9092-B3BC
      usage.amount_in_pricing_units: 10
      usage.pricing_unit: hour
      currency: USD
      instance_type: m2-megamem-416
      location.region: australia-southeast1-a
      resource.name: projects/nise-populator/instances/gcp_compute1
      resource.global_name: //compute.googleapis.com/projects/nise-populator/zones/australia-southeast1-a/instances/3447398860992947181
      labels: [{"environment": "clyde", "app":"winter", "version":"green", "kubernetes-io-cluster-c32se93c-73z3-3s3d-cs23-d3245sj45349": "owned"}] 
  1. Create a source and load the OCP data
  2. Create a source and load the GCP data you just created
  3. Let summary run and check the OCP and OCP on GCP database records and verify the network records are visible and distinct with infrastructure_data_in_gigabytes or infrastructure_data_out_gigabytes filled in for each day and each Network unattributed project.
  4. Run a few SQL queries to verify the costs before and after OCPGCP summary line up.
    docker exec -it trino trino --server localhost:8080 --catalog hive --schema org1234567 --user admin --debug
trino:org1234567> SELECT sum(cost) as cost FROM gcp_openshift_daily WHERE month='05';
   cost   
----------
 306528.0 
(1 row)

trino:org1234567> select sum(unblended_cost) from reporting_ocpgcpcostlineitem_project_daily_summary WHERE month = '5';
  _col0   
----------
 306528.0 
(1 row)

trino:org1234567> SELECT sum(cost) as cost FROM gcp_openshift_daily WHERE lower(sku_description) LIKE '%data transfer%' AND month='05';
   cost   
----------
 297600.0 
(1 row)

trino:org1234567> SELECT sum(unblended_cost) as cost FROM reporting_ocpgcpcostlineitem_project_daily_summary WHERE data_transfer_direction IS NOT NULL AND month='05';
   cost   
----------
 297600.0 
(1 row)

trino:org1234567> select sum(unblended_cost) as cost, data_transfer_direction from reporting_ocpgcpcostlineitem_project_daily_summary WHERE data_transfer_direction IS NOT NULL AND month='05' GROUP BY data_transfer_direction;
   cost   | data_transfer_direction 
----------+-------------------------
 223200.0 | OUT                     
  74400.0 | IN                      
(2 rows)

trino:org1234567> select SUM(unblended_cost) from reporting_ocpgcpcostlineitem_project_daily_summary where data_transfer_direction IS NOT NULL AND month='05';
  _col0   
----------
 297600.0 
(1 row)

trino:org1234567> select usage_start, unblended_cost, infrastructure_data_in_gigabytes, infrastructure_data_out_gigabytes, usage_amount from postgres.org1234567.reporting_ocpgcpcostlineitem_project_daily_summary_p_2024_05 WHERE namespace = 'Network unattributed' ORDER BY usage_start;
 usage_start |    unblended_cost    | infrastructure_data_in_gigabytes | infrastructure_data_out_gigabytes |     usage_amount     
-------------+----------------------+----------------------------------+-----------------------------------+----------------------
 2024-05-01  | 7200.000000000000000 |                0.000000000000000 |               257.697599999999970 |  240.000000000000000 
 2024-05-01  | 2400.000000000000000 |             1288.487999999999800 |                 0.000000000000000 | 1200.000000000000000 
 2024-05-02  | 7200.000000000000000 |                0.000000000000000 |               257.697599999999970 |  240.000000000000000 
 2024-05-02  | 2400.000000000000000 |             1288.487999999999800 |                 0.000000000000000 | 1200.000000000000000 

Inbound math:
Cost: 2400 = 50 (usage) * 2 (rate) * 24 hours
Quantity: 1288.488 = 50 (usage) * 24 hours * 1.07374 (gibibyte to gigabyte conversion)
Outbound math:
Cost: 7200 = 30 (usage) * 10 (rate) * 24 hours
Quantity:257.6976 = 30 (usage) * 24 hours * 1.07374 (gibibyte to gigabyte conversion)

Release Notes

  • proposed release note
* [COST-4745](https://issues.redhat.com/browse/COST-4745) This PR will **result in a numbers change when looking at OpenShift or GCP filtered by OpenShift endpoints when grouped by project** as long as OpenShift Costs are coming from a GCP cloud source. 
* Previously the networking cost of the node was distributed amongst the projects on the node but now those networking costs are removed into a separate NEW project called `Network unattributed`.
* Example with numbers: 

- I have a node called `compute_1` and this node has 2 projects, `projectA` and `projectB` that each use 50% of the cluster leaving 0 unallocated costs.
- When I look at the costs for this node grouped by project today, `projectA` costs $15 and `projectB` costs $5 for a total of $20. 
- Of that $20, I know that $5 is networking costs. 
- After this change there will be 3 projects with costs for this node, `projectA`, `projectB`, and `Network unattributed`.
- The cost for `projectA` would now be $12.5, `projectB` would now be $2.5 and `Network unattributed` would be $5. 
- The new Network unattributed project is the networking costs that can be specifically tied to this node but not broken down at the project level. 

Copy link

codecov bot commented Apr 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.1%. Comparing base (1c4bf68) to head (c94ea8c).

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #5058     +/-   ##
=======================================
- Coverage   94.1%   94.1%   -0.0%     
=======================================
  Files        375     375             
  Lines      31191   31191             
  Branches    3731    3731             
=======================================
- Hits       29346   29343      -3     
- Misses      1174    1177      +3     
  Partials     671     671             

@cgoodfred cgoodfred added the gcp-smoke-tests pr_check will build the image and run gcp + ocp on gcp smoke tests label Jun 3, 2024
@cgoodfred cgoodfred self-assigned this Jun 3, 2024
@cgoodfred cgoodfred marked this pull request as ready for review June 3, 2024 15:56
@cgoodfred cgoodfred requested review from a team as code owners June 3, 2024 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gcp-smoke-tests pr_check will build the image and run gcp + ocp on gcp smoke tests smokes-required
Projects
None yet
4 participants