Implement "best case" and "worst case" for `emission_profile()` on product-level #242

maurolepore · 2023-09-13T22:07:26Z

Currently the tiltIndicators are calculated using only an equal-weight approach which assumes that the company produces products with low, medium, and high risk equally.

Anne shared code (here) what implements it. And I reflected on it here.

AnneSchoenauer · 2023-09-28T15:53:31Z

Update

@maurolepore please note that this issue refers to all indicators. So to the emission profile, emission upstream profile, sector profile, sector upstream profile. You can find how I would implement it in this output table here always on the company level data for the emission profile and sector profile. For the upstream profiles I still need to clarify the methodological questions.

Please keep this still in the backlog and we will also have further disucssions once we decide to focus on this again.

AnneSchoenauer · 2024-01-16T10:37:25Z

For emission_share_wc find this reprex:

library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#> 
#>     filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

# Sample data
data <- tribble(
  ~companies_id, ~company_name, ~country, ~emission_share_ew, ~emission_category, ~benchmark,
  "040-tegelzetter_00000005336021-634273001", "040 tegelzetter", "netherlands", 0,   "high", "all",
  "040-tegelzetter_00000005336021-634273001", "040 tegelzetter", "netherlands", 1,   "medium", "all",
  "040-tegelzetter_00000005336021-634273001", "040 tegelzetter", "netherlands", 0,   "low", "all",
  "a-beverwijk-en-zonen-bv_nld009707-00101",  "a. beverwijk en zonen b.v.", "netherlands", 0.5, "high", "all",
  "a-beverwijk-en-zonen-bv_nld009707-00101",  "a. beverwijk en zonen b.v.", "netherlands", 0,   "medium", "all",
  "a-beverwijk-en-zonen-bv_nld009707-00101",  "a. beverwijk en zonen b.v.", "netherlands", 0.5, "low", "all"
)

# Automatically calculate emission_share_wc
add_worst_case_emissions <- function(data) {
  risk_order <- c("low", "medium", "high")
  
  grouped_data <- data %>%
    group_by(benchmark, companies_id, company_name) %>%
    mutate(
      max_risk_category = risk_order[max(match(emission_category[emission_share_ew > 0], risk_order))],
      emission_share_wc = ifelse(emission_category == max_risk_category & emission_share_ew > 0,
                                  1, 0)
    ) %>%
    ungroup()
  
  return(grouped_data)
}


result <- add_worst_case_emissions(data)
print(result)
#> # A tibble: 6 × 8
#>   companies_id          company_name country emission_share_ew emission_category
#>   <chr>                 <chr>        <chr>               <dbl> <chr>            
#> 1 040-tegelzetter_0000… 040 tegelze… nether…               0   high             
#> 2 040-tegelzetter_0000… 040 tegelze… nether…               1   medium           
#> 3 040-tegelzetter_0000… 040 tegelze… nether…               0   low              
#> 4 a-beverwijk-en-zonen… a. beverwij… nether…               0.5 high             
#> 5 a-beverwijk-en-zonen… a. beverwij… nether…               0   medium           
#> 6 a-beverwijk-en-zonen… a. beverwij… nether…               0.5 low              
#> # ℹ 3 more variables: benchmark <chr>, max_risk_category <chr>,
#> #   emission_share_wc <dbl>

^{Created on 2024-01-16 with reprex v2.0.2}

For emission_share_bc find this reprex:

#for the best case

library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#> 
#>     filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

# Sample data
data <- tribble(
  ~companies_id, ~company_name, ~country, ~emission_share_ew, ~emission_category, ~benchmark,
  "040-tegelzetter_00000005336021-634273001", "040 tegelzetter", "netherlands", 0,   "high", "all",
  "040-tegelzetter_00000005336021-634273001", "040 tegelzetter", "netherlands", 1,   "medium", "all",
  "040-tegelzetter_00000005336021-634273001", "040 tegelzetter", "netherlands", 0,   "low", "all",
  "a-beverwijk-en-zonen-bv_nld009707-00101",  "a. beverwijk en zonen b.v.", "netherlands", 0.5, "high", "all",
  "a-beverwijk-en-zonen-bv_nld009707-00101",  "a. beverwijk en zonen b.v.", "netherlands", 0,   "medium", "all",
  "a-beverwijk-en-zonen-bv_nld009707-00101",  "a. beverwijk en zonen b.v.", "netherlands", 0.5, "low", "all"
)

# Automatically calculate emission_share_wc
add_worst_case_emissions <- function(data) {
  risk_order <- c("high", "medium", "low")
  
  grouped_data <- data %>%
    group_by(benchmark, companies_id, company_name) %>%
    mutate(
      max_risk_category = risk_order[max(match(emission_category[emission_share_ew > 0], risk_order))],
      emission_share_wc = ifelse(emission_category == max_risk_category & emission_share_ew > 0,
                                  1, 0)
    ) %>%
    ungroup()
  
  return(grouped_data)
}


result <- add_worst_case_emissions(data)
print(result)
#> # A tibble: 6 × 8
#>   companies_id          company_name country emission_share_ew emission_category
#>   <chr>                 <chr>        <chr>               <dbl> <chr>            
#> 1 040-tegelzetter_0000… 040 tegelze… nether…               0   high             
#> 2 040-tegelzetter_0000… 040 tegelze… nether…               1   medium           
#> 3 040-tegelzetter_0000… 040 tegelze… nether…               0   low              
#> 4 a-beverwijk-en-zonen… a. beverwij… nether…               0.5 high             
#> 5 a-beverwijk-en-zonen… a. beverwij… nether…               0   medium           
#> 6 a-beverwijk-en-zonen… a. beverwij… nether…               0.5 low              
#> # ℹ 3 more variables: benchmark <chr>, max_risk_category <chr>,
#> #   emission_share_wc <dbl>

^{Created on 2024-01-16 with reprex v2.0.2}

Is this making it a bit clearer?

kalashsinghal · 2024-02-12T11:43:05Z

@AnneSchoenauer I have some issues in understanding the above reprexes.

You only added the function add_worst_case_emissions in both reprexes above. Please review above reprexes and add the add_best_case_emissions function in the reprex so that I can understand the methodology better.
We have emissions_share column in the company level output and not the emissions_share_ew. It seems like emissions_share_ew is the output column after applying the code. If yes, then please update the column names.
Please note that the input to the function add_worst_case_emissions is same as the output. Hence, its difficult to understand what the code is actually changing. Please use different input so that its easy to understand the code.

Thanks for spending time on creating the reprex! :)

AnneSchoenauer · 2024-02-13T10:53:41Z

Hi @kalashsinghal, yes you are right - this was a very very bad reprex. Please have a look if this works for you below. Please note that I didn't write any code but showed you in a sample data set how the result should look like.

library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#> 
#>     filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tibble)

# Create the initial dataset
companies_co2 <- tibble(
  companies_id = rep("a", 13),
  grouped_by = c("all", "isic_4digit", "tilt_sector", "unit", "unit_isic_4digit", "unit_tilt_sector", "NA", 
                 "all", "isic_4digit", "tilt_sector", "unit", "unit_isic_4digit", "unit_tilt_sector"),
  risk_category = c("high", NA, "high", "high", NA, "high", NA, "medium", "medium", "medium", "medium", "medium", "medium"),
  profile_ranking = c(1, NA, 1, 1, NA, 1, NA, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5),
  clustered = c(rep("a", 6), "b", rep("c", 6)),
  activity_uuid_product_uuid = c(rep("a", 6), NA, rep("c", 6)),
  co2_footprint = c(rep(1, 6), NA, rep(0.5, 6))
)

# Function to add amount of distinct products
add_amount_of_distinct_products <- function(data) {
  products <- data %>% 
    group_by(companies_id) %>% 
    summarise(amount_of_distinct_products = n_distinct(clustered))
  
  data <- data %>% 
    left_join(products, by = "companies_id")
  
  return(data)
}

companies_co2 <- add_amount_of_distinct_products(companies_co2)

# You can directly use 'companies_co2' to further process or create new datasets as needed.
# Make sure to adjust the final dataset creation part as per your specific requirements.

^{Created on 2024-02-13 with reprex v2.0.2}

Linda used with financial data similiar functions. Here are the functions that she used:

add_worst_case_finance <- function(data) {
  risk_order <- c("low", "medium", "high")
  
  grouped_data <- data %>%
    group_by(company_name) %>%
    mutate(
      max_risk_category = risk_order[max(match(pctr_risk_category, risk_order))],
      product_count = sum(pctr_risk_category == max_risk_category),
      worst_case_finance = ifelse(pctr_risk_category == max_risk_category,
                                  amount_total / product_count, 0)
    ) %>%
    ungroup() %>%
    select(-product_count, -max_risk_category)
  
  return(grouped_data)
}

add_best_case_finance <- function(data) {
  risk_order <- c("high", "medium", "low")
  
  grouped_data <- data |> 
    group_by(company_name) |> 
    mutate(
      max_risk_category = risk_order[max(match(pctr_risk_category, risk_order))],
      product_count = sum(pctr_risk_category == max_risk_category),
      best_case_finance = ifelse(pctr_risk_category == max_risk_category,
                                  amount_total / product_count, 0)
    ) |> 
    ungroup() |> 
    select(-product_count, -max_risk_category) 
  
  return(grouped_data)
}

You can find her work here in this zip folder:
weighing_loanbook (2).zip

However, please note that she used the following toy dataset here which also has financial data. The logic however is the same. In the worst case scenario you only count the worst case products (so with the highest emission profile) and in the best case you only count the products with the lowest emission profile.

Don't know how best to implement this in code. But let me know if there are questions.
Thanks!

Tilmon · 2024-02-20T15:23:37Z

Hi @kalashsinghal , just FYI, @lindadelacombaz already worked with best case and worst case toy data a lot in tiltPlot and can support you with code or general knowledge if needed!

kalashsinghal · 2024-02-26T09:44:39Z

Hi @AnneSchoenauer, What does the variable amount_total signifies in the above reprex?

AnneSchoenauer · 2024-02-26T12:39:59Z

This is only important when it comes to credit data, so with financial data. So without credit data you can leave it like this.

kalashsinghal · 2024-02-26T12:43:50Z

This is only important when it comes to credit data, so with financial data. So without credit data you can leave it like this.

@AnneSchoenauer So then what should be the replacement of this expression amount_total / product_count if I don't have to use the amount_total?

AnneSchoenauer · 2024-02-26T12:54:13Z

Amount total is the total amount of the products of the company, and product count the amount of the products in a certain risk category. So for example if the company has 4 products in total and in the equal weight scenario 2 are in the high and 2 are in the low risk category, you would have 2/4 for low, and 2/4 for high. If it is the best case it is 4/4 for low and 4/4 for high.

kalashsinghal · 2024-02-27T14:26:01Z

@maurolepore Output of this task is not required for Bundesbank data, however it is required to create toy dataset for financials . I have to delegate this work to you too. Thanks!

Tilmon · 2024-04-09T15:01:59Z

@maurolepore see the file "weighing_loanbook (2).zip" in this comment #242

AnneSchoenauer · 2024-04-15T16:20:35Z

UPDATE:

I read through the chain I noticed that over the months this issue and the understanding of the complexity changed. Therfore, here is now the final ticket that we need to look for.

Starting point: emission_profile product_level. Tickets will follow for sector profile and transition risk as well as how to calculate the company level. There will also be edge cases with NAs which I would like to discuss if you say that you understand this ticket here.

Task Breakdown:

Calculate Equal Weights:

For each product in a company, calculate the equal_weight by dividing one by the number of distinct products per company. This represents the relative weight of each product.

Determine Emission Risk Extremes:
Identify the lowest and highest emission profiles (e.g., "low", "medium", "high") for each company.
Calculate Best and Worst Case Scenarios:
Best Case: Compute the proportion of each product that matches the lowest emission profile in its company.
Worst Case: Compute the proportion of each product that matches the highest emission profile in its company.
Implement Additional Columns:
amount_of_distinct_products: count all ep_products
equal_weight: calculate 1/all ep_products
dummy_best: Binary column (1 or 0) indicating if the product's emission profile matches the company's lowest risk category.
dummy_worst: Binary column (1 or 0) indicating if the product's emission profile matches the company's highest risk category.
count_best_case_products: Count of all products that match the lowest emission profile within the company.
count_worst_case_products: Count of all products that match the highest emission profile within the company.
best_case: Calculate as dummy_best divided by count_best_case_products.
worst_case: Calculate as dummy_worst divided by count_worst_case_products.

Please note that the dummy_best solution and the count solution was my solution on how I would implement the best_case and worst_case column. The dummy columns are not needed as well as the count columns for best case and worst case. However in the new outputs we should see amount_of_distinct_products, equal_weight, best_case and worst_case.

Here is the reprex to the above:

# Load necessary libraries
library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#> 
#>     filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Example data
data <- tibble(
  company_name = c("bruno", "bruno", "bruno", "mauro", "mauro", "mirja", "mirja", "pasant", "pasant", "peter", "peter", "peter peasant", "peter peasant", "tilman", "tilman", "tilman", "tilman", "tilman", "tilman"),
  postcode = c(27568, 27568, 27568, 39221, 39221, 34117, 34117, 80337, 80337, 88131, 88131, 53773, 53773, 12043, 12043, 12043, 12043, 12043, 12043),
  emission_profile = c("high", "high", "medium", "high", "high", "high", "high", "high", "low", "high", "medium", "high", "high", "high", "high", "high", "low", "low", "medium")
)

# Calculate the number of products per company
products <- data %>% 
  group_by(company_name) %>% 
  summarise(amount_of_distinct_products = n()) %>%
  ungroup()

# Join this count back to the main data frame
data <- data %>%
  left_join(products, by = "company_name") %>%
  mutate(equal_weight = 1 / amount_of_distinct_products)

# Function to add case scenarios and calculate best_case and worst_case
add_case_scenarios <- function(data) {
  risk_order <- c("low", "medium", "high")
  
  data %>%
    group_by(company_name) %>%
    mutate(
      max_risk_category = risk_order[max(match(emission_profile, risk_order))],
      min_risk_category = risk_order[min(match(emission_profile, risk_order))],
      count_best_case_products = sum(emission_profile == min_risk_category),
      count_worst_case_products = sum(emission_profile == max_risk_category),
      dummy_best = as.integer(emission_profile == min_risk_category),
      dummy_worst = as.integer(emission_profile == max_risk_category),
      best_case = dummy_best / count_best_case_products,
      worst_case = dummy_worst / count_worst_case_products
    ) %>%
    ungroup()
}

# Apply function
data <- add_case_scenarios(data)

# Print data to check the new columns
print(data)
#> # A tibble: 19 × 13
#>    company_name  postcode emission_profile amount_of_distinct_pro…¹ equal_weight
#>    <chr>            <dbl> <chr>                               <int>        <dbl>
#>  1 bruno            27568 high                                    3        0.333
#>  2 bruno            27568 high                                    3        0.333
#>  3 bruno            27568 medium                                  3        0.333
#>  4 mauro            39221 high                                    2        0.5  
#>  5 mauro            39221 high                                    2        0.5  
#>  6 mirja            34117 high                                    2        0.5  
#>  7 mirja            34117 high                                    2        0.5  
#>  8 pasant           80337 high                                    2        0.5  
#>  9 pasant           80337 low                                     2        0.5  
#> 10 peter            88131 high                                    2        0.5  
#> 11 peter            88131 medium                                  2        0.5  
#> 12 peter peasant    53773 high                                    2        0.5  
#> 13 peter peasant    53773 high                                    2        0.5  
#> 14 tilman           12043 high                                    6        0.167
#> 15 tilman           12043 high                                    6        0.167
#> 16 tilman           12043 high                                    6        0.167
#> 17 tilman           12043 low                                     6        0.167
#> 18 tilman           12043 low                                     6        0.167
#> 19 tilman           12043 medium                                  6        0.167
#> # ℹ abbreviated name: ¹amount_of_distinct_products
#> # ℹ 8 more variables: max_risk_category <chr>, min_risk_category <chr>,
#> #   count_best_case_products <int>, count_worst_case_products <int>,
#> #   dummy_best <int>, dummy_worst <int>, best_case <dbl>, worst_case <dbl>

^{Created on 2024-04-15 with reprex v2.0.2}

Tilmon · 2024-04-23T14:49:22Z

Conclusion from Sprint 2024-04-23
@kalashsinghal to come up with first draft PR and share it with @maurolepore for review.

AnneSchoenauer · 2024-05-07T12:49:26Z

@Tilmon please find here an XLS sheet (sheet 1), where I implemented the best case and worst case for the sector profile and transition risk score (@kalashsinghal please ignore this comment for now please). I included now the NAs for the ew but I excluded the NAs for best_case and worst_case. This could lead to the fact that the best_case are worse than the equal_weight. It would be great if we could discuss this pretty soon so that we are both aligned on this issue and that we can tell the tech team how to handle NAs :) I put it to discuss it on Wednesday!

kalashsinghal · 2024-05-08T06:50:21Z

count_best_case_products: Count of all products that match the lowest emission profile within the company.
count_worst_case_products: Count of all products that match the highest emission profile within the company.

@AnneSchoenauer A single product have six benchmarks (like: "all", "tilt_sector", ..etc.). Each benchmark can have different risk categories. So, on the basis of which benchmark will we judge that the product matches the lowest risk category of the company? For example:

Risk categories of a product with benchmark "all" is low and for benchmark "tilt_sector" is high, and company's lowest risk category is high, then for benchmark "all", the product doesn't match to the company's high risk category, but for benchmark "tilt_sector" it does match. Is this an issue? or am i missing something?

AnneSchoenauer · 2024-05-08T06:52:30Z

Hi @kalashsinghal we will do it for all benchmarks individually so before you do it you would need to group it for each benchmark.

AnneSchoenauer · 2024-05-08T14:32:12Z

@kalashsinghal --> push already a draft work without expectations that it is clean but at least the latest work is already pushed.

maurolepore · 2024-05-08T16:18:32Z

From our conversation during sprint planning I understand this issue moves from tiltIndicator to tiltIndicatorAfter.

Moving it now ...

Tilmon · 2024-06-04T12:14:50Z

@kalashsinghal @AnneSchoenauer adjusted the ticket name to reflect that it's only for product-level required

maurolepore self-assigned this Sep 13, 2023

maurolepore changed the title ~~Add the ability to calculate indicators using "best case", "worst case", and "equal weight" (current default)~~ Implement "best case", "worst case" Sep 13, 2023

maurolepore changed the title ~~Implement "best case", "worst case"~~ Implement "best case" and "worst case" Sep 13, 2023

AnneSchoenauer added the enhancement New feature or request label Jan 4, 2024

Tilmon assigned kalashsinghal and unassigned maurolepore Feb 6, 2024

kalashsinghal assigned maurolepore and unassigned kalashsinghal Feb 27, 2024

Tilmon modified the milestones: Enhancement of tilt methodology, Database launch Apr 16, 2024

Tilmon assigned kalashsinghal and maurolepore and unassigned maurolepore Apr 23, 2024

kalashsinghal mentioned this issue May 8, 2024

Creates "best case" and "worst case" for product level output of emission profile indicator 2DegreesInvesting/tiltIndicator#778

Closed

11 tasks

AnneSchoenauer unassigned maurolepore May 8, 2024

maurolepore transferred this issue from 2DegreesInvesting/tiltIndicator May 8, 2024

kalashsinghal mentioned this issue May 10, 2024

Creates amount_of_distinct_products, equal_weight, best case, and worst case for product level output of emission profile indicator #243

Merged

11 tasks

Tilmon mentioned this issue May 14, 2024

Generate all tilt output data after completion of webtool-relevant tickets 2DegreesInvesting/tiltIndicator#783

Closed

4 tasks

maurolepore mentioned this issue May 21, 2024

Refactor #242 (and merges into it) #246

Merged

This was referenced May 21, 2024

Implement "best case" and "worst case" for sector_profile() on product-level #247

Closed

Implement "best case" and "worst case" for score_transition_risk() on product-level #248

Closed

Tilmon changed the title ~~Implement "best case" and "worst case"~~ Implement "best case" and "worst case" for emission_profile() on product-level Jun 4, 2024

Tilmon closed this as completed Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement "best case" and "worst case" for `emission_profile()` on product-level #242

Implement "best case" and "worst case" for `emission_profile()` on product-level #242

maurolepore commented Sep 13, 2023 •

edited

Loading

AnneSchoenauer commented Sep 28, 2023 •

edited

Loading

AnneSchoenauer commented Jan 16, 2024

kalashsinghal commented Feb 12, 2024

AnneSchoenauer commented Feb 13, 2024

Tilmon commented Feb 20, 2024

kalashsinghal commented Feb 26, 2024

AnneSchoenauer commented Feb 26, 2024

kalashsinghal commented Feb 26, 2024

AnneSchoenauer commented Feb 26, 2024

kalashsinghal commented Feb 27, 2024

Tilmon commented Apr 9, 2024 •

edited

Loading

AnneSchoenauer commented Apr 15, 2024

Tilmon commented Apr 23, 2024

AnneSchoenauer commented May 7, 2024

kalashsinghal commented May 8, 2024

AnneSchoenauer commented May 8, 2024

AnneSchoenauer commented May 8, 2024

maurolepore commented May 8, 2024

Tilmon commented Jun 4, 2024

Implement "best case" and "worst case" for emission_profile() on product-level #242

Implement "best case" and "worst case" for emission_profile() on product-level #242

Comments

maurolepore commented Sep 13, 2023 • edited Loading

AnneSchoenauer commented Sep 28, 2023 • edited Loading

AnneSchoenauer commented Jan 16, 2024

kalashsinghal commented Feb 12, 2024

AnneSchoenauer commented Feb 13, 2024

Tilmon commented Feb 20, 2024

kalashsinghal commented Feb 26, 2024

AnneSchoenauer commented Feb 26, 2024

kalashsinghal commented Feb 26, 2024

AnneSchoenauer commented Feb 26, 2024

kalashsinghal commented Feb 27, 2024

Tilmon commented Apr 9, 2024 • edited Loading

AnneSchoenauer commented Apr 15, 2024

Tilmon commented Apr 23, 2024

AnneSchoenauer commented May 7, 2024

kalashsinghal commented May 8, 2024

AnneSchoenauer commented May 8, 2024

AnneSchoenauer commented May 8, 2024

maurolepore commented May 8, 2024

Tilmon commented Jun 4, 2024

Implement "best case" and "worst case" for `emission_profile()` on product-level #242

Implement "best case" and "worst case" for `emission_profile()` on product-level #242

maurolepore commented Sep 13, 2023 •

edited

Loading

AnneSchoenauer commented Sep 28, 2023 •

edited

Loading

Tilmon commented Apr 9, 2024 •

edited

Loading