-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement "best case" and "worst case" for emission_profile()
on product-level
#242
Comments
Update @maurolepore please note that this issue refers to all indicators. So to the Please keep this still in the backlog and we will also have further disucssions once we decide to focus on this again. |
For emission_share_wc find this reprex: library(dplyr)
#>
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#>
#> filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
# Sample data
data <- tribble(
~companies_id, ~company_name, ~country, ~emission_share_ew, ~emission_category, ~benchmark,
"040-tegelzetter_00000005336021-634273001", "040 tegelzetter", "netherlands", 0, "high", "all",
"040-tegelzetter_00000005336021-634273001", "040 tegelzetter", "netherlands", 1, "medium", "all",
"040-tegelzetter_00000005336021-634273001", "040 tegelzetter", "netherlands", 0, "low", "all",
"a-beverwijk-en-zonen-bv_nld009707-00101", "a. beverwijk en zonen b.v.", "netherlands", 0.5, "high", "all",
"a-beverwijk-en-zonen-bv_nld009707-00101", "a. beverwijk en zonen b.v.", "netherlands", 0, "medium", "all",
"a-beverwijk-en-zonen-bv_nld009707-00101", "a. beverwijk en zonen b.v.", "netherlands", 0.5, "low", "all"
)
# Automatically calculate emission_share_wc
add_worst_case_emissions <- function(data) {
risk_order <- c("low", "medium", "high")
grouped_data <- data %>%
group_by(benchmark, companies_id, company_name) %>%
mutate(
max_risk_category = risk_order[max(match(emission_category[emission_share_ew > 0], risk_order))],
emission_share_wc = ifelse(emission_category == max_risk_category & emission_share_ew > 0,
1, 0)
) %>%
ungroup()
return(grouped_data)
}
result <- add_worst_case_emissions(data)
print(result)
#> # A tibble: 6 × 8
#> companies_id company_name country emission_share_ew emission_category
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 040-tegelzetter_0000… 040 tegelze… nether… 0 high
#> 2 040-tegelzetter_0000… 040 tegelze… nether… 1 medium
#> 3 040-tegelzetter_0000… 040 tegelze… nether… 0 low
#> 4 a-beverwijk-en-zonen… a. beverwij… nether… 0.5 high
#> 5 a-beverwijk-en-zonen… a. beverwij… nether… 0 medium
#> 6 a-beverwijk-en-zonen… a. beverwij… nether… 0.5 low
#> # ℹ 3 more variables: benchmark <chr>, max_risk_category <chr>,
#> # emission_share_wc <dbl> Created on 2024-01-16 with reprex v2.0.2 For emission_share_bc find this reprex: #for the best case
library(dplyr)
#>
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#>
#> filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
# Sample data
data <- tribble(
~companies_id, ~company_name, ~country, ~emission_share_ew, ~emission_category, ~benchmark,
"040-tegelzetter_00000005336021-634273001", "040 tegelzetter", "netherlands", 0, "high", "all",
"040-tegelzetter_00000005336021-634273001", "040 tegelzetter", "netherlands", 1, "medium", "all",
"040-tegelzetter_00000005336021-634273001", "040 tegelzetter", "netherlands", 0, "low", "all",
"a-beverwijk-en-zonen-bv_nld009707-00101", "a. beverwijk en zonen b.v.", "netherlands", 0.5, "high", "all",
"a-beverwijk-en-zonen-bv_nld009707-00101", "a. beverwijk en zonen b.v.", "netherlands", 0, "medium", "all",
"a-beverwijk-en-zonen-bv_nld009707-00101", "a. beverwijk en zonen b.v.", "netherlands", 0.5, "low", "all"
)
# Automatically calculate emission_share_wc
add_worst_case_emissions <- function(data) {
risk_order <- c("high", "medium", "low")
grouped_data <- data %>%
group_by(benchmark, companies_id, company_name) %>%
mutate(
max_risk_category = risk_order[max(match(emission_category[emission_share_ew > 0], risk_order))],
emission_share_wc = ifelse(emission_category == max_risk_category & emission_share_ew > 0,
1, 0)
) %>%
ungroup()
return(grouped_data)
}
result <- add_worst_case_emissions(data)
print(result)
#> # A tibble: 6 × 8
#> companies_id company_name country emission_share_ew emission_category
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 040-tegelzetter_0000… 040 tegelze… nether… 0 high
#> 2 040-tegelzetter_0000… 040 tegelze… nether… 1 medium
#> 3 040-tegelzetter_0000… 040 tegelze… nether… 0 low
#> 4 a-beverwijk-en-zonen… a. beverwij… nether… 0.5 high
#> 5 a-beverwijk-en-zonen… a. beverwij… nether… 0 medium
#> 6 a-beverwijk-en-zonen… a. beverwij… nether… 0.5 low
#> # ℹ 3 more variables: benchmark <chr>, max_risk_category <chr>,
#> # emission_share_wc <dbl> Created on 2024-01-16 with reprex v2.0.2 Is this making it a bit clearer? |
@AnneSchoenauer I have some issues in understanding the above reprexes.
Thanks for spending time on creating the reprex! :) |
Hi @kalashsinghal, yes you are right - this was a very very bad reprex. Please have a look if this works for you below. Please note that I didn't write any code but showed you in a sample data set how the result should look like. library(dplyr)
#>
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#>
#> filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tibble)
# Create the initial dataset
companies_co2 <- tibble(
companies_id = rep("a", 13),
grouped_by = c("all", "isic_4digit", "tilt_sector", "unit", "unit_isic_4digit", "unit_tilt_sector", "NA",
"all", "isic_4digit", "tilt_sector", "unit", "unit_isic_4digit", "unit_tilt_sector"),
risk_category = c("high", NA, "high", "high", NA, "high", NA, "medium", "medium", "medium", "medium", "medium", "medium"),
profile_ranking = c(1, NA, 1, 1, NA, 1, NA, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5),
clustered = c(rep("a", 6), "b", rep("c", 6)),
activity_uuid_product_uuid = c(rep("a", 6), NA, rep("c", 6)),
co2_footprint = c(rep(1, 6), NA, rep(0.5, 6))
)
# Function to add amount of distinct products
add_amount_of_distinct_products <- function(data) {
products <- data %>%
group_by(companies_id) %>%
summarise(amount_of_distinct_products = n_distinct(clustered))
data <- data %>%
left_join(products, by = "companies_id")
return(data)
}
companies_co2 <- add_amount_of_distinct_products(companies_co2)
# You can directly use 'companies_co2' to further process or create new datasets as needed.
# Make sure to adjust the final dataset creation part as per your specific requirements. Created on 2024-02-13 with reprex v2.0.2 Linda used with financial data similiar functions. Here are the functions that she used:
You can find her work here in this zip folder: However, please note that she used the following toy dataset here which also has financial data. The logic however is the same. In the worst case scenario you only count the worst case products (so with the highest emission profile) and in the best case you only count the products with the lowest emission profile. Don't know how best to implement this in code. But let me know if there are questions. |
Hi @kalashsinghal , just FYI, @lindadelacombaz already worked with best case and worst case toy data a lot in tiltPlot and can support you with code or general knowledge if needed! |
Hi @AnneSchoenauer, What does the variable |
This is only important when it comes to credit data, so with financial data. So without credit data you can leave it like this. |
@AnneSchoenauer So then what should be the replacement of this expression |
Amount total is the total amount of the products of the company, and product count the amount of the products in a certain risk category. So for example if the company has 4 products in total and in the equal weight scenario 2 are in the high and 2 are in the low risk category, you would have 2/4 for low, and 2/4 for high. If it is the best case it is 4/4 for low and 4/4 for high. |
@maurolepore Output of this task is not required for Bundesbank data, however it is required to create toy dataset for financials . I have to delegate this work to you too. Thanks! |
@maurolepore see the file "weighing_loanbook (2).zip" in this comment #242 |
UPDATE: I read through the chain I noticed that over the months this issue and the understanding of the complexity changed. Therfore, here is now the final ticket that we need to look for. Starting point: emission_profile product_level. Tickets will follow for sector profile and transition risk as well as how to calculate the company level. There will also be edge cases with NAs which I would like to discuss if you say that you understand this ticket here. Task Breakdown:
For each product in a company, calculate the equal_weight by dividing one by the number of distinct products per company. This represents the relative weight of each product.
Please note that the dummy_best solution and the count solution was my solution on how I would implement the best_case and worst_case column. The dummy columns are not needed as well as the count columns for best case and worst case. However in the new outputs we should see amount_of_distinct_products, equal_weight, best_case and worst_case. Here is the reprex to the above: # Load necessary libraries
library(dplyr)
#>
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#>
#> filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#>
#> intersect, setdiff, setequal, union
# Example data
data <- tibble(
company_name = c("bruno", "bruno", "bruno", "mauro", "mauro", "mirja", "mirja", "pasant", "pasant", "peter", "peter", "peter peasant", "peter peasant", "tilman", "tilman", "tilman", "tilman", "tilman", "tilman"),
postcode = c(27568, 27568, 27568, 39221, 39221, 34117, 34117, 80337, 80337, 88131, 88131, 53773, 53773, 12043, 12043, 12043, 12043, 12043, 12043),
emission_profile = c("high", "high", "medium", "high", "high", "high", "high", "high", "low", "high", "medium", "high", "high", "high", "high", "high", "low", "low", "medium")
)
# Calculate the number of products per company
products <- data %>%
group_by(company_name) %>%
summarise(amount_of_distinct_products = n()) %>%
ungroup()
# Join this count back to the main data frame
data <- data %>%
left_join(products, by = "company_name") %>%
mutate(equal_weight = 1 / amount_of_distinct_products)
# Function to add case scenarios and calculate best_case and worst_case
add_case_scenarios <- function(data) {
risk_order <- c("low", "medium", "high")
data %>%
group_by(company_name) %>%
mutate(
max_risk_category = risk_order[max(match(emission_profile, risk_order))],
min_risk_category = risk_order[min(match(emission_profile, risk_order))],
count_best_case_products = sum(emission_profile == min_risk_category),
count_worst_case_products = sum(emission_profile == max_risk_category),
dummy_best = as.integer(emission_profile == min_risk_category),
dummy_worst = as.integer(emission_profile == max_risk_category),
best_case = dummy_best / count_best_case_products,
worst_case = dummy_worst / count_worst_case_products
) %>%
ungroup()
}
# Apply function
data <- add_case_scenarios(data)
# Print data to check the new columns
print(data)
#> # A tibble: 19 × 13
#> company_name postcode emission_profile amount_of_distinct_pro…¹ equal_weight
#> <chr> <dbl> <chr> <int> <dbl>
#> 1 bruno 27568 high 3 0.333
#> 2 bruno 27568 high 3 0.333
#> 3 bruno 27568 medium 3 0.333
#> 4 mauro 39221 high 2 0.5
#> 5 mauro 39221 high 2 0.5
#> 6 mirja 34117 high 2 0.5
#> 7 mirja 34117 high 2 0.5
#> 8 pasant 80337 high 2 0.5
#> 9 pasant 80337 low 2 0.5
#> 10 peter 88131 high 2 0.5
#> 11 peter 88131 medium 2 0.5
#> 12 peter peasant 53773 high 2 0.5
#> 13 peter peasant 53773 high 2 0.5
#> 14 tilman 12043 high 6 0.167
#> 15 tilman 12043 high 6 0.167
#> 16 tilman 12043 high 6 0.167
#> 17 tilman 12043 low 6 0.167
#> 18 tilman 12043 low 6 0.167
#> 19 tilman 12043 medium 6 0.167
#> # ℹ abbreviated name: ¹amount_of_distinct_products
#> # ℹ 8 more variables: max_risk_category <chr>, min_risk_category <chr>,
#> # count_best_case_products <int>, count_worst_case_products <int>,
#> # dummy_best <int>, dummy_worst <int>, best_case <dbl>, worst_case <dbl> Created on 2024-04-15 with reprex v2.0.2 |
Conclusion from Sprint 2024-04-23 |
@Tilmon please find here an XLS sheet (sheet 1), where I implemented the best case and worst case for the sector profile and transition risk score (@kalashsinghal please ignore this comment for now please). I included now the NAs for the ew but I excluded the NAs for best_case and worst_case. This could lead to the fact that the best_case are worse than the equal_weight. It would be great if we could discuss this pretty soon so that we are both aligned on this issue and that we can tell the tech team how to handle NAs :) I put it to discuss it on Wednesday! |
@AnneSchoenauer A single product have six benchmarks (like: "all", "tilt_sector", ..etc.). Each benchmark can have different risk categories. So, on the basis of which benchmark will we judge that the product matches the lowest risk category of the company? For example: Risk categories of a product with benchmark "all" is |
Hi @kalashsinghal we will do it for all benchmarks individually so before you do it you would need to group it for each benchmark. |
@kalashsinghal --> push already a draft work without expectations that it is clean but at least the latest work is already pushed. |
From our conversation during sprint planning I understand this issue moves from tiltIndicator to tiltIndicatorAfter. Moving it now ... |
emission_profile()
on product-level
@kalashsinghal @AnneSchoenauer adjusted the ticket name to reflect that it's only for product-level required |
Currently the tiltIndicators are calculated using only an equal-weight approach which assumes that the company produces products with low, medium, and high risk equally.
Anne shared code (here) what implements it. And I reflected on it here.
The text was updated successfully, but these errors were encountered: