Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Bundesbank data (focus on Germany and exclude *_upstream()) #153

Closed
7 of 10 tasks
AnneSchoenauer opened this issue Jan 30, 2024 · 66 comments
Closed
7 of 10 tasks
Assignees

Comments

@AnneSchoenauer
Copy link

AnneSchoenauer commented Jan 30, 2024

We don’t need ... upstream profiles -- Anne #178 (comment)

We are only interested in german companies --Anne https://github.com/2DegreesInvesting/tiltOutput/issues/1#issuecomment-1991262963

Task list:

TODO AFTER DELIVERY


Hi @kalashsinghal,

Please find attached a Google Sheet which schows you which data I would need for the Bundesbank. I think it is easiest to start with the product information. Please note that Mauro still works on the lower_bound and higher_bound plus the NAs. However, I think you could already start however with creating the dataset and prepare them and then add the columns and row thereafter.

There is one new feature in the data namely having the averages of the products when aggregating it to the company level. It is the average_profile_rank, the average_reductions and the average_transition_risk_score. For this I will create some reprexes @kalashsinghal so that you know exactly on how to solve it. However, to create the reprex I would need your input @Tilmon. My problem is that if I multipy profile_rank and reduction targets I get the transition risk score for an individual product. If I now do this for each product I can calculate on company level the average profile rank and the average reduction targets of the company. Now the problem is if I multiple the average I get a different average transition risk score than I would get if I average the product based transition risk scores. Would be great if we could discuss this so that I can create a reprex for @kalashsinghal.

Thanks both of you!

@AnneSchoenauer
Copy link
Author

This issue is blocked by this issue here

@maurolepore
Copy link
Contributor

Wrong. Repo. I'm unsure if this is for tiltIndicatorBefore or tiltToyData. I'll tentatively transfer to tiltToyData but @kalashsinghal please discuss with @AnneSchoenauer to identify the best repo.

@maurolepore maurolepore transferred this issue from 2DegreesInvesting/tiltIndicator Feb 11, 2024
@kalashsinghal kalashsinghal transferred this issue from 2DegreesInvesting/tiltToyData Feb 12, 2024
@kalashsinghal
Copy link
Collaborator

kalashsinghal commented Feb 27, 2024

Dear @AnneSchoenauer @Tilmon @maurolepore

This script only creates the product level output for Bundesbank data:

library(readr)
library(dplyr)
devtools::load_all(".")

# Ensure the output is reproducible
set.seed(1)

options(readr.show_col_types = FALSE)
toy_emissions_profile_products_ecoinvent <- read_csv(toy_emissions_profile_products_ecoinvent())
toy_emissions_profile_any_companies <- read_csv(toy_emissions_profile_any_companies())
toy_sector_profile_any_scenarios <- read_csv(toy_sector_profile_any_scenarios())
toy_sector_profile_companies <- read_csv(toy_sector_profile_companies())
toy_europages_companies <- read_csv(toy_europages_companies())
toy_ecoinvent_activities <- read_csv(toy_ecoinvent_activities())
toy_ecoinvent_europages <- read_csv(toy_ecoinvent_europages())
toy_ecoinvent_inputs <- read_csv(toy_ecoinvent_inputs())
toy_isic_name <- read_csv(toy_isic_name())

co2_data <- read_csv("ecoinvent-v3.9.1.csv") |>
  select(c("activity_uuid_product_uuid", "ipcc_2021_climate_change_global_warming_potential_gwp100_kg_co2_eq"))
headcount <- read_csv("headcount.csv")

emissions_profile <- profile_emissions(
  companies = toy_emissions_profile_any_companies,
  co2 = toy_emissions_profile_products_ecoinvent,
  europages_companies = toy_europages_companies,
  ecoinvent_activities = toy_ecoinvent_activities,
  ecoinvent_europages = toy_ecoinvent_europages,
  isic = toy_isic_name)

emissions_profile_at_product_level <- unnest_product(emissions_profile)
emissions_profile_at_company_level <- unnest_company(emissions_profile)

sector_profile <- profile_sector(
  companies = toy_sector_profile_companies,
  scenarios = toy_sector_profile_any_scenarios,
  europages_companies = toy_europages_companies,
  ecoinvent_activities = toy_ecoinvent_activities,
  ecoinvent_europages = toy_ecoinvent_europages,
  isic = toy_isic_name)

sector_profile_at_product_level <- unnest_product(sector_profile)
sector_profile_at_company_level <- unnest_company(sector_profile)

transition_risk_score <- score_transition_risk(emissions_profile_at_product_level, sector_profile_at_product_level)
transition_risk_score_at_product_level <- unnest_product(transition_risk_score)
transition_risk_score_at_company_level <- unnest_company(transition_risk_score)

select_emissions_profile_at_product_level <- emissions_profile_at_product_level |>
  select(c("companies_id", "country", "main_activity", "ep_product", "activity_uuid_product_uuid", 
           "matched_activity_name", "matched_reference_product", "unit", "co2e_lower", 
           "co2e_upper", "emission_profile", "benchmark", "profile_ranking", "tilt_sector", 
           "tilt_subsector"))

select_sector_profile_at_product_level <- sector_profile_at_product_level |>
  select(c("companies_id", "ep_product", "activity_uuid_product_uuid", "sector_profile",
           "scenario", "year", "reduction_targets"))

select_transition_risk_score_at_product_level <- transition_risk_score_at_product_level |>
  select(c("companies_id", "ep_product", "activity_uuid_product_uuid", "benchmark_tr_score",
           "transition_risk_score"))

bundesbank_data_at_product_level <- select_emissions_profile_at_product_level |>
  left_join(select_sector_profile_at_product_level, by = c("companies_id", "ep_product", "activity_uuid_product_uuid")) |>
  mutate(benchmark_tr_score = paste(scenario, year, benchmark, sep = "_")) |>
  left_join(select_transition_risk_score_at_product_level, by = c("companies_id", "ep_product", "activity_uuid_product_uuid", "benchmark_tr_score")) |>
  left_join(co2_data, by = c("activity_uuid_product_uuid")) |>
  rename(co2_footprint = "ipcc_2021_climate_change_global_warming_potential_gwp100_kg_co2_eq") |>
  left_join(headcount, by = c("companies_id"))

bundesbank_data_at_product_level

Points to note:

  1. Please feel free to change the toy input data with the real data from this folder.
  2. Please use the ecoinvent co2 data (ecoinvent-v3.9.1.csv) from here.
  3. I didn't get the time to figure out which dataset is needed to get the employees_size column. Hence, I have to delegate this work to you guyz to identify its original source. However, I have included the code for the file which may require modification if column names are different in the original source.

I can't create a reprex of the output because I didn't have the dataset for employees_size. However, I have tested this code and it creates the right output :)

Mauro's notes:

  • Find headcount.csv here
  • The script above does not use the latest version of tiltOutput-v0.0.0.9002 -- which is 03-sector-profile.zip and 06-emissions-profile.zip. I assume this script is outdated and the real script indeed used the latest data.

@maurolepore
Copy link
Contributor

Thanks Kalash.

This also relates to 2DegreesInvesting/tiltIndicator#657 (comment)

Happy vacations.

@kalashsinghal
Copy link
Collaborator

@maurolepore Thanks for taking this task over! I would like to mention that company level output is not easy to create as it requires output column emission_category in a wide format from the original long format. Also, there is a new column average_co2, for which which you might need more insight from @AnneSchoenauer. Good luck with the task! :)

@maurolepore
Copy link
Contributor

BTW, following our long conversation about setting the seed, I edited your reprex above to ensure the output is reproducible.

@AnneSchoenauer
Copy link
Author

@maurolepore Do you need any support on this here?

With regard to employee_size. I think it is fine to also simple add the columns min_headcount and max_headcount. These variables should be included in the tiltData that you webscraped from europages. Please note that I think we still use the information in 2022 - so it would be great if the employee data matches with this point in time of when we webscraped the information. I changed it in this data sheet here: https://docs.google.com/spreadsheets/d/1Gsz2N2-gqrttAooYuNgWrKfjLyA73Uj4w0xm_wN7fp4/edit#gid=1500707263.

With regard to the Co2_footprint variable. We have this variable in our dataset on the product variable. However, this is licensed data. As I have the license it is fine that we share it with the Bundesbank as no one else has access to these datasets. Therefore, on hte product level it is the "standard" Co2 variable data point that you guys know. On the company level (sheet 2 in the Bundesbank data file) is the average Co2 variable of all products. I think Kalash wrote a function for the profile ranking and sector targets which is called profile_ranking_avg and reduction_targets_avg - what we would need in the same logic is a co2_avg on the company level. Do you want me to write a reprex on this? This might be the best right? :) I will follow up in a separate ticket. If you could however already look at the Google Sheet and how the porduct and company data should look like at the end that would be fantasti!

@maurolepore
Copy link
Contributor

maurolepore commented Mar 5, 2024

would be great to get a rough estimation for it. -- @AnneSchoenauer via Slack (private)

I roughly estimate 1-6 work days.

@maurolepore maurolepore removed their assignment Mar 5, 2024
@maurolepore maurolepore changed the title Create Bundesbank data Create Bundesbank data (exclude _upstream()) Mar 6, 2024
@maurolepore maurolepore changed the title Create Bundesbank data (exclude _upstream()) Create Bundesbank data (exclude *_upstream()) Mar 6, 2024
@maurolepore maurolepore changed the title Create Bundesbank data (exclude *_upstream()) Create Bundesbank data (focus on Germany and exclude *_upstream()) Mar 12, 2024
@kalashsinghal
Copy link
Collaborator

@AnneSchoenauer In the google sheet, you added activity_uuid_product_uuid column at the company level. This column is only used to produce and access the product level results. Hence, I would like you to recheck and please let me know why this column is needed at the company level. Thanks!

@Tilmon
Copy link
Collaborator

Tilmon commented Mar 19, 2024

Hi there! I did a thorough review of the newest dataset and all seems in order except for one thing. But I'm not sure if it's really a problem with the data or I'm just missing something. Therefore, could you @kalashsinghal please check out the following?

When I select for companies_id "alakus-naturdarme-gmbh_00000004935321-001" the benchmark "unit_tilt_sector", year "2050" and scenario "NZ 2050", I get on product-level 2 products, where one has a low and one a medium emission_profile. I would expect that this translates into equal shares for emission_category_medium and emission_category_low on company-level. However, after applying the same filters, the shares for the two categories are quite unequal on company-level. Please see screenshots for reference plus piece of code below to select the respective data yourself in the code.

Can you make sense of the difference @kalashsinghal or is that a bug?

Product-level
image

company-level
image

You can generate the example yourself with this piece of code:

subset_prod <- buba_prod |> 
  filter(
  companies_id == "alakus-naturdarme-gmbh_00000004935321-001", 
  benchmark == "unit_tilt_sector", 
  year == "2050", 
  scenario == "NZ 2050")

subset_comp <- buba_comp |> 
  filter(
  companies_id == "alakus-naturdarme-gmbh_00000004935321-001", 
  benchmark == "unit_tilt_sector", 
  year == "2050", 
  scenario == "NZ 2050"
  )

cc' @AnneSchoenauer

@Tilmon
Copy link
Collaborator

Tilmon commented Mar 19, 2024

@AnneSchoenauer , re

@Tilmon there are some activities that have a co2_footprint of 0. However, they don't sound like activities without emitting any CO2 emissions. Here are some examples. Would be great if you could check if it "makes sense" or if Kalash or Mauro need to check on it. I was thinking if they all would have to do something with recylcing but that we cannot see it as they would have the ecionvent sector "recycling" which we don't have in our tilt_sectors?

Anne, I checked the emission data in the original lcia file from Ecoinvent and it's correct that for the co2 indicator we use, the emissions are 0. For some of the alternatives, they have emissions, but something like 0.00005 or so, so in any case very small.

The activity "clinker production" relates to the reference product "waste plastic, mixture", that may help to understand. For the other two, I can't explain, would need to do some more product-specific research. But I think for now it's sufficient to know the data are correct, no?

@AnneSchoenauer
Copy link
Author

Yes Perfect! Thanks a lot Tilman for the explanation!

@kalashsinghal
Copy link
Collaborator

kalashsinghal commented Mar 19, 2024

When I select for companies_id "alakus-naturdarme-gmbh_00000004935321-001" the benchmark "unit_tilt_sector", year "2050" and scenario "NZ 2050", I get on product-level 2 products, where one has a low and one a medium emission_profile. I would expect that this translates into equal shares for emission_category_medium and emission_category_low on company-level. However, after applying the same filters, the shares for the two categories are quite unequal on company-level. Please see screenshots for reference plus piece of code below to select the respective data yourself in the code.
Can you make sense of the difference @kalashsinghal or is that a bug?

@Tilmon Yes it is a bug and it originates from the input files I received from @maurolepore. In the product level input file, you can see that we have "NA" benchmark which has "NA" emission_profile, however in the company level result all benchmarks are assigned "NA" emission_profile. This should not be the case. We should have a separate "NA" benchmark at company level to provide the emission_profile_share of 0.63. Do you agree?

Product level result: (Company_id = "alakus-naturdarme-gmbh_00000004935321-001")

image

Company level result:

image

cc' @AnneSchoenauer

@Tilmon
Copy link
Collaborator

Tilmon commented Mar 19, 2024

Hi @maurolepore and @kalashsinghal,

@AnneSchoenauer and I discussed what we could do to get the data she needs on product-level relatively fast. For context, we discussed in the meeting today that the fact that NAs are not preserved on product-level is a problem for the Bundesbank Analysis. For some of the charts Anne needs to know the total number of products of a company vs. the number of products with results.

Therefore, what about the following quick fix for the product-level results:

  1. You produce the profile_emission() from tiltIndicatorAfter. Is it correct, that NAs are preserved there?
  2. You use the output table from 1. as "sceleton" as it preserves NAs. Then, you
  3. join the sector_profile and transition_risk_score to table from output 1. This join should be done by = c("companies_id", "ep_product")

Company-level results are perfect as they are! What do you think? Who's the best to do that?

@AnneSchoenauer
Copy link
Author

To be clear - "from output 1" Tilman meant the results that you already gave us on product-level.

@maurolepore
Copy link
Contributor

maurolepore commented Mar 19, 2024

1.

I'm best to answer this question

I guess you refer to both of the two ways to yield NAs in the new output of tiltIndicator::emissions_profile() -- i.e. (1.a.) unmatched products and (1.b.) missing benchmarks?

  • 1.a. Yes. We discussed this in profile_emissions() now preserves unmatched products #193 (comment) and documented it in the changelog here. This was fixed before I produced the latest output in tiltOutput-v0.0.0.9002. (Caveat: This fix is more comprehensively tested in tiltIndicator than tiltIndicatorAfter since it was all very rushed but I will take this as an opportunity to extend the tests.)

  • 1.b. I don't know yet since I didn't test for this specific case. We all had assumed that the new features of tiltIndicator where passed to the output of tiltIndicatorAfter but the item above 1.a. demonstrates we were wrong. So it's wise to test this now. (Caveat: Testing tiltIndicatorAfter is particluarly hard so it can take me some time.)

2. and 3.

Kalash may be more familiar with this but I can also help here if you point me exactly to:

  • 2.a. The file that contains the "output table from 1".

  • 2,b. The file that contains the columns sector_profile and trainsition_risk_score that you want to join to the "output table from 1".

(Likely I can track those things if I follow Kalash's comments but there is room for confusion.)

@Tilmon
Copy link
Collaborator

Tilmon commented Mar 19, 2024

Thanks @maurolepore !

Re

1.

We all had assumed that the new features of tiltIndicator where passed to the output of tiltIndicatorAfter but the item above 1.a. demonstrates we were wrong.

If it's not passed over, we may can use just the output from emission_profile() instead? Ultimately, we need the emission_profile results on product level with preserved NAs for unmatched products and missing benchmarks. tiltIndicatorAfter would be nice because of all the other columns that are included but I think tiltIndicator would also do the job. And the other columns can probably also be used from the Bundesbank data if we join it anyways (see below for link)?

2. and 3.

Re

I can also help here if you point me exactly to:

2.a. The file that contains the "output table from 1".:

This would be the output of either emission_profile() from tiltIndicator or profile_emission() from tiltIndicatorAfter.

2,b. The file that contains the columns sector_profile and trainsition_risk_score that you want to join to the "output table from 1".

That's the Bundesbank data on product level. You can find the latest version in this folder "bundesbank_data_product_level.csv.zip"

@maurolepore
Copy link
Contributor

@Tilmon

can use just the output from emission_profile() instead?

I can explore this idea but one big challenge I see is that the way I generate outputs is via tiltWorkflows, which supports a chunking option that allows computing on the large real datasets we have. tiltWorkflows wraps tiltIndicatorAfter -- not tiltIndicator alone. And calling tiltIndicator directly would almost certainly saturate the available memory and crash because the real data is too large to fit in a single pass.

@kalashsinghal
Copy link
Collaborator

  1. You use the output table from 1. as "sceleton" as it preserves NAs. Then, you
  2. join the sector_profile and transition_risk_score to table from output 1. This join should be done by = c("companies_id", "ep_product")

@Tilmon I will complete these two tasks after I get the "output from table 1" from @maurolepore :)

@Tilmon
Copy link
Collaborator

Tilmon commented Mar 20, 2024

@maurolepore

I can explore this idea but one big challenge I see is that the way I generate outputs is via tiltWorkflows, which supports a chunking option that allows computing on the large real datasets we have. tiltWorkflows wraps tiltIndicatorAfter -- not tiltIndicator alone. And calling tiltIndicator directly would almost certainly saturate the available memory and crash because the real data is too large to fit in a single pass.

Ok, I see. Then maybe first check if NAs (both unmatched and missing benchmark) are preserved on product level in tiltIndicatorAfter? Because if that's the case, you could simply run tiltWorkflows. If not, the alternative solution with only running tiltIndicator comes into play.

If you need support, maybe @kalashsinghal can help? We already know a company that has unmatched products: "alakus-naturdarme-gmbh_00000004935321-001". Running tiltIndicatorAfter for that company should give clarity on whether it preserves unmatched products. For a company with missing benchmarks, we'd need to look for another example.

Thanks you both

@maurolepore
Copy link
Contributor

maurolepore commented Mar 20, 2024

@Tilmon

1.

  • 1.a. Preserves unmatched product (see above).
  • 1.b. Preserves missing benchmark?

I don't think so. In the reprex below I see:

  • The output of tiltIndicator includes a row for the missing benchmark at product level, and reflects it in the output at product level (via the corresponding value).
  • The output of tiltIndicatorAfter does NOT include a row for the missing benchmark at product level, and I'm not sure if it reflects it in the output at product level (you're best to judge).

image

reprex
# SETUP --------------------------------------------------------------------
library(readr, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(tiltToyData)
library(tiltIndicator)
library(tiltIndicatorAfter)

packageVersion("tiltIndicator")
#> [1] '0.0.0.9213'
packageVersion("tiltIndicatorAfter")
#> [1] '0.0.0.9024'

withr::local_options(readr.show_col_types = FALSE, width = 500)

companies <- read_csv(toy_emissions_profile_any_companies())
id <- unique(companies$companies_id)[[1]]
uuid <- unique(companies$activity_uuid_product_uuid)[[1]]
companies <- companies |>
  filter(companies_id == id) |>
  filter(activity_uuid_product_uuid == uuid)

co2 <- read_csv(toy_emissions_profile_products_ecoinvent()) |>
  filter(activity_uuid_product_uuid == uuid)
co2$isic_4digit <- NA

europages_companies <- read_csv(toy_europages_companies()) |>
  filter(companies_id == id)
ecoinvent_activities <- read_csv(toy_ecoinvent_activities()) |>
  filter(activity_uuid_product_uuid == uuid)
ecoinvent_europages <- read_csv(toy_ecoinvent_europages()) |>
  filter(activity_uuid_product_uuid == uuid)
isic_name <- read_csv(toy_isic_name()) |>
  filter(isic_4digit == co2$isic_4digit)



# INTERESTING STUFF --------------------------------------------------------
companies |> relocate(matches(c("id")))
#> # A tibble: 1 × 7
#>   activity_uuid_product_uuid           companies_id        clustered country ei_activity_name                                              main_activity unit 
#>   <chr>                                <chr>               <chr>     <chr>   <chr>                                                         <chr>         <chr>
#> 1 76269c17-78d6-420b-991a-aa38c51b45b7 antimonarchy_canine tent      germany market for shed, large, wood, non-insulated, fire-unprotected distributor   m2

co2 |> relocate(matches(c("id", "isic")))
#> # A tibble: 1 × 8
#>   activity_uuid_product_uuid           isic_4digit co2_footprint ei_activity_name                                              ei_geography tilt_sector  tilt_subsector           unit 
#>   <chr>                                <lgl>               <dbl> <chr>                                                         <chr>        <chr>        <chr>                    <chr>
#> 1 76269c17-78d6-420b-991a-aa38c51b45b7 NA                   303. market for shed, large, wood, non-insulated, fire-unprotected GLO          construction construction residential m2


# For reference, remember the behaviour of tiltIndicator
out_tilt_indicator <- tiltIndicator::emissions_profile(companies, co2)

# At product level, preserves missing benchmark
out_tilt_indicator |> unnest_product()
#> # A tibble: 6 × 7
#>   companies_id        grouped_by       risk_category profile_ranking clustered activity_uuid_product_uuid           co2_footprint
#>   <chr>               <chr>            <chr>                   <dbl> <chr>     <chr>                                        <dbl>
#> 1 antimonarchy_canine all              high                        1 tent      76269c17-78d6-420b-991a-aa38c51b45b7          303.
#> 2 antimonarchy_canine isic_4digit      <NA>                       NA tent      76269c17-78d6-420b-991a-aa38c51b45b7          303.
#> 3 antimonarchy_canine tilt_sector      high                        1 tent      76269c17-78d6-420b-991a-aa38c51b45b7          303.
#> 4 antimonarchy_canine unit             high                        1 tent      76269c17-78d6-420b-991a-aa38c51b45b7          303.
#> 5 antimonarchy_canine unit_isic_4digit <NA>                       NA tent      76269c17-78d6-420b-991a-aa38c51b45b7          303.
#> 6 antimonarchy_canine unit_tilt_sector high                        1 tent      76269c17-78d6-420b-991a-aa38c51b45b7          303.

# Zoom into the missing benchmark
out_tilt_indicator |> 
  unnest_product() |> 
  filter(grepl("isic", grouped_by))
#> # A tibble: 2 × 7
#>   companies_id        grouped_by       risk_category profile_ranking clustered activity_uuid_product_uuid           co2_footprint
#>   <chr>               <chr>            <chr>                   <dbl> <chr>     <chr>                                        <dbl>
#> 1 antimonarchy_canine isic_4digit      <NA>                       NA tent      76269c17-78d6-420b-991a-aa38c51b45b7          303.
#> 2 antimonarchy_canine unit_isic_4digit <NA>                       NA tent      76269c17-78d6-420b-991a-aa38c51b45b7          303.

# At company level, reflects missing benchmark
out_tilt_indicator |> unnest_company()
#> # A tibble: 24 × 4
#>    companies_id        grouped_by  risk_category value
#>    <chr>               <chr>       <chr>         <dbl>
#>  1 antimonarchy_canine all         high              1
#>  2 antimonarchy_canine all         medium            0
#>  3 antimonarchy_canine all         low               0
#>  4 antimonarchy_canine all         <NA>              0
#>  5 antimonarchy_canine isic_4digit high              0
#>  6 antimonarchy_canine isic_4digit medium            0
#>  7 antimonarchy_canine isic_4digit low               0
#>  8 antimonarchy_canine isic_4digit <NA>              1
#>  9 antimonarchy_canine tilt_sector high              1
#> 10 antimonarchy_canine tilt_sector medium            0
#> # ℹ 14 more rows

# Zoom into the missing benchmark
out_tilt_indicator |> 
  unnest_company() |> 
  filter(grepl("isic", grouped_by))
#> # A tibble: 8 × 4
#>   companies_id        grouped_by       risk_category value
#>   <chr>               <chr>            <chr>         <dbl>
#> 1 antimonarchy_canine isic_4digit      high              0
#> 2 antimonarchy_canine isic_4digit      medium            0
#> 3 antimonarchy_canine isic_4digit      low               0
#> 4 antimonarchy_canine isic_4digit      <NA>              1
#> 5 antimonarchy_canine unit_isic_4digit high              0
#> 6 antimonarchy_canine unit_isic_4digit medium            0
#> 7 antimonarchy_canine unit_isic_4digit low               0
#> 8 antimonarchy_canine unit_isic_4digit <NA>              1

out_tilt_indicator_after <- profile_emissions(
  companies,
  co2,
  europages_companies,
  ecoinvent_activities,
  ecoinvent_europages,
  isic_name
)
#> ℹ Adding 55% and 92% noise to `co2e_lower` and `co2e_upper`, respectively.



# Now examine the the behaviour of tiltIndicator

# ANSWER: At product level, does NOT preserve missing benchmark as you expect???
out_tilt_indicator_after |> unnest_product()
#> # A tibble: 6 × 25
#>   companies_id        company_name        country emission_profile benchmark        ep_product matched_activity_name                                         matched_reference_product                          co2e_lower co2e_upper unit  multi_match matching_certainty matching_certainty_company_average tilt_sector  tilt_subsector           isic_4digit isic_4digit_name company_city postcode address                                    main_activity activity_uuid_product…¹ profile_ranking ei_geography
#>   <chr>               <chr>               <chr>   <chr>            <chr>            <chr>      <chr>                                                         <chr>                                                   <dbl>      <dbl> <chr> <lgl>       <chr>              <chr>                              <chr>        <chr>                    <chr>       <chr>            <chr>        <chr>    <chr>                                      <chr>         <chr>                             <dbl> <chr>       
#> 1 antimonarchy_canine antimonarchy_canine germany high             all              tent       market for shed, large, wood, non-insulated, fire-unprotected shed, large, wood, non-insulated, fire-unprotected      -16.5       451. m2    FALSE       low                low                                construction construction residential <NA>        <NA>             hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor   76269c17-78d6-420b-991…               1 tilt_world  
#> 2 antimonarchy_canine antimonarchy_canine germany <NA>             <NA>             tent       market for shed, large, wood, non-insulated, fire-unprotected shed, large, wood, non-insulated, fire-unprotected       52.8       542. m2    FALSE       low                low                                construction construction residential <NA>        <NA>             hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor   76269c17-78d6-420b-991…              NA tilt_world  
#> 3 antimonarchy_canine antimonarchy_canine germany high             tilt_sector      tent       market for shed, large, wood, non-insulated, fire-unprotected shed, large, wood, non-insulated, fire-unprotected      274.        679. m2    FALSE       low                low                                construction construction residential <NA>        <NA>             hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor   76269c17-78d6-420b-991…               1 tilt_world  
#> 4 antimonarchy_canine antimonarchy_canine germany high             unit             tent       market for shed, large, wood, non-insulated, fire-unprotected shed, large, wood, non-insulated, fire-unprotected     -303.        405. m2    FALSE       low                low                                construction construction residential <NA>        <NA>             hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor   76269c17-78d6-420b-991…               1 tilt_world  
#> 5 antimonarchy_canine antimonarchy_canine germany <NA>             <NA>             tent       market for shed, large, wood, non-insulated, fire-unprotected shed, large, wood, non-insulated, fire-unprotected       88.9       640. m2    FALSE       low                low                                construction construction residential <NA>        <NA>             hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor   76269c17-78d6-420b-991…              NA tilt_world  
#> 6 antimonarchy_canine antimonarchy_canine germany high             unit_tilt_sector tent       market for shed, large, wood, non-insulated, fire-unprotected shed, large, wood, non-insulated, fire-unprotected       90.2       780. m2    FALSE       low                low                                construction construction residential <NA>        <NA>             hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor   76269c17-78d6-420b-991…               1 tilt_world  
#> # ℹ abbreviated name: ¹​activity_uuid_product_uuid

# Zoom into the missing benchmark
out_tilt_indicator_after |> 
  unnest_product() |> 
  filter(grepl("isic", benchmark))
#> # A tibble: 0 × 25
#> # ℹ 25 variables: companies_id <chr>, company_name <chr>, country <chr>, emission_profile <chr>, benchmark <chr>, ep_product <chr>, matched_activity_name <chr>, matched_reference_product <chr>, co2e_lower <dbl>, co2e_upper <dbl>, unit <chr>, multi_match <lgl>, matching_certainty <chr>, matching_certainty_company_average <chr>, tilt_sector <chr>, tilt_subsector <chr>, isic_4digit <chr>, isic_4digit_name <chr>, company_city <chr>, postcode <chr>, address <chr>, main_activity <chr>,
#> #   activity_uuid_product_uuid <chr>, profile_ranking <dbl>, ei_geography <chr>

# ANSWER: At company level, reflects missing benchmark???
out_tilt_indicator_after |> unnest_company()
#> # A tibble: 24 × 14
#>    companies_id        company_name        country emission_profile_share emission_profile benchmark   co2e_lower co2e_upper matching_certainty_company_average company_city postcode address                                    main_activity profile_ranking_avg
#>    <chr>               <chr>               <chr>                    <dbl> <chr>            <chr>            <dbl>      <dbl> <chr>                              <chr>        <chr>    <chr>                                      <chr>                       <dbl>
#>  1 antimonarchy_canine antimonarchy_canine germany                      1 high             all              -16.5       451. low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                     1
#>  2 antimonarchy_canine antimonarchy_canine germany                      0 medium           all               NA          NA  low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                     1
#>  3 antimonarchy_canine antimonarchy_canine germany                      0 low              all               NA          NA  low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                     1
#>  4 antimonarchy_canine antimonarchy_canine germany                      0 <NA>             all               NA          NA  low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                     1
#>  5 antimonarchy_canine antimonarchy_canine germany                      0 high             isic_4digit       NA          NA  low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                   NaN
#>  6 antimonarchy_canine antimonarchy_canine germany                      0 medium           isic_4digit       NA          NA  low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                   NaN
#>  7 antimonarchy_canine antimonarchy_canine germany                      0 low              isic_4digit       NA          NA  low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                   NaN
#>  8 antimonarchy_canine antimonarchy_canine germany                      1 <NA>             isic_4digit       52.8       542. low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                   NaN
#>  9 antimonarchy_canine antimonarchy_canine germany                      1 high             tilt_sector      274.        679. low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                     1
#> 10 antimonarchy_canine antimonarchy_canine germany                      0 medium           tilt_sector       NA          NA  low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                     1
#> # ℹ 14 more rows

# Zoom into the missing benchmark
out_tilt_indicator_after |> 
  unnest_company() |> 
  filter(grepl("isic", benchmark))
#> # A tibble: 8 × 14
#>   companies_id        company_name        country emission_profile_share emission_profile benchmark        co2e_lower co2e_upper matching_certainty_company_average company_city postcode address                                    main_activity profile_ranking_avg
#>   <chr>               <chr>               <chr>                    <dbl> <chr>            <chr>                 <dbl>      <dbl> <chr>                              <chr>        <chr>    <chr>                                      <chr>                       <dbl>
#> 1 antimonarchy_canine antimonarchy_canine germany                      0 high             isic_4digit            NA          NA  low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                   NaN
#> 2 antimonarchy_canine antimonarchy_canine germany                      0 medium           isic_4digit            NA          NA  low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                   NaN
#> 3 antimonarchy_canine antimonarchy_canine germany                      0 low              isic_4digit            NA          NA  low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                   NaN
#> 4 antimonarchy_canine antimonarchy_canine germany                      1 <NA>             isic_4digit            52.8       542. low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                   NaN
#> 5 antimonarchy_canine antimonarchy_canine germany                      0 high             unit_isic_4digit       NA          NA  low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                   NaN
#> 6 antimonarchy_canine antimonarchy_canine germany                      0 medium           unit_isic_4digit       NA          NA  low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                   NaN
#> 7 antimonarchy_canine antimonarchy_canine germany                      0 low              unit_isic_4digit       NA          NA  low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                   NaN
#> 8 antimonarchy_canine antimonarchy_canine germany                      1 <NA>             unit_isic_4digit       88.9       640. low                                hamburg      22525    schnackenburgallee 217-223 | 22525 hamburg distributor                   NaN

As much as possible my exploration is with toy data. This allows me to explore the bug most generally, and to share the reprex with you publicly here. If there is something to fix, then I can easily turn my reprex into a test. If I fail to find a bug, then we may want to confirm with the real data -- but that's harder to do and harder to discuss in a public thread.

--

Sorry, something went wrong.

@Tilmon
Copy link
Collaborator

Tilmon commented Mar 20, 2024

Hey @maurolepore thanks for the reprex. Shows clearly that how tiltIndicatorAfter handles missing benchmarks on product-level in emission_profile() right now, is not how it should be.

Alternative option was, as discussed, to run tiltIndicator with emission_profile(). Could you try that out? We "only" need German companies. If running that works, then this would be the best alternative option.

If this doesn't work, we need to see which of the following options is the third-best:

  • fixing the output in tiltIndicatorAfter
  • fixing sector_profile to preserve NAs, so that the Bundesbank output as created right now shows the NAs.

Thank you!

@maurolepore
Copy link
Contributor

maurolepore commented Mar 20, 2024

@Tilmon the fix in tiltIndicatorAfter seems pretty straightforward.

I"ll follow up there and request your conceptual review and Kalash's technical review -- shortly.

@maurolepore
Copy link
Contributor

maurolepore commented Mar 20, 2024

@Tilmon and @kalashsinghal this PR is now pending your conceptual and technical review, respectively.

What you've seen so far suggests you're happy with the fix so while I wait for the formal review I used the code in that PR to generate new output. The file 06-emissions.profile.zip is now attached to tiltOtput-v0.0.0.9002.

@kalashsinghal
Copy link
Collaborator

@AnneSchoenauer @Tilmon @maurolepore

I have updated the final Bundesbank product and company level data after preserving missing benchmarks in this folder. @AnneSchoenauer If you want to see the missing benchmarks then should I also add the benchmarks NA in the final output? right?

@Tilmon
Copy link
Collaborator

Tilmon commented Mar 22, 2024

@kalashsinghal I cannot see any new data in the folder you shared - did you remove them again? All 3 files I see are from March 18th.

@kalashsinghal
Copy link
Collaborator

@AnneSchoenauer @Tilmon

I have updated the files again by replacing the benchmark "all" with "unit" as asked by Anne here. The files are uploaded here.

@Tilmon
Copy link
Collaborator

Tilmon commented Mar 22, 2024

Hey @kalashsinghal , the product-level results to not seem to preserve unmatched products on product-level. Checking for companies_id "alakus-naturdarme-gmbh_00000004935321-001", I don't see any clustered without match, even though the company has many unmatched products, as you yourself pointed out with the screenshot here in #153 (comment).
Why is that so? Are the unmatched products missing in the inputs you use from @maurolepore or did you perform a wrong join?
To recap: The idea was to use profile_emission() from tiltIndicatorAfter as sceleton and join the Bundesbank output data by = c("companies_id", "ep_product") (+ probably the benchmark column) to preserve the NA rows but also get the Bundesbank output.

@kalashsinghal
Copy link
Collaborator

@Tilmon I am filtering the final results by benchmarks "unit" and "unit_tilt_sector", and the unmatched products have the benchmark "NA". That's the reason why unmatched products get excluded from the final result (as you can see from the image)

image

Should I also add data of "NA" benchmarks to the product level output?

cc @AnneSchoenauer

@Tilmon
Copy link
Collaborator

Tilmon commented Mar 22, 2024

Aaaah. Yes, in that case please include benchmark "NA" to keep unmatched products. Thanks! cc' @AnneSchoenauer

@kalashsinghal
Copy link
Collaborator

@Tilmon I have updated the results again with "NA" benchmarks. Please check again :)

@Tilmon
Copy link
Collaborator

Tilmon commented Mar 22, 2024

@kalashsinghal I don't see unmathced products for the respective company "alakus-naturdarme-gmbh_00000004935321-001". Am I missing anything or what's the issue?

@AnneSchoenauer
Copy link
Author

Hello everyone, Just letting you know that I double checked the Bundesbank data and sent it to the Bundesbank this morning. Well done everyone for finishing this very intensive process! From my side we could close this issue but don’t want to mess up things :)
Have a great week everyone!

@maurolepore
Copy link
Contributor

maurolepore commented Apr 2, 2024

Great, I'll go ahead and close. I see some related follow up tasks (see "TODO AFTER DELIVERY" in the first comment) but each has its own issue to remind us of what needs to be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants