-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create Bundesbank data (focus on Germany and exclude *_upstream()
)
#153
Comments
This issue is blocked by this issue here |
Wrong. Repo. I'm unsure if this is for tiltIndicatorBefore or tiltToyData. I'll tentatively transfer to tiltToyData but @kalashsinghal please discuss with @AnneSchoenauer to identify the best repo. |
Dear @AnneSchoenauer @Tilmon @maurolepore This script only creates the product level output for Bundesbank data:
Points to note:
I can't create a reprex of the output because I didn't have the dataset for Mauro's notes:
|
Thanks Kalash. This also relates to 2DegreesInvesting/tiltIndicator#657 (comment) Happy vacations. |
@maurolepore Thanks for taking this task over! I would like to mention that company level output is not easy to create as it requires output column |
BTW, following our long conversation about setting the seed, I edited your reprex above to ensure the output is reproducible. |
@maurolepore Do you need any support on this here? With regard to With regard to the Co2_footprint variable. We have this variable in our dataset on the product variable. However, this is licensed data. As I have the license it is fine that we share it with the Bundesbank as no one else has access to these datasets. Therefore, on hte product level it is the "standard" Co2 variable data point that you guys know. On the company level (sheet 2 in the Bundesbank data file) is the average Co2 variable of all products. I think Kalash wrote a function for the profile ranking and sector targets which is called |
I roughly estimate 1-6 work days. |
_upstream()
)
_upstream()
)*_upstream()
)
*_upstream()
)*_upstream()
)
@AnneSchoenauer In the google sheet, you added |
Hi there! I did a thorough review of the newest dataset and all seems in order except for one thing. But I'm not sure if it's really a problem with the data or I'm just missing something. Therefore, could you @kalashsinghal please check out the following? When I select for Can you make sense of the difference @kalashsinghal or is that a bug? You can generate the example yourself with this piece of code:
cc' @AnneSchoenauer |
@AnneSchoenauer , re
Anne, I checked the emission data in the original lcia file from Ecoinvent and it's correct that for the co2 indicator we use, the emissions are 0. For some of the alternatives, they have emissions, but something like 0.00005 or so, so in any case very small. The activity "clinker production" relates to the reference product "waste plastic, mixture", that may help to understand. For the other two, I can't explain, would need to do some more product-specific research. But I think for now it's sufficient to know the data are correct, no? |
Yes Perfect! Thanks a lot Tilman for the explanation! |
@Tilmon Yes it is a bug and it originates from the input files I received from @maurolepore. In the product level input file, you can see that we have "NA" Product level result: (Company_id = "alakus-naturdarme-gmbh_00000004935321-001") Company level result: cc' @AnneSchoenauer |
Hi @maurolepore and @kalashsinghal, @AnneSchoenauer and I discussed what we could do to get the data she needs on product-level relatively fast. For context, we discussed in the meeting today that the fact that NAs are not preserved on product-level is a problem for the Bundesbank Analysis. For some of the charts Anne needs to know the total number of products of a company vs. the number of products with results. Therefore, what about the following quick fix for the product-level results:
Company-level results are perfect as they are! What do you think? Who's the best to do that? |
To be clear - "from output 1" Tilman meant the results that you already gave us on product-level. |
1.I'm best to answer this question I guess you refer to both of the two ways to yield
2. and 3.Kalash may be more familiar with this but I can also help here if you point me exactly to:
(Likely I can track those things if I follow Kalash's comments but there is room for confusion.) |
Thanks @maurolepore ! Re 1.
If it's not passed over, we may can use just the output from emission_profile() instead? Ultimately, we need the emission_profile results on product level with preserved NAs for unmatched products and missing benchmarks. tiltIndicatorAfter would be nice because of all the other columns that are included but I think tiltIndicator would also do the job. And the other columns can probably also be used from the Bundesbank data if we join it anyways (see below for link)? 2. and 3.Re
This would be the output of either emission_profile() from tiltIndicator or profile_emission() from tiltIndicatorAfter.
That's the Bundesbank data on product level. You can find the latest version in this folder "bundesbank_data_product_level.csv.zip" |
I can explore this idea but one big challenge I see is that the way I generate outputs is via tiltWorkflows, which supports a chunking option that allows computing on the large real datasets we have. tiltWorkflows wraps tiltIndicatorAfter -- not tiltIndicator alone. And calling tiltIndicator directly would almost certainly saturate the available memory and crash because the real data is too large to fit in a single pass. |
@Tilmon I will complete these two tasks after I get the "output from table 1" from @maurolepore :) |
Ok, I see. Then maybe first check if NAs (both unmatched and missing benchmark) are preserved on product level in tiltIndicatorAfter? Because if that's the case, you could simply run tiltWorkflows. If not, the alternative solution with only running tiltIndicator comes into play. If you need support, maybe @kalashsinghal can help? We already know a company that has unmatched products: "alakus-naturdarme-gmbh_00000004935321-001". Running tiltIndicatorAfter for that company should give clarity on whether it preserves unmatched products. For a company with missing benchmarks, we'd need to look for another example. Thanks you both |
1.
I don't think so. In the reprex below I see:
reprex# SETUP --------------------------------------------------------------------
library(readr, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(tiltToyData)
library(tiltIndicator)
library(tiltIndicatorAfter)
packageVersion("tiltIndicator")
#> [1] '0.0.0.9213'
packageVersion("tiltIndicatorAfter")
#> [1] '0.0.0.9024'
withr::local_options(readr.show_col_types = FALSE, width = 500)
companies <- read_csv(toy_emissions_profile_any_companies())
id <- unique(companies$companies_id)[[1]]
uuid <- unique(companies$activity_uuid_product_uuid)[[1]]
companies <- companies |>
filter(companies_id == id) |>
filter(activity_uuid_product_uuid == uuid)
co2 <- read_csv(toy_emissions_profile_products_ecoinvent()) |>
filter(activity_uuid_product_uuid == uuid)
co2$isic_4digit <- NA
europages_companies <- read_csv(toy_europages_companies()) |>
filter(companies_id == id)
ecoinvent_activities <- read_csv(toy_ecoinvent_activities()) |>
filter(activity_uuid_product_uuid == uuid)
ecoinvent_europages <- read_csv(toy_ecoinvent_europages()) |>
filter(activity_uuid_product_uuid == uuid)
isic_name <- read_csv(toy_isic_name()) |>
filter(isic_4digit == co2$isic_4digit)
# INTERESTING STUFF --------------------------------------------------------
companies |> relocate(matches(c("id")))
#> # A tibble: 1 × 7
#> activity_uuid_product_uuid companies_id clustered country ei_activity_name main_activity unit
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 76269c17-78d6-420b-991a-aa38c51b45b7 antimonarchy_canine tent germany market for shed, large, wood, non-insulated, fire-unprotected distributor m2
co2 |> relocate(matches(c("id", "isic")))
#> # A tibble: 1 × 8
#> activity_uuid_product_uuid isic_4digit co2_footprint ei_activity_name ei_geography tilt_sector tilt_subsector unit
#> <chr> <lgl> <dbl> <chr> <chr> <chr> <chr> <chr>
#> 1 76269c17-78d6-420b-991a-aa38c51b45b7 NA 303. market for shed, large, wood, non-insulated, fire-unprotected GLO construction construction residential m2
# For reference, remember the behaviour of tiltIndicator
out_tilt_indicator <- tiltIndicator::emissions_profile(companies, co2)
# At product level, preserves missing benchmark
out_tilt_indicator |> unnest_product()
#> # A tibble: 6 × 7
#> companies_id grouped_by risk_category profile_ranking clustered activity_uuid_product_uuid co2_footprint
#> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
#> 1 antimonarchy_canine all high 1 tent 76269c17-78d6-420b-991a-aa38c51b45b7 303.
#> 2 antimonarchy_canine isic_4digit <NA> NA tent 76269c17-78d6-420b-991a-aa38c51b45b7 303.
#> 3 antimonarchy_canine tilt_sector high 1 tent 76269c17-78d6-420b-991a-aa38c51b45b7 303.
#> 4 antimonarchy_canine unit high 1 tent 76269c17-78d6-420b-991a-aa38c51b45b7 303.
#> 5 antimonarchy_canine unit_isic_4digit <NA> NA tent 76269c17-78d6-420b-991a-aa38c51b45b7 303.
#> 6 antimonarchy_canine unit_tilt_sector high 1 tent 76269c17-78d6-420b-991a-aa38c51b45b7 303.
# Zoom into the missing benchmark
out_tilt_indicator |>
unnest_product() |>
filter(grepl("isic", grouped_by))
#> # A tibble: 2 × 7
#> companies_id grouped_by risk_category profile_ranking clustered activity_uuid_product_uuid co2_footprint
#> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
#> 1 antimonarchy_canine isic_4digit <NA> NA tent 76269c17-78d6-420b-991a-aa38c51b45b7 303.
#> 2 antimonarchy_canine unit_isic_4digit <NA> NA tent 76269c17-78d6-420b-991a-aa38c51b45b7 303.
# At company level, reflects missing benchmark
out_tilt_indicator |> unnest_company()
#> # A tibble: 24 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 antimonarchy_canine all high 1
#> 2 antimonarchy_canine all medium 0
#> 3 antimonarchy_canine all low 0
#> 4 antimonarchy_canine all <NA> 0
#> 5 antimonarchy_canine isic_4digit high 0
#> 6 antimonarchy_canine isic_4digit medium 0
#> 7 antimonarchy_canine isic_4digit low 0
#> 8 antimonarchy_canine isic_4digit <NA> 1
#> 9 antimonarchy_canine tilt_sector high 1
#> 10 antimonarchy_canine tilt_sector medium 0
#> # ℹ 14 more rows
# Zoom into the missing benchmark
out_tilt_indicator |>
unnest_company() |>
filter(grepl("isic", grouped_by))
#> # A tibble: 8 × 4
#> companies_id grouped_by risk_category value
#> <chr> <chr> <chr> <dbl>
#> 1 antimonarchy_canine isic_4digit high 0
#> 2 antimonarchy_canine isic_4digit medium 0
#> 3 antimonarchy_canine isic_4digit low 0
#> 4 antimonarchy_canine isic_4digit <NA> 1
#> 5 antimonarchy_canine unit_isic_4digit high 0
#> 6 antimonarchy_canine unit_isic_4digit medium 0
#> 7 antimonarchy_canine unit_isic_4digit low 0
#> 8 antimonarchy_canine unit_isic_4digit <NA> 1
out_tilt_indicator_after <- profile_emissions(
companies,
co2,
europages_companies,
ecoinvent_activities,
ecoinvent_europages,
isic_name
)
#> ℹ Adding 55% and 92% noise to `co2e_lower` and `co2e_upper`, respectively.
# Now examine the the behaviour of tiltIndicator
# ANSWER: At product level, does NOT preserve missing benchmark as you expect???
out_tilt_indicator_after |> unnest_product()
#> # A tibble: 6 × 25
#> companies_id company_name country emission_profile benchmark ep_product matched_activity_name matched_reference_product co2e_lower co2e_upper unit multi_match matching_certainty matching_certainty_company_average tilt_sector tilt_subsector isic_4digit isic_4digit_name company_city postcode address main_activity activity_uuid_product…¹ profile_ranking ei_geography
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
#> 1 antimonarchy_canine antimonarchy_canine germany high all tent market for shed, large, wood, non-insulated, fire-unprotected shed, large, wood, non-insulated, fire-unprotected -16.5 451. m2 FALSE low low construction construction residential <NA> <NA> hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor 76269c17-78d6-420b-991… 1 tilt_world
#> 2 antimonarchy_canine antimonarchy_canine germany <NA> <NA> tent market for shed, large, wood, non-insulated, fire-unprotected shed, large, wood, non-insulated, fire-unprotected 52.8 542. m2 FALSE low low construction construction residential <NA> <NA> hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor 76269c17-78d6-420b-991… NA tilt_world
#> 3 antimonarchy_canine antimonarchy_canine germany high tilt_sector tent market for shed, large, wood, non-insulated, fire-unprotected shed, large, wood, non-insulated, fire-unprotected 274. 679. m2 FALSE low low construction construction residential <NA> <NA> hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor 76269c17-78d6-420b-991… 1 tilt_world
#> 4 antimonarchy_canine antimonarchy_canine germany high unit tent market for shed, large, wood, non-insulated, fire-unprotected shed, large, wood, non-insulated, fire-unprotected -303. 405. m2 FALSE low low construction construction residential <NA> <NA> hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor 76269c17-78d6-420b-991… 1 tilt_world
#> 5 antimonarchy_canine antimonarchy_canine germany <NA> <NA> tent market for shed, large, wood, non-insulated, fire-unprotected shed, large, wood, non-insulated, fire-unprotected 88.9 640. m2 FALSE low low construction construction residential <NA> <NA> hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor 76269c17-78d6-420b-991… NA tilt_world
#> 6 antimonarchy_canine antimonarchy_canine germany high unit_tilt_sector tent market for shed, large, wood, non-insulated, fire-unprotected shed, large, wood, non-insulated, fire-unprotected 90.2 780. m2 FALSE low low construction construction residential <NA> <NA> hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor 76269c17-78d6-420b-991… 1 tilt_world
#> # ℹ abbreviated name: ¹activity_uuid_product_uuid
# Zoom into the missing benchmark
out_tilt_indicator_after |>
unnest_product() |>
filter(grepl("isic", benchmark))
#> # A tibble: 0 × 25
#> # ℹ 25 variables: companies_id <chr>, company_name <chr>, country <chr>, emission_profile <chr>, benchmark <chr>, ep_product <chr>, matched_activity_name <chr>, matched_reference_product <chr>, co2e_lower <dbl>, co2e_upper <dbl>, unit <chr>, multi_match <lgl>, matching_certainty <chr>, matching_certainty_company_average <chr>, tilt_sector <chr>, tilt_subsector <chr>, isic_4digit <chr>, isic_4digit_name <chr>, company_city <chr>, postcode <chr>, address <chr>, main_activity <chr>,
#> # activity_uuid_product_uuid <chr>, profile_ranking <dbl>, ei_geography <chr>
# ANSWER: At company level, reflects missing benchmark???
out_tilt_indicator_after |> unnest_company()
#> # A tibble: 24 × 14
#> companies_id company_name country emission_profile_share emission_profile benchmark co2e_lower co2e_upper matching_certainty_company_average company_city postcode address main_activity profile_ranking_avg
#> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 antimonarchy_canine antimonarchy_canine germany 1 high all -16.5 451. low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor 1
#> 2 antimonarchy_canine antimonarchy_canine germany 0 medium all NA NA low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor 1
#> 3 antimonarchy_canine antimonarchy_canine germany 0 low all NA NA low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor 1
#> 4 antimonarchy_canine antimonarchy_canine germany 0 <NA> all NA NA low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor 1
#> 5 antimonarchy_canine antimonarchy_canine germany 0 high isic_4digit NA NA low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor NaN
#> 6 antimonarchy_canine antimonarchy_canine germany 0 medium isic_4digit NA NA low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor NaN
#> 7 antimonarchy_canine antimonarchy_canine germany 0 low isic_4digit NA NA low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor NaN
#> 8 antimonarchy_canine antimonarchy_canine germany 1 <NA> isic_4digit 52.8 542. low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor NaN
#> 9 antimonarchy_canine antimonarchy_canine germany 1 high tilt_sector 274. 679. low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor 1
#> 10 antimonarchy_canine antimonarchy_canine germany 0 medium tilt_sector NA NA low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor 1
#> # ℹ 14 more rows
# Zoom into the missing benchmark
out_tilt_indicator_after |>
unnest_company() |>
filter(grepl("isic", benchmark))
#> # A tibble: 8 × 14
#> companies_id company_name country emission_profile_share emission_profile benchmark co2e_lower co2e_upper matching_certainty_company_average company_city postcode address main_activity profile_ranking_avg
#> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 antimonarchy_canine antimonarchy_canine germany 0 high isic_4digit NA NA low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor NaN
#> 2 antimonarchy_canine antimonarchy_canine germany 0 medium isic_4digit NA NA low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor NaN
#> 3 antimonarchy_canine antimonarchy_canine germany 0 low isic_4digit NA NA low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor NaN
#> 4 antimonarchy_canine antimonarchy_canine germany 1 <NA> isic_4digit 52.8 542. low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor NaN
#> 5 antimonarchy_canine antimonarchy_canine germany 0 high unit_isic_4digit NA NA low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor NaN
#> 6 antimonarchy_canine antimonarchy_canine germany 0 medium unit_isic_4digit NA NA low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor NaN
#> 7 antimonarchy_canine antimonarchy_canine germany 0 low unit_isic_4digit NA NA low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor NaN
#> 8 antimonarchy_canine antimonarchy_canine germany 1 <NA> unit_isic_4digit 88.9 640. low hamburg 22525 schnackenburgallee 217-223 | 22525 hamburg distributor NaN As much as possible my exploration is with toy data. This allows me to explore the bug most generally, and to share the reprex with you publicly here. If there is something to fix, then I can easily turn my reprex into a test. If I fail to find a bug, then we may want to confirm with the real data -- but that's harder to do and harder to discuss in a public thread. -- |
Hey @maurolepore thanks for the reprex. Shows clearly that how tiltIndicatorAfter handles missing benchmarks on product-level in emission_profile() right now, is not how it should be. Alternative option was, as discussed, to run tiltIndicator with emission_profile(). Could you try that out? We "only" need German companies. If running that works, then this would be the best alternative option. If this doesn't work, we need to see which of the following options is the third-best:
Thank you! |
@Tilmon the fix in tiltIndicatorAfter seems pretty straightforward. I"ll follow up there and request your conceptual review and Kalash's technical review -- shortly. |
@Tilmon and @kalashsinghal this PR is now pending your conceptual and technical review, respectively. What you've seen so far suggests you're happy with the fix so while I wait for the formal review I used the code in that PR to generate new output. The file 06-emissions.profile.zip is now attached to tiltOtput-v0.0.0.9002. |
@AnneSchoenauer @Tilmon @maurolepore I have updated the final Bundesbank product and company level data after preserving missing benchmarks in this folder. @AnneSchoenauer If you want to see the missing benchmarks then should I also add the benchmarks |
@kalashsinghal I cannot see any new data in the folder you shared - did you remove them again? All 3 files I see are from March 18th. |
Hey @kalashsinghal , the product-level results to not seem to preserve unmatched products on product-level. Checking for companies_id "alakus-naturdarme-gmbh_00000004935321-001", I don't see any clustered without match, even though the company has many unmatched products, as you yourself pointed out with the screenshot here in #153 (comment). |
@Tilmon I am filtering the final results by benchmarks "unit" and "unit_tilt_sector", and the unmatched products have the benchmark "NA". That's the reason why unmatched products get excluded from the final result (as you can see from the image) Should I also add data of "NA" benchmarks to the product level output? |
Aaaah. Yes, in that case please include benchmark "NA" to keep unmatched products. Thanks! cc' @AnneSchoenauer |
@Tilmon I have updated the results again with "NA" benchmarks. Please check again :) |
@kalashsinghal I don't see unmathced products for the respective company "alakus-naturdarme-gmbh_00000004935321-001". Am I missing anything or what's the issue? |
Hello everyone, Just letting you know that I double checked the Bundesbank data and sent it to the Bundesbank this morning. Well done everyone for finishing this very intensive process! From my side we could close this issue but don’t want to mess up things :) |
Great, I'll go ahead and close. I see some related follow up tasks (see "TODO AFTER DELIVERY" in the first comment) but each has its own issue to remind us of what needs to be done. |
Task list:
lower_bound
andhigher_bound
#179profile_emissions()
can optionally addco2_footprint
#183profile_emissios()
can optionally addco2_avg
#185tiltWorkflow::profile*()
are now sensitive totiltIndicatorAfter_options
tiltWorkflows#135profile_emissions()
now preserves unmatched products #193europages_companies
for "germany" yields unexpectedNA
s tiltWorkflows#145TODO AFTER DELIVERY
co2e_lower
andco2e_upper
#194co2e_lower
andco2e_upper
#188Hi @kalashsinghal,
Please find attached a Google Sheet which schows you which data I would need for the Bundesbank. I think it is easiest to start with the product information. Please note that Mauro still works on the lower_bound and higher_bound plus the NAs. However, I think you could already start however with creating the dataset and prepare them and then add the columns and row thereafter.
There is one new feature in the data namely having the averages of the products when aggregating it to the company level. It is the average_profile_rank, the average_reductions and the average_transition_risk_score. For this I will create some reprexes @kalashsinghal so that you know exactly on how to solve it. However, to create the reprex I would need your input @Tilmon. My problem is that if I multipy profile_rank and reduction targets I get the transition risk score for an individual product. If I now do this for each product I can calculate on company level the average profile rank and the average reduction targets of the company. Now the problem is if I multiple the average I get a different average transition risk score than I would get if I average the product based transition risk scores. Would be great if we could discuss this so that I can create a reprex for @kalashsinghal.
Thanks both of you!
The text was updated successfully, but these errors were encountered: