Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overview ticket: Review Output tables (status 22nd of December) #113

Closed
AnneSchoenauer opened this issue Jan 2, 2024 · 5 comments
Closed
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@AnneSchoenauer
Copy link

AnneSchoenauer commented Jan 2, 2024

This ticket is an overview ticket for the review that Tilman and I are doing for the output tables that @SKruthoff and @ysherstyuk Yana created on the 22nd of December. The output tables can be found here: https://drive.google.com/drive/u/0/folders/1uaYeEIiwAcJkNvG5oFMIVvlqrG5PrR_9.

@Tilmon please add your review here.
@ysherstyuk and @SKruthoff this is only FYI.
@maurolepore and @kalashsinghal there might be tasks coming out of this overview ticket which will then be assigned to you.

I created this ticket in the tiltIndicatorAfter package as I assume that the results are correct that are produced but there might be small things that we want to change in the output tables (so more a "view" problem than an actual "code" problem). If we find some mistakes that need to be code related, we will create tickets in the according packages - but I think and hope that this is not the case anyway ;).

@AnneSchoenauer
Copy link
Author

AnneSchoenauer commented Jan 2, 2024

Anne's review (preliminary):

0. Overall

  • 0.1 Each of the output files have a row_id. What is this row_id about?
  • 0.2 I think we should think about simplifying it. There are a lot of double information and maybe would be great to have only one file similar to this ING here: https://docs.google.com/spreadsheets/d/1rRUBpeXfj1w-paAX0Cono7RslZX_kzoujYE1E7XdnaM/edit#gid=1842238901 @Tilmon what do you think?
  • 0.3 For the company view - I noticed that it is super important when doing econemtric that we need not a long but a wide view. Could we create both? @Tilmon what do you think? I also think that for some plots we need wide view so maybe good to have the view that is best compatible for further analysis? @Tilmon what do you think?
  • 0.4 Do we want to have the Transition Risk Score some where in the data? As a combination from the emission profile and sector profile?

1. Emission profile product level:

  • 1.1 Adding Co2e_lower and Co2e_upper in the output table. This refers to this ticket here. So I think the variables created with the jitter function are already in there but not in the output file. @Tilmon are the names okay for this variable?
  • 1.2 Renaming PCTR_risk_category in emission_profile. @Tilmon is the name okay?

2. Emission profile company level:

  • 2.1 @Tilmon We would need to decide if we do want to have a main_tilt_sector and main_tilt_subsector
  • 2.2 Renaming PCTR_risk_category in emission_profile. @Tilmon is the name okay?
  • 2.3 Renaming PCTR_share in emission_profile_share. @Tilmon is the name okay?
  • 2.4 I know we decided that if the company produces a product that cannot be matched, that we exlcuded it from the analysis. However, if you now look at the emission_profile results, the results only hold for the products that we were able to match with ecoinvent. I think out of transparency, we do need also a NA section here. Especially because of this ticket here, we will have a NA in the emission_profile anyway. @Tilmon please let's discuss and then write a ticket here The same also holds for the sector profiles.

3. Emission profile upstream product level:

  • 3.1 Empty file
  • 3.2 But for sure: Rename 'ISTR_risk_category' into 'emission_usptream_profile' @Tilmon this is a very long name. Let's discuss the naming please!
  • 3.3 Adding GEO column.

4. Emission profile upstream company level:

  • 4.1 Rename 'ISTR_share' into 'emission_usptream_profile_share' @Tilmon this is a very long name. Let's discuss the naming please!
  • 4.2 Rename 'ISTR_risk_category' into 'emission_usptream_profile' @Tilmon this is a very long name. Let's discuss the naming please!

Sector profile product level

  • Rename 'PSTR_risk_category' into 'sector_profile' @Tilmon Let's discuss the naming please!
  • Rename 'sector' and 'subsector' to 'sector_scenario' and 'subsector_scenario' @Tilmon this is a very long name. Let's discuss the naming please!
  • Rename 'profile_ranking' into 'SERT' @Tilmon Let's discuss the naming please!

Sector profile companyl evel

  • Rename 'PSTR_risk_category' into 'sector_profile' @Tilmon Let's discuss the naming please!
  • Rename 'PSTR_share' into 'sector_profile_share' @Tilmon Let's discuss the naming please!

Sector profile upstream product level:

  • Adding GEO column.
  • I noticed that some of the inputs are actually not inputs but outputs - for example biowaste. Shall we exlcude them @Tilmon
  • I noticed that some ep_products were matched to the same matched_activity_name. The results are therefore counted twice. For example, see for company_id 'wamic-gravur-lasertechnik-eu_00000005202642-001'
    Image. Is this okay? What do you think @Tilmon

Sector profile upstream companyl evel
[ ] Rename 'ISTR_risk_category' into 'sector_profile_upstream' @Tilmon Let's discuss the naming please!
[ ] Rename 'ISTR_share' into 'sector_profile_upstream_share' @Tilmon Let's discuss the naming please!

@Tilmon
Copy link
Collaborator

Tilmon commented Jan 3, 2024

Overall

  1. Duplicates: running dplyr::distinct() on the datasets emission_profile_company.csv, emission_profile_product.csv, emission_profile_upstream_at_company_level.csv shows that all these 3 datasets have duplicates. Only tested for these 3. All datasets should be tested for duplications and duplications avoided. E.g. the companies_id "adolf-wurth-gmbh-co-kg_00000004971238-001" has all rows twice in the emission_profile_product.csv.

  2. Re simplification & wide vs long format

I think we should think about simplifying it. There are a lot of double information and maybe would be great to have only one file similar to this ING here: https://docs.google.com/spreadsheets/d/1rRUBpeXfj1w-paAX0Cono7RslZX_kzoujYE1E7XdnaM/edit#gid=1842238901 @Tilmon what do you think?

@AnneSchoenauer I think the downside of the link you shared is that this would require to make separate columns for each benchmark (i.e. scenarios for sector_profile and the other benchmarks for emission_profile). Right now, we have it all in the long format instead of wide format. This then already relates to your comment here

For the company view - I noticed that it is super important when doing econemtric that we need not a long but a wide view. Could we create both? @Tilmon what do you think? I also think that for some plots we need wide view so maybe good to have the view that is best compatible for further analysis? @Tilmon what do you think?

@AnneSchoenauer Providing both to banks might be a bit of an overkill BUT we could provide code to modify datasets to wide format AND join all company-level results together and all product-level results? I assume that's possible and would also solve the "simplifying" question you raised above.

3. Transition Risk Score

Do we want to have the Transition Risk Score some where in the data? As a combination from the emission profile and sector profile?

YES

Emission Profile
@AnneSchoenauer both your suggestions are fine for me.

Emission profile company level:
ALL OK FOR ME. Regarding main_sectors: I think our current data don't allow for that, no? Maybe good to discuss, but would think that if we aim to use other data sources than Europages in the long-term that we maybe do not need to invest time in that right now, because eventually, another data source will solve the problem?

Emission profile upstream product level:
OK

Renaming in

    1. Emission profile upstream company level: OK
  • Sector profile product level: OK
  • Sector profile companyl evel: OK
  • Sector profile upstream companyl evel: OK

Sector profile upstream product level:

Adding GEO column.

OK

I noticed that some of the inputs are actually not inputs but outputs - for example biowaste. Shall we exlcude them @Tilmon

@AnneSchoenauer can you share the specific example? Not entirely clear to me from your description :)

I noticed that some ep_products were matched to the same matched_activity_name. The results are therefore counted twice. For example, see for company_id 'wamic-gravur-lasertechnik-eu_00000005202642-001' Is this okay? What do you think @Tilmon

I would say Yes, that's OK. The matching is never perfect, always only a proxy. I think it's stringent if we stick to the number of ep_products, even if they are matched to the same ecoinvent product.

@AnneSchoenauer
Copy link
Author

With regard to an example of biowaste please see here. In general the problem exists as we have a Life CYCLE assessment, i.e. also downstream and not only upstream information. What do you think?

One example is for example the company ihab-serour_00000005050260-001 (you can see it in the sector_profile_upstream_at_product_level which is producing coffee bean, green. One "input" product as we call it is biowaste. But biowaste is not an input product but rather an output (downstream).

@AnneSchoenauer
Copy link
Author

@Tilmon I created a separate ticket for this to discuss it elsewhere and to be able to close this ticket here.

@AnneSchoenauer AnneSchoenauer added the documentation Improvements or additions to documentation label Jan 4, 2024
@AnneSchoenauer
Copy link
Author

All tickets are created that's why I close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants