Do we need to redesign the top-level interface and/or the input datasets? #541

maurolepore · 2023-09-26T23:14:42Z

maurolepore
Sep 26, 2023

Relates to https://github.com/2DegreesInvesting/TiltDevProjectMGMT/milestone/15

@AnneSchoenauer and @kalashsinghal, I would appreciate your opinion about an inconsistency I see in the source of the "isic" and "sector" columns. This is not urgent but it could eventually make our code more user friendly and easier to maintain.

--

An important goal of the "Integration" milestone is to add the columns *isic_4digit and tilt_*sector to the outputs of tiltIndicatorAfter. For that we need to identify which inputs to tiltIndicator have those columns.

Here I find that the source of those columns is a little messy. In general they come from the products or inputs dataset (globally called co2) but the mess is this:

The co2 dataset sometimes is in position 2 (e.g. emissions_profile(companies, co2) and sometimes in position 3 (e.g. sector_profile_upstream(companies, scenarios, co2).
sector_profile() doesn't even have a co2 argument so those columns come from companies (e.g. sector_profile(companies, scenarios).

This apparent inconsistency may be conceptually unavoidable, but it might also signal that we could improve the design of the tiltIndicator functions and/or the datasets it takes, so that the packages is easier to use and maintain.

Here is the code from which I derive these insights.

library(dplyr, warn.conflicts = FALSE)
library(readr, warn.conflicts = FALSE)
library(tiltToyData)
library(tiltIndicator)

options(readr.show_col_types = FALSE)

# Sector profile
companies <- read_csv(toy_sector_profile_companies())
scenarios <- read_csv(toy_sector_profile_any_scenarios())
output <- sector_profile(companies, scenarios)
# `companies`
companies |> select(matches(c("isic", "sector")))
#> # A tibble: 28 × 5
#>    isic_4digit tilt_sector           tilt_subsector         sector     subsector
#>          <dbl> <chr>                 <chr>                  <chr>      <chr>    
#>  1        2410 <NA>                  <NA>                   industry   iron and…
#>  2        2410 <NA>                  <NA>                   total      iron and…
#>  3        2029 <NA>                  <NA>                   industry   chemicals
#>  4        2029 <NA>                  <NA>                   total      chemicals
#>  5          NA energy                bioenergy & waste      total      energy   
#>  6          NA energy                bioenergy & waste      bioenergy… total en…
#>  7          NA transportation        transportation         transport  other tr…
#>  8          NA transportation        transportation         total      transport
#>  9          NA construction industry construction buildings buildings  <NA>     
#> 10          NA construction industry construction buildings total      buildings
#> # ℹ 18 more rows
scenarios |> select(matches(c("isic", "sector")))
#> # A tibble: 388 × 2
#>    sector    subsector            
#>    <chr>     <chr>                
#>  1 power     <NA>                 
#>  2 power     <NA>                 
#>  3 buildings <NA>                 
#>  4 buildings <NA>                 
#>  5 industry  iron and steel       
#>  6 industry  iron and steel       
#>  7 industry  non-metallic minerals
#>  8 industry  non-metallic minerals
#>  9 industry  chemicals            
#> 10 industry  chemicals            
#> # ℹ 378 more rows
output |> unnest_product() |> select(matches(c("isic", "sector")))
#> # A tibble: 196 × 2
#>    tilt_sector tilt_subsector
#>    <chr>       <chr>         
#>  1 <NA>        <NA>          
#>  2 <NA>        <NA>          
#>  3 <NA>        <NA>          
#>  4 <NA>        <NA>          
#>  5 <NA>        <NA>          
#>  6 <NA>        <NA>          
#>  7 <NA>        <NA>          
#>  8 <NA>        <NA>          
#>  9 <NA>        <NA>          
#> 10 <NA>        <NA>          
#> # ℹ 186 more rows
output |> unnest_company() |> select(matches(c("isic", "sector")))
#> # A tibble: 588 × 0

# Upstream
companies <- read_csv(toy_sector_profile_upstream_companies())
scenarios <- read_csv(toy_sector_profile_any_scenarios())
co2 <- read_csv(toy_sector_profile_upstream_products())
output <- sector_profile_upstream(companies, scenarios, co2)
companies |> select(matches(c("isic", "sector")))
#> # A tibble: 8 × 1
#>   tilt_sector
#>   <chr>      
#> 1 energy     
#> 2 energy     
#> 3 energy     
#> 4 <NA>       
#> 5 land use   
#> 6 land use   
#> 7 land use   
#> 8 land use
scenarios |> select(matches(c("isic", "sector")))
#> # A tibble: 388 × 2
#>    sector    subsector            
#>    <chr>     <chr>                
#>  1 power     <NA>                 
#>  2 power     <NA>                 
#>  3 buildings <NA>                 
#>  4 buildings <NA>                 
#>  5 industry  iron and steel       
#>  6 industry  iron and steel       
#>  7 industry  non-metallic minerals
#>  8 industry  non-metallic minerals
#>  9 industry  chemicals            
#> 10 industry  chemicals            
#> # ℹ 378 more rows
# `co2`
co2 |> select(matches(c("isic", "sector")))
#> # A tibble: 74 × 5
#>    input_isic_4digit input_tilt_sector     input_tilt_subsector sector subsector
#>                <dbl> <chr>                 <chr>                <chr>  <chr>    
#>  1              3821 non-metallic minerals raw minerals         no_ma… no_match 
#>  2              3821 non-metallic minerals raw minerals         bioen… total en…
#>  3              2011 non-metallic minerals raw minerals         indus… chemicals
#>  4              2011 non-metallic minerals raw minerals         total  chemicals
#>  5              1201 non-metallic minerals raw minerals         indus… chemicals
#>  6              1201 non-metallic minerals raw minerals         total  chemicals
#>  7              4141 non-metallic minerals raw minerals         land … <NA>     
#>  8              4141 non-metallic minerals raw minerals         no_ma… no_match 
#>  9              1050 non-metallic minerals raw minerals         indus… other in…
#> 10              1050 non-metallic minerals raw minerals         total  industry 
#> # ℹ 64 more rows
output |> unnest_product() |> select(matches(c("isic", "sector")))
#> # A tibble: 704 × 3
#>    tilt_sector input_tilt_sector     input_tilt_subsector
#>    <chr>       <chr>                 <chr>               
#>  1 energy      non-metallic minerals raw minerals        
#>  2 energy      non-metallic minerals raw minerals        
#>  3 energy      non-metallic minerals raw minerals        
#>  4 energy      non-metallic minerals raw minerals        
#>  5 energy      non-metallic minerals raw minerals        
#>  6 energy      non-metallic minerals raw minerals        
#>  7 energy      non-metallic minerals raw minerals        
#>  8 energy      non-metallic minerals raw minerals        
#>  9 energy      non-metallic minerals raw minerals        
#> 10 energy      non-metallic minerals raw minerals        
#> # ℹ 694 more rows
output |> unnest_company() |> select(matches(c("isic", "sector")))
#> # A tibble: 294 × 0

# Emissions profile
companies <- read_csv(toy_emissions_profile_any_companies())
co2 <- read_csv(toy_emissions_profile_products())
output <- emissions_profile(companies, co2)
companies |> select(matches(c("isic", "sector")))
#> # A tibble: 9 × 0
# co2
co2 |> select(matches(c("isic", "sector")))
#> # A tibble: 5 × 3
#>   isic_4digit tilt_sector    tilt_subsector
#>         <dbl> <chr>          <chr>         
#> 1        2560 Industry       Other         
#> 2        2560 Industry       Other         
#> 3        2870 Steel & Metals Steel         
#> 4        1780 Agriculture    Agriculture   
#> 5        2679 Industry       Other
output |> unnest_product() |> select(matches(c("isic", "sector")))
#> # A tibble: 49 × 0
output |> unnest_company() |> select(matches(c("isic", "sector")))
#> # A tibble: 129 × 0

# Upstream
companies <- read_csv(toy_emissions_profile_any_companies())
co2 <- read_csv(toy_emissions_profile_upstream_products())
output <- emissions_profile_upstream(companies, co2)
companies |> select(matches(c("isic", "sector")))
#> # A tibble: 9 × 0
# co2
co2 |> select(matches(c("isic", "sector")))
#> # A tibble: 33 × 3
#>    input_isic_4digit input_tilt_sector input_tilt_subsector
#>                <dbl> <chr>             <chr>               
#>  1              2560 Inudstry          Other               
#>  2              2560 Inudstry          Other               
#>  3              2560 Inudstry          Other               
#>  4              2560 Inudstry          Other               
#>  5              2560 Inudstry          Other               
#>  6              2560 Inudstry          Other               
#>  7              2560 Inudstry          Other               
#>  8              2560 Inudstry          Other               
#>  9              2560 Inudstry          Other               
#> 10              2560 Inudstry          Other               
#> # ℹ 23 more rows
output |> unnest_product() |> select(matches(c("isic", "sector")))
#> # A tibble: 319 × 0
output |> unnest_company() |> select(matches(c("isic", "sector")))
#> # A tibble: 129 × 0

^{Created on 2023-09-26 with reprex v2.0.2}

AnneSchoenauer · 2023-09-28T05:54:03Z

AnneSchoenauer
Sep 28, 2023
Maintainer

Dear @maurolepore,

There are two ways how to derive at the tilt_subsector column. One is via the mapping of europages products and Ecoinvent products. When we do this, we derive the isic_digit from the mapping process and then map to the isic_digit the tilt_subsectors. The other way is via sector resolving with the use of GPT. Here we take the sector categories from the companies. As each company can be in more webscraped categories we use GPT to choose the right tilt_subsector.

Long story short - there are two ways of how to get the tilt_subsector and this is conceptually wanted.

To your note that sector profiles don’t have a CO2 datapoint. As the sector profiles are not matched with Ecoinvent you also don’t find a CO2 datapoint in the sector profiles. So this should also conceptually correct.

However, the code should not be messy. But I cannot judge about positions. Please note that I also saw that in the CSV files that we got as an output there a a lot of missing NAs for the sector profiles for the tilt_sector and tilt_subsector column. I am opening a new issue for this today. The issue could lie in the sector resolving part or somewhere else - I will tag you and Kalash to it. So here we see that somewhere needs to be a bug.

Hope that helped already a bit.
Best
Anne

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we need to redesign the top-level interface and/or the input datasets? #541

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Do we need to redesign the top-level interface and/or the input datasets? #541

Uh oh!

Uh oh!

maurolepore Sep 26, 2023

Replies: 1 comment

Uh oh!

AnneSchoenauer Sep 28, 2023 Maintainer

maurolepore
Sep 26, 2023

AnneSchoenauer
Sep 28, 2023
Maintainer