Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error with tidyee_ob |>group_by(a,b,c) |> summarise(stat=stat) when grouping creates >n groups #29

Open
zackarno opened this issue Aug 19, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@zackarno
Copy link
Member

zackarno commented Aug 19, 2022

When your grouping creates a huge number of groups to summarise the tidyee object over there seems to be an issue. This wont happen with typical group_by(year) or group_by(year,month) work flows, but can happen if you include doy in the grouping. I have not figured out the limit # of groups or the exact source of the problem, but the reprex below shows the issue and gets passed the first error message and onto the next.

library(tidyrgee)

library(rgee)
ee_Initialize()
#> -- rgee 1.1.2.9000 ---------------------------------- earthengine-api 0.1.295 -- 
#>  v user: not_defined
#>  v Initializing Google Earth Engine: v Initializing Google Earth Engine:  DONE!
#> --------------------------------------------------------------------------------
ic <- ee$ImageCollection("COPERNICUS/S5P/OFFL/L3_NO2")
ic_tidy <- as_tidyee(ic)
ic_tidy
#> band names: [ NO2_column_number_density, tropospheric_NO2_column_number_density, stratospheric_NO2_column_number_density, NO2_slant_column_number_density, tropopause_pressure, absorbing_aerosol_index, cloud_fraction, sensor_altitude, sensor_azimuth_angle, sensor_zenith_angle, solar_azimuth_angle, solar_zenith_angle ] 
#> 
#> $ee_ob
#> EarthEngine Object: ImageCollection
#> $vrt
#> # A tibble: 21,185 x 8
#>    id           time_start          syste~1 date       month  year   doy band_~2
#>    <chr>        <dttm>              <chr>   <date>     <dbl> <dbl> <dbl> <list> 
#>  1 COPERNICUS/~ 2018-06-28 10:45:42 201806~ 2018-06-28     6  2018   179 <chr>  
#>  2 COPERNICUS/~ 2018-06-28 12:27:12 201806~ 2018-06-28     6  2018   179 <chr>  
#>  3 COPERNICUS/~ 2018-06-28 14:52:09 201806~ 2018-06-28     6  2018   179 <chr>  
#>  4 COPERNICUS/~ 2018-06-28 15:50:11 201806~ 2018-06-28     6  2018   179 <chr>  
#>  5 COPERNICUS/~ 2018-06-28 17:31:41 201806~ 2018-06-28     6  2018   179 <chr>  
#>  6 COPERNICUS/~ 2018-06-28 19:13:12 201806~ 2018-06-28     6  2018   179 <chr>  
#>  7 COPERNICUS/~ 2018-06-28 20:54:41 201806~ 2018-06-28     6  2018   179 <chr>  
#>  8 COPERNICUS/~ 2018-06-28 22:36:11 201806~ 2018-06-28     6  2018   179 <chr>  
#>  9 COPERNICUS/~ 2018-06-29 00:17:40 201806~ 2018-06-29     6  2018   180 <chr>  
#> 10 COPERNICUS/~ 2018-06-29 01:59:11 201806~ 2018-06-29     6  2018   180 <chr>  
#> # ... with 21,175 more rows, and abbreviated variable names 1: system_index,
#> #   2: band_names
#> # i Use `print(n = ...)` to see more rows
#> 
#> attr(,"class")
#> [1] "tidyee"


# the l3_NO2 ic has multiple records per day so I want to summarise by dat (i.e  year, month , doy)
# there is a silent failure going on here
ic_summarised_daily <- ic_tidy |>
  group_by(year, month,doy) |>
  summarise(stat = "mean")

# this often happens with `rgee` and thus `tidyrgee`... it seems like the best
# way to check if the object has been created successfully is to try a `$getInfo` call

ic_summarised_daily$ee_ob$first()$bandNames()$getInfo()
#> Error in py_call_impl(callable, dots$args, dots$keywords): RecursionError: maximum recursion depth exceeded in comparison

# okay a maximum recursion issue - seems reasonable.. under the hood we are splitting 
# the `vrt` and `ic` into thousands of groups... I can increase the recursion limit and see what 
# happens (default is 1000)

sys <-  reticulate::import("sys")
sys$setrecursionlimit(as.integer(5000))

# lets run `$getInfo()` again with the recursion limit increased....

ic_summarised_daily$ee_ob$first()$bandNames()$getInfo()
#> Error in py_call_impl(callable, dots$args, dots$keywords): ee.ee_exception.EEException: Collection.first: merge() is too deeply nested.

# we get a new error, which took alot longer to appear than the first.

Created on 2022-08-19 by the reprex package (v2.0.1)

@zackarno
Copy link
Member Author

zackarno commented Aug 19, 2022

Something else I noticed - the issue seems to be occurring in this code (inside summarise_pixels)

tidyee_output <- .data |>
      group_split() |>
      purrr::map(
        ~ee_composite(
          .x |>
          group_by(!!!rlang::syms(group_vars_chr)),
          stat=stat)
        ) |>
      bind_ics()

It seems like it is occurring in the bind_ics() function rather than ee_composite() because if you remove bind_ics() from the above you can query the list of composite_ics with getInfo() without issues, for example:

tidyee_output <- .data |>
      group_split() |>
      purrr::map(
        ~ee_composite(
          .x |>
          group_by(!!!rlang::syms(group_vars_chr)),
          stat=stat)
        ) # removed bind_ics 

# no problem
 tidyee_output[[1]]$ee_ob$bandNames()$getInfo()

So it seems the issue is when the ics are merged - this is also suggested by the second error message in the reprex above

@zackarno zackarno added the bug Something isn't working label Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant