Skip to content

[WIP] Proposed vector data format and application to LandIQ data #3423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 26 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
b9aec30
add functions to convert from shp to gpkg, and from landiq to a 'stan…
Jan 25, 2025
f8744a4
renamed convert_landiq to landiq2gpkg. Now only takes shp as input. a…
Jan 25, 2025
8b092e5
split sf object into gis and tabular info, save as GPKG + CSV
Jan 25, 2025
d58a20b
separate generic shp2gpkg from landiq2std functions
Jan 26, 2025
5411d2f
update messages to pecan logger functions
Jan 26, 2025
b979b0e
fix escape \ --> \\
dlebauer Jan 27, 2025
bd6fc12
put shp2gpkg as step within landiq2std fn, as previously intended but…
Jan 27, 2025
75fc5a7
fixed a few bugs, changed to using latxlon as ids
Jan 27, 2025
2eb59ca
fixed a few bugs in landiq2std fn; re-added id field
dlebauer Jan 28, 2025
60c69ab
debugging landiq2std function
dlebauer Jan 29, 2025
39bb191
fleshed out get.attributes function and added extract_values from ras…
dlebauer Jan 30, 2025
3d5c62e
Update modules/data.land/R/landiq2std.R
dlebauer Feb 4, 2025
3675a31
stop --> logger.error
dlebauer Feb 4, 2025
893f31a
Merge branch 'shp2gpkg' of github.com:dlebauer/pecan into shp2gpkg
dlebauer Feb 4, 2025
368b2a4
update link to dwr data source
dlebauer Feb 5, 2025
5b37a87
change id --> site_id for clarity
dlebauer Mar 18, 2025
bf47efa
Merge branch 'shp2gpkg' of github.com:dlebauer/pecan into shp2gpkg
dlebauer Mar 18, 2025
883a215
removed importsfrom, use tempfile instead of input_gpkg for intermedi…
dlebauer Mar 18, 2025
ce2eb5e
use st_coordinates once instead of twice
dlebauer Mar 18, 2025
044cdae
Merge branch 'develop' of github.com:pecanproject/pecan into shp2gpkg
dlebauer Mar 21, 2025
c6cc497
implementing changes suggested in PR review
dlebauer Mar 21, 2025
18af482
update documentation
dlebauer Mar 22, 2025
7df7918
Merge branch 'PecanProject:develop' into shp2gpkg
dlebauer Apr 10, 2025
2b213e4
replace landiq pft mapping with table
dlebauer May 2, 2025
536d67b
Merge branch 'shp2gpkg' of github.com:dlebauer/pecan into shp2gpkg
dlebauer May 2, 2025
14c5ebb
updated landiq2std to write out in CA Albers espg:3310; added overwri…
dlebauer May 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions modules/data.land/R/landiq2std.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
#' Convert a LandIQ Shapefile into a Standardized Format consisting of
#' a GeoPackage file to store geospatial information and an associated
#' CSV file with attributes
#'
#' This function reads a LandIQ crop map shapefile downloaded from
#' https://www.landiq.com/land-use-mapping and processes the data into
#' a standardized GeoPackage and CSV format.
#'
#' @param input_file Character. Path to the input Shapefile
#' (GeoPackage created by shp2gpkg is also valid)
#' @param output_gpkg Character. Path to the output GeoPackage
#' @param output_csv Character. Path to the output CSV.
#'
#' @return Invisibly returns a list with paths to the output files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the output paths ever differ from the ones specified as input? Might be more useful to return a success/failure status instead of information the user already has.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I also think returning invisibly is more often confusing than useful, but side-effecting functions like this are a place it can work)

#' @details
#' This function:
#' - Reads a Shapefile using the `sf` package.
#' - Calculates centroids to extract latitude and longitude.
#' - Converts `Acres` column to hectares (`ha`).
#' - Extracts crop information and assigns a PFT (Plant Functional Type).
#' - Outputs a standardized GeoPackage with columns `id`, `geometry`, `lat`, `lon`, and `area_ha`.
#' - Outputs a CSV with columns `year`, `crop`, `pft`, `source`, and `notes`.
#'
#' Note: TODO provide crop-->PFT mapping as an external file using either
#' defaults stored with the PEcAn.data.land package or an optional custom
#' mapping
#'
#' @examples
#' # Specify input and output file paths
#' input_file <- "path/to/landiq.shp"
#' output_gpkg <- "path/to/your_output.gpkg"
#' output_csv <- "path/to/your_output.csv"
#'
#' # Run the conversion function
#' landiq2std(input_file, output_gpkg, output_csv)
#'
#' @importFrom sf st_read st_transform st_centroid st_coordinates st_write
#' @importFrom dplyr mutate select rename filter case_when distinct
#' @export
landiq2std <- function(input_file, output_gpkg, output_csv) {
# Check input file format
if (grepl(pattern = "\\.shp$", input_file)) {
PEcAn.logger::logger.info(
"Converting Shapefile:", basename(input_file),
"to GeoPackage:", basename(output_gpkg)
)
# read in, repair geometries, write out repaired geopackage
shp2gpkg(input_file, output_gpkg, overwrite = TRUE)
input_file <- output_gpkg # now gpkg is input file
}

# Read the Shapefile
landiq_polygons <- sf::st_read(input_file, quiet = FALSE)

# Determine crop year from column name
crop_col <- grep("Crop[0-9]{4}", colnames(landiq_polygons), value = TRUE)
year <- gsub("Crop", "", crop_col)
# Check required columns
required_cols <- c("Acres", crop_col, "Source", "Comments", "County", "geom")
missing_cols <- setdiff(required_cols, colnames(landiq_polygons))
if (length(missing_cols) > 0) {
stop("Input file is missing the following columns: ", paste(missing_cols, collapse = ", "))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use logger instead of stop

}

# possible spedup by pre-computing lat and lon
landiq_polygons_updated <- landiq_polygons |>
sf::st_transform(4326) |>
dplyr::mutate(
year = year,
lon = sf::st_coordinates(sf::st_centroid(geom))[, "X"],
lat = sf::st_coordinates(sf::st_centroid(geom))[, "Y"],
area_ha = PEcAn.utils::ud_convert(Acres, "acre", "ha")
) |>
dplyr::rowwise() |>
dplyr::mutate( # generate ids rowwise separately because
# rowwise geospatial operations are very slow
id = digest::digest(geom, algo = "xxhash64")
) |>
dplyr::rename(county = County)

# Process data for GeoPackage
gpkg_data <- landiq_polygons_updated |>
dplyr::select(id, geom, lat, lon, area_ha, county)

# Process data for CSV
csv_data <- landiq_polygons_updated |>
tidyr::as_tibble() |>
dplyr::mutate(
crop = .data[[crop_col]],
pft = case_when(
crop %in% c(
"Cherries", "Almonds", "Plums, Prunes and Apricots",
"Walnuts", "Citrus", "Miscellaneous Deciduous", "Pears", "Olives",
"Apples", "Pistachios", "Bush Berries", "Peaches/Nectarines",
"Miscellaneous Subtropical Fruits", "Pomegranates"
) ~ "woody perennial crop",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another column I don't recognize -- The DWR shapefiles have alphanumeric codes (eg cherry = D3, almond = D12, etc), not cropnames. Am I missing where you merge in the lookup?

(If you have the codes available, I think the woody perennials will likely consist of all of C = "citrus and subtropical", D = "deciduous fruits and nuts", and V = "vineyards", plus T19 = "bush berries", and T28 = "blueberries")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I see your TODO above, but just to make sure: the mapping definitely needs to be made configurable before merging rather than left hardcoded into the function.

TRUE ~ paste0("no PFT for ", crop)
)
) |>
dplyr::rename(
source = Source,
notes = Comments
) |>
dplyr::select(id, lat, lon, year, crop, pft, source, notes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is a place I might do the renaming inside the select like ...select(..., source = Source, notes = Comments), but this works


# Warn about crops without a PFT
unassigned_pft <- csv_data |>
filter(grepl("no PFT for", pft)) |>
distinct(crop, pft)
if (nrow(unassigned_pft) > 0) {
PEcAn.logger::logger.warn(
"The following crops do not have a PFT assigned:",
paste(unassigned_pft$crop, collapse = ", ")
)
}

# Write outputs
file.remove(output_gpkg, output_csv)
sf::st_write(gpkg_data,
output_gpkg,
layer = "sites", # analogous to BETYdb table name
quiet = FALSE
)
readr::write_csv(csv_data, output_csv)

# Return paths to output files
invisible(list(GeoPackage = output_gpkg, CSV = output_csv))
}
84 changes: 84 additions & 0 deletions modules/data.land/R/shp2gpkg.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
#' Convert Shapefile to GeoPackage
#'
#' This function converts a Shapefile to a GeoPackage using the `sf` package.
#'
#' @param input_shp Character. Path to the input Shapefile (e.g., `"data/myfile.shp"`).
#' @param output_gpkg Character. Path to the output GeoPackage (e.g., `"output/myfile.gpkg"`).
#' @param layer_name Character. Optional. Name of the layer in the GeoPackage. Defaults to the base name of the Shapefile.
#' @param overwrite Logical. Whether to overwrite an existing layer in the GeoPackage. Defaults to `TRUE`.
#'
#' @return Invisibly returns the path to the GeoPackage (`output_gpkg`) upon successful conversion.
#' @details
#' This function reads a Shapefile, converts it to an `sf` object, and writes it to a GeoPackage.
#' The `sf` package handles the conversion and ensures spatial data integrity.
Comment on lines +11 to +13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I recommend moving details above the parameter listing and removing the @details tag -- that's the layout Roxygen favors and it treats paragraph 1 as the title, paragraph 2 as the description, and paragraphs 3-and-onward as details. Yes, it differs from the layout of an Rd page, but I find it makes it easier to find the parameters when reading in source form.

#'
#' @examples
#' # Convert 'roads.shp' to 'roads.gpkg' with the default layer name
#' shp2gpkg("data/roads.shp", "output/roads.gpkg")
#'
#' # Convert with a custom layer name
#' shp2gpkg("data/roads.shp", "output/roads.gpkg", layer_name = "custom_layer")
#'
#' # Prevent overwriting existing layers
#' shp2gpkg("data/roads.shp", "output/roads.gpkg", overwrite = FALSE)
#'
#' @importFrom sf st_read st_write st_is_valid st_make_valid
#' @export
shp2gpkg <- function(input_shp, output_gpkg, layer_name = NULL, overwrite = TRUE) {
# Load the Shapefile
shapefile <- sf::st_read(input_shp, quiet = TRUE)

# Check validity of geometries
Copy link
Member Author

@dlebauer dlebauer Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camalmborg this is the logic for reparing geometries. The general approach within pecan is to bring in external data and immediately converted to a common standard. shp2gpkg can be part of that workflow, as it is here.

is_geometry_valid <- sf::st_is_valid(shapefile)
total_geometries <- length(is_geometry_valid)
n_invalid <- sum(!is_geometry_valid)
percent_invalid <- (n_invalid / total_geometries) * 100

# Log invalid geometry statistics

PEcAn.logger::logger.info("Total records in shapefile: ", total_geometries)

if (n_invalid > 0) {
# Report invalid geometry statistics
PEcAn.logger::logger.info(
n_invalid,
"(", round(percent_invalid, 2), "%)",
"of geometries are invalid"
)
PEcAn.logger::logger.info(". Attempting to repair invalid geometries")

# Attempt to repair invalid geometries
repaired_shapefile <- sf::st_make_valid(shapefile)
is_repair_successful <- sf::st_is_valid(repaired_shapefile)
n_repaired <- n_invalid - sum(!is_repair_successful)
n_remaining_invalid <- sum(!is_repair_successful)

# Report repair results
message("Repaired ", n_repaired, " geometries successfully.")
if (n_remaining_invalid > 0) {
PEcAn.logger::logger.warn("Could not repair ", n_remaining_invalid, " geometries. They will be excluded.")
} else {
PEcAn.logger::logger.info("All geometries were repaired successfully.")
}

# Retain only valid geometries after repair
shapefile <- repaired_shapefile[is_repair_successful, ]
}

# Determine layer name from input file if not provided
if (is.null(layer_name)) {
layer_name <- tools::file_path_sans_ext(basename(input_shp))
}

# Write to GeoPackage
sf::st_write(
shapefile,
output_gpkg,
layer = layer_name,
delete_layer = overwrite,
quiet = TRUE
)

# Invisibly return the file path
invisible(output_gpkg)
}
Loading