Skip to content

Commit

Permalink
nav
Browse files Browse the repository at this point in the history
  • Loading branch information
cboettig committed Feb 9, 2024
1 parent 6630ce0 commit 5226e80
Show file tree
Hide file tree
Showing 16 changed files with 55 additions and 24 deletions.
16 changes: 16 additions & 0 deletions _freeze/tutorials/R/1-intro-R/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"hash": "7cd21c6b25fc6d4141e84652629729d2",
"result": {
"markdown": "---\ntitle: \"Introduction\"\ndescription: \"Cloud-Native Data in R\"\n---\n\n\n\n## Exploring the Legacy of Redlining\n\nThis executable notebook provides an opening example to illustrate a cloud-native workflow in both R and python. \nPedagogy research emphasizes the importance of \"playing the whole game\" before breaking down every pitch and hit.\nWe intentionally focus on powerful high-level tools (STAC API, COGs, datacubes) to illustrate how a few chunks of\ncode can perform a task that would be far slower and more verbose in a traditional file-based, download-first workflow.\nNote the close parallels between R and Python syntax. This arises because both languages wrap the same underlying \ntools (the STAC API and GDAL warper) and handle many of the nuisances of spatial data -- from re-projections and\nresampling to mosaic tiles -- without us noticing.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rstac)\nlibrary(gdalcubes)\nlibrary(stars)\nlibrary(tmap)\nlibrary(dplyr)\ngdalcubes::gdalcubes_options(parallel = TRUE)\n```\n:::\n\n\n\n## Data discovery\n\n<!--\nLarge geospatial data comes in many different formats and is frequently divided into many individual files or \"assets\" which may represent different points in space, time, sensor bands or variables. Many users are familiar with file-based workflows, where each file type is read into the computational environment by a specific tool and that workflows proceed file-by-file. However, the same data can be represented in many different formats (ncdf or tiff, say) and subdivided in different ways. Importantly, the file-based-divisions often do not reflect the way a user might want to work with the data. For instance, a NASA ncdf product may provide sea-surface-temperature as one file per day, with each file covering the entire global extent, while a user wants to examine trends in the data over time but only in a certain regional area. In such cases, it is inefficient to download data for the whole globe over many files. Just as end-users in high level languages are not expected to manage very low-level concepts like memory block sizes, geospatial data scientists need not worry about these file serialization details when they have good high-level abstractions that can do it for them. \n-->\n\nThe first step in many workflows involves discovering individual spatial data files covering the space, time, and variables of interest. Here we use a [STAC](https://stacspec.org/en) Catalog API to recover a list of candidate data. \nWe dig deeper into how this works and what it returns in later recipes. This example searches for images in a lon-lat bounding box from a collection of Cloud-Optimized-GeoTIFF (COG) images taken by Sentinel2 satellite mission.\nThis function will not download any imagery, it merely gives us a list of metadata about available images, including the access URLs.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbox <- c(xmin=-122.51, ymin=37.71, xmax=-122.36, ymax=37.81) \nstart_date <- \"2022-06-01\"\nend_date <- \"2022-08-01\"\nitems <-\n stac(\"https://earth-search.aws.element84.com/v0/\") |>\n stac_search(collections = \"sentinel-s2-l2a-cogs\",\n bbox = box,\n datetime = paste(start_date, end_date, sep=\"/\"),\n limit = 100) |>\n ext_query(\"eo:cloud_cover\" < 20) |>\n post_request()\n```\n:::\n\n\nWe pass this list of images to a high-level utilty (`gdalcubes` in R, `odc.stac` in python) that will do all of the heavy lifting. Using the URLs and metadata provided by STAC, \nthese functions can extract only our data of interest (given by the bounding box) without downloading unnecessary regions or bands. While streaming the data, these functions\nwill also reproject it into the desired coordinate reference system -- (an often costly operation to perform in R) and can potentially resample or aggregate the data to a desired \nspatial resolution. (The R code will also resample from images in overlapping areas to replace pixels masked by clouds)\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncol <- stac_image_collection(items$features, asset_names = c(\"B08\", \"B04\", \"SCL\"))\n\ncube <- cube_view(srs =\"EPSG:4326\",\n extent = list(t0 = start_date, t1 = end_date,\n left = box[1], right = box[3],\n top = box[4], bottom = box[2]),\n dx = 0.0001, dy = 0.0001, dt = \"P1D\",\n aggregation = \"median\", resampling = \"average\")\n\nmask <- image_mask(\"SCL\", values=c(3, 8, 9)) # mask clouds and cloud shadows\n\ndata <- raster_cube(col, cube, mask = mask)\n```\n:::\n\n\n\nWe can do arbitrary calculations on this data as well. Here we calculate NDVI, a widely used measure of greenness that can be used to determine tree cover. \n(Note that the R example uses lazy evaluation, and can thus perform these calculations while streaming)\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nndvi <- data |>\n select_bands(c(\"B04\", \"B08\")) |>\n apply_pixel(\"(B08-B04)/(B08+B04)\", \"NDVI\") |>\n reduce_time(c(\"mean(NDVI)\"))\n\nndvi_stars <- st_as_stars(ndvi)\n```\n:::\n\n\n\nAnd we plot the result. The long rectangle of Golden Gate Park is clearly visible in the North-West.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmako <- tm_scale_continuous(values = viridisLite::mako(30))\nfill <- tm_scale_continuous(values = \"Greens\")\n\ntm_shape(ndvi_stars) + tm_raster(col.scale = mako)\n```\n\n::: {.cell-output-display}\n![](1-intro-R_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n\n\n# From NDVI to Environmental Justice\n\nWe examine the present-day impact of historic \"red-lining\" of US cities during the Great Depression using data from the [Mapping Inequality](https://dsl.richmond.edu/panorama/redlining) project. All though this racist practice was banned by federal law under the Fair Housing Act of 1968, the systemic scars of that practice are still so deeply etched on our landscape that the remain visible from space -- \"red-lined\" areas (graded \"D\" under the racist HOLC scheme) show systematically lower greenness than predominately-white neighborhoods (Grade \"A\"). Trees provide many benefits, from mitigating urban heat to biodiversity, real-estate value, to health.\n\n\n## Zonal statistics \n\nIn addition to large scale raster data such as satellite imagery, the analysis of vector shapes such as polygons showing administrative regions is a central component of spatial analysis, and particularly important to spatial social sciences. The red-lined areas of the 1930s are one example of spatial vectors. One common operation is to summarise the values of all pixels falling within a given polygon, e.g. computing the average greenness (NDVI) \n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsf <- \"/vsicurl/https://dsl.richmond.edu/panorama/redlining/static/citiesData/CASanFrancisco1937/geojson.json\" |>\n st_read() |>\n st_make_valid() |>\n select(-label_coords)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\npoly <- ndvi |> extract_geom(sf, FUN = mean, reduce_time = TRUE)\nsf$NDVI <- poly$NDVI\n```\n:::\n\n\n\nWe plot the underlying NDVI as well as the average NDVI of each polygon, along with it's textual grade, using `tmap`. Note that \"A\" grades tend to be darkest green (high NDVI) while \"D\" grades are frequently the least green. (Regions not zoned for housing at the time of the 1937 housing assessment are not displayed as polygons.)\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntm_shape(ndvi_stars) + tm_raster(col.scale = mako) +\n tm_shape(sf) + tm_polygons('NDVI', fill.scale = fill) +\n tm_shape(sf) + tm_text(\"grade\", col=\"darkblue\", size=0.6) +\n tm_legend_hide()\n```\n\n::: {.cell-output-display}\n![](1-intro-R_files/figure-html/unnamed-chunk-8-1.png){width=672}\n:::\n:::\n\n\n\nAre historically redlined areas still less green?\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsf |> \n as_tibble() |>\n group_by(grade) |> \n summarise(ndvi = mean(NDVI), \n sd = sd(NDVI)) |>\n knitr::kable()\n```\n\n::: {.cell-output-display}\n|grade | ndvi| sd|\n|:-----|---------:|---------:|\n|A | 0.3201204| 0.0611414|\n|B | 0.2138501| 0.0783221|\n|C | 0.1956334| 0.0564822|\n|D | 0.1949736| 0.0385805|\n|NA | 0.0962092| NA|\n:::\n:::\n",
"supporting": [
"1-intro-R_files"
],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"hash": "936e34da6da3eac5bbee72becc66c013",
"result": {
"markdown": "---\ntitle: \"NASA EarthData\"\nformat: html\n---\n\n\nThe NASA EarthData program provides access to an extensive collection of spatial data products from each of its 12 Distributed Active Archive Centers ('DAACs') on the high-performance S3 storage system of Amazon Web Services (AWS). We can take advantage of range requests with NASA EarthData URLs, but unlike the previous examples,\nNASA requires an authentication step. NASA offers several different mechanisms, including `netrc` authentication, token-based authentication, and S3 credentials, but only the first of these works equally well from locations both inside and outside of AWS-based compute, so there really is very little reason to learn the other two.\n\nThe [`earthdatalogin` package in R](https://boettiger-lab.github.io/earthdatalogin/) or the `earthaccess` package in Python handle the authentication. The R package sets up authentication behind the scenes using environmental variables.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nearthdatalogin::edl_netrc()\n```\n:::\n\n\n(A default login is supplied though users are encouraged to [register](https://urs.earthdata.nasa.gov/home) for their own individual accounts.) Once this is in place, EarthData's protected URLs can be used like any other: \n\n\n::: {.cell}\n\n```{.r .cell-code}\nterra::rast(\"https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T56JKT.2023246T235950.v2.0/HLS.L30.T56JKT.2023246T235950.v2.0.SAA.tif\",\n vsi=TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nclass : SpatRaster \ndimensions : 3660, 3660, 1 (nrow, ncol, nlyr)\nresolution : 30, 30 (x, y)\nextent : 199980, 309780, 7190200, 7300000 (xmin, xmax, ymin, ymax)\ncoord. ref. : WGS 84 / UTM zone 56N \nsource : HLS.L30.T56JKT.2023246T235950.v2.0.SAA.tif \nname : HLS.L30.T56JKT.2023246T235950.v2.0.SAA \n```\n:::\n:::\n",
"markdown": "---\ntitle: \"NASA EarthData\"\nformat: html\n---\n\n\nThe NASA EarthData program provides access to an extensive collection of spatial data products from each of its 12 Distributed Active Archive Centers ('DAACs') on the high-performance S3 storage system of Amazon Web Services (AWS). We can take advantage of range requests with NASA EarthData URLs, but unlike the previous examples,\nNASA requires an authentication step. NASA offers several different mechanisms, including `netrc` authentication, token-based authentication, and S3 credentials, but only the first of these works equally well from locations both inside and outside of AWS-based compute, so there really is very little reason to learn the other two.\n\nThe [`earthdatalogin` package in R](https://boettiger-lab.github.io/earthdatalogin/) or the `earthaccess` package in Python handle the authentication. The R package sets up authentication behind the scenes using environmental variables.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nearthdatalogin::edl_netrc()\n```\n:::\n\n\n(A default login is supplied though users are encouraged to [register](https://urs.earthdata.nasa.gov/home) for their own individual accounts.) Once this is in place, EarthData's protected URLs can be used like any other: \n\n\n::: {.cell}\n\n```{.r .cell-code}\nterra::rast(\"https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T56JKT.2023246T235950.v2.0/HLS.L30.T56JKT.2023246T235950.v2.0.SAA.tif\",\n vsi=TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nclass : SpatRaster \ndimensions : 3660, 3660, 1 (nrow, ncol, nlyr)\nresolution : 30, 30 (x, y)\nextent : 199980, 309780, 7190200, 7300000 (xmin, xmax, ymin, ymax)\ncoord. ref. : WGS 84 / UTM zone 56N (EPSG:32656) \nsource : HLS.L30.T56JKT.2023246T235950.v2.0.SAA.tif \nname : HLS.L30.T56JKT.2023246T235950.v2.0.SAA \n```\n:::\n:::\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
Expand Down
14 changes: 14 additions & 0 deletions _freeze/tutorials/R/earthdata/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"hash": "936e34da6da3eac5bbee72becc66c013",
"result": {
"markdown": "---\ntitle: \"NASA EarthData\"\nformat: html\n---\n\n\nThe NASA EarthData program provides access to an extensive collection of spatial data products from each of its 12 Distributed Active Archive Centers ('DAACs') on the high-performance S3 storage system of Amazon Web Services (AWS). We can take advantage of range requests with NASA EarthData URLs, but unlike the previous examples,\nNASA requires an authentication step. NASA offers several different mechanisms, including `netrc` authentication, token-based authentication, and S3 credentials, but only the first of these works equally well from locations both inside and outside of AWS-based compute, so there really is very little reason to learn the other two.\n\nThe [`earthdatalogin` package in R](https://boettiger-lab.github.io/earthdatalogin/) or the `earthaccess` package in Python handle the authentication. The R package sets up authentication behind the scenes using environmental variables.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nearthdatalogin::edl_netrc()\n```\n:::\n\n\n(A default login is supplied though users are encouraged to [register](https://urs.earthdata.nasa.gov/home) for their own individual accounts.) Once this is in place, EarthData's protected URLs can be used like any other: \n\n\n::: {.cell}\n\n```{.r .cell-code}\nterra::rast(\"https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T56JKT.2023246T235950.v2.0/HLS.L30.T56JKT.2023246T235950.v2.0.SAA.tif\",\n vsi=TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nclass : SpatRaster \ndimensions : 3660, 3660, 1 (nrow, ncol, nlyr)\nresolution : 30, 30 (x, y)\nextent : 199980, 309780, 7190200, 7300000 (xmin, xmax, ymin, ymax)\ncoord. ref. : WGS 84 / UTM zone 56N (EPSG:32656) \nsource : HLS.L30.T56JKT.2023246T235950.v2.0.SAA.tif \nname : HLS.L30.T56JKT.2023246T235950.v2.0.SAA \n```\n:::\n:::\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 12 additions & 16 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,24 +42,20 @@ website:
sidebar:
style: floating
contents:
- section: R Tutorials
- section: 'Tutorials in R'
icon: mortarboard-fill
contents:
- text: Introduction (R)
icon: play-btn
href: contents/intro-R.qmd
- text: NASA EarthData
icon: rocket
href: contents/earthdata.qmd
- section: Python Tutorials
- auto: "tutorials/R/*"
- section: 'Tutorials in Python'
icon: mortarboard-fill
contents:
- text: Introduction (python)
icon: play-btn
href: contents/intro-python.ipynb
- section: Recipes
- section: Background
contents:
- text: Portable Environments
href: contents/computing-environment.qmd
- text: Introduction
href: tutorials/python/intro-python.html
- section: Platforms
icon: pc-display-horizontal
contents:
- href: tutorials/computing-environment.qmd
- section: Recipes
page-footer:
right:
- icon: github
Expand Down
2 changes: 1 addition & 1 deletion assets/html/footer.html
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
<script src="assets/js/material-kit.js"></script>
<script src="/assets/js/material-kit.js"></script>
<script src="https://kit.fontawesome.com/42d5adcbca.js" crossorigin="anonymous"></script>
Binary file modified assets/img/blue-marble.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions contents/intro-R.qmd → tutorials/R/1-intro-R.qmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Introduction
icon: play-btn
title: "Introduction"
description: "Cloud-Native Data in R"
---


Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes
Original file line number Diff line number Diff line change
@@ -1,19 +1,24 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "dc109c59",
"cell_type": "raw",
"id": "2fa81dda-2b4a-4795-b972-91b13d07363e",
"metadata": {},
"source": [
" # Examining Environmental Justice through Open Source, Cloud-Native Tools: Python\n",
"\n"
"---\n",
"title: \"Introduction to cloud-native data: Python\"\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "7a1b19ae-a3df-455c-9128-4d34e30c42e2",
"metadata": {},
"source": [
"# Exploring the Legacy of Redlining\n",
"\n",
"\n",
"\n",
"This executable notebook provides an opening example to illustrate a cloud-native workflow. \n",
"Pedagogy research emphasizes the importance of \"playing the whole game\" before breaking down every pitch and hit.\n",
"We intentionally focus on powerful high-level tools (STAC API, COGs, datacubes) to illustrate how a few chunks of\n",
Expand Down

0 comments on commit 5226e80

Please sign in to comment.