Skip to content

Commit

Permalink
sst
Browse files Browse the repository at this point in the history
  • Loading branch information
cboettig committed Feb 9, 2024
1 parent 5226e80 commit 0115cb3
Show file tree
Hide file tree
Showing 5 changed files with 66 additions and 12 deletions.
3 changes: 1 addition & 2 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,7 @@ website:
- section: 'Tutorials in Python'
icon: mortarboard-fill
contents:
- text: Introduction
href: tutorials/python/intro-python.html
- auto: "tutorials/python/*"
- section: Platforms
icon: pc-display-horizontal
contents:
Expand Down
10 changes: 7 additions & 3 deletions index.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "NASA TOPS-T: Cloud Native Geospatial in R & Python"
title: "Cloud Native Geospatial in R & Python"
format: html
---

Expand All @@ -8,12 +8,16 @@ format: html

[![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)
[![Docker Image :whale2:](https://github.com/boettiger-lab/nasa-topst-env-justice/actions/workflows/docker-image.yml/badge.svg)](https://github.com/boettiger-lab/nasa-topst-env-justice/actions/workflows/docker-image.yml)
[![](https://github.com/codespaces/badge.svg)](https://codespaces.new/espm-157/nasa-topst-env-justice?quickstart=1)

<!--
[![](https://github.com/codespaces/badge.svg)](https://codespaces.new/espm-157/nasa-topst-env-justice?quickstart=1)
-->

---

This project seeks to introduce cloud-native approaches to geospatial analysis in R & Python through the lens of environmental justice applications. This is not meant as a complete course in geospatial analysis -- though we encourage interested readers to consider [Geocomputation in R or Python](https://geocompx.org/) as an excellent resource. We present opinionated recipes meant to empower users with the following design goals:


This project seeks to introduce [cloud-native approaches to geospatial analysis](https://cloudnativegeo.org) in R & Python through the lens of environmental justice applications. This is not meant as a complete course in geospatial analysis -- though we encourage interested readers to consider [Geocomputation in R or Python](https://geocompx.org/) as an excellent resource. We present opinionated recipes meant to empower users with the following design goals:

- Open science: open source software, open data, open standards, and reproducibility are emphasized.
- Recipes are presented as reproducible computational notebooks (Quarto and Jupyter) set in narrative analysis.
Expand Down
2 changes: 1 addition & 1 deletion tutorials/R/1-intro-R.qmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Introduction"
description: "Cloud-Native Data in R"
description: "Cloud-native geospatial data in R"
---


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
"metadata": {},
"source": [
"---\n",
"title: \"Introduction to cloud-native data: Python\"\n",
"title: \"Introduction\"\n",
"description: \"Cloud-native geospatial data in Python\"\n",
"---"
]
},
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
{
"cells": [
{
"cell_type": "raw",
"id": "c6d00574-f0e0-4c20-8218-8fb76a41d346",
"metadata": {},
"source": [
"---\n",
"title: xarray without downloads\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "eae02ba3-39e6-4ef6-9a0f-3210fdc5ce68",
Expand Down Expand Up @@ -230,7 +240,48 @@
"\n",
"Because the NASA EarthData are behind a security layer, using the URLs directly instead of `earthaccess` with fsspec requires a little extra handling of authentication process to make GDAL aware of the NETRC and cookie files it needs. We'll also set some of the optional but recommended options for GDAL when using the virtual filesystem. Unfortunately this makes our code look a bit verbose -- ideally packages like `rioxarray` would take care of these things.\n",
"\n",
"Note the GDAL is about 3x faster at setting up the virtual filesystem, and a little faster in the xarray/dask dispatch to compute the plot. (When this approach is combined with metadata from a STAC catalog, it does not need to read individual file metadata and the first step can become almost instant).\n"
"Note the GDAL is about 3x faster at setting up the virtual filesystem, and a little faster in the xarray/dask dispatch to compute the plot. (When this approach is combined with metadata from a STAC catalog, it does not need to read individual file metadata and the first step can become almost instant). GDAL performance is constantly improving, especially with regards to cloud native reads, so a recent version can make a huge difference.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "aa2d3c02-6617-4d44-87f2-7b0923cf580a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"rasterio info:\n",
" rasterio: 1.3.9\n",
" GDAL: 3.6.4\n",
" PROJ: 9.0.1\n",
" GEOS: 3.11.1\n",
" PROJ DATA: /opt/venv/lib/python3.10/site-packages/rasterio/proj_data\n",
" GDAL DATA: /opt/venv/lib/python3.10/site-packages/rasterio/gdal_data\n",
"\n",
"System:\n",
" python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]\n",
"executable: /opt/venv/bin/python\n",
" machine: Linux-6.6.10-76060610-generic-x86_64-with-glibc2.35\n",
"\n",
"Python deps:\n",
" affine: 2.4.0\n",
" attrs: 23.2.0\n",
" certifi: 2024.02.02\n",
" click: 8.1.7\n",
" cligj: 0.7.2\n",
" cython: None\n",
" numpy: 1.26.3\n",
" snuggs: 1.4.7\n",
"click-plugins: None\n",
"setuptools: 59.6.0\n"
]
}
],
"source": [
"rasterio.show_versions()"
]
},
{
Expand Down Expand Up @@ -337,16 +388,15 @@
"\n",
"The GDAL VSI is already widely used under the hood by python packages working with cloud-optimized geotiff (COG) files (e.g. via `odc.stac`, which like the above approach also produces dask-backed xarrays), and also widely used by most other languages (e.g. R) for working with any spatial data. To GDAL, netcdf and other so-called \"n-dimensional array\" formats like h5, zarr are just a handful of the [160-odd formats of \"raster\" data](https://gdal.org/drivers/raster/index.html) it supports, along with formats like COG and GeoTIFF files. It can be particularly powerful in more complicated workflows which require spatially-aware operations such as reprojection and aggregation. The GDAL VSI can sometimes be considerably faster than fsspec, expecially when configured for cloud-native access. The nusiance of these environmental variables aside, it can also be considerably easier to use and to generalize patterns across data formats (netcdf, zarr, COG), and across languages (R, C++, javascript, julia etc), since GDAL understands [all these formats] and is used in all of these languages, as well as in platforms such as Google Earth Engine and QGIS. This makes it a natural bridge between languages. This broad use over decades has made GDAL very powerful, and it continues to improve rapidly with frequent releases. \n",
"\n",
"\n",
"For some reason, the `xarray` community seems to prefer to access ncdf without GDAL, whether by relying on downloading complete files, using fsspec, or other dedicated libraries (zarr). There are possibly many reasons for this. One is a divide between the the \"Geospatial Information Systems\" community, that thinks of file serializations as \"rasters\" or \"vectors\", and the \"modeler\" community, which thinks of data as \"n-dimensional arrays\". Both have their weaknesses and the lines are frequently blurred, but one obvious manifestation is in how each one writes their netcdf files (and how much they rely on GDAL). For instance, this NASA product, strongly centered in the modeler community is sometimes sloppy about these metadata conventions, and as a result GDAL (especially older versions), might not detect all the details appropriately. Note that GDAL has failed to recognize the units of lat-long, so we have had to subset the x-y positions manually. \n"
"**GDAL is not just for COGs**. The python ecosystem has a rich set of patterns for range-request reads of Cloud Optimized Geotif (COG) files using packages like `odc.stac`, as illustrated in our [intro to cloud-native python](/tutorials/python/intro-python.html). But possibly for historical/cultural reasons, at present the python geospatial community seems to prefer to access ncdf and similar n-dimensional-array formats without GDAL, whether by relying on downloading complete files, using fsspec, or other dedicated libraries (zarr). There are possibly many reasons for this. One is a divide between the the \"Geospatial Information Systems\" community, that thinks of file serializations as \"rasters\" or \"vectors\", and the \"modeler\" community, which thinks of data as \"n-dimensional arrays\". Both have their weaknesses and the lines are frequently blurred, but one obvious manifestation is in how each one writes their netcdf files (and how much they rely on GDAL). For instance, this NASA product, strongly centered in the modeler community is sometimes sloppy about these metadata conventions, and as a result GDAL (especially older versions), might not detect all the details appropriately. Note that GDAL has failed to recognize the units of lat-long, so we have had to subset the x-y positions manually. \n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "spatial",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "spatial"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
Expand Down

0 comments on commit 0115cb3

Please sign in to comment.