diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml index 1b38a26..a59db9c 100644 --- a/.github/workflows/deploy.yml +++ b/.github/workflows/deploy.yml @@ -40,8 +40,7 @@ jobs: ipykernel jupyter_server - name: Build HTML Assets - # FIXME: enable execution once it is scoped to particular notebooks - run: myst build --html # --execute + run: myst build --html --execute shell: micromamba-shell {0} - name: Upload artifact uses: actions/upload-pages-artifact@v3 diff --git a/content/IS2_cloud_data_access.md b/content/IS2_cloud_data_access.md index fba30aa..888d0b6 100644 --- a/content/IS2_cloud_data_access.md +++ b/content/IS2_cloud_data_access.md @@ -5,18 +5,16 @@ jupytext: format_name: myst format_version: 0.13 jupytext_version: 1.16.4 -kernelspec: - display_name: Python 3 (ipykernel) - language: python - name: python3 --- +++ {"user_expressions": []} # ICESat-2 AWS cloud data access + This notebook ({download}`download `) illustrates the use of icepyx for accessing ICESat-2 data currently available through the AWS (Amazon Web Services) us-west2 hub s3 data bucket. ## Notes + 1. ICESat-2 data became publicly available on the cloud on 29 September 2022. Thus, access methods and example workflows are still being developed by NSIDC, and the underlying code in icepyx will need to be updated now that these data (and the associated metadata) are available. We appreciate your patience and contributions (e.g. reporting bugs, sharing your code, etc.) during this transition! 2. This example and the code it describes are part of ongoing development. Current limitations to using these features are described throughout the example, as appropriate. 3. You **MUST** be working within an AWS instance. Otherwise, you will get a permissions error. @@ -104,7 +102,7 @@ We can use the Variables module with an s3 url to explore available data variabl Notice that accessing cloud data requires two layers of authentication: 1) authenticating with your Earthdata Login 2) authenticating for cloud access. These both happen behind the scenes, without the need for users to provide any explicit commands. -Icepyx uses earthaccess to generate your s3 data access token, which will be valid for *one* hour. Icepyx will also renew the token for you after an hour, so if viewing your token over the course of several hours you may notice the values will change. +Icepyx uses earthaccess to generate your s3 data access token, which will be valid for _one_ hour. Icepyx will also renew the token for you after an hour, so if viewing your token over the course of several hours you may notice the values will change. If you do want to see your s3 credentials, you can access them using: @@ -180,4 +178,5 @@ The slow load speed is a demonstration of the many steps involved in making clou +++ {"user_expressions": []} #### Credits -* notebook by: Jessica Scheick and Rachel Wegener + +- notebook by: Jessica Scheick and Rachel Wegener diff --git a/content/IS2_data_access.md b/content/IS2_data_access.md index c8e48c7..6911a40 100644 --- a/content/IS2_data_access.md +++ b/content/IS2_data_access.md @@ -5,13 +5,10 @@ jupytext: format_name: myst format_version: 0.13 jupytext_version: 1.16.4 -kernelspec: - display_name: icepyx - language: python - name: python3 --- # Accessing ICESat-2 Data + This notebook ({download}`download `) illustrates the use of icepyx for programmatic ICESat-2 data query and download from the NASA NSIDC DAAC (NASA National Snow and Ice Data Center Distributed Active Archive Center). A complimentary notebook demonstrates in greater detail the [subsetting](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access2-subsetting.html) options available when ordering data. @@ -28,7 +25,7 @@ import shutil +++ {"user_expressions": []} ---------------------------------- +--- ## Quick-Start Guide @@ -47,11 +44,12 @@ where the function inputs are described in more detail below. ## Key Steps for Programmatic Data Access There are several key steps for accessing data from the NSIDC API: + 1. Define your parameters (spatial, temporal, dataset, etc.) 2. Query the NSIDC API to find out more information about the dataset -4. Define additional parameters (e.g. subsetting/customization options) -5. Order your data -6. Download your data +3. Define additional parameters (e.g. subsetting/customization options) +4. Order your data +5. Download your data icepyx streamlines this process into a minimal number of lines of code. @@ -60,28 +58,30 @@ icepyx streamlines this process into a minimal number of lines of code. ### Create an ICESat-2 data object with the desired search parameters There are three required inputs, depending on how you want to search for data. Two are required in all cases: + - `short_name` = the data product of interest, known as its "short name". -See https://nsidc.org/data/icesat-2/products for a list of the available data products. + See https://nsidc.org/data/icesat-2/products for a list of the available data products. - `spatial extent` = a region of interest to search within. This can be entered as a bounding box, polygon vertex coordinate pairs, or a polygon geospatial file (currently shp, kml, and gpkg are supported). - - bounding box: Given in decimal degrees for the lower left longitude, lower left latitude, upper right longitude, and upper right latitude - - polygon vertices: Given as longitude, latitude coordinate pairs of decimal degrees with the last entry a repeat of the first. - - polygon file: A string containing the full file path and name. - -*NOTE: The input keyword for `short_name` was updated in the code from `dataset` to `product` to match common usage. -This should not affect users providing positional inputs as demonstrated in this tutorial.* + - bounding box: Given in decimal degrees for the lower left longitude, lower left latitude, upper right longitude, and upper right latitude + - polygon vertices: Given as longitude, latitude coordinate pairs of decimal degrees with the last entry a repeat of the first. + - polygon file: A string containing the full file path and name. + +_NOTE: The input keyword for `short_name` was updated in the code from `dataset` to `product` to match common usage. +This should not affect users providing positional inputs as demonstrated in this tutorial._ -*NOTE: You can submit at most one bounding box or a list of lonlat polygon coordinates per object instance. -Per NSIDC requirements, geospatial polygon files may only contain one feature (polygon).* +_NOTE: You can submit at most one bounding box or a list of lonlat polygon coordinates per object instance. +Per NSIDC requirements, geospatial polygon files may only contain one feature (polygon)._ Then, for all non-gridded products (ATL<=13), you must include AT LEAST one of the following inputs (temporal or orbital constraints): -- `date_range` = the date range for which you would like to search for results. The following formats are accepted: - - A list of two 'YYYY-MM-DD' strings separated by a comma - - A list of two 'YYYY-DOY' strings separated by a comma - - A list of two datetime.date or datetime.datetime objects - - Dict with the following keys: - - `start_date`: start date, type can be datetime.datetime, datetime.date, or strings (format 'YYYY-MM-DD' or 'YYYY-DOY') - - `end_date`: end date, type can be datetime.datetime, datetime.date, or strings (format 'YYYY-MM-DD' or 'YYYY-DOY') -- `cycles` = Which orbital cycle to use, input as a numerical string or a list of strings. If no input is given, this value defaults to all available cycles within the search parameters. An orbital cycle refers to the 91-day repeat period of the ICESat-2 orbit. + +- `date_range` = the date range for which you would like to search for results. The following formats are accepted: + - A list of two 'YYYY-MM-DD' strings separated by a comma + - A list of two 'YYYY-DOY' strings separated by a comma + - A list of two datetime.date or datetime.datetime objects + - Dict with the following keys: + - `start_date`: start date, type can be datetime.datetime, datetime.date, or strings (format 'YYYY-MM-DD' or 'YYYY-DOY') + - `end_date`: end date, type can be datetime.datetime, datetime.date, or strings (format 'YYYY-MM-DD' or 'YYYY-DOY') +- `cycles` = Which orbital cycle to use, input as a numerical string or a list of strings. If no input is given, this value defaults to all available cycles within the search parameters. An orbital cycle refers to the 91-day repeat period of the ICESat-2 orbit. - `tracks` = Which [Reference Ground Track (RGT)](https://icesat-2.gsfc.nasa.gov/science/specs) to use, input as a numerical string or a list of strings. If no input is given, this value defaults to all available RGTs within the spatial and temporal search parameters. Below are examples of each type of spatial extent and temporal input and an example using orbital parameters. Please choose and run only one of the input option cells to set your spatial and temporal parameters. @@ -189,16 +189,17 @@ region_a.visualize_spatial_extent() +++ {"user_expressions": []} There are also several optional inputs to allow the user finer control over their search. Start and end time are only valid inputs on a temporally limited search, and they are ignored if your `date_range` input is a datetime.datetime object. + - `start_time` = start time to search for data on the start date. If no input is given, this defaults to 00:00:00. -- `end_time` = end time for the end date of the temporal search parameter. If no input is given, this defaults to 23:59:59. +- `end_time` = end time for the end date of the temporal search parameter. If no input is given, this defaults to 23:59:59. Times must be input as 'HH:mm:ss' strings or datetime.time objects. - `version` = What version of the data product to use, input as a numerical string. If no input is given, this value defaults to the most recent version of the product specified in `short_name`. -*NOTE Version 002 is used as an example in the below cell. However, using it will cause 'no results' errors in granule ordering for some search parameters. These issues have been resolved in later versions of the data products, so it is best to use the most recent version where possible. +_NOTE Version 002 is used as an example in the below cell. However, using it will cause 'no results' errors in granule ordering for some search parameters. These issues have been resolved in later versions of the data products, so it is best to use the most recent version where possible. Similarly, if you try to order/download too old a version (such that it is no longer hosted by NSIDC), you will get a "no data matched your request" error. -Thus, you will need to update the version associated with `region_a` and rerun the next cell for the rest of this notebook to run.* +Thus, you will need to update the version associated with `region_a` and rerun the next cell for the rest of this notebook to run._ ```{code-cell} ipython3 region_a = ipx.Query(short_name, spatial_extent, date_range, \ @@ -216,13 +217,14 @@ print(region_a.temporal) Alternatively, you can also just create the query object without creating named variables first: ```{code-cell} ipython3 -# region_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-01','2019-02-28'], +# region_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-01','2019-02-28'], # start_time='00:00:00', end_time='23:59:59', version='002') ``` +++ {"user_expressions": []} ### More information about your query object + In addition to viewing the stored object information shown above (e.g. product short name, start and end date and time, version, etc.), we can also request summary information about the data product itself or confirm that we have manually specified the latest version. ```{code-cell} ipython3 @@ -241,6 +243,7 @@ region_a.product_all_info() +++ {"user_expressions": []} ### Querying a data product + In order to search the product collection for available data granules, we need to build our search parameters. This is done automatically behind the scenes when you run `region_a.avail_granules()`, but you can also build and view them by calling `region_a.CMRparams`. These are formatted as a dictionary of key:value pairs according to the [CMR documentation](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html). ```{code-cell} ipython3 @@ -274,9 +277,11 @@ region_a.granules.avail +++ {"user_expressions": []} ### Log in to NASA Earthdata + When downloading data from NSIDC, all users must login using a valid (free) Earthdata account. The process of authenticating is handled by icepyx by creating and handling the required authentication to interface with the data at the DAAC (including ordering and download). Authentication is completed as login-protected features are accessed. In order to allow icepyx to login for us we still have to make sure that we have made our Earthdata credentials available for icepyx to find. There are multiple ways to provide your Earthdata credentials via icepyx. Behind the scenes, icepyx is using the [earthaccess library](https://nsidc.github.io/earthaccess/). The [earthaccess documentation](https://earthaccess.readthedocs.io/en/latest/tutorials/getting-started/#auth) automatically tries three primary mechanisms for logging in, all of which are supported by icepyx: + - with `EARTHDATA_USERNAME` and `EARTHDATA_PASSWORD` environment variables (these are the same as the ones you might have set for icepyx previously) - through an interactive, in-notebook login (used below); passwords are not shown plain text with this option - with stored credentials in a .netrc file (not recommended for security reasons) @@ -292,6 +297,7 @@ Previously, icepyx required you to explicitly use the `.earthdata_login()` funct ### Additional Parameters and Subsetting Once we have generated our session, we must build the required configuration parameters needed to actually download data. These will tell the system how we want to download the data. As with the CMR search parameters, these will be built automatically when you run `region_a.order_granules()`, but you can also create and view them with `region_a.reqparams`. The default parameters, given below, should work for most users. + - `page_size` = 2000. This is the number of granules we will request per order. - `page_num` = 1. Determine the number of pages based on page size and the number of granules available. If no page_num is specified, this calculation is done automatically to set page_num, which then provides the number of individual orders we will request given the number of granules. - `request_mode` = 'async' @@ -299,6 +305,7 @@ Once we have generated our session, we must build the required configuration par - `include_meta` = 'Y' #### More details about the configuration parameters + `request_mode` is "asynchronous" by default, which allows concurrent requests to be queued and processed without the need for a continuous connection between you and the API endpoint. In contrast, using a "synchronous" `request_mode` means that the request relies on a direct, continuous connection between you and the API endpoint. Outputs are directly downloaded, or "streamed", to your working directory. @@ -306,7 +313,7 @@ For this tutorial, we will set the request mode to asynchronous. **Use the streaming `request_mode` with caution: While it can be beneficial to stream outputs directly to your local directory, note that timeout errors can result depending on the size of the request, and your request will not be queued in the system if NSIDC is experiencing high request volume. For best performance, NSIDC recommends setting `page_size=1` to download individual outputs, which will eliminate extra time needed to zip outputs and will ensure faster processing times per request.** -Recall that we queried the total number and volume of granules prior to applying customization services. `page_size` and `page_num` can be used to adjust the number of granules per request up to a limit of 2000 granules for asynchronous, and 100 granules for synchronous (streaming). For now, let's select 9 granules to be processed in each zipped request. For ATL06, the granule size can exceed 100 MB so we want to choose a granule count that provides us with a reasonable zipped download size. +Recall that we queried the total number and volume of granules prior to applying customization services. `page_size` and `page_num` can be used to adjust the number of granules per request up to a limit of 2000 granules for asynchronous, and 100 granules for synchronous (streaming). For now, let's select 9 granules to be processed in each zipped request. For ATL06, the granule size can exceed 100 MB so we want to choose a granule count that provides us with a reasonable zipped download size. ```{code-cell} ipython3 print(region_a.reqparams) @@ -320,10 +327,11 @@ In addition to the required parameters (CMRparams and reqparams) that are submit For a deeper dive into subsetting, please see our [Subsetting Tutorial Notebook](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access2-subsetting.html), which covers subsetting in more detail, including how to get a list of subsetting options, how to build your list of subsetting parameters, and how to generate a list of desired variables (most datasets have more than 200 variable fields!), including using pre-built default lists (these lists are still in progress and we welcome contributions!). Subsetting utilizes the NSIDC's built in subsetter to extract only the data you are interested (spatially, temporally, variables of interest, etc.). The advantages of using the NSIDC's subsetter include: -* easily reproducible downloads, particularly when coupled with an icepyx query object -* smaller file size, meaning faster downloads, less storage required, and no need to subset the data on your own -* still easy to go back and order more data/variables with the same or similar search parameters -* no extraneous data means you can move directly to analysis and easily navigate your dataset + +- easily reproducible downloads, particularly when coupled with an icepyx query object +- smaller file size, meaning faster downloads, less storage required, and no need to subset the data on your own +- still easy to go back and order more data/variables with the same or similar search parameters +- no extraneous data means you can move directly to analysis and easily navigate your dataset Certain subset parameters are specified by default unless `subset=False` is included as an input to `order_granules()` or `download_granules()` (which calls `order_granules()` under the hood). A separate, companion notebook tutorial covers subsetting in more detail, including how to get a list of subsetting options, how to build your list of subsetting parameters, and how to generate a list of desired variables (most products have more than 200 variable fields!), including using pre-built default lists (these lists are still in progress and we welcome contributions!). @@ -336,6 +344,7 @@ region_a.subsetparams() ``` ### Place the order + Then, we can send the order to NSIDC using the order_granules function. Information about the granules ordered and their status will be printed automatically. Status information can also be emailed to the address associated with your EarthData account when the `email` kwarg is set to `True`. Additional information on the order, including request URLs, can be viewed by setting the optional keyword input 'verbose' to True. ```{code-cell} ipython3 @@ -349,6 +358,7 @@ region_a.granules.orderIDs ``` ### Download the order + Finally, we can download our order to a specified directory (which needs to have a full path but doesn't have to point to an existing directory) and the download status will be printed as the program runs. Additional information is again available by using the optional boolean keyword `verbose`. ```{code-cell} ipython3 @@ -357,6 +367,7 @@ region_a.download_granules(path) ``` **Credits** -* original notebook by: Jessica Scheick -* notebook contributors: Amy Steiker and Tyler Sutterley -* source material: [NSIDC Data Access Notebook](https://github.com/ICESAT-2HackWeek/ICESat2_hackweek_tutorials/tree/master/03_NSIDCDataAccess_Steiker) by Amy Steiker and Bruce Wallin and [2020 Hackweek Data Access Notebook](https://github.com/ICESAT-2HackWeek/2020_ICESat-2_Hackweek_Tutorials/blob/main/06-07.Data_Access/02-Data_Access_rendered.ipynb) by Jessica Scheick and Amy Steiker + +- original notebook by: Jessica Scheick +- notebook contributors: Amy Steiker and Tyler Sutterley +- source material: [NSIDC Data Access Notebook](https://github.com/ICESAT-2HackWeek/ICESat2_hackweek_tutorials/tree/master/03_NSIDCDataAccess_Steiker) by Amy Steiker and Bruce Wallin and [2020 Hackweek Data Access Notebook](https://github.com/ICESAT-2HackWeek/2020_ICESat-2_Hackweek_Tutorials/blob/main/06-07.Data_Access/02-Data_Access_rendered.ipynb) by Jessica Scheick and Amy Steiker diff --git a/content/IS2_data_access2-subsetting.md b/content/IS2_data_access2-subsetting.md index d4df34d..53eec14 100644 --- a/content/IS2_data_access2-subsetting.md +++ b/content/IS2_data_access2-subsetting.md @@ -5,15 +5,12 @@ jupytext: format_name: myst format_version: 0.13 jupytext_version: 1.16.4 -kernelspec: - display_name: python3 - language: python - name: python3 --- +++ {"user_expressions": []} # Subsetting ICESat-2 Data + This notebook ({download}`download `) illustrates the use of icepyx for subsetting ICESat-2 data ordered through the NSIDC DAAC. We'll show how to find out what subsetting options are available and how to specify the subsetting options for your order. For more information on using icepyx to find, order, and download data, see our complimentary [ICESat-2 Data Access Notebook](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access.html). @@ -66,11 +63,12 @@ Previously, icepyx required you to explicitly use the `.earthdata_login()` funct ## Discover Subsetting Options You can see what subsetting options are available for a given product by calling `show_custom_options()`. The options are presented as a series of headings followed by available values in square brackets. Headings are: -* **Subsetting Options**: whether or not temporal and spatial subsetting are available for the data product -* **Data File Formats (Reformatting Options)**: return the data in a format other than the native hdf5 (submitted as a key=value kwarg to `order_granules(format='NetCDF4-CF')`) -* **Data File (Reformatting) Options Supporting Reprojection**: return the data in a reprojected reference frame. These will be available for gridded ICESat-2 L3B data products. -* **Data File (Reformatting) Options NOT Supporting Reprojection**: data file formats that cannot be delivered with reprojection -* **Data Variables (also Subsettable)**: a dictionary of variable name keys and the paths to those variables available in the product + +- **Subsetting Options**: whether or not temporal and spatial subsetting are available for the data product +- **Data File Formats (Reformatting Options)**: return the data in a format other than the native hdf5 (submitted as a key=value kwarg to `order_granules(format='NetCDF4-CF')`) +- **Data File (Reformatting) Options Supporting Reprojection**: return the data in a reprojected reference frame. These will be available for gridded ICESat-2 L3B data products. +- **Data File (Reformatting) Options NOT Supporting Reprojection**: data file formats that cannot be delivered with reprojection +- **Data Variables (also Subsettable)**: a dictionary of variable name keys and the paths to those variables available in the product ```{code-cell} ipython3 region_a.show_custom_options(dictview=True) @@ -107,6 +105,7 @@ Thus, this notebook uses a default list of wanted variables to showcase subsetti +++ {"user_expressions": []} ### Determine what variables are available for your data product + There are multiple ways to get a complete list of available variables. To increase readability, some display options (2 and 3, below) show the 200+ variable + path combinations as a dictionary where the keys are variable names and the values are the paths to that variable. @@ -167,6 +166,7 @@ region_a.download_granules('/home/jovyan/icepyx/dev-notebooks/vardata') # <-- yo ``` ### _Why does the subsetter say no matching data was found?_ + _Sometimes, granules ("files") returned in our initial search end up not containing any data in our specified area of interest._ _This is because the initial search is completed using summary metadata for a granule._ _You've likely encountered this before when viewing available imagery online: your spatial search turns up a bunch of images with only a few border or corner pixels, maybe even in no data regions, in your area of interest._ @@ -185,6 +185,7 @@ fn = '' ``` ## Check the downloaded data + Get all `latitude` variables in your downloaded file: ```{code-cell} ipython3 @@ -194,14 +195,14 @@ varlist = [] def IS2h5walk(vname, h5node): if isinstance(h5node, h5py.Dataset): varlist.append(vname) - return + return with h5py.File(fn,'r') as h5pt: h5pt.visititems(IS2h5walk) - + for tvar in varlist: vpath,vn = os.path.split(tvar) - if vn==varname: print(tvar) + if vn==varname: print(tvar) ``` ### Compare to the variable paths available in the original data @@ -211,5 +212,6 @@ region_a.order_vars.parse_var_list(region_a.order_vars.avail)[0][varname] ``` #### Credits -* notebook contributors: Zheng Liu, Jessica Scheick, and Amy Steiker -* some source material: [NSIDC Data Access Notebook](https://github.com/ICESAT-2HackWeek/ICESat2_hackweek_tutorials/tree/main/03_NSIDCDataAccess_Steiker) by Amy Steiker and Bruce Wallin + +- notebook contributors: Zheng Liu, Jessica Scheick, and Amy Steiker +- some source material: [NSIDC Data Access Notebook](https://github.com/ICESAT-2HackWeek/ICESat2_hackweek_tutorials/tree/main/03_NSIDCDataAccess_Steiker) by Amy Steiker and Bruce Wallin diff --git a/content/IS2_data_read-in.md b/content/IS2_data_read-in.md index 5339df7..b5468d2 100644 --- a/content/IS2_data_read-in.md +++ b/content/IS2_data_read-in.md @@ -5,21 +5,19 @@ jupytext: format_name: myst format_version: 0.13 jupytext_version: 1.16.4 -kernelspec: - display_name: python3 - language: python - name: python3 --- +++ {"user_expressions": []} # Reading ICESat-2 Data in for Analysis + This notebook ({download}`download `) illustrates the use of icepyx for reading ICESat-2 data files, loading them into a data object. Currently the default data object is an Xarray Dataset, with ongoing work to provide support for other data object types. For more information on how to order and download ICESat-2 data, see the [icepyx data access tutorial](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access.html). ### Motivation + Most often, when you open a data file, you must specify the underlying data structure and how you'd like the information to be read in. A simple example of this, for instance when opening a csv or similarly delimited file, is letting the software know if the data contains a header row, what the data type is (string, double, float, boolean, etc.) for each column, what the delimiter is, and which columns or rows you'd like to be loaded. Many ICESat-2 data readers are quite manual in nature, requiring that you accurately type out a list of string paths to the various data variables. @@ -28,6 +26,7 @@ icepyx simplifies this process by relying on its awareness of ICESat-2 specific Instead of needing to manually iterate through the beam pairs, you can provide a few options to the `Read` object and icepyx will do the heavy lifting for you (as detailed in this notebook). ### Approach + If you're interested in what's happening under the hood: icepyx uses the [xarray](https://docs.xarray.dev/en/stable/) library to read in each of the requested variables of the dataset. icepyx formats each requested variable and then merges the read-in data from each of the variables to create a single data object. The use of xarray is powerful, because the returned data object can be used with relevant xarray processing tools. +++ @@ -40,9 +39,10 @@ import icepyx as ipx +++ {"user_expressions": []} ---------------------------------- +--- ## Quick-Start Guide + For those who might be looking into playing with this (but don't want all the details/explanations) ```{code-cell} ipython3 @@ -65,10 +65,12 @@ ds.plot.scatter(x="longitude", y="latitude", hue="h_li", vmin=-100, vmax=2000) +++ {"user_expressions": []} ---------------------------------------- +--- + ## Key steps for loading (reading) ICESat-2 data Reading in ICESat-2 data with icepyx happens in a few simple steps: + 1. Let icepyx know where to find your data (this might be local files or urls to data in cloud storage) 2. Create an icepyx `Read` object 3. Make a list of the variables you want to read in (does not apply for gridded products) @@ -79,6 +81,7 @@ We go through each of these steps in more detail in this notebook. +++ {"user_expressions": []} ### Step 0: Get some data if you haven't already + Here are a few lines of code to get you set up with a few data files if you don't already have some on your local system. ```{code-cell} ipython3 @@ -102,10 +105,11 @@ Previously, icepyx required you to explicitly use the `.earthdata_login()` funct Provide a full path to the data to be read in (i.e. opened). Currently accepted inputs are: -* a string path to directory - all files from the directory will be opened -* a string path to single file - one file will be opened -* a list of filepaths - all files in the list will be opened -* a glob string (see [glob](https://docs.python.org/3/library/glob.html)) - any files matching the glob pattern will be opened + +- a string path to directory - all files from the directory will be opened +- a string path to single file - one file will be opened +- a list of filepaths - all files in the list will be opened +- a glob string (see [glob](https://docs.python.org/3/library/glob.html)) - any files matching the glob pattern will be opened ```{code-cell} ipython3 path_root = '/full/path/to/your/data/' @@ -116,7 +120,7 @@ path_root = '/full/path/to/your/data/' ``` ```{code-cell} ipython3 -# list_of_files = ['/my/data/ATL06/processed_ATL06_20190226005526_09100205_006_02.h5', +# list_of_files = ['/my/data/ATL06/processed_ATL06_20190226005526_09100205_006_02.h5', # '/my/other/data/ATL06/processed_ATL06_20191202102922_10160505_006_01.h5'] ``` @@ -128,9 +132,9 @@ path_root = '/full/path/to/your/data/' glob works using `*` and `?` as wildcard characters, where `*` matches any number of characters and `?` matches a single character. For example: -* `/this/path/*.h5`: refers to all `.h5` files in the `/this/path` folder (Example matches: "/this/path/processed_ATL03_20191130221008_09930503_006_01.h5" or "/this/path/myfavoriteicsat-2file.h5") -* `/this/path/*ATL07*.h5`: refers to all `.h5` files in the `/this/path` folder that have ATL07 in the filename. (Example matches: "/this/path/ATL07-02_20221012220720_03391701_005_01.h5" or "/this/path/processed_ATL07.h5") -* `/this/path/ATL??/*.h5`: refers to all `.h5` files that are in a subfolder of `/this/path` and a subdirectory of `ATL` followed by any 2 characters (Example matches: "/this/path/ATL03/processed_ATL03_20191130221008_09930503_006_01.h5", "/this/path/ATL06/myfile.h5") +- `/this/path/*.h5`: refers to all `.h5` files in the `/this/path` folder (Example matches: "/this/path/processed_ATL03_20191130221008_09930503_006_01.h5" or "/this/path/myfavoriteicsat-2file.h5") +- `/this/path/*ATL07*.h5`: refers to all `.h5` files in the `/this/path` folder that have ATL07 in the filename. (Example matches: "/this/path/ATL07-02_20221012220720_03391701_005_01.h5" or "/this/path/processed_ATL07.h5") +- `/this/path/ATL??/*.h5`: refers to all `.h5` files that are in a subfolder of `/this/path` and a subdirectory of `ATL` followed by any 2 characters (Example matches: "/this/path/ATL03/processed_ATL03_20191130221008_09930503_006_01.h5", "/this/path/ATL06/myfile.h5") See the glob documentation or other online explainer tutorials for more in depth explanation, or advanced glob paths such as character classes and ranges. @@ -143,6 +147,7 @@ See the glob documentation or other online explainer tutorials for more in depth glob will not by default search all of the subdirectories for matching filepaths, but it has the ability to do so. If you would like to search recursively, you can achieve this by either: + 1. passing the `recursive` argument into `glob_kwargs` and including `\**\` in your filepath 2. using glob directly to create a list of filepaths @@ -272,7 +277,7 @@ ds = reader.load() Within a Jupyter Notebook, you can get a summary view of your data object. -***ATTENTION: icepyx loads your data by creating an Xarray DataSet for each input granule and then merging them. In some cases, the automatic merge fails and needs to be handled manually. In these cases, icepyx will return a warning with the error message from the failed Xarray merge and a list of per-granule DataSets*** +**_ATTENTION: icepyx loads your data by creating an Xarray DataSet for each input granule and then merging them. In some cases, the automatic merge fails and needs to be handled manually. In these cases, icepyx will return a warning with the error message from the failed Xarray merge and a list of per-granule DataSets_** This can happen if you unintentionally provide the same granule multiple times with different filenames or in segmented products where the rgt+cycle automatically generated `gran_idx` values match. In this latter case, you can simply provide unique `gran_idx` values for each DataSet in `ds` and run `import xarray as xr` and `ds_merged = xr.merge(ds)` to create one merged DataSet. @@ -302,8 +307,9 @@ Please let us know if you have any ideas or already have functions developed (we +++ {"user_expressions": []} #### Credits -* original notebook by: Jessica Scheick -* notebook contributors: Wei Ji and Tian + +- original notebook by: Jessica Scheick +- notebook contributors: Wei Ji and Tian ```{code-cell} ipython3 diff --git a/content/IS2_data_variables.md b/content/IS2_data_variables.md index 567235b..983295d 100644 --- a/content/IS2_data_variables.md +++ b/content/IS2_data_variables.md @@ -5,10 +5,6 @@ jupytext: format_name: myst format_version: 0.13 jupytext_version: 1.16.4 -kernelspec: - display_name: python3 - language: python - name: python3 --- +++ {"user_expressions": []} @@ -17,6 +13,7 @@ kernelspec: This notebook ({download}`download `) illustrates the use of icepyx for managing lists of available and wanted ICESat-2 data variables. The two use cases for variable management within your workflow are: + 1. During the data access process, whether that's via order and download (e.g. via NSIDC DAAC) or remote (e.g. via the cloud). 2. When reading in data to a Python object (whether from local files or the cloud). @@ -57,6 +54,7 @@ from pprint import pprint +++ {"user_expressions": []} There are three ways to create or access an ICESat-2 Variables object in icepyx: + 1. Access via the `.order_vars` property of a Query object 2. Access via the `.vars` property of a Read object 3. Create a stand-alone ICESat-2 Variables object using a local file, cloud file, or a product name @@ -106,11 +104,12 @@ reader.vars ### 3. Create a stand-alone Variables object You can also generate an independent Variables object. This can be done using either: + 1. The filepath to a local or cloud file you'd like a variables list for 2. The product name (and optionally version) of a an ICESat-2 product -*Note: Cloud data access requires a valid Earthdata login; -you will be prompted to log in if you are not already authenticated.* +_Note: Cloud data access requires a valid Earthdata login; +you will be prompted to log in if you are not already authenticated._ +++ {"user_expressions": []} @@ -160,9 +159,10 @@ The other is the list of variables you'd like to actually have (in your download Thus, your `avail` list depends on your data source and whether you are accessing or reading data, while your `wanted` list may change for each analysis you are working on or depending on what variables you want to see. The variables parameter has methods to: -* get a list of all available variables, either available from the NSIDC or the file (`avail()` method). -* append new variables to the wanted list (`append()` method). -* remove variables from the wanted list (`remove()` method). + +- get a list of all available variables, either available from the NSIDC or the file (`avail()` method). +- append new variables to the wanted list (`append()` method). +- remove variables from the wanted list (`remove()` method). We'll showcase the use of all of these methods and attributes below using an `icepyx.Query` object. Usage is identical in the case of an `icepyx.Read` object. @@ -203,6 +203,7 @@ Much like a directory-file system on a computer, each variable (file) has a uniq Thus, some variables (e.g. `'latitude'`, `'longitude'`) have multiple paths (one for each of the six beams in most products). #### Determine what variables are available + `region_a.order_vars.avail` will return a list of all valid path+variable strings. ```{code-cell} ipython3 @@ -232,6 +233,7 @@ region_a.order_vars.avail(options=True) You can run these same methods no matter how you created or accessed your ICESat-2 Variables. So the methods in this section could be equivalently be accessed using a Read object, or by directly accessing a file on your computer: ``` + ```python # Using a Read object reader.vars.avail() @@ -253,6 +255,7 @@ Now that you know which variables and path components are available, you need to There are several options for generating your initial list as well as modifying it, giving the user complete control. The options for building your initial list are: + 1. Use a default list for the product (not yet fully implemented across all products. Have a default variable list for your field/product? Submit a pull request or post it as an issue on [GitHub](https://github.com/icesat2py/icepyx)!) 2. Provide a list of variable names 3. Provide a list of profiles/beams or other path keywords, where "keywords" are simply the unique subdirectory names contained in the full variable paths of the product. A full list of available keywords for the product is displayed in the error message upon entering `keyword_list=['']` into the `append` function (see below for an example) or by running `region_a.order_vars.avail(options=True)`, as above. @@ -281,16 +284,18 @@ region_a.order_vars.append(keyword_list=['']) ### Modifying your wanted variable list Generating and modifying your variable request list, which is stored in `region_a.order_vars.wanted`, is controlled by the `append` and `remove` functions that operate on `region_a.order_vars.wanted`. The input options to `append` are as follows (the full documentation for this function can be found by executing `help(region_a.order_vars.append)`). -* `defaults` (default False) - include the default variable list for your product (not yet fully implemented for all products; please submit your default variable list for inclusion!) -* `var_list` (default None) - list of variables (entered as strings) -* `beam_list` (default None) - list of beams/profiles (entered as strings) -* `keyword_list` (default None) - list of keywords (entered as strings); use `keyword_list=['']` to obtain a list of available keywords + +- `defaults` (default False) - include the default variable list for your product (not yet fully implemented for all products; please submit your default variable list for inclusion!) +- `var_list` (default None) - list of variables (entered as strings) +- `beam_list` (default None) - list of beams/profiles (entered as strings) +- `keyword_list` (default None) - list of keywords (entered as strings); use `keyword_list=['']` to obtain a list of available keywords Similarly, the options for `remove` are: -* `all` (default False) - reset `region_a.order_vars.wanted` to None -* `var_list` (as above) -* `beam_list` (as above) -* `keyword_list` (as above) + +- `all` (default False) - reset `region_a.order_vars.wanted` to None +- `var_list` (as above) +- `beam_list` (as above) +- `keyword_list` (as above) ```{code-cell} ipython3 region_a.order_vars.remove(all=True) @@ -298,6 +303,7 @@ pprint(region_a.order_vars.wanted) ``` ### Examples (Overview) + Below are a series of examples to show how you can use `append` and `remove` to modify your wanted variable list. For clarity, `region_a.order_vars.wanted` is cleared at the start of many examples. However, multiple `append` and `remove` commands can be called in succession to build your wanted variable list (see Examples 3+). @@ -309,10 +315,12 @@ Both example tracks showcase the same functionality and are provided for users o +++ ------------------- +--- + ### Example Track 1 (Land Ice - run with ATL06 dataset) #### Example 1.1: choose variables + Add all `latitude` and `longitude` variables across all six beam groups. Note that the additional required variables for time and spacecraft orientation are included by default. ```{code-cell} ipython3 @@ -321,6 +329,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 1.2: specify beams and variable + Add `latitude` for only `gt1l` and `gt2l` ```{code-cell} ipython3 @@ -334,6 +343,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 1.3: add/remove selected beams+variables + Add `latitude` for `gt3l` and remove it for `gt2l` ```{code-cell} ipython3 @@ -343,6 +353,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 1.4: `keyword_list` + Add `latitude` and `longitude` for all beams and with keyword `land_ice_segments` ```{code-cell} ipython3 @@ -351,6 +362,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 1.5: target a specific variable + path + Remove `gt1r/land_ice_segments/longitude` (but keep `gt1r/land_ice_segments/latitude`) ```{code-cell} ipython3 @@ -359,6 +371,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 1.6: add variables not specific to beams/profiles + Add `rgt` under `orbit_info`. ```{code-cell} ipython3 @@ -367,6 +380,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 1.7: add all variables+paths of a group + In addition to adding specific variables and paths, we can filter all variables with a specific keyword as well. Here, we add all variables under `orbit_info`. Note that paths already in `region_a.order_vars.wanted`, such as `'orbit_info/rgt'`, are not duplicated. ```{code-cell} ipython3 @@ -375,6 +389,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 1.8: add all possible values for variables+paths + Append all `longitude` paths and all variables/paths with keyword `land_ice_segments`. Similarly to what is shown in Example 4, if you submit only one `append` call as `region_a.order_vars.append(var_list=['longitude'], keyword_list=['land_ice_segments'])` rather than the two `append` calls shown below, you will only add the variable `longitude` and only paths containing `land_ice_segments`, not ALL paths for `longitude` and ANY variables with `land_ice_segments` in their path. @@ -386,6 +401,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 1.9: remove all variables+paths associated with a beam + Remove all paths for `gt1l` and `gt3r` ```{code-cell} ipython3 @@ -394,6 +410,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 1.10: generate a default list for the rest of the tutorial + Generate a reasonable variable list prior to download ```{code-cell} ipython3 @@ -402,10 +419,12 @@ region_a.order_vars.append(defaults=True) pprint(region_a.order_vars.wanted) ``` ------------------- +--- + ### Example Track 2 (Atmosphere - run with ATL09 dataset commented out at the start of the notebook) #### Example 2.1: choose variables + Add all `latitude` and `longitude` variables ```{code-cell} ipython3 @@ -414,6 +433,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 2.2: specify beams/profiles and variable + Add `latitude` for only `profile_1` and `profile_2` ```{code-cell} ipython3 @@ -427,6 +447,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 2.3: add/remove selected beams+variables + Add `latitude` for `profile_3` and remove it for `profile_2` ```{code-cell} ipython3 @@ -436,6 +457,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 2.4: `keyword_list` + Add `latitude` for all profiles and with keyword `low_rate` ```{code-cell} ipython3 @@ -444,6 +466,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 2.5: target a specific variable + path + Remove `'profile_1/high_rate/latitude'` (but keep `'profile_3/high_rate/latitude'`) ```{code-cell} ipython3 @@ -452,6 +475,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 2.6: add variables not specific to beams/profiles + Add `rgt` under `orbit_info`. ```{code-cell} ipython3 @@ -460,6 +484,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 2.7: add all variables+paths of a group + In addition to adding specific variables and paths, we can filter all variables with a specific keyword as well. Here, we add all variables under `orbit_info`. Note that paths already in `region_a.order_vars.wanted`, such as `'orbit_info/rgt'`, are not duplicated. ```{code-cell} ipython3 @@ -468,6 +493,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 2.8: add all possible values for variables+paths + Append all `longitude` paths and all variables/paths with keyword `high_rate`. Similarly to what is shown in Example 4, if you submit only one `append` call as `region_a.order_vars.append(var_list=['longitude'], keyword_list=['high_rate'])` rather than the two `append` calls shown below, you will only add the variable `longitude` and only paths containing `high_rate`, not ALL paths for `longitude` and ANY variables with `high_rate` in their path. @@ -478,6 +504,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 2.9: remove all variables+paths associated with a profile + Remove all paths for `profile_1` and `profile_3` ```{code-cell} ipython3 @@ -486,6 +513,7 @@ pprint(region_a.order_vars.wanted) ``` #### Example 2.10: generate a default list for the rest of the tutorial + Generate a reasonable variable list prior to download ```{code-cell} ipython3 @@ -496,11 +524,12 @@ pprint(region_a.order_vars.wanted) ### Using your wanted variable list -Now that you have your wanted variables list, you need to use it within your icepyx object (`Query` or `Read`) will automatically use it. +Now that you have your wanted variables list, you need to use it within your icepyx object (`Query` or `Read`) will automatically use it. +++ #### With a `Query` object + In order to have your wanted variable list included with your order, you must pass it as a keyword argument to the `subsetparams()` attribute or the `order_granules()` or `download_granules()` (which calls `order_granules` under the hood if you have not already placed your order) functions. ```{code-cell} ipython3 @@ -525,6 +554,7 @@ region_a.download_granules('/home/jovyan/icepyx/dev-notebooks/vardata') # <-- yo +++ {"user_expressions": []} #### With a `Read` object + Calling the `load()` method on your `Read` object will automatically look for your wanted variable list and use it. Please see the [read-in example Jupyter Notebook](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_read-in.html) for a complete example of this usage. @@ -554,4 +584,5 @@ You'll notice in this workflow you are limited to viewing data only within a par +++ #### Credits -* based on the subsetting notebook by: Jessica Scheick and Zheng Liu + +- based on the subsetting notebook by: Jessica Scheick and Zheng Liu diff --git a/content/IS2_data_visualization.md b/content/IS2_data_visualization.md index 871454d..a845349 100644 --- a/content/IS2_data_visualization.md +++ b/content/IS2_data_visualization.md @@ -5,10 +5,6 @@ jupytext: format_name: myst format_version: 0.13 jupytext_version: 1.16.4 -kernelspec: - display_name: Python 3 (ipykernel) - language: python - name: python3 --- # Visualizing ICESat-2 Elevations @@ -41,7 +37,7 @@ For details on minimum required inputs, please refer to [IS2_data_access](https: #Larsen C Ice Shelf short_name = 'ATL06' date_range = ['2020-7-1', '2020-8-1'] -spatial_extent = [-67, -70, -59, -65] +spatial_extent = [-67, -70, -59, -65] cycles = ['03'] tracks = ['0948', '0872', '1184', '0186', '1123', '1009', '0445', '0369'] ``` @@ -74,7 +70,8 @@ print(list(set(region.avail_granules(cycles=True)[0]))) #region.cycles print(list(set(region.avail_granules(tracks=True)[0]))) #region.tracks ``` -## Visualize spatial extent +## Visualize spatial extent + By calling function `visualize_spatial_extent`, it will plot the spatial extent in red outline overlaid on a basemap, try zoom-in/zoom-out to see where is your interested region and what the geographic features look like in this region. ```{code-cell} ipython3 @@ -127,22 +124,28 @@ Previously, icepyx required you to explicitly use the `.earthdata_login()` funct ### Alternative Access Options to Visualize ICESat-2 elevation using OpenAltimetry API You can also view elevation data by importing the visualization module directly and initializing it with your query object or a list of parameters: - ```python - from icepyx.core.visualization import Visualize - ``` - - passing your query object directly to the visualization module - ```python - region2 = ipx.Query(short_name, spatial_extent, date_range) - vis = Visualize(region2) - ``` - - creating a visualization object directly without first creating a query object - ```python - vis = Visualize(product=short_name, spatial_extent=spatial_extent, date_range=date_range) - ``` + +```python +from icepyx.core.visualization import Visualize +``` + +- passing your query object directly to the visualization module + +```python +region2 = ipx.Query(short_name, spatial_extent, date_range) +vis = Visualize(region2) +``` + +- creating a visualization object directly without first creating a query object + +```python +vis = Visualize(product=short_name, spatial_extent=spatial_extent, date_range=date_range) +``` +++ #### Credits -* Notebook by: [Tian Li](https://github.com/icetianli), [Jessica Scheick](https://github.com/JessicaS11) and -[Wei Ji](https://github.com/weiji14) -* Source material: [READ_ATL06_DEM Notebook](https://github.com/ICESAT-2HackWeek/Assimilation/blob/master/contributors/icetianli/READ_ATL06_DEM.ipynb) by Tian Li and [Friedrich Knuth](https://github.com/friedrichknuth) + +- Notebook by: [Tian Li](https://github.com/icetianli), [Jessica Scheick](https://github.com/JessicaS11) and + [Wei Ji](https://github.com/weiji14) +- Source material: [READ_ATL06_DEM Notebook](https://github.com/ICESAT-2HackWeek/Assimilation/blob/master/contributors/icetianli/READ_ATL06_DEM.ipynb) by Tian Li and [Friedrich Knuth](https://github.com/friedrichknuth)