National Wildlife Federation

Accessing, processing, documenting wildlife and environmental datasets for the National Wildlife Federation

Project Description

The National Wildlife Federation (NWF) is a large non-profit dedicated to conservation and wildlife advocacy. NWF is working with another organization to create an interactive mapping tool that shows the intersection of potential carbon management project with wildlife and envirornmental considerations in the state of Wyoming. Data Clinic is tasked with finding and processing these wildlife and envirornmental datasets and providing them to NWF. NWF has given us a spreadsheet outlining the desired datasets, which we have augmented with additional metadata.

The data pipeline we build will contain a few distinct steps, with each step depending on the previous. Roughly, these steps are:

Access data and upload to s3
- The metadata spreadsheet contains links to APIs and hosted files matching the requested datasets. The code in download.py should iterate through the datasets with links and download each locally before uploading to the nwf-dataclinic s3 bucket.
Simple data processing
- The raw data on s3 will have different file formats, projects, and extents. We want to provide NWF with data that has been minimally processed to ensure compatibility. The code in process.py should traverse the raw datasets and apply these basic processing steps and save the results to s3.
Documenting processed data
- The final step is to create simple documentation for each dataset. These should be pdf files generated for each processed dataset. These documents will contain information from the metadata, such as dataset description, licence, years covered, etc. as well as some additional information dervived from the data itself such as column names/types and number of rows. The code in document.py will iterate through the processed datasets and create the documentation for each.

These steps are composed in run.py - which also exports the full contents of the repository to a specified local folder. You can run the full pipeline by executing poetry run python3 src/run.py --s3bucket <your_bucket> (or by running the script from the envirornment of your choice). Flags exist to export the data locally, skip the s3 upload, or overwrite data which has already been processed. To see the full list, run poetry run python3 src/run.py --help.

Poetry Environment Set-up

This project uses Poetry to provide an easy way to manage dependencies. You can set it up by following these steps:

Ensure you have a python 3.9 or higher installation on your external machine
Install poetry following the instructions here
From the root project directory, install the depencies with poetry install
Ensure the envirornment has been installed by running poetry shell. You should see something like (nwf-process-geodata-py3.9) in your terminal.

Git stuff

We encourage people to follow the git feature branch workflow which you can read more about here: How to use git as a Data Scientist

For each feature you are adding to the code

Switch to the main branch and pull the most recent changes

git checkout main 
git pull

Make a new branch for your addition

git checkout -b cleaning_script

Write your awesome code.
Once it's done add it to git

git status
git add {files that have changed}
git commit -m {some descriptive commit message}

Push the branch to gitlab

git push -u origin cleaning_script

Go to GitHub and create a merge request.
Either merge the branch yourself if your confident it's good or request that someone else reviews the changes and merges it in.
Repeat
...
Profit.

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
NWF_metadata.csv		NWF_metadata.csv
Private_LICENSE.txt		Private_LICENSE.txt
README.md		README.md
output_metadata.csv		output_metadata.csv
pdf_with_paragraph.pdf		pdf_with_paragraph.pdf
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
site-logo.png		site-logo.png
wyoming.geojson		wyoming.geojson

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

National Wildlife Federation

Project Description

Poetry Environment Set-up

Git stuff

About

Releases

Packages

Contributors 4

Languages

tsdataclinic/nwf-process-geodata

Folders and files

Latest commit

History

Repository files navigation

National Wildlife Federation

Project Description

Poetry Environment Set-up

Git stuff

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages