Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As an organization interested in Justice40, I'd like an API so that I can automate my consumption of Justice40 data #2199

Open
travis-newby opened this issue Mar 13, 2023 · 4 comments
Labels

Comments

@travis-newby
Copy link
Collaborator

travis-newby commented Mar 13, 2023

Description

Justice40 publishes data and methodology on the Climate and Economic Justice Screening Tool (CEJST). The published data is structured and reasonably easy to consume, but there is no long-term strategy around publishing, versioning, or discoverability.

Additionally, Federal Agencies and NGOs have asked for a more formal API to make it easier to incorporate Justice40 data into their own systems for display and comparison between indices.

Solution

The solution is to publish Justice40 data and documentation via a simple, versioned API available at an easy to find Justice40 subdomain (e.g. **api**.environmentaljustice.gov). To aid in discoverability, that API should be defined using the OpenAPI specification and cataloged on data.gov.

Published data for each version should include:

  • Methodology and input data sources
  • Documentation such as technical guides, communities lists, and codebooks
  • Justice40 result data such as scoring result spreadsheets and geographic results in a standard format (e.g. GeoPackage)
  • Map data (e.g. tiles)

Additional Considerations

Data Formats

All data should be published in open, easily consumable formats. The specific format for files should be determined by a combination of openness and ease of use. For geographic data, this may be GeoPackage or GeoJSON. Spreadsheets should be published as CSV files, and documents should be PDF or Markdown.

All of this data should be generated as part of the Justice40 data pipeline.

Individual datasets should be published in a single format (i.e. do not publish the same information in multiple formats). While some sites publish their data in as many formats as possible, it's impossible for Justice40 to anticipate the desired data format for every consumer; therefore, Justice40 should pick open, common formats and allow consumers of Justice40 data to perform any translations necessary to make the data consumable by their system.

Published Data

The data published in each API version should be comprehensive; that is, there should be enough data to describe the API and the data available, define how to use the API and data, and, of course, there should be the data itself. Right now, that list includes the information above, but it could change as the API evolves.

API Versioning

Justice40 disadvantaged communities and scores may change over time. Some agencies may immediately shift to using the new score, but some may not. Because of this – and because agencies sometimes need to review older scores – it's critical that any Justice40 API include versioning. Major revisions of the score should receive their own version number, and minor revisions that have an impact on the score may receive a minor version number.

Sometimes agencies do not care if score data has changed; for example, some agencies may only want to show Justice40 map tiles on their map. In that case, Justice40 should maintain a current version of their API, always mapped to the latest version (e.g. api.environmentaljustice.gov/current/ could be mapped to api.environmentaljustice.gov/v1/).

It is beyond the scope of this request to determine how to implement API versioning (whether versioning is in the header, the url, or some other form).

Change Announcements

It is important to let agencies know of changes to the API. In addition to information in the OpenAPI specification, a process should be developed to announce API changes. This may involve a mailing list, a Google group (including the existing group), or some other way for agencies to subscribe to notifications of API updates.

Next Steps

Once this API is in place, client applications, such as CEJST, should be updated to use the API as their source of data. Existing versions of the data should be deprecated and agencies should be given up to 6 months to update their code to use the new API(s).

@sampowers-usds
Copy link
Collaborator

I made some small grammatical/punctuation edits to the above.

Outstanding question for consideration: Do we want to include guidance on how to treat the data sources that we pull in? Is the aim to be a re-publisher of other organizations' data or are we only interested in publishing data that results from USDS transformations?

@vim-usds
Copy link
Collaborator

awesome write up!

@travis-newby
Copy link
Collaborator Author

Outstanding question for consideration: Do we want to include guidance on how to treat the data sources that we pull in? Is the aim to be a re-publisher of other organizations' data or are we only interested in publishing data that results from USDS transformations?

Probably ultimately a question for Kameron, but my $0.02 is that we should not be a republisher of data (we should stick to publishing results and information about how we got to those results).

@tpcolson
Copy link

tpcolson commented Jun 6, 2023

My 2 cents, my agency is wanting a lot of downstream app development using the CEJST output data, and speaking of republishing data, the only way I can deliver is....to republish the CEJST data. Having a REST API exposed (it currently is not) would alleviate that problem. I don't think the question is about the input data (used to create the CEJST output).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants