Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open-source options for subnational requirements #858

Open
brockfanning opened this issue Dec 22, 2017 · 7 comments
Open

Open-source options for subnational requirements #858

brockfanning opened this issue Dec 22, 2017 · 7 comments

Comments

@brockfanning
Copy link
Contributor

@Kali2017SDG @philipashlock
This is a write-up after some investigation into open-source options for adding subnational features to this platform. No action needed, just putting this up to help start the conversation, in case you'd like to consider an open-source vendor-less solution.

The requirements

Any subnational solution (as far as I can conceptualize it) needs to minimally satisfy 3 requirements:

  1. Data collection - ie, how will we get data on the indicators at the subnational level?
  2. Data management - ie, how will the subnational data be entered, maintained, and queried?
  3. Data visualization - ie, how will the data be displayed and interacted with?

1. Data collection

Requirement 1 above is the area I have the least insights into, so for the purposes of this write-up, I’m going to naively assume that we have access to all the data we need (hooray!).

2. Data management

Meeting requirement 2, a data management solution, should start with the basic question: Can we continue to use Github/Prose for data management, or does the introduction of subnational data require another approach?

This is a tough question, because it touches on administrative concerns like data provider workflows and user access and permissions; and it also touches on technical considerations like client-side performance.

Because “if it ain’t broke don’t fix it” I’m going to proceed with the assumption that we will try to continue using Github/Prose for data management, with the caveat that future requirements related to workflows, access, permissions, and client-side performance may necessitate a switch to a separate data management system.

As for the nitty gritty of the data management in Github/Prose, I’m assuming that it will done using a subfolder-based approach. For example, right now this platform uses a “data” folder, with one CSV file for each indicator. With a subfolder approach, there would be a subfolder for each subnational region (U.S. state), eg: data/state/alabama, data/state/alaska, etc. Each of these subfolders similarly have one CSV file for each indicator.

3. Data visualization

That leaves requirement 3, data visualization. For map visualizations, there seem to be 2 main types of open-source solutions out there: “vector-based”, and “tile-based”.

Tile-based

Tile-based solutions have the advantage of more choices of imagery (such as streets, terrain, satellite, etc.), but they carry an additional “moving part” by requiring the use of a tile server. There are free tile-servers for light use, like Open Street Maps, but it might be a complication.

Vector-based

Vector-based solutions outline the map and fill the regions with color. Since we will presumably be exclusively displaying “choropleth” maps, this is probably all that we need. So, between the 2, I lean towards using a vector-based solution.

Open-source mapping libraries

Here are a few well-maintained javascript libraries for both approaches:

  1. Leaflet is a tile-based, lightweight mapping library. Here is a choropleth example. Worth noting: the Tanzania and Mexico NRPs both use Leaflet.
  2. D3.js is a vector-based, general-purpose data visualization library. Here is a choropleth example. Worth noting: D3 is a very popular data-vis library with a strong following, and is the subject of many courses/tutorials/books.
  3. Open Layers is tile-based, with lots of features. However, I could not find a choropleth example. It may be more geared towards other uses.
  4. jQuery Mapael is vector-based, light-weight, and depends on another library, Raphael. Here’s a choropleth example.

Recommendations and next steps

Any next step would probably be to try a proof-of-concept with one or more of these libraries. As for recommendations, I’d would personally go with either Leaflet or D3, to start. Between those two I lean towards D3, because it doesn’t need a tile server. But ultimately I would recommend which ever one downloads the smallest amount assets needed to get the job done, with the most easily maintainable integration code. That wouldn’t be clear until trying them out.

As always, any feedback is welcome.

@brockfanning
Copy link
Contributor Author

@Kali2017SDG @SmithersA @philipashlock I've put up an ongoing proof-of-concept using D3 on my fork, using the test data that Kali provided for 8-1-1. You can see it by clicking on the "Map" tab after going here.

A few notes:

  • So far I have only implemented a map and year-switcher. I didn't implement any disaggregation of the charts or tables, because that has already been implemented in the UK platform. My next step for this proof-of-concept is to look at the feasibility of using the UK platform's filtering/disaggregation code.
  • Along those lines I still need to tweak the code to make it more abstract and less US-specific. I also need to hook the map's text into the translation system.
  • The map is not yet responsive, so I don't recommend testing on a phone, at this point.
  • Is North Dakota 2012 accurate? If so they really skew the coloring system. :)

Let me know if run into any trouble testing it. As always, feedback is welcome.

@AnnCorp
Copy link

AnnCorp commented Jan 15, 2018

Hi @brockfanning just wondering what did you use to produce the test map for 8.1.1.?

@brockfanning
Copy link
Contributor Author

@AnnCorp That was done with D3. I have more work to do to make it truly nation-agnostic, but eventually it should be theoretically usable in the UK platform. It relies on the same data format (tidy) that you use. If you'd like a sneak peak at the code, the relevant files would be:

@AnnCorp
Copy link

AnnCorp commented Jan 31, 2018

Hi @brockfanning just wondering how things are going with this? Hoping the UK NRP developers will be exploring this kind of thing very soon and so planned to point them at this ticket and specifically your proof of concept http://brock.tips/sdg-indicators/8-1-1/. Anything else you think should be flagged up at the moment? Any further advice or information would be very welcome - thank you!

@brockfanning
Copy link
Contributor Author

Hi @AnnCorp it's going well. I've got the proof-of-concept more abstracted now, so that the country-specific code is decoupled from the general mapping code. At this point I've stopped working on the functionality and have turned to getting more subnational data for the US. Here are the indicators done so far:

One statistical/tech note I wanted to mention about these - on some I noticed that extreme outliers could skew the coloring system in a way that made the choropleth map less useful. One example was North Dakota in 2012 on 8-1-1, and another example was District of Columbia for most of the years on 3-3-1. Because these outliers were so much higher than all the other regions, the map pretty much turned into only 2 colors: one for the outlier, and one for everything else.

The approach I've got in place on these proofs of concept is, for the purposes of the coloring system, to ignore any regions whose values are outside of 3 standard deviations from the mean. Here is the code that does that. The outliers are still displayed accurately on the map, but their values aren't included in the color legend.

@Kali2017SDG
Copy link

Explanation of North Dakota outlier (Survey of Current Business, July 2013. p.116... note: statistics have been revised since this initial release via incorporating available source data):
Mining has increased in importance in North Dakota’s economy as a result of the oil boom due to the recovery of oil from the Bakken region’s shale formation; in 2009, mining accounted for 3.5 percent of North Dakota’s current-dollar GDP, and in 2012, mining’s share had nearly tripled, accounting for 9.6 percent of the state’s current-dollar GDP.

@brockfanning
Copy link
Contributor Author

@Kali2017SDG Ah, good to know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants