Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What else needs to be done? What's missing? #3

Open
waldoj opened this issue Dec 8, 2014 · 10 comments
Open

What else needs to be done? What's missing? #3

waldoj opened this issue Dec 8, 2014 · 10 comments
Labels

Comments

@waldoj
Copy link
Member

waldoj commented Dec 8, 2014

Did we get everything? Does anything else need to be done in order to accomplish our goal?

@waldoj waldoj added the question label Dec 8, 2014
@jqnatividad
Copy link
Contributor

Beyond making CKAN more cloud-ready, you may want to take this opportunity to address some items on the wishlist as well

https://github.com/ckan/ideas-and-roadmap

As a lot of these ideas already come from the CKAN global community and these signals highlight the pain-points and gaps from various CKAN implementations.

@jqnatividad
Copy link
Contributor

The multisite project will go a long way towards democratizing Open Data for sure, and having mechanisms to make sure that our Open Data future works like the web (data portals linked to one another), you may also want to think about the data layer, and what ODI can do in this project, to encourage federation.

Some other items that come to mind:

  • data.json, and Project Open Data compliance (we forked Joshua's HHS implementation as its HHS specific - https://github.com/HHS/ckanext-datajson. Perhaps, make it more generic and easier to configure with a web interface?)
  • schema.org integration - especially now that it supports http://schema.org/DataCatalog and http://schema.org/Dataset). People look for data using search engines.
  • Analytics/API mgmt dashboard dashboard - with API tracking, and quota/throttling capabilities. In opendata.city, we're doing this by extending existing Google Analytics support to recognize API use, as well as integrating 3scale API usage. We're even thinking of getting some metrics from GA and automatically exposing it as Open Data for each instance.
  • you may also want to look into extensions like https://github.com/open-data/ckanext-scheming which promotes schema sharing/standardization amongst CKAN instances

@rgradeck
Copy link

Would be great to see a blog post about what you learned from this process once you're a little further on your way.

@waldoj
Copy link
Member Author

waldoj commented Dec 10, 2014

You bet!

@jqnatividad
Copy link
Contributor

You may want to also consider some form of facilitated federation. That is, as each new CKAN multisite instance is spun up, the publisher can optionally be prompted to be included in a central registry.

This registry can show the publisher's information. The publisher can even ask that his catalog metadata is available for harvesting.

Future iterations of the registry can even support federated search across catalogs.

@waldoj
Copy link
Member Author

waldoj commented Dec 16, 2014

Yes. Dataset registries are really important. This is quite likely a thing that is missing, for the intended purpose of this project. After all, the host of each CKAN Multisite server surely wants to keep up with all datasets hosted within the sub-sites. (I know that's not quite what you're describing, but it's the same mechanism.) I intend to bundle the ckanext-datajson extension with this, so of course the host could just poll each site's /data.json file, but a mechanism to (optionally!) ping one or more URLs when a dataset is added or updated seems potentially very useful.

@jqnatividad
Copy link
Contributor

If this project achieves its main goal, its foreseeable that there will thousands, if not millions of CKAN data repositories, hopefully, organically linked and federated with each repo ideally closest to and populated by the data producer.

So registries are really a necessary part of the project as discoverability issues will naturally follow this explosion of data repos.

From the data consumer side, ODI may want to think about how to create facilitated discovery.

Beyond making sure that ckanext-datajson is bundled in, an effort should be made to automate the creation of expressive catalog/dataset metadata the default, rather than just the barebones, manually-entered metadata.

ODI may even want to go further than Project Open Data guidance and include additional metadata that can be automatically computed and used for discovery - like bounding boxes for geospatial data, and the date range of a dataset.

In our implementation, we try to do this by contextualizing datasets through time and place, along with the usual tags and good metadata publishing practices as espoused by Project Open Data.

@rgradeck
Copy link

One other thing that comes to mind, but not necessarily for this 1st round is the ability for administrators to scan datasets within the repository for PII or other sensitive info (ideally prior to going public). We can encourage good practices re. records management, but an additional layer of protection would be welcome.

@wardi
Copy link

wardi commented Mar 20, 2015

@jqnatividad some of this is related to ckan/ideas#48 and your ticket ckan/ideas#59

@jqnatividad
Copy link
Contributor

Yes. Looking forward to https://github.com/boxkite/ckan-multisite.

Hopefully, it will lay the groundwork for these ideas along with making admin easier in general https://github.com/opendata/CKAN-Multisite/issues/8 and multsite admin can be extended to manage other CKAN ini settings as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants