Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add export functionality #4

Open
waldoj opened this issue Dec 8, 2014 · 8 comments
Open

Add export functionality #4

waldoj opened this issue Dec 8, 2014 · 8 comments

Comments

@waldoj
Copy link
Member

waldoj commented Dec 8, 2014

Something important and missing from this proposal is export functionality. We want it to be simple for somebody to take their data out of such a system to bring it over to another host. No doubt sites are going to outgrow this setup. They need to be able to take all of their data with them to move to a larger host that can accommodate customizations, larger amounts of traffic, etc. That host might be CKAN, but it might be Socrata or Junar or DKAN or whatever else.

@rossjones
Copy link

CKAN already has support for RDF/CSV/JSON dumps of the datasets, but this is currently a CLI command (db dump, and rdfexport respectively).

It's often something that you don't want to do at runtime, particularly if you have a log of packages, but as a nightly task, or a one-off not directly in the http request flow it can definitely work.

Perhaps just extending the dumps to ensure they contain the organisation/usernames etc might work?

@waldoj
Copy link
Member Author

waldoj commented Dec 10, 2014

Perhaps just extending the dumps to ensure they contain the organisation/usernames etc might work?

I suspect strongly that this is all that's going to be required. We'll certainly study closely that existing functionality. I've never had to export data from CKAN, but only read about it—I'll log into my CKAN instance in the morning and try it out. :) Thanks for your insights on this!

@jqnatividad
Copy link
Contributor

We should also consider exporting the installation metadata i.e. CKAN version, config, plugins installed, disk space required, etc. as part of the export.

In that way, a user can move to another CKAN provider with confidence that the import will take.

For export, maybe we also can leverage data.json, especially for migrations going to another non-CKAN system.

@waldoj
Copy link
Member Author

waldoj commented Dec 19, 2014

I've been playing with CKAN's various command-line export functions, and I think it's most of the way there. Exporting datasets is pretty good (with paster db simple-dump-json -c /etc/ckan/default/production.ini my_datasets.json), and exporting users works well enough.

But only dataset metadata is exported. CKAN's docs instruct us to edit Apache's config file to export data, with what's basically a hack—disabling the file handler for the directory where datasets are stored, and turning on directory listings. So you don't actually get the files in an export—you get a directory listing where you can right-click on each file and save it. And then you have to correlate it with the exported metadata, which is possible only awkwardly—the filename (e.g., 8f-4995-4709-9d13-a683693dd8ac) is a string that serves as the resource ID (e.g., 1797da8f-4995-4709-9d13-a683693dd8ac).

Proper export functionality necessitates including all of the files and providing a direct correlation between the filename and an identifier within the exported metadata. Also, the config file that drives the site (e.g., development.ini), in case the exported data is going to be used in another CKAN site.

@waldoj
Copy link
Member Author

waldoj commented Dec 19, 2014

We've just published an RFP for this small aspect of the overall project. Bidding is open through December 31.

@wardi
Copy link

wardi commented Dec 19, 2014

https://github.com/ckan/ckanapi might also be a good place to start. It can already export and import dataset, group and org metadata with multiple connections in parallel. It's also MIT licensed.

@waldoj
Copy link
Member Author

waldoj commented Dec 22, 2014

I had no idea that ckapapi had that particular functionality! I'll add that to the RFP as a possible path—thanks so much, @wardi.

@wardi
Copy link

wardi commented Mar 20, 2015

Work on this is close to complete ckan/ckanapi#37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants