scripting manual source downloads #689

andrewharvey · 2018-03-06T05:56:45Z

There are quite a few sources where you need to manually download fresh data, for these OA provides https://results.openaddresses.io/upload-cache which caches these upstream files on S3.

This is very time consuming and results in OA always lagging behind the upstream source.

What do people think about trying to automate this? I'm thinking of a Node script using https://github.com/GoogleChrome/puppeteer for each source where this is needed.

I'm happy to work on the puppeteer scripts but we'd need machine to actually run these. What do people think about this?

If not, then what do people think about a change to the https://results.openaddresses.io/upload-cache to have it produce a curl command line you can run instead of uploading files through the browser.

My workaround for slow upload speeds is to do things on a remote server which means running this script in the Console when logged into https://results.openaddresses.io/upload-cache.

function curlCommand(file) {
    var form = new FormData(document.querySelector('form[action="https://s3.amazonaws.com/data.openaddresses.io"]'));
    var curl = "curl -v -X POST"
        for (var pair of form.entries()) {
            curl += " -F '" + pair[0] + '=' + pair[1] + "'";
        }
    curl += ' https://s3.amazonaws.com/data.openaddresses.io'
    curl = curl.replace('[object File]', '@' + file);
    curl = curl.replace('${filename}', file);
    return curl;
}

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripting manual source downloads #689

scripting manual source downloads #689

andrewharvey commented Mar 6, 2018

scripting manual source downloads #689

scripting manual source downloads #689

Comments

andrewharvey commented Mar 6, 2018