Add in selective pushing/pulling from s3 #23

larryfenn · 2019-12-19T19:03:00Z

Right now, datakit data pushes and pulls the entire contents of the data directory, even for files that haven't changed. This causes pushes and pulls to take much more time than they should in some projects.

Instead, datakit data should only push and pull files that have changed between disk and s3. Is there a s3 flag that we can pass through (and then make the default)?

zstumgoren · 2019-12-19T19:23:18Z

@larryfenn The push/pull commands delegate to the AWS cli sync command, which should only send new or updated files to the target destination.

When you run datakit data [push|pull], the exact AWS cli command that is being executed should be printed to the shell along with the list of files that were uploaded/downloaded.

Can you run a test in your shell that demonstrates a case where files that have not been changed are actually being synced and paste the shell session contents here?

zstumgoren · 2019-12-19T19:28:39Z

@larryfenn One additional thought -- I wonder if the time delay you mentioned is not in fact due to sending of unchanged files, but in performing a diff operation to determine what has actually changed between the source and destination (akin to the long wait you might experience when using rsync). If you can send a shell session where you experienced a long wait time, that could help pinpoint the nature of the issue/bug.

zstumgoren · 2019-12-19T19:41:36Z

@larryfenn Sorry, one final thought as a stop-gap workflow. The push|pull commands only support pass-through of boolean flags to the underlying AWS utility. For example: datakit data push delete (which would delete any files on S3 that no longer exist locally).

But the AWS sync command also has --include and --exclude flags that use patterns to target/exclude files for the sync operation. As a temporary workaround, you might want to run datakit data push dryrun to obtain the AWS command on the shell. Then update that raw AWS command with an --include or --exclude flag that minimizes the delay. Bit of a kluge, but might speed things up for the time being until we get to the root of the underlying problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add in selective pushing/pulling from s3 #23

Add in selective pushing/pulling from s3 #23

larryfenn commented Dec 19, 2019

zstumgoren commented Dec 19, 2019

zstumgoren commented Dec 19, 2019 •

edited

Loading

zstumgoren commented Dec 19, 2019

Add in selective pushing/pulling from s3 #23

Add in selective pushing/pulling from s3 #23

Comments

larryfenn commented Dec 19, 2019

zstumgoren commented Dec 19, 2019

zstumgoren commented Dec 19, 2019 • edited Loading

zstumgoren commented Dec 19, 2019

zstumgoren commented Dec 19, 2019 •

edited

Loading