Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider refactoring to use boto3 instead of awscli #28

Open
myersjustinc opened this issue Feb 2, 2021 · 3 comments
Open

Consider refactoring to use boto3 instead of awscli #28

myersjustinc opened this issue Feb 2, 2021 · 3 comments

Comments

@myersjustinc
Copy link

I was talking with @ssempervirens earlier about her work on #25, and I got to thinking: Rather than straining against the limits of what we can do through shelling out to the AWS CLI like we currently do (e.g., #6, #7, #20, #22), should we instead use the underlying SDK (boto3) and implement some of these features in a more Pythonic (and ultimately more flexible) way?

It'd be a relatively serious overhaul, but I think it'd just about all be confined to the datakit_data.s3 module. Credential management is likely to be the biggest challenge.

@zstumgoren
Copy link
Contributor

@myersjustinc The original motivation for shelling out to AWS CLI is that it allowed us to get up and running with a wide range of features that could be potentially time consuming to re-implement using a Python-level sdk (most notably the speed and flexibility of the bi-directional synchronization operations). That said, it's always made me itch that we tied ourselves so closely to AWS tooling. A Python-level SDK would be a step in the right direction, but lately I've been thinking an even better solution would be to use a generic cross-platform library such as Apache libcloud to allow support for a variety of cloud storage providers. I haven't done a great deal of research into options on cross-platform solutions, but just throwing this out there as a possible alternative that might allow us to go beyond support for a single platform.

@myersjustinc
Copy link
Author

@zstumgoren: Definitely interested to hear about libcloud—I hadn't seen that sort of thing before, but it does seem like a reasonable alternative, and it actually resolves another concern I had. (And it might be the first Apache project I've seen that isn't in Java!)

@zstumgoren
Copy link
Contributor

A few other alternatives worth considering:

Git-LFS is compelling because it effectively eliminates the need to manage an external storage system while getting all the benefits of version control, branching for files. But it does have certain restrictions for file size (2-5GB depending on account type) and ties you to supporting platforms such as BitBucket or GitHub.

DVC, which @meghanhoyer alerted me to, provides similar versioning benefits while allowing you to use a variety of cloud providers.

I can see arguments for both, but personally would likely lean towards dvc. Perhaps something we should discuss with your team and @meghanhoyer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants