Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Functionality: Input Keys from a File to Download #103

Open
rajivchodisetti opened this issue Aug 18, 2018 · 4 comments
Open

New Functionality: Input Keys from a File to Download #103

rajivchodisetti opened this issue Aug 18, 2018 · 4 comments

Comments

@rajivchodisetti
Copy link

HI,

Can we have this additional functionality where list of keys to be downloaded can be provided via a file through an additional argument and for simplicity this input file can be an other Blob as well.

Use case, we have millions of small files(images) which are required for training on a need basis, so it would be very handy for this use case.

Right-now am relying on Python multiprocessing for the same but I think Go would be much faster based on the experience using your module.

Thanks

@rajivchodisetti rajivchodisetti changed the title New Functionality: Input Keys from a File to Downloads New Functionality: Input Keys from a File to Download Aug 18, 2018
@rajivchodisetti
Copy link
Author

Or Just guide me how to do it, I will try to hack it

@giventocode
Copy link
Contributor

Currently you can download multiple files based on the prefix. If this does not work for your scenario, more details on why not would be helpful.

Nevertheless, there's already an enhancement request to make inputs of the -n and -f options available via a file. The current thinking on this (feedback is welcome) is to support something like this: -f @myfile. To implement this functionally, a new parsing-validation rule would need to be implemented here. The validation rule would detect the case when the file is provided, read/validate the content and derive the pipeline parameters from it.

@rajivchodisetti
Copy link
Author

Sry for the delay in Reply, Regarding your first question on why does prefix based download doesn't work, it does work for my use case because, am maintaining an Index (Database) of Blob keys where depending on the search criteria on top of the database a bunch of output keys will be emitted and for those keys data has to be downloaded.
For example, we crawl millions of images and for each image there will be multiple other images associated, like one where the entire background is removed and only the Apparel is visible, one where the thumbnail is generated for the original image and these assets are stored in the Blob storage and I maintain an Index of keys in the database and a search query on My database might look like, give me all those Blob keys for which thumbnails are generated in the last 2 days and the output of query is nothing but bunch of blob keys for which the assets has to be downloaded

@rajivchodisetti
Copy link
Author

Any chance of this getting picked up ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants