Skip to content

file-uploader uploads a large file to s3 from an given download URL.

Notifications You must be signed in to change notification settings

CameronXie/file-uploader

Repository files navigation

File Uploader

Test

This project demonstrates an approach to leverage S3 multipart upload and Step Functions (Distributed Map) to concurrently download a large file, up to 100TB (10GB * 10,000) in theory, from any given url (request range is required), and upload to a S3 bucket.

This project also demonstrates a way to host the source code in CodeCommit and deploy via CDK Pipeline.

Solution

Components

  • Partitioner An Python Lambda take URL and SingleTaskSize as input, fetches the total download file size from given url. Based on given single task size, split the upload task into smaller tasks, and pass tasks to next state.
  • Uploader An Python Lambda is triggered by Step Functions leverages request range to download a portion of file, and upload to S3 by using multipart upload.
  • Step Functions An state machine handles tasks validation, fan-out, retry and error handling, also handles S3 multipart upload create, complete and abort.

Diagram

File Uploader

Test

Simply run make test to run lint and unit test on Partitioner and Uploader.

Deploy

Prerequisites

  • An AWS IAM user account which has enough permission to deploy:
    • CodeCommit
    • CodeBuild
    • CodePipeline
    • Step Functions
    • Lambda
    • S3
  • Set up AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION and CDK_DEFAULT_ACCOUNT in .env file.

Deploy with Docker

This project is using AWS CodeCommit to host source code and CDK Pipeline to deploy. Simply run make ci-deploy to run lint, build, create new repository in CodeCommit, push source code and deploy the project CDK Pipeline.

Example

An example Step Functions payload below to upload an awscli file to S3.

{
  "URL": "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip",
  "SingleTaskSize": 6000000
}

Releases

No releases published

Packages

No packages published