GitHub - aws-samples/amazon-athena-queries-via-aws-lambda-cdk

Amazon Athena Queries via AWS Lambda CDK

This architecture showcases how Amazon Athena SQL queries can be executed via AWS Lambda using the Boto3 API. Additionally, this architecture can be fully deployed using AWS CDK and is designed to fit into a larger serverless architecture. The CDK stack configures and deploys a Lambda function with the appropriate IAM permissions to make Athena SQL queries on an S3 bucket. The query results can then be found in an S3 output bucket specified by the user. This architecture can be used if Athena queries need to be run on a regular, scheduled basis.

Prerequistes and limitations

Prerequistes

An active AWS account
An Amazon Simple Storage Service (Amazon S3) with pre-existing data
- Data to be queried by Athena should be available in an S3 bucket.
Amazon S3 data is cataloged via AWS Glue
- This can be done using a Glue crawler. For more information regarding this, refer to Using AWS Glue to connect to data sources in Amazon S3 from the Amazon Athena documentation.
Default output S3 bucket for Amazon Athena has been set
- Before running any queries in Athena, an output S3 bucket location in the same region must be set in Athena settings. For more information regarding this, refer to Specifying a query result location from the Amazon Athena documentation.
Amazon Athena workgroup
- If you do not have an existing Athena workgroup to use for querying, follow Setting up workgroups from the Amazon Athena documentation. We recommend using a workgroup that only has access to the tables used in the query.
Familiarity with deploying AWS resources using AWS CDK.
- For more information regarding this, refer to the AWS CDK Workshop.

Architecture

Target technology stack

S3 bucket (prerequisite) — contains data to be queries
Lambda function — executes Athena SQL queries via Boto3 API
IAM role for Lambda function — Lambda execution role with the proper permissions to query S3 via Athena and save results to specified S3 location. This role contains an access policy that follows the principal of least-privilege

Target architecture

Note: For simplicity, the input and output buckets are configured to be the same in this pattern. However, the user can optionally specify separate input and output buckets in the CDK code.

Automation and scale

AWS Lambda can be run on-demand or can be configured to run on a schedule using CloudWatch Events.

Getting started

Step 1. Clone Github repository

Clone this repo and configure the //TODO portions of the code found in the lib/athena-queries-via-lambda-stack.ts file with proper variables from your AWS environment

Step 2. Build the CDK app

The cdk.json file tells the CDK Toolkit how to execute your app.

Before getting ready to deploy, ensure the dependencies are installed by executing the following within the root folder of your code files:

npm install -g aws-cdk
npm install
npm run build

Note: The above commands should be run within the root folder containing the cdk.json file

Step 3. Deploy the CDK app

This stack uses assets, so the toolkit stack must be deployed to the environment. This can be done by running the following command:

cdk bootstrap aws://your-aws-account-id/your-specified-aws-region

At this point, you can now synthesize the CloudFormation template for this code by running the following command:

cdk synth

Finally, to deploy the stack to your AWS environment run the following command:

cdk deploy

Step 4. Verify Lambda function configuration

Navigate to the AWS Lambda console and look for the function created by the CDK stack. It should be named something like CdkStack-queryAthena followed by a series of numbers and letters.

Click on the Lambda function and open the “Configuration” tab. Next, click on “Environment Variables”. The environment variables should match what you filled out in the //TODO sections in the CDK code.

Step 5. Test Lambda function and verify Athena query results

If no test event exists for the Lambda function, create a new test event (fine to use the default, pre-populated JSON event). Click on “Test” and ensure the Lambda function executes successfully.

Next, navigate to the S3 bucket specified as the output location for Athena query results. Check that files have been saved to the specified output folder. Additionally, you can locally download the output file to verify the specific Athena SQL query.

Step 5. Clean up

Navigate to the root folder of the code files and run the following:

cdk destroy

This will destroy all the cloud infrastructure deployed by the CDK stack.

Tools

Amazon Simple Storage Service (Amazon S3) — used for data storage
AWS Lambda — serverless compute service, makes the API call to Athena
AWS Cloud Development Kit (CDK) — software development framework used to provision cloud resource
Amazon Athena (indirectly) — serverless, interactive analytics service, executes SQL query on S3
AWS Glue (prerequisite) — data catalog of available data, contains metadata for tables queried by Athena

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Authors

Siddharth Kumaran -- Assoc. Machine Learning Engineer @ AWS Professional Services
Ritika Raju -- Assoc. Cloud App Developer @ AWS Professional Services
Isabelle Imacseng -- Data & ML Engineer @ AWS Professional Services
Radhika Tallamraju -- Data & ML Engineer @ AWS Professional Services

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
bin		bin
lib		lib
resources/lambda		resources/lambda
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
architecture_diagram.png		architecture_diagram.png
cdk.json		cdk.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Amazon Athena Queries via AWS Lambda CDK

Prerequistes and limitations

Prerequistes

Architecture

Target technology stack

Target architecture

Automation and scale

Getting started

Step 1. Clone Github repository

Step 2. Build the CDK app

Step 3. Deploy the CDK app

Step 4. Verify Lambda function configuration

Step 5. Test Lambda function and verify Athena query results

Step 5. Clean up

Tools

Security

License

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

aws-samples/amazon-athena-queries-via-aws-lambda-cdk

Folders and files

Latest commit

History

Repository files navigation

Amazon Athena Queries via AWS Lambda CDK

Prerequistes and limitations

Prerequistes

Architecture

Target technology stack

Target architecture

Automation and scale

Getting started

Step 1. Clone Github repository

Step 2. Build the CDK app

Step 3. Deploy the CDK app

Step 4. Verify Lambda function configuration

Step 5. Test Lambda function and verify Athena query results

Step 5. Clean up

Tools

Security

License

Authors

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages