Skip to content

This Project provides a sample implementation that will show how to leverage Amazon Athena from .NET Core Application using AWS SDK for .NET to run standard SQL to analyze a large amount of data in Amazon S3.

License

MIT-0 and 2 other licenses found

Licenses found

MIT-0
LICENSE
MIT-0
LICENSE-SAMPLECODE
Unknown
LICENSE-SUMMARY
Notifications You must be signed in to change notification settings

aws-samples/query-data-in-s3-with-amazon-athena-and-aws-sdk-for-dotnet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

How to use SQL to query data in S3 Bucket with Amazon Athena and AWS SDK for .NET

This Project provides a sample implementation that will show how to leverage Amazon Athena from .NET Core Application using AWS SDK for .NET to run standard SQL to analyze a large amount of data in Amazon S3. To showcase a more realistic use-case, it includes a WebApp UI developed using ReactJs. this WebApp contains components to demonstrate fetching COVID-19 data from API Server that uses AWS SDK for .NET to connect to Amazon Athena and run SQL Standard query from datasets on Amazon S3 files from a Data Lake account. This Data Lake account is the aws-covid19-lake account, made available on Registry of Open Data on AWS

Those ReatJs Components call .NET Core API that runs Amazon Athena Query, check the execution status, and list results. Each menu presents different views.

Menu option Testing By Date: Shows a filter by Date that presents a table with the following data: Date, State, Positive, Negative, Pending, Hospitalized, Death, Positive Increase

Menu option Testing By State: Shows a filter by State that presents a table with the following data: Date, State, Positive, Negative, Pending, Hospitalized, Death Positive Increase

Menu option Hospitals (Run&Go): Run a request to the API server, get 200 with the Query ID, check the status of the execution; when the execution it's completed, it presents a table with the following data: Name, State, Type, ZipCode, Licensed Beds, Staffed Beds, Potential Increase in Beds

Menu option Hospitals (Run&Go): Run request to the API server, wait for the result and presents a table with the following data: Name, State, Type, Zip Code, Licensed Beds, Staffed Beds, Potential Increase in Beds

Steps

To run this project follow the instructions bellow:

1) Deploy Glue Catalog & Athena Database/Tables

#1) Deploy
aws cloudformation create-stack --stack-name covid-lake-stack --template-url https://covid19-lake.s3.us-east-2.amazonaws.com/cfn/CovidLakeStack.template.json --region us-west-2

#2) Check deployment Status
aws cloudformation  describe-stacks --stack-name covid-lake-stack --region us-west-2

Below the result of status check, wait for "StackStatus": "CREATE_COMPLETE"

{
    "Stacks": [
        {
            "StackId": "arn:aws:cloudformation:us-west-2:XXXXXXXX9152:stack/covid-lake-stack/xxxxxxxx-100d-11eb-87ef-xxxxxxxxxxx",
            "StackName": "covid-lake-stack",
            "CreationTime": "2020-10-17T00:12:09.151Z",
            "RollbackConfiguration": {},
            "StackStatus": "CREATE_COMPLETE",
            "DisableRollback": false,
            "NotificationARNs": [],
            "Tags": [],
            "EnableTerminationProtection": false,
            "DriftInformation": {
                "StackDriftStatus": "NOT_CHECKED"
            }
        }
    ]
}

2) Create S3 bucket for Athena Result

#1) Deploy S3 Bucket
aws cloudformation deploy --stack-name athena-results-netcore --template-file ./src/cloud-formation-templates/s3-athena-result.template.yaml --region us-west-2

#2) Check deployment Status
aws cloudformation  describe-stacks --stack-name athena-results-netcore --region us-west-2

Below the result of status check, wait for "StackStatus": "CREATE_COMPLETE" and copy output Bucket Name "OutputValue": "s3://athena-results-netcore-s3bucket-xxxxxxxxxxxx/athena/results/", you will need this to run your code

{
    "Stacks": [
        {
            "StackId": "arn:aws:cloudformation:us-west-2:XXXXXXXX9152:stack/athena-results-netcore/xxxxxxxx-100c-11eb-889f-xxxxxxxxxxx",
            "StackName": "athena-results-netcore",
            "Description": "Amazon S3 bucket to store Athena query results",
            "CreationTime": "2020-10-17T00:02:44.968Z",
            "LastUpdatedTime": "2020-10-17T00:21:13.692Z",
            "RollbackConfiguration": {
                "RollbackTriggers": []
            },
            "StackStatus": "CREATE_COMPLETE",
            "DisableRollback": false,
            "NotificationARNs": [],
            "Outputs": [
                {
                    "OutputKey": "BucketName",
                    "OutputValue": "s3://athena-results-netcore-s3bucket-xxxxxxxxxxxx/athena/results/",
                    "Description": "Name of the Amazon S3 bucket to store Athena query results"
                }
            ],
            "Tags": [],
            "EnableTerminationProtection": false,
            "DriftInformation": {
                "StackDriftStatus": "NOT_CHECKED"
            }
        }
    ]
}

3) COVID-19 Analisys (optional)

Some SQL Query that you can try on your own using Amazon Athena Console UI. This step is optional for this demo, but it helps you explore and learn more about Amazon Athena using Console UI

-- The following query returns the growth of confirmed cases for the past 7 days joined side-by-side with hospital bed availability, broken down by US county:
SELECT 
  cases.fips, 
  admin2 as county, 
  province_state, 
  confirmed,
  growth_count, 
  sum(num_licensed_beds) as num_licensed_beds, 
  sum(num_staffed_beds) as num_staffed_beds, 
  sum(num_icu_beds) as num_icu_beds
FROM 
  "covid-19"."hospital_beds" beds, 
  ( SELECT 
      fips, 
      admin2, 
      province_state, 
      confirmed, 
      last_value(confirmed) over (partition by fips order by last_update) - first_value(confirmed) over (partition by fips order by last_update) as growth_count,
      first_value(last_update) over (partition by fips order by last_update desc) as most_recent,
      last_update
    FROM  
      "covid-19"."enigma_jhu" 
    WHERE 
      from_iso8601_timestamp(last_update) > now() - interval '200' day AND country_region = 'US') cases
WHERE 
  beds.fips = cases.fips AND last_update = most_recent
GROUP BY cases.fips, confirmed, growth_count, admin2, province_state
ORDER BY growth_count desc

--Last 10 records regarding Testing and deaths
SELECT * FROM "covid-19"."world_cases_deaths_testing" order by "date" desc limit 10;

-- Last 10 records regarding Testing and deaths with JOIN on us_state_abbreviations to list State name
SELECT 
   date,
   positive,
   negative,
   pending,
   hospitalized,
   death,
   total,
   deathincrease,
   hospitalizedincrease,
   negativeincrease,
   positiveincrease,
   sta.state AS state_abbreviation,
   abb.state 

FROM "covid-19"."covid_testing_states_daily" sta
JOIN "covid-19"."us_state_abbreviations" abb ON sta.state = abb.abbreviation
limit 500;

4) Build & Run .NET Web Application

  1. Go to the app root dir
cd ./src/app/AthenaNetCore/
  1. Create AWS Credential file, for security precaution the file extension *.env is added to .gitignore to avoid accidental commit
code aws-credentials-do-not-commit.env #You can use any text editor eg: vi -> vi aws-credentials-do-not-commit.env

Below example of env file content, replace the XXXX... with your real AWS Credential, and add to S3_RESULT the output result you got from steep 2)

AWS_DEFAULT_REGION=us-west-2
AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXXXXXX
AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
AWS_SESSION_TOKEN=XXXXX #(Optional, used only in case of temporary token, you'll need to remove this comment on the .env file)
S3_RESULT_BUCKET_NAME=s3://athena-results-netcore-s3bucket-xxxxxxxxxxxx/athena/results/ #paste the bucket name you've copied on the step 2, you'll need to remove this comment on the .env file)
  1. Build .NET APP using docker-compose
docker-compose -f ./docker-compose.yml build
  1. Run .NET APP docker-compose
docker-compose -f ./docker-compose.yml up
  1. Test .NET APP via URL http://localhost:8089/

  2. Clean up

# 1) Clean local resources
docker-compose down -v

# 2) Clean s3 objects created by Athena to store Results metadata
 aws s3 rm --recursive s3://athena-results-netcore-s3bucket-xxxxxxxxxxxx/athena/results/

# 3) Delete S3 bucket
aws cloudformation delete-stack --stack-name athena-results-netcore --region us-west-2

# 4) Delete Athena Tables
aws cloudformation delete-stack --stack-name covid-lake-stack

 

References

https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-19-data/

https://docs.aws.amazon.com/athena/latest/ug/code-samples.html

https://aws.amazon.com/blogs/apn/using-athena-express-to-simplify-sql-queries-on-amazon-athena/

https://docs.aws.amazon.com/sdk-for-net/v3/developer-guide/net-dg-config-creds.html

https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html

https://docs.aws.amazon.com/sdk-for-net/latest/developer-guide/creds-assign.html

https://github.com/awsdocs/aws-cloud9-user-guide/blob/master/LICENSE-SAMPLECODE

About

This Project provides a sample implementation that will show how to leverage Amazon Athena from .NET Core Application using AWS SDK for .NET to run standard SQL to analyze a large amount of data in Amazon S3.

Topics

Resources

License

MIT-0 and 2 other licenses found

Licenses found

MIT-0
LICENSE
MIT-0
LICENSE-SAMPLECODE
Unknown
LICENSE-SUMMARY

Code of conduct

Security policy

Stars

Watchers

Forks