Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/epic 1 us 2 #64

Merged
merged 27 commits into from
May 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
b5eaf1a
Merge pull request #62 from HicResearch/main
awskaran May 19, 2022
e0fab2e
Updated Arch documentation
awskaran May 19, 2022
9668f50
Design considerations updated
awskaran May 19, 2022
d45de1c
Costs for TRE
awskaran May 19, 2022
c7fd3ac
Update doc/architecture/Design-Considerations.md
awskaran May 23, 2022
e26b6cf
Update doc/architecture/Design-Considerations.md
awskaran May 23, 2022
2721da4
Update doc/architecture/Design-Considerations.md
awskaran May 23, 2022
508316c
Update doc/architecture/Design-Considerations.md
awskaran May 23, 2022
5a4daf0
Update doc/architecture/Design-Considerations.md
awskaran May 23, 2022
b05c68f
Update doc/architecture/Architecture.md
awskaran May 23, 2022
d9ddee1
Update doc/architecture/Architecture.md
awskaran May 23, 2022
a3221ab
Update doc/architecture/Architecture.md
awskaran May 23, 2022
8ab1cd7
Update doc/architecture/Architecture.md
awskaran May 23, 2022
5f07fad
Update doc/architecture/Design-Considerations.md
awskaran May 23, 2022
7781be2
Update doc/architecture/Cost.md
awskaran May 23, 2022
072772d
Update Cost.md
awskaran May 23, 2022
44d06e6
Update Architecture.md
awskaran May 23, 2022
3a70647
Update doc/architecture/Architecture.md
awskaran May 23, 2022
27a4657
Update doc/architecture/Architecture.md
awskaran May 23, 2022
008b286
Update doc/architecture/Architecture.md
awskaran May 23, 2022
d5cfc3a
Update doc/architecture/Architecture.md
awskaran May 23, 2022
90e05ba
Update doc/architecture/Architecture.md
awskaran May 23, 2022
ed0f9d3
Update doc/architecture/Architecture.md
awskaran May 23, 2022
7da1e31
Apply suggestions from code review
awskaran May 23, 2022
67ec5eb
Update Architecture.md
awskaran May 23, 2022
94e56cf
Updated Arch diagram
awskaran May 23, 2022
1fb52b9
update diagram for org structure
awskaran May 23, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 77 additions & 42 deletions doc/architecture/Architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,18 @@

---

## TREEHOOSE (TRE)
This document explains the high level architecture of
the Trusted Research Environment that would be deployed
on AWS Cloud following the installation
steps in this repository.

---

TREEHOOSE is the Trusted Research Environment (TRE) implementation
that will be deployed for each research project.
Deploying the solution with the **default parameters**
builds the following environment in the AWS Cloud.

![TREEHOOSE Architecture](../../res/images/TREEHOOSE-architecture.png)

### Overview
## Overview

---

awskaran marked this conversation as resolved.
Show resolved Hide resolved
The TREEHOOSE solution is formed of
[Service Workbench on AWS](https://aws.amazon.com/government-education/research-and-technical-computing/service-workbench/)
and a data lake that together provides the building blocks
and a data lake that together provide the building blocks
for the Trusted Research Environment (TRE) capability.
[AWS Control Tower](https://aws.amazon.com/controltower/) provides the scalable
multi-account setup for managing TRE implementations at scale in AWS Cloud.
Expand All @@ -31,29 +25,69 @@ provides optional add-on components to enable
- Workspace backups
- Budget controls

### Solution Overview
TREEHOOSE is the Trusted Research Environment (TRE) implementation
that will be deployed for each research project.
Deploying the solution with the **default parameters**
builds the following environment in AWS Cloud.

![TREEHOOSE Architecture](../../res/images/TREEHOOSE-architecture.png)
awskaran marked this conversation as resolved.
Show resolved Hide resolved

The solution uses Infrastructure as Code for deployment.
Additional sections in this document provide additional details about each component. Below is a brief explanation
of the numbered steps in the diagram.

1. TRE Data Managers use AWS Management console to upload
data to the TRE Data Lake to be used for research.
1. IT Administrators use the Service Workbench web application
to administer resources in the TRE environment.
1. The budget controls component is used to set budget limits for the TRE
project. IT Administrators can set the budget and any actions
to be taken when the budget thresholds are breached.
1. Backup functionality for research workspaces can also be
enabled. IT Admins can monitor
these through AWS Backup.
1. Data Managers and IT Administrators can work together to provide researchers with access to relevant
data sets from the data lake.
1. Researchers can create and connect to approved workspaces through the Service Workbench web application.
They get secure access to compute resources using
Amazon AppStream 2.0.
1. On research completion the researcher can request egress of
research results.
1. The egress request is processed through a Data Egress App add-on
with a comprehensive review process with multiple approvers
before the data is available for download.
1. Egress requests that are approved can be downloaded by Data Egress Managers
and shared with the Researcher who requested the data egress.
There is a configurable limit to the number of downloads which can be made.
1. Audit & Compliance teams get full visibility into all
user activities resulting in AWS API calls through centralised
CloudTrail logs. Additionally, they get breakglass
access to all TRE projects/accounts in the TRE through
a Lambda function role in the Audit account.

## Component Overview

---

#### *AWS Control Tower*
### *AWS Control Tower*

---

awskaran marked this conversation as resolved.
Show resolved Hide resolved
Using TREEHOOSE implemenation a user should be able to run multiple
isolated projects and trusted research environment in parallel
and scale according the organisation's research needs.
Using the TREEHOOSE implementation allows a user to run multiple isolated
TRE projects in parallel and to scale according to the organisation's research needs.

To enable TREEHOOSE implementation to support scalable research workloads
, meet the organization’s security and auditing requirements, and evolve with business requirements
it uses AWS Control Tower to set up and govern a secure,
The TREEHOOSE TRE implementation supports scalable research workloads,
aims to meet an organization’s security and auditing requirements,
and can evolve with the business demands.
To meet this goal, an AWS Control Tower provides the setup to govern a secure,
multi-account AWS environment, called a landing zone.

Below is the high-level Organization Unit and Account Structure
Below is the high-level Organizational Unit and Account Structure
that will be setup by using the TREEHOOSE solution.

awskaran marked this conversation as resolved.
Show resolved Hide resolved
![Multi-account structure](../../res/images/multi-account-setup.png)
awskaran marked this conversation as resolved.
Show resolved Hide resolved

awskaran marked this conversation as resolved.
Show resolved Hide resolved
#### *Service Workbench on AWS Solution*
### *Service Workbench on AWS Solution*

---

Expand All @@ -73,32 +107,30 @@ Key Components :
(more services as desired; this is customisable by providing Service Catalog templates).
- For the secure access environment: AWS AppStream 2.0

#### *Datalake*
### *Data Lake*

---

awskaran marked this conversation as resolved.
Show resolved Hide resolved
TREEHOOSE uses a data lake setup that
uses [AWS Lake Formation](https://aws.amazon.com/lake-formation/)
under the hoods for creating a secure and scalable
data store for storing research data.
TREEHOOSE uses a data lake setup that leverages AWS Lake Formation
to create a secure and scalable data store for storing research data.
A data lake is a centralized, curated, and secured repository that stores all your data,
both in its original form and prepared for analysis.
It creates a pre-configured data lake to be used for TRE data pipelines.
This is a mandatory add-on.

Key Components :

- AWS Lake Formation
- AWS Lake Formation, Amazon S3, AWS KMS, AWS Glue, Amazon Athena

#### *Data Egress Application*
### *Data Egress Application*

---

awskaran marked this conversation as resolved.
Show resolved Hide resolved
This add-on provides a data egress approval workflow
for researchers to take out data from TRE with the permission of multiple parties
(data manager, research IT, etc.).
The add-on is hosted as a web application supported by
backend infrastrucutre. Each add-on installation is tied
backend infrastructure. Each add-on installation is tied
to a specific TRE project.

The add-on provides a streamlined
Expand All @@ -115,9 +147,9 @@ Key Components :

- For the UI: AWS Amplify
- For the backend: AWS Step Functions, Amazon EFS,
AWS Lambda, Amazon DynamoDB, Amazon SES, Amazon S3, Amazon SNS, Amazon Cognito
AWS Lambda, Amazon DynamoDB, Amazon SES, Amazon S3, AWS KMS, Amazon SNS, Amazon Cognito, AWS AppSync

#### *Workspace backup*
### *Workspace backup*

---

awskaran marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -131,12 +163,15 @@ researchers to select whether they want to enable
periodic workspace backups when creating the workspace.

Only TRE administrators can control the backup frequency
and back retention periods. Also, any restore operations
and retention periods. Also, any restore operations
need to be performed by admins.

This add-on uses [AWS Backup](https://aws.amazon.com/backup/) for backing up block storage attached to
[Amazon EC2](https://aws.amazon.com/ec2/) based compute workspaces while it uses a be-spoke
implementation to backup [Amazon SageMaker Notebook Instances](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html)
This component uses:

- [AWS Backup](https://aws.amazon.com/backup/) for backing up block storage attached to
[Amazon EC2](https://aws.amazon.com/ec2/) based compute workspaces
- a be-spoke
implementation to backup [Amazon SageMaker Notebook Instances](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html)

Below diagrams explain how the backup solution works
for
Expand All @@ -150,23 +185,23 @@ for
Key Components:

- For the backend: AWS Step Functions,
AWS Lambda, Amazon CloudWatch Events, AWS CloudForamtion, AWS Backup, Amazon S3
AWS Lambda, Amazon CloudWatch Events, AWS CloudFormation, AWS Backup, Amazon S3

#### *Budget controls*
### *Budget controls*

---

awskaran marked this conversation as resolved.
Show resolved Hide resolved
Budget controls is an optional
add-on that allows administrators and finance stakeholders
component that allows administrators and finance stakeholders
of the TRE to stay on top of project finances.
This add-on can optionally be deployed for
This component can optionally be deployed for
each TRE project and allows to

- **Monitor** : set thresholds for sending budget alerts
- **Report** : sending notification on budget usage
- **Repond** : automate actions to avoid over-spending
- **Repsond** : automate actions to avoid over-spending

The add-on uses [AWS Budgets](https://aws.amazon.com/aws-cost-management/aws-budgets/)
The component uses [AWS Budgets](https://aws.amazon.com/aws-cost-management/aws-budgets/)
to plan and set expectations around TRE project costs.

Key Components:
Expand Down
93 changes: 92 additions & 1 deletion doc/architecture/Cost.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,100 @@

You are responsible for the cost of the AWS services used to run this solution.
As of January 2022, the cost for running this solution with the default settings
in the EU West (Ireland) AWS Region is approximately $X for TRE account with all add-ons.
in the EU West (London) AWS Region is approximately **$30** for TRE account with all add-ons.
Prices are subject to change.
For full details, see the pricing page for each AWS service used in this solution.

> **_NOTE:_** Many AWS Services include a Free Tier – a baseline amount of the service that customers can use at no charge.
> Actual costs may be more or less than the pricing examples provided.

The baseline cost is just for spinning up the infrastructure.
As the solution is based on Serverless architecture, you only
pay for what you use when you use.

Following factors will contribute to incremental costs for an actively used deployment or TRE account:

- Compute resources used by researchers in the form
of EC2 instances
- Volume of data stored in S3 buckets in the Data Lake account
- AppStream resources used by researchers to interact
with their research workspace
- Volume of backup data stored by AWS Backup

The cost of using and maintaining an AWS Control Tower
environment can be found [here](https://aws.amazon.com/controltower/pricing/).

The best place to calculate the cost of using this solution
is by using [AWS Pricing Calculator](https://calculator.aws/#/)
and putting in the correct usage information.

## Example cost table

---

The following table provides an example cost breakdown for deploying this
solution with the default settings in EU West (Ireland) AWS Region.

### Base Installation

An installation of TRE without any workspaces and users.

|AWS Service|Monthly cost|
|----|----|
|Networking services|$11|
|KMS|$6|
|Config|$4|
|CloudTrail|$3.5|
|EC2-other|$1.5|
awskaran marked this conversation as resolved.
Show resolved Hide resolved
|DynamoDB|$6|
|Service Catalog|$1|
|Step Functions|$0.09|
|Lambdas|$0.003|
|CloudFront|$0.0002|
|CloudWatch|$0.0003|
|Total|$33.0935|

### EC2 Usage

Below example is based on on-demand
pricing.
A researcher uses a workspace for 730 hours.

Example - 1
|AWS Service|Monthly cost|
|----|----|
|EC2 - t3.large|$66.58 |
|EBS - 10GB| $1.10|
|Total|$67.68|

Example - 2
|AWS Service|Monthly cost|
|----|----|
|EC2 - m6g.8xlarge|$1,004.48 |
|EBS - 80GB| $8.80|
|Total|$1,013.28|

### SageMaker Usage

A researcher use sagemaker notebook
for 730 hours on a project.

Example
|AWS Service|Monthly cost|
|----|----|
|SageMaker - notebook - ml.c5.large| $82.80|
|Total|$82.80|

### S3 storage

A researcher works on a 1 TB data study
and produces a 10 GB output to download.

Example
|AWS Service|Monthly cost|
|----|----|
|S3 - study data| $23.58 |
|Data Egress| $0.90|
|Total|$24.48|

All cost examples provided above are indicative.
29 changes: 26 additions & 3 deletions doc/architecture/Design-Considerations.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ Below are the key design considerations for TREEHOOSE

---

- All the core infrastructure is deployed using IaC (Infrastructure as Code).
- Maximised the use of AWS Serverless services for ease of operability and scalability.
- All the core infrastructure is deployed using [IaC (Infrastructure as Code)](https://docs.aws.amazon.com/whitepapers/latest/introduction-devops-aws/infrastructure-as-code.html).
- The solution is based on Serverless Architecture for ease of operability and scalability.

## Audit

Expand All @@ -19,6 +19,8 @@ Below are the key design considerations for TREEHOOSE
and the logs centralised for Auditing.
- [AWS Config](https://aws.amazon.com/config/) is enabled in all AWS accounts
and the config records centralised for Auditing.
- [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) is used for
log aggregation and metrics for each TRE project/AWS account.

## Security

Expand All @@ -27,4 +29,25 @@ Below are the key design considerations for TREEHOOSE
- Use [AWS KMS](https://aws.amazon.com/kms/) for encryption at-rest.
- Encryption in-transit is enabled for all AWS services where applicable
and also enabled for all API calls.
- For all [AWS IAM](https://aws.amazon.com/iam/) policies principle of least privilege has been followed.
- For all [AWS IAM](https://aws.amazon.com/iam/) policies the principle of least privilege has been followed.
- [AWS Accounts](https://aws.amazon.com/account/) provide well-defined billing and security boundaries.
Hence each research project should be hosted in a separate AWS account.

## Considerations for End Users

---

These are some additional decisions that the end user of
TREEHOOSE should make based on their functional and
non-functional requirements.

- Centralise and enable AWS Security services like:
- [AWS Security Hub](https://aws.amazon.com/security-hub/)
- [Amazon GuardDuty](https://aws.amazon.com/guardduty/)
- [Amazon Macie](https://aws.amazon.com/macie/)
- [AWS IAM Access Analyzer](https://docs.aws.amazon.com/IAM/latest/UserGuide/what-is-access-analyzer.html)

- Enable [AWS Web Application Firewall](https://aws.amazon.com/waf/) for Web Applications.
- Enable additional [Control Tower Guardrails](https://docs.aws.amazon.com/controltower/latest/userguide/guardrails.html).
- Use [Amazon EC2 reserved instances](https://aws.amazon.com/ec2/pricing/reserved-instances/).
- [Optimize](https://docs.aws.amazon.com/whitepapers/latest/best-practices-for-deploying-amazon-appstream-2/cost-optimization.html) how you use AppStream.
Binary file modified res/images/TREEHOOSE-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified res/images/multi-account-setup.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.