Skip to content

Commit

Permalink
feat: Add codespell config, pre-commit and dedicated workflow (to cat…
Browse files Browse the repository at this point in the history
…ch new typos) and get typos fixed
  • Loading branch information
askulkarni2 authored Dec 13, 2023
2 parents 241a241 + 8d35cdc commit 30ad93b
Show file tree
Hide file tree
Showing 54 changed files with 106 additions and 74 deletions.
6 changes: 6 additions & 0 deletions .codespellrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[codespell]
skip = .git,*.pdf,*.svg,go.sum,package-lock.json,*.css,.codespellrc,*.sql,website/package-lock.json
check-hidden = true
# some embedded images and known typoed outputs
ignore-regex = ^\s*"image/\S+": ".*|.*loopback adddress.*
# ignore-words-list =
22 changes: 22 additions & 0 deletions .github/workflows/codespell.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
name: Codespell

on:
push:
branches: [main]
pull_request:
branches: [main]

permissions:
contents: read

jobs:
codespell:
name: Check for spelling errors
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v3
- name: Codespell
uses: codespell-project/actions-codespell@v2
4 changes: 4 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,7 @@ repos:
- '--args=--only=terraform_workspace_remote'
- id: terraform_validate
exclude: docs
- repo: https://github.com/codespell-project/codespell
rev: v2.2.6
hooks:
- id: codespell
2 changes: 1 addition & 1 deletion ai-ml/emr-spark-rapids/addons.tf
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ module "eks_data_addons" {
#---------------------------------------------------------------
# Kubecost Add-on
#---------------------------------------------------------------
# Note: Kubecost add-on depdends on Kube Prometheus Stack add-on for storing the metrics
# Note: Kubecost add-on depends on Kube Prometheus Stack add-on for storing the metrics
enable_kubecost = var.enable_kubecost
kubecost_helm_config = {
values = [templatefile("${path.module}/helm-values/kubecost-values.yaml", {})]
Expand Down
2 changes: 1 addition & 1 deletion ai-ml/emr-spark-rapids/eks.tf
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ module "eks" {
instance_types = ["m5.xlarge"] # 4 vCPU and 16GB

ebs_optimized = true
# This bloc device is used only for root volume. Adjust volume according to your size.
# This block device is used only for root volume. Adjust volume according to your size.
# NOTE: Dont use this volume for Spark workloads
block_device_mappings = {
xvda = {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ filter:
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
# CATION: Donot use `cloudwatch` plugin. This Golang Plugin is not recommnded by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# CATION: Do not use `cloudwatch` plugin. This Golang Plugin is not recommended by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# cloudWatch:
# enabled: false

Expand Down
2 changes: 1 addition & 1 deletion ai-ml/jark-stack/terraform/eks.tf
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ module "eks" {

eks_managed_node_groups = {
# It's recommended to have a Managed Node group for hosting critical add-ons
# It's recommeded to use Karpenter to place your workloads instead of using Managed Node groups
# It's recommended to use Karpenter to place your workloads instead of using Managed Node groups
# You can leverage nodeSelector and Taints/tolerations to distribute workloads across Managed Node group or Karpenter nodes.
core_node_group = {
name = "core-node-group"
Expand Down
2 changes: 1 addition & 1 deletion ai-ml/jupyterhub/helm/aws-for-fluentbit/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ filter:
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
# CATION: Donot use `cloudwatch` plugin. This Golang Plugin is not recommnded by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# CATION: Do not use `cloudwatch` plugin. This Golang Plugin is not recommended by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# cloudWatch:
# enabled: false

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ proxy:
service.beta.kubernetes.io/aws-load-balancer-ip-address-type: ipv4

singleuser:
startTimeout: 1200 # 20 mins to spin up a notebook server for GPU inlcuding the image pull
startTimeout: 1200 # 20 mins to spin up a notebook server for GPU including the image pull
profileList:
- display_name: Data Engineering (CPU)
description: "PySpark Notebooks | Karpenter AutoScaling"
Expand Down
2 changes: 1 addition & 1 deletion ai-ml/jupyterhub/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ module "eks" {

eks_managed_node_groups = {
# It's recommended to have a Managed Node group for hosting critical add-ons
# It's recommeded to use Karpenter to place your workloads instead of using Managed Node groups
# It's recommended to use Karpenter to place your workloads instead of using Managed Node groups
# You can leverage nodeSelector and Taints/tolerations to distribute workloads across Managed Node group or Karpenter nodes.
core_node_group = {
name = "jupyterhub-node-group"
Expand Down
4 changes: 2 additions & 2 deletions ai-ml/trainium-inferentia/addons.tf
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ module "eks_blueprints_addons" {
#---------------------------------------
enable_aws_fsx_csi_driver = true
aws_fsx_csi_driver = {
# INFO: fsx node daemonset wont be placed on Karpenter nodes with taints without the following toleration
# INFO: fsx node daemonset won't be placed on Karpenter nodes with taints without the following toleration
values = [
<<-EOT
node:
Expand Down Expand Up @@ -440,7 +440,7 @@ resource "aws_launch_template" "trn1_lt" {
name = module.eks_blueprints_addons.karpenter.node_instance_profile_name
}

# Commented for visiblity to implement this feature in the future
# Commented for visibility to implement this feature in the future
# placement {
# tenancy = "default"
# availability_zone = "${local.region}d"
Expand Down
8 changes: 4 additions & 4 deletions ai-ml/trainium-inferentia/eks.tf
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ module "eks" {

# security group rule from all ipv4 to nodes for port 22
node_security_group_additional_rules = {
# Critical Secruity group rule for EFA enabled nodes
# Critical Security group rule for EFA enabled nodes
ingress_efa_self_enabled = {
description = "EFA-enabled self-referencing security group Ingress"
protocol = "-1"
Expand All @@ -55,7 +55,7 @@ module "eks" {
self = true
}

# Critical Secruity group rule for EFA enabled nodes
# Critical Security group rule for EFA enabled nodes
egress_efa_self_enabled = {
description = "EFA-enabled self-referencing security group Egress"
protocol = "-1"
Expand Down Expand Up @@ -248,7 +248,7 @@ module "eks" {
}
]

# Commented to investigate further as the node group creation is failing with palcement group
# Commented to investigate further as the node group creation is failing with placement group
# placement = {
# spread_domain = "cluster"
# groupName = "trn1-32xl-ng1"
Expand Down Expand Up @@ -458,7 +458,7 @@ module "eks" {
},
]

# Commented to investigate further as the node group creation is failing with palcement group
# Commented to investigate further as the node group creation is failing with placement group
# placement = {
# spread_domain = "cluster"
# groupName = "trn1-32xl-ng1"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,14 +33,14 @@ Now, you can access the Ray Dashboard from the URL Below
To test the Llama2 model, you can use the following command with a query added at the end of the URL.
This uses the GET method to get the response:

http://<NLB_DNS_NAME>/serve/infer?sentence=what is data parallelism and tensor parallelisma and the diffrences
http://<NLB_DNS_NAME>/serve/infer?sentence=what is data parallelism and tensor parallelisma and the differences


You will see an output like this in your browser:

```text
[
"what is data parallelism and tensor parallelisma and the diffrences between them?
"what is data parallelism and tensor parallelisma and the differences between them?

Data parallelism and tensor parallelism are both techniques used to speed up machine learning training on large datasets using multiple GPUs or other parallel processing units. However, there are some key differences between them:

Expand Down
2 changes: 1 addition & 1 deletion ai-ml/trainium-inferentia/fsx-for-lustre.tf
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ YAML

#---------------------------------------------------------------
# Static PV for FSx for Lustre
# Don't change the metdata.name `fsx-claim` as this is referenced in lib/trn1_dist_ddp.py script
# Don't change the metadata.name `fsx-claim` as this is referenced in lib/trn1_dist_ddp.py script
#---------------------------------------------------------------
resource "kubectl_manifest" "static_pv" {
yaml_body = <<YAML
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ filter:
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token

# CATION: Donot use `cloudwatch` plugin. This Golang Plugin is not recommnded by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# CATION: Do not use `cloudwatch` plugin. This Golang Plugin is not recommended by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# cloudWatch:
# enabled: false

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ proxy:
service:
type: ClusterIP
singleuser:
startTimeout: 1200 # 20 mins to spin up a notebook server for GPU inlcuding the image pull
startTimeout: 1200 # 20 mins to spin up a notebook server for GPU including the image pull
profileList:
- display_name: Trainium (trn1)
description: "Trainium | Karpenter AutoScaling"
Expand Down
8 changes: 4 additions & 4 deletions analytics/cdk/stream-emr-on-eks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ echo -e "\nIn web browser, paste the URL to launch the template: https://console
### CDK Deployment

#### Prerequisites
Install the folowing tools:
Install the following tools:
1. [Python 3.6 +](https://www.python.org/downloads/).
2. [Node.js 10.3.0 +](https://nodejs.org/en/)
3. [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-macos.html#install-macosos-bundled). Configure the CLI by `aws configure`.
Expand Down Expand Up @@ -113,12 +113,12 @@ rm -vf ${HOME}/.aws/credentials
curl https://raw.githubusercontent.com/aws-samples/stream-emr-on-eks/main/deployment/app_code/post-deployment.sh | bash
```
5. Wait for 5 mins, then check the [MSK cluster](https://console.aws.amazon.com/msk/) status. Make sure it is `active` before sending data to the cluster.
6. Launching a new termnial window in Cloud9, send the sample data to MSK:
6. Launching a new terminal window in Cloud9, send the sample data to MSK:
```bash
wget https://github.com/xuite627/workshop_flink1015-1/raw/master/dataset/nycTaxiRides.gz
zcat nycTaxiRides.gz | split -l 10000 --filter="kafka_2.12-2.8.1/bin/kafka-console-producer.sh --broker-list ${MSK_SERVER} --topic taxirides ; sleep 0.2" > /dev/null
```
6. Launching the 3rd termnial window and monitor the source MSK topic:
6. Launching the 3rd terminal window and monitor the source MSK topic:
```bash
kafka_2.12-2.8.1/bin/kafka-console-consumer.sh \
--bootstrap-server ${MSK_SERVER} \
Expand Down Expand Up @@ -172,7 +172,7 @@ aws emr-containers cancel-job-run --virtual-cluster-id $VIRTUAL_CLUSTER_ID --id
### 2. EMR on EKS with Fargate
Run the [same job](deployment/app_code/job/msk_consumer.py) on the same EKS cluster, but with the serverless option - Fargate compute choice.

To ensure it is picked up by Fargate not by the managed nodegroup on EC2, we will tag the Spark job by a `serverless` label, which has setup in a Fargate profile prevously:
To ensure it is picked up by Fargate not by the managed nodegroup on EC2, we will tag the Spark job by a `serverless` label, which has setup in a Fargate profile previously:
```yaml
--conf spark.kubernetes.driver.label.type=serverless
--conf spark.kubernetes.executor.label.type=serverless
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ for d in `find . -mindepth 1 -maxdepth 1 -type d`; do
mv $d $fname
cd $staging_dist_dir/$fname

# Build the artifcats
# Build the artifacts
if ls *.py 1> /dev/null 2>&1; then
echo "===================================="
echo "This is Python runtime"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def __init__(self, scope: Construct, id:str,
)


# 3. Add Spot managed NodeGroup to EKS (Run Spark exectutor on spot)
# 3. Add Spot managed NodeGroup to EKS (Run Spark executor on spot)
self._my_cluster.add_nodegroup_capacity('spot-mn',
nodegroup_name = 'etl-spot',
node_role = noderole,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ filter:
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token

# CATION: Donot use `cloudwatch` plugin. This Golang Plugin is not recommnded by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# CATION: Do not use `cloudwatch` plugin. This Golang Plugin is not recommended by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# cloudWatch:
# enabled: false

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ filter:
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token

# CATION: Donot use `cloudwatch` plugin. This Golang Plugin is not recommnded by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# CATION: Do not use `cloudwatch` plugin. This Golang Plugin is not recommended by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# cloudWatch:
# enabled: false

Expand Down
4 changes: 2 additions & 2 deletions analytics/terraform/emr-eks-fargate/addons.tf
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,13 @@ module "eks_blueprints_addons" {
resources = {
limits = {
cpu = "0.25"
# We are targetting the smallest Task size of 512Mb, so we subtract 256Mb from the
# We are targeting the smallest Task size of 512Mb, so we subtract 256Mb from the
# request/limit to ensure we can fit within that task
memory = "256M"
}
requests = {
cpu = "0.25"
# We are targetting the smallest Task size of 512Mb, so we subtract 256Mb from the
# We are targeting the smallest Task size of 512Mb, so we subtract 256Mb from the
# request/limit to ensure we can fit within that task
memory = "256M"
}
Expand Down
4 changes: 2 additions & 2 deletions analytics/terraform/emr-eks-karpenter/addons.tf
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ module "eks_blueprints_addons" {
#---------------------------------------
enable_aws_fsx_csi_driver = var.enable_fsx_for_lustre
aws_fsx_csi_driver = {
# INFO: fsx node daemonset wont be placed on Karpenter nodes with taints without the following toleration
# INFO: fsx node daemonset won't be placed on Karpenter nodes with taints without the following toleration
values = [
<<-EOT
node:
Expand Down Expand Up @@ -201,7 +201,7 @@ module "eks_data_addons" {
#---------------------------------------------------------------
# Kubecost Add-on
#---------------------------------------------------------------
# Note: Kubecost add-on depdends on Kube Prometheus Stack add-on for storing the metrics
# Note: Kubecost add-on depends on Kube Prometheus Stack add-on for storing the metrics
enable_kubecost = var.enable_kubecost
kubecost_helm_config = {
values = [templatefile("${path.module}/helm-values/kubecost-values.yaml", {})]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ spec:
spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
# EMRFS commiter
# EMRFS committer
spark.sql.parquet.output.committer.class: com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter
spark.sql.parquet.fs.optimized.committer.optimization-enabled: "true"
spark.sql.emr.internal.extensions: com.amazonaws.emr.spark.EmrSparkSessionExtensions
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ filter:
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
# CATION: Donot use `cloudwatch` plugin. This Golang Plugin is not recommnded by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# CATION: Do not use `cloudwatch` plugin. This Golang Plugin is not recommended by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# cloudWatch:
# enabled: false

Expand Down
2 changes: 1 addition & 1 deletion analytics/terraform/spark-k8s-operator/addons.tf
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ module "eks_data_addons" {
#---------------------------------------------------------------
# Spark History Server Add-on
#---------------------------------------------------------------
# Spark hsitory server is required only when EMR Spark Operator is enabled
# Spark history server is required only when EMR Spark Operator is enabled
enable_spark_history_server = true
spark_history_server_helm_config = {
values = [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ filter:
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token

# CATION: Donot use `cloudwatch` plugin. This Golang Plugin is not recommnded by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# CATION: Do not use `cloudwatch` plugin. This Golang Plugin is not recommended by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# cloudWatch:
# enabled: false

Expand Down
2 changes: 1 addition & 1 deletion analytics/terraform/spark-k8s-operator/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ module "eks" {
EOT

ebs_optimized = true
# This bloc device is used only for root volume. Adjust volume according to your size.
# This block device is used only for root volume. Adjust volume according to your size.
# NOTE: Don't use this volume for Spark workloads
block_device_mappings = {
xvda = {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ spec:
name: app-auth
backup:
barmanObjectStore:
# For backup, we S3 backet to store data.
# For backup, we S3 bucket to store data.
# On this Blueprint, we create an S3 check the terraform output for it.
destinationPath: s3://<your-s3-barman-bucket> # ie: s3://xxxx-cnpg-barman-bucket
s3Credentials:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ config:
# INHERITED_LABELS: environment, workload, app
# WATCH_NAMESPACE: namespace-a,namespace-b

# -- Additinal arguments to be added to the operator's args list.
# -- Additional arguments to be added to the operator's args list.
additionalArgs: []

serviceAccount:
Expand Down
2 changes: 1 addition & 1 deletion schedulers/terraform/argo-workflow/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Argo Worklfows on EKS
# Argo Workflows on EKS
Checkout the [documentation website](https://awslabs.github.io/data-on-eks/docs/job-schedulers/argo-workflows-eks) to deploy this pattern and run sample tests.

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
Expand Down
2 changes: 1 addition & 1 deletion schedulers/terraform/argo-workflow/addons.tf
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ module "eks_data_addons" {
#---------------------------------------------------------------
# Spark History Server Add-on
#---------------------------------------------------------------
# Spark hsitory server is required only when EMR Spark Operator is enabled
# Spark history server is required only when EMR Spark Operator is enabled
enable_spark_history_server = true
spark_history_server_helm_config = {
values = [
Expand Down
2 changes: 1 addition & 1 deletion schedulers/terraform/argo-workflow/eks.tf
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ module "eks" {
}

ebs_optimized = true
# This bloc device is used only for root volume. Adjust volume according to your size.
# This block device is used only for root volume. Adjust volume according to your size.
# NOTE: Don't use this volume for Spark workloads
block_device_mappings = {
xvda = {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ filter:
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token

# CATION: Donot use `cloudwatch` plugin. This Golang Plugin is not recommnded by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# CATION: Do not use `cloudwatch` plugin. This Golang Plugin is not recommended by AWS anymore instead use C plugin(`cloudWatchLogs`) for better performance.
# cloudWatch:
# enabled: false

Expand Down
Loading

0 comments on commit 30ad93b

Please sign in to comment.