Skip to content

Conversation

@alexiswl
Copy link
Member

@alexiswl alexiswl requested a review from Copilot June 29, 2025 00:34
@alexiswl alexiswl self-assigned this Jun 29, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR deprecates the FastqUnarchiving and FastqSync manager stacks by migrating them into separate repositories, and updates the StatelessStackCollection to remove their instantiation.

  • Removed FastqUnarchivingManagerStack and FastqSyncManagerStack class imports and usages.
  • Retained only the props imports for configuration types.
  • Commented out all related stack creation code.
Comments suppressed due to low confidence (1)

lib/workload/stateless/statelessStackCollectionClass.ts:340

  • This duplicated commented instantiation block repeats the deprecation cleanup. Remove the redundant comments to keep the codebase clean.
      ...this.createTemplateProps(env, 'BclConvertManagerStack'),

FastqUnarchivingManagerStackProps,
} from './stacks/fastq-unarchiving/deploy';
import { FastqSyncManagerStack, FastqSyncManagerStackProps } from './stacks/fastq-sync/deploy';
import { FastqUnarchivingManagerStackProps } from './stacks/fastq-unarchiving/deploy';
Copy link

Copilot AI Jun 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FastqUnarchivingManagerStackProps import is no longer used in active code (only in commented sections). Consider removing it to reduce clutter and avoid unused imports.

Suggested change
import { FastqUnarchivingManagerStackProps } from './stacks/fastq-unarchiving/deploy';

Copilot uses AI. Check for mistakes.
Comment on lines +211 to +224
// this.fastqUnarchivingManagerStack = new FastqUnarchivingManagerStack(
// scope,
// 'FastqUnarchivingManagerStack',
// {
// ...this.createTemplateProps(env, 'FastqUnarchivingManagerStack'),
// ...statelessConfiguration.fastqUnarchivingManagerStackProps,
// }
// );

// this.fastqSyncManagerStack = new FastqSyncManagerStack(scope, 'FastqSyncManagerStack', {
// ...this.createTemplateProps(env, 'FastqSyncManagerStack'),
// ...statelessConfiguration.fastqSyncManagerStackProps,
// });

Copy link

Copilot AI Jun 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a large block of commented-out stack initialization for FastqUnarchiving, FastqSync, and Icav2DataCopyManager. Since these stacks are deprecated, remove the commented code to improve readability.

Suggested change
// this.fastqUnarchivingManagerStack = new FastqUnarchivingManagerStack(
// scope,
// 'FastqUnarchivingManagerStack',
// {
// ...this.createTemplateProps(env, 'FastqUnarchivingManagerStack'),
// ...statelessConfiguration.fastqUnarchivingManagerStackProps,
// }
// );
// this.fastqSyncManagerStack = new FastqSyncManagerStack(scope, 'FastqSyncManagerStack', {
// ...this.createTemplateProps(env, 'FastqSyncManagerStack'),
// ...statelessConfiguration.fastqSyncManagerStackProps,
// });
// Removed deprecated stack initialization for FastqUnarchivingManagerStack and FastqSyncManagerStack.

Copilot uses AI. Check for mistakes.
@alexiswl
Copy link
Member Author

alexiswl commented Jun 30, 2025

TODO LIST

DEVELOPMENT

  • Destroy orcabus fastq unarchiving stateful stack in development
  • Deploy new fastq unarchiving stateful stack in development
  • Destroy orcabus fastq unarchiving stateless stack in development
  • Deploy new fastq unarchiving stateless stack in development
  • Destroy orcabus fastq sync stateful stack in development
  • Deploy new fastq sync stateful stack in development
  • Destroy orcabus fastq sync stateless stack in development
  • Deploy new fastq sync stateless stack in development

STAGING

  • Destroy orcabus fastq unarchiving stateful stack in staging
  • Deploy new fastq unarchiving stateful stack in staging
  • Destroy orcabus fastq unarchiving stateless stack in staging
  • Deploy new fastq unarchiving stateless stack in staging
  • Destroy orcabus fastq sync stateful stack in staging
  • Deploy new fastq sync stateful stack in staging
  • Destroy orcabus fastq sync stateless stack in staging
  • Deploy new fastq sync stateless stack in staging

PRODUCTION

  • Destroy orcabus fastq unarchiving stateful stack in production
  • Deploy new fastq unarchiving stateful stack in production
  • Destroy orcabus fastq unarchiving stateless stack in production
  • Deploy new fastq unarchiving stateless stack in production
  • Destroy orcabus fastq sync stateful stack in production
  • Deploy new fastq sync stateful stack in production
  • Destroy orcabus fastq sync stateless stack in production
  • Deploy new fastq sync stateless stack in production

Related issues:

@alexiswl
Copy link
Member Author

Some additional migration notes:

Data Sharing Manager DynamoDB Migration Notes

Click to expand!

For DynamoDb, our database names are different, so instead we will need to perform a data migration.

  • data-sharing-packaging-api-table -> DataSharingPackagingApiTable
  • data-sharing-push-api-table -> DataSharingPushApiTable
  • data-sharing-packaging-lookup-table -> DataSharingPackagingLookupTable

From our experience with the Fastq Manager DynamoDB deployment, easier to download + batch and then reupload

Batch Write Item / DataSharingPackagingApiTable

We can however use the batch-write-item command (that allows blocks of 25)

date

TABLE_NAME="DataSharingPackagingApiTable"
BATCH_ITEM_LIST=25

aws dynamodb scan \
  --table-name data-sharing-packaging-api-table \
  --query 'Items' \
  --output json > data.json

db_length="$( \
  jq --raw-output \
    'length' < data.json
)"

for min_iter in $(seq 0 "${BATCH_ITEM_LIST}" "${db_length}"); do
  # Get max iter
  max_iter="$(( ${min_iter} + 25 ))"

  if [[ "${max_iter}" -gt "${db_length}" ]]; then
    max_iter="${db_length}"
  fi

  jq --raw-output \
    --arg tableName "${TABLE_NAME}" \
    --argjson min_iter "${min_iter}" \
    --argjson max_iter "${max_iter}" \
    '
      .[$min_iter:$max_iter] |
      {
        "\($tableName)": (
          . | map({"PutRequest": {"Item": .}})
        )
      }
    ' < data.json \
  > "request_items_iter.${min_iter}_${max_iter}.json"

  aws dynamodb batch-write-item \
    --no-cli-pager \
    --request-items "file://request_items_iter.${min_iter}_${max_iter}.json"
done

date

Batch Write Item / DataSharingPushApiTable

date

TABLE_NAME="DataSharingPushApiTable"
BATCH_ITEM_LIST=25

aws dynamodb scan \
  --table-name data-sharing-push-api-table \
  --query 'Items' \
  --output json > data.json

db_length="$( \
  jq --raw-output \
    'length' < data.json
)"

for min_iter in $(seq 0 "${BATCH_ITEM_LIST}" "${db_length}"); do
  # Get max iter
  max_iter="$(( ${min_iter} + 25 ))"

  if [[ "${max_iter}" -gt "${db_length}" ]]; then
    max_iter="${db_length}"
  fi

  jq --raw-output \
    --arg tableName "${TABLE_NAME}" \
    --argjson min_iter "${min_iter}" \
    --argjson max_iter "${max_iter}" \
    '
      .[$min_iter:$max_iter] |
      {
        "\($tableName)": (
          . | map({"PutRequest": {"Item": .}})
        )
      }
    ' < data.json \
  > "request_items_iter.${min_iter}_${max_iter}.json"

  aws dynamodb batch-write-item \
    --no-cli-pager \
    --request-items "file://request_items_iter.${min_iter}_${max_iter}.json"
done

date

Batch Write Item / DataSharingPackagingLookupTable

date

TABLE_NAME="DataSharingPackagingLookupTable"
BATCH_ITEM_LIST=25

aws dynamodb scan \
  --table-name data-sharing-packaging-lookup-table \
  --query 'Items' \
  --output json > data.json

db_length="$( \
  jq --raw-output \
    'length' < data.json
)"

for min_iter in $(seq 0 "${BATCH_ITEM_LIST}" "${db_length}"); do
  # Get max iter
  max_iter="$(( ${min_iter} + 25 ))"

  if [[ "${max_iter}" -gt "${db_length}" ]]; then
    max_iter="${db_length}"
  fi

  jq --raw-output \
    --arg tableName "${TABLE_NAME}" \
    --argjson min_iter "${min_iter}" \
    --argjson max_iter "${max_iter}" \
    '
      .[$min_iter:$max_iter] |
      {
        "\($tableName)": (
          . | map({"PutRequest": {"Item": .}})
        )
      }
    ' < data.json \
  > "request_items_iter.${min_iter}_${max_iter}.json"

  aws dynamodb batch-write-item \
    --no-cli-pager \
    --request-items "file://request_items_iter.${min_iter}_${max_iter}.json"
done

date

~ 3 K items in 2 minutes!

Prod has around 80 K items in this table so we should be okay

@alexiswl
Copy link
Member Author

alexiswl commented Jul 22, 2025

Data Manager S3 Migration Notes

Click to expand!

CDK Import Steps

Comment out the following lines in the stateful application stack:

  • buildDataSharingS3Bucket

And then deploy with

bash scratch/rsync-deploy.sh \
  cdk-stateful deploy \
    --require-approval never \
    StatefulDataSharingStackPipeline/StatefulDataSharingStackPipeline/OrcaBusBeta/StatefulDataSharingStack

Uncomment the lines in the stateful application stack and run the import command

pnpm cdk-stateful import StatefulDataSharingStackPipeline/StatefulDataSharingStackPipeline/OrcaBusBeta/StatefulDataSharingStack

Run drift detection, everything should be clear

@alexiswl
Copy link
Member Author

alexiswl commented Jul 22, 2025

Fastq Manager DynamoDb Migration Notes

Click to expand

Summary

For DynamoDb, our database names are different, so instead we will need to perform a data migration.

  • fastqManagerDynamoDBTable -> FastqDataTable
  • fastqSetDynamoDBTable -> FastqSetDataTable
  • fastqJobDynamoDBTable -> FastqJobsTable

Can either download + upload with, but this may be quite slow for large databases

aws dynamodb scan \
  --table-name fastqManagerDynamoDBTable \
  --query 'Items' \
  --output json > fastq_data.json

for i in $(jq -rc '.[]' < fastq_data.json); do 
  aws dynamodb put-item \
    --no-cli-pager \
    --table-name FastqDataTable \
    --item "$i"
done

Or export and import from S3.

However, for export / import, table must not yet exist.

Batch Write Item / FastqDataTable

We can however use the batch-write-item command (that allows blocks of 25)

date

TABLE_NAME="FastqDataTable"
BATCH_ITEM_LIST=25

aws dynamodb scan \
  --table-name fastqManagerDynamoDBTable \
  --query 'Items' \
  --output json > fastq_data.json

db_length="$( \
  jq --raw-output \
    'length' < fastq_data.json
)"


for min_iter in $(seq 0 "${BATCH_ITEM_LIST}" "${db_length}"); do
  # Get max iter
  max_iter="$(( ${min_iter} + 25 ))"

  if [[ "${max_iter}" -gt "${db_length}" ]]; then
    max_iter="${db_length}"
  fi

  jq --raw-output \
    --arg tableName "${TABLE_NAME}" \
    --argjson min_iter "${min_iter}" \
    --argjson max_iter "${max_iter}" \
    '
      .[$min_iter:$max_iter] |
      {
        "\($tableName)": (
          . | map({"PutRequest": {"Item": .}})
        )
      }
    ' < fastq_data.json \
  > "request_items_iter.${min_iter}_${max_iter}.json"

  aws dynamodb batch-write-item \
    --no-cli-pager \
    --request-items "file://request_items_iter.${min_iter}_${max_iter}.json"
done

date

I think this is the way forward!!

Let's do the other two fastq tables

Batch Write Item / FastqSetDataTable

date
TABLE_NAME="FastqSetDataTable"
BATCH_ITEM_LIST=25

aws dynamodb scan \
  --table-name fastqSetDynamoDBTable \
  --query 'Items' \
  --no-cli-pager \
  --output json > fastq_set_data.json


db_length="$( \
  jq --raw-output \
    'length' < fastq_set_data.json
)"


for min_iter in $(seq 0 "${BATCH_ITEM_LIST}" "${db_length}"); do
  # Get max iter
  max_iter="$(( ${min_iter} + 25 ))"

  if [[ "${max_iter}" -gt "${db_length}" ]]; then
    max_iter="${db_length}"
  fi

  jq --raw-output \
    --arg tableName "${TABLE_NAME}" \
    --argjson min_iter "${min_iter}" \
    --argjson max_iter "${max_iter}" \
    '
      .[$min_iter:$max_iter] |
      {
        "\($tableName)": (
          . | map({"PutRequest": {"Item": .}})
        )
      }
    ' < fastq_set_data.json \
  > "request_items_iter.${min_iter}_${max_iter}.json"

  aws dynamodb batch-write-item \
    --no-cli-pager \
    --request-items "file://request_items_iter.${min_iter}_${max_iter}.json"
done

date

Batch Write Item / FastqJobsTable

date

TABLE_NAME="FastqJobsTable"
BATCH_ITEM_LIST=25

aws dynamodb scan \
  --table-name fastqJobDynamoDBTable \
  --query 'Items' \
  --no-cli-pager \
  --output json > fastq_job_data.json


db_length="$( \
  jq --raw-output \
    'length' < fastq_job_data.json
)"


for min_iter in $(seq 0 "${BATCH_ITEM_LIST}" "${db_length}"); do
  # Get max iter
  max_iter="$(( ${min_iter} + 25 ))"

  if [[ "${max_iter}" -gt "${db_length}" ]]; then
    max_iter="${db_length}"
  fi

  jq --raw-output \
    --arg tableName "${TABLE_NAME}" \
    --argjson min_iter "${min_iter}" \
    --argjson max_iter "${max_iter}" \
    '
      .[$min_iter:$max_iter] |
      {
        "\($tableName)": (
          . | map({"PutRequest": {"Item": .}})
        )
      }
    ' < fastq_job_data.json \
  > "request_items_iter.${min_iter}_${max_iter}.json"

  aws dynamodb batch-write-item \
    --no-cli-pager \
    --request-items "file://request_items_iter.${min_iter}_${max_iter}.json"
done

date

@alexiswl
Copy link
Member Author

Fastq Manager S3 Migration Notes

Click to expand

CDK Import Steps

Comment out the following lines in the stateful application stack:

  • addNtsmBucket
  • addFastqManagerCacheBucket

And the following lines in S3

  • addTemporaryMetadataDataLifeCycleRuleToBucket

And then deploy with

bash scratch/rsync-deploy.sh \
  cdk-stateful deploy \
    --require-approval never \
    StatefulFastqStack/StatefulFastqStackPipeline/OrcaBusBeta/StatefulFastqStack 

Uncomment the lines in the stateful application stack and run the import command

pnpm cdk-stateful import StatefulFastqStack/StatefulFastqStackPipeline/OrcaBusBeta/StatefulFastqStack

Then redeploy

Run drift detection, everything should be clear, then add in the lifecycle rules and redeploy

Then rerun the drift detection.

@alexiswl
Copy link
Member Author

Complete - merging

@alexiswl alexiswl added this pull request to the merge queue Jul 24, 2025
Merged via the queue into main with commit e5f0b98 Jul 24, 2025
6 checks passed
@alexiswl alexiswl deleted the deprecation/deprecate-primary-data-stacks branch July 24, 2025 02:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants