This project deploys AWS infrastructure for running ComfyUI workflows on SageMaker Processing Jobs. The solution provides a scalable, cost-effective way to run ComfyUI workflows in the cloud with automatic resource management.
Note: This runs ComfyUI in headless batch mode — no web UI is exposed. Workflows are submitted programmatically and results are written to S3.
The project consists of three main CDK stacks:
- SecurityStack - IAM roles and security configurations
- DataStack - S3 bucket for output storage
- ComfyUiStack - SageMaker Processing Job with Lambda trigger
- SageMaker Processing Job: Runs ComfyUI on
ml.g5.xlargeinstances with GPU acceleration - Lambda Function: Manual trigger for processing job
- S3 Bucket: Output storage for generated images
- Docker Container: Custom CUDA-enabled container (see
processing_job/directory) - CDK Infrastructure: Automated deployment and resource management
- Python 3.13+
- AWS CLI configured
- Docker (for building container images)
- AWS CDK v2
- CDK bootstrapped in your target AWS account and region
Apple Silicon users: You may see a Docker platform mismatch warning during
cdk deploy(building linux/amd64 on ARM). This is expected and harmless — the image builds correctly for the target SageMaker instances.
Create your environment file:
cp .env.example .envEdit .env with your AWS account details:
AWS_ACCOUNT_ID=your-account-id
REGION=us-east-1
This project uses uv for dependency management:
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create environment
uv venv --python 3.13
# Activate Environment
source .venv/bin/activate
# Install dependencies
uv syncAlternatively, using pip:
pip install -r requirements.txtBootstrap CDK in your target AWS account and region (required for CDK deployments):
cdk bootstrap aws://YOUR-ACCOUNT-ID/YOUR-REGIONFor example:
cdk bootstrap aws://123456789012/us-east-1Request a service quota increase for ml.g5.xlarge in SageMaker processing jobs to at least 6 through the AWS console.
The processing job configuration is defined in config/config.yaml:
- Instance Type:
ml.g5.xlarge(GPU-enabled) - Instance Count: 6 (for parallel image generation)
- Volume Size: 125GB
- Container: Custom ComfyUI Docker image
The default configuration uses 6 instances to generate images in parallel. Each instance runs an independent ComfyUI batch — with ContainerArguments: ["50"], each instance generates 50 images, producing 300 total images in ~18 minutes.
If you have a lower GPU quota (e.g., only 1 ml.g5.xlarge instance available), update config/config.yaml:
InstanceCount: 1
InstanceType: ml.g5.xlarge
VolumeSizeInGB: 125
ContainerEntrypoint: ["/bin/bash", "./run_job.sh"]
ContainerArguments: ["50"]This will generate 50 images on a single instance in ~9 minutes. Adjust ContainerArguments to control how many images each instance produces.
cdk deploy --all --require-approval nevercdk deploy SecurityStack
cdk deploy DataStack
cdk deploy ComfyUiStackOnce deployed, the stack outputs will show the Lambda function name and output S3 bucket. You can trigger the processing job by invoking the Lambda function (shown in the TriggerLambdaFunctionName output).
Once the processing job completes, generated images are stored in the S3 bucket shown in the OutputBucketName stack output.
├── app.py # Main CDK application
├── assets/ # README assets
├── config/
│ ├── config.py # Configuration loader
│ └── config.yaml # Processing job configuration
├── infrastructure/
│ ├── data.py # S3 bucket and data resources
│ ├── security.py # IAM roles and policies
│ └── comfyui.py # Main ComfyUI stack
├── project_constructs/
│ ├── processing_job/
│ │ ├── main.py # Processing job construct
│ │ └── model.py # Data models for processing job
│ ├── lambda_function.py # Lambda construct
│ └── s3.py # S3 bucket construct
├── processing_job/ # Container and workflow scripts (see processing_job/README.md)
│ ├── Dockerfile # CUDA-enabled container definition
│ ├── run_job.sh # Main processing script
│ ├── run_workflow.py # Individual workflow runner
│ ├── is_queue_empty.py # Queue monitoring script
│ ├── image_z_image_turbo.json # Example ComfyUI workflow
│ ├── prompts.txt # Example prompt file
│ └── README.md # Detailed container and workflow documentation
├── lambdas/
│ └── trigger_processing_job/ # Lambda function code
└── requirements.txt # Python dependencies
- CloudWatch Logs: Check SageMaker Processing Job logs
- Lambda Logs: Monitor function execution in CloudWatch
- S3 Monitoring: Review bucket contents for output files
- CDK Nag: Security and best practice compliance checking
- Container build failures: Ensure Docker is running and ECR authentication is complete
- Processing job failures: Check CloudWatch logs for detailed error messages
- Permission errors: Verify IAM roles have necessary permissions
ml.g5.xlargeinstances are GPU-enabled and cost ~$1.41/hour- Processing jobs are billed per second with a 1-minute minimum
- Monitor usage through AWS Cost Explorer
- IAM Roles: Least privilege access principles
- VPC Configuration: Network isolation for processing jobs
- CDK Nag Integration: Automated security compliance checking
- Encrypted Storage: S3 buckets with encryption at rest
Edit config/config.yaml to change:
- Instance type (ensure GPU compatibility for ComfyUI)
- Volume size
- Instance count for parallel processing
For container customization, workflow development, and detailed usage instructions, see processing_job/README.md.
To avoid unnecessary costs, destroy all resources:
cdk destroy --allThis will remove all AWS resources created by the CDK stacks.
