ComfyUI on SageMaker Processing Job

This project deploys AWS infrastructure for running ComfyUI workflows on SageMaker Processing Jobs. The solution provides a scalable, cost-effective way to run ComfyUI workflows in the cloud with automatic resource management.

Note: This runs ComfyUI in headless batch mode — no web UI is exposed. Workflows are submitted programmatically and results are written to S3.

Architecture Overview

The project consists of three main CDK stacks:

SecurityStack - IAM roles and security configurations
DataStack - S3 bucket for output storage
ComfyUiStack - SageMaker Processing Job with Lambda trigger

Key Components

SageMaker Processing Job: Runs ComfyUI on ml.g5.xlarge instances with GPU acceleration
Lambda Function: Manual trigger for processing job
S3 Bucket: Output storage for generated images
Docker Container: Custom CUDA-enabled container (see processing_job/ directory)
CDK Infrastructure: Automated deployment and resource management

Prerequisites

Python 3.13+
AWS CLI configured
Docker (for building container images)
AWS CDK v2
CDK bootstrapped in your target AWS account and region

Apple Silicon users: You may see a Docker platform mismatch warning during cdk deploy (building linux/amd64 on ARM). This is expected and harmless — the image builds correctly for the target SageMaker instances.

Setup

1. Environment Configuration

Create your environment file:

cp .env.example .env

Edit .env with your AWS account details:

AWS_ACCOUNT_ID=your-account-id
REGION=us-east-1

2. Install Dependencies

This project uses uv for dependency management:

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create environment
uv venv --python 3.13

# Activate Environment
source .venv/bin/activate

# Install dependencies
uv sync

Alternatively, using pip:

pip install -r requirements.txt

3. CDK Bootstrap

Bootstrap CDK in your target AWS account and region (required for CDK deployments):

cdk bootstrap aws://YOUR-ACCOUNT-ID/YOUR-REGION

For example:

cdk bootstrap aws://123456789012/us-east-1

4. Service Quota Request

Request a service quota increase for ml.g5.xlarge in SageMaker processing jobs to at least 6 through the AWS console.

Configuration

The processing job configuration is defined in config/config.yaml:

Instance Type: ml.g5.xlarge (GPU-enabled)
Instance Count: 6 (for parallel image generation)
Volume Size: 125GB
Container: Custom ComfyUI Docker image

Parallelization Strategy

The default configuration uses 6 instances to generate images in parallel. Each instance runs an independent ComfyUI batch — with ContainerArguments: ["50"], each instance generates 50 images, producing 300 total images in ~18 minutes.

If you have a lower GPU quota (e.g., only 1 ml.g5.xlarge instance available), update config/config.yaml:

InstanceCount: 1
InstanceType: ml.g5.xlarge
VolumeSizeInGB: 125
ContainerEntrypoint: ["/bin/bash", "./run_job.sh"]
ContainerArguments: ["50"]

This will generate 50 images on a single instance in ~9 minutes. Adjust ContainerArguments to control how many images each instance produces.

Deployment

Deploy all stacks:

cdk deploy --all --require-approval never

Deploy individual stacks:

cdk deploy SecurityStack
cdk deploy DataStack
cdk deploy ComfyUiStack

Triggering the Processing Job

Once deployed, the stack outputs will show the Lambda function name and output S3 bucket. You can trigger the processing job by invoking the Lambda function (shown in the TriggerLambdaFunctionName output).

Processing Job Outputs

Once the processing job completes, generated images are stored in the S3 bucket shown in the OutputBucketName stack output.

Solution Architecture

Project Structure

├── app.py                          # Main CDK application
├── assets/                         # README assets
├── config/
│   ├── config.py                   # Configuration loader
│   └── config.yaml                 # Processing job configuration
├── infrastructure/
│   ├── data.py                     # S3 bucket and data resources
│   ├── security.py                 # IAM roles and policies
│   └── comfyui.py                  # Main ComfyUI stack
├── project_constructs/
│   ├── processing_job/
│   │   ├── main.py                 # Processing job construct
│   │   └── model.py                # Data models for processing job
│   ├── lambda_function.py          # Lambda construct
│   └── s3.py                       # S3 bucket construct
├── processing_job/                 # Container and workflow scripts (see processing_job/README.md)
│   ├── Dockerfile                  # CUDA-enabled container definition
│   ├── run_job.sh                  # Main processing script
│   ├── run_workflow.py             # Individual workflow runner
│   ├── is_queue_empty.py           # Queue monitoring script
│   ├── image_z_image_turbo.json    # Example ComfyUI workflow
│   ├── prompts.txt                 # Example prompt file
│   └── README.md                   # Detailed container and workflow documentation
├── lambdas/
│   └── trigger_processing_job/     # Lambda function code
└── requirements.txt                # Python dependencies

Monitoring and Troubleshooting

CloudWatch Logs: Check SageMaker Processing Job logs
Lambda Logs: Monitor function execution in CloudWatch
S3 Monitoring: Review bucket contents for output files
CDK Nag: Security and best practice compliance checking

Common Issues

Container build failures: Ensure Docker is running and ECR authentication is complete
Processing job failures: Check CloudWatch logs for detailed error messages
Permission errors: Verify IAM roles have necessary permissions

Cost Considerations

ml.g5.xlarge instances are GPU-enabled and cost ~$1.41/hour
Processing jobs are billed per second with a 1-minute minimum
Monitor usage through AWS Cost Explorer

Security Features

IAM Roles: Least privilege access principles
VPC Configuration: Network isolation for processing jobs
CDK Nag Integration: Automated security compliance checking
Encrypted Storage: S3 buckets with encryption at rest

Development

Customizing Infrastructure

Edit config/config.yaml to change:

Instance type (ensure GPU compatibility for ComfyUI)
Volume size
Instance count for parallel processing

Container Development

For container customization, workflow development, and detailed usage instructions, see processing_job/README.md.

Clean Up

To avoid unnecessary costs, destroy all resources:

cdk destroy --all

This will remove all AWS resources created by the CDK stacks.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
config		config
infrastructure		infrastructure
lambdas/trigger_processing_job		lambdas/trigger_processing_job
processing_job		processing_job
project_constructs		project_constructs
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cdk.json		cdk.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
source.bat		source.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComfyUI on SageMaker Processing Job

Architecture Overview

Key Components

Prerequisites

Setup

1. Environment Configuration

2. Install Dependencies

3. CDK Bootstrap

4. Service Quota Request

Configuration

Parallelization Strategy

Deployment

Deploy all stacks:

Deploy individual stacks:

Triggering the Processing Job

Processing Job Outputs

Solution Architecture

Project Structure

Monitoring and Troubleshooting

Common Issues

Cost Considerations

Security Features

Development

Customizing Infrastructure

Container Development

Clean Up

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ComfyUI on SageMaker Processing Job

Architecture Overview

Key Components

Prerequisites

Setup

1. Environment Configuration

2. Install Dependencies

3. CDK Bootstrap

4. Service Quota Request

Configuration

Parallelization Strategy

Deployment

Deploy all stacks:

Deploy individual stacks:

Triggering the Processing Job

Processing Job Outputs

Solution Architecture

Project Structure

Monitoring and Troubleshooting

Common Issues

Cost Considerations

Security Features

Development

Customizing Infrastructure

Container Development

Clean Up

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages