tap-redshift

tap-redshift is a Singer tap for Amazon Redshift, built with the Meltano Tap SDK for Singer Taps.

Features

Multiple Authentication Methods: Support for username/password and IAM authentication
Redshift Serverless Support: Full support for Redshift Serverless workgroups
S3 UNLOAD Optimization: Uses Redshift's UNLOAD command to S3 for better performance on large datasets
Comprehensive Type Mapping: Complete mapping from Redshift SQL types to JSON Schema types
SSL Support: Secure connections with configurable SSL modes

Installation

Install from GitHub:

pipx install git+https://github.com/tobiascadee/tap-redshift.git@main

Configuration

Authentication Options

Username/Password Authentication

{
  "host": "your-cluster.abc123.us-west-2.redshift.amazonaws.com",
  "port": "5439",
  "database": "your_database",
  "schema": "public",
  "user": "your_username",
  "password": "your_password",
  "use_iam_authentication": false
}

IAM Authentication (Provisioned Cluster)

{
  "host": "your-cluster.abc123.us-west-2.redshift.amazonaws.com",
  "port": 5439,
  "database": "your_database",
  "schema": "public",
  "use_iam_authentication": true,
  "db_user": "your_db_user",
  "cluster_identifier": "your-cluster",
  "aws_region": "us-west-2",
  "aws_access_key_id": "your_access_key",
  "aws_secret_access_key": "your_secret_key"
}

IAM Authentication (Serverless)

{
  "host": "your-workgroup.123456789.us-west-2.redshift-serverless.amazonaws.com",
  "database": "your_database",
  "schema": "public",
  "use_iam_authentication": true,
  "is_serverless": true,
  "serverless_work_group": "your-workgroup",
  "aws_region": "us-west-2"
}

S3 UNLOAD Configuration (Recommended)

For improved performance on large datasets, configure S3 UNLOAD:

{
  "s3_bucket": "your-unload-bucket",
  "s3_key_prefix": "redshift-unload"
}

When enabled, the tap will:

Use Redshift's UNLOAD command to export data to S3
Download the files locally for processing
Clean up S3 files after processing

Required Settings

Setting	Description
`host`	Redshift cluster endpoint
`database`	Database name

Authentication Settings

Setting	Description	Required
`user`	Username for database auth	If not using IAM
`password`	Password for database auth	If not using IAM
`use_iam_authentication`	Use IAM instead of username/password	No (default: false)
`db_user`	Database user for IAM auth	If using IAM
`cluster_identifier`	Cluster identifier for IAM auth	If using IAM with provisioned cluster
`serverless_work_group`	Workgroup name	If using Serverless

AWS Settings

Setting	Description	Required
`aws_region`	AWS region	If using IAM or S3
`aws_access_key_id`	AWS access key	If not using IAM roles/profiles
`aws_secret_access_key`	AWS secret key	If not using IAM roles/profiles
`aws_session_token`	AWS session token	For temporary credentials
`aws_profile`	AWS profile name	If using AWS profiles

Optional Settings

Setting	Description	Default
`port`	Redshift port	5439
`schema`	Schema to extract from	public
`ssl`	Use SSL connection	true
`sslmode`	SSL mode	require
`is_serverless`	Using Serverless	false
`s3_key_prefix`	S3 key prefix for UNLOAD	redshift-unload
`dates_as_string`	Convert dates to strings vs date formats	true
`super_as_object`	Treat SUPER columns as objects vs strings	false

Supported SQL Types

The tap includes a dedicated RedshiftSQLToJSONSchema converter class that extends the Singer SDK's base converter with Redshift-specific functionality:

Redshift Type	JSON Schema Type	Notes
Standard SQL Types		Handled by base SQLToJSONSchema class
SMALLINT, INTEGER, BIGINT	integer	With appropriate min/max values
DECIMAL, NUMERIC	number	With precision handling via registered method
REAL, FLOAT, DOUBLE PRECISION	number
BOOLEAN	boolean
CHAR, VARCHAR, TEXT	string
DATE, TIMESTAMP, TIME	string	Configurable format via registered methods
Redshift-Specific Types		Custom handled by RedshiftSQLToJSONSchema
SUPER	string or object	Configurable via super_as_object
GEOMETRY, GEOGRAPHY	string	Spatial data
HLLSKETCH	string	HyperLogLog sketches

Architecture

The converter uses the Singer SDK standard pattern:

Class attribute: RedshiftConnector.sql_to_jsonschema_converter = RedshiftSQLToJSONSchema
Base class handles standard SQLAlchemy types (integers, strings, booleans, etc.)
Registered methods override date/time types for configurable format handling
String-based dispatch handles Redshift-specific types from database introspection
Automatic instantiation: The SDK automatically creates the converter instance with tap config

Schema Conversion Options

dates_as_string (default: true): When true, date/time columns are converted to plain strings. When false, they include proper JSON Schema format constraints (date, date-time, time).
super_as_object (default: false): When true, SUPER columns are treated as flexible objects that can contain JSON-like data. When false, they are treated as strings.

Example configuration for structured data handling:

{
  "dates_as_string": false,
  "super_as_object": true
}

Usage

Direct Execution

# Test connection
tap-redshift --config config.json --discover > catalog.json

# Extract data
tap-redshift --config config.json --catalog catalog.json

With Meltano

Add to your meltano.yml:

extractors:
- name: tap-redshift
  namespace: tap_redshift
  pip_url: git+https://github.com/tobiascadee/tap-redshift.git@main
  settings:
  - name: host
  - name: database
  - name: user
  - name: password
    kind: password
    sensitive: true
  # ... other settings

Then run:

meltano install extractor tap-redshift
meltano invoke tap-redshift --discover > catalog.json
meltano run tap-redshift target-jsonl

Performance Recommendations

IAM Authentication: Use IAM authentication for better security
Appropriate Batch Size: Adjust batch_size based on your data size
Schema Selection: Specify the exact schema to avoid unnecessary discovery

Development

Setup Development Environment

git clone https://github.com/tobiascadee/tap-redshift.git
cd tap-redshift
uv sync

Run Tests

uv run pytest

Test CLI

uv run tap-redshift --help

Dependencies

singer-sdk: Meltano Singer SDK
redshift-connector: Official AWS Redshift connector
boto3: AWS SDK for S3 operations
sqlalchemy: SQL toolkit

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
.secrets		.secrets
.vscode		.vscode
output		output
plugins/loaders		plugins/loaders
tap_redshift		tap_redshift
tests		tests
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
meltano.yml		meltano.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tap-redshift

Features

Installation

Configuration

Authentication Options

Username/Password Authentication

IAM Authentication (Provisioned Cluster)

IAM Authentication (Serverless)

S3 UNLOAD Configuration (Recommended)

Required Settings

Authentication Settings

AWS Settings

Optional Settings

Supported SQL Types

Architecture

Schema Conversion Options

Usage

Direct Execution

With Meltano

Performance Recommendations

Development

Setup Development Environment

Run Tests

Test CLI

Dependencies

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

License

TicketSwap/tap-redshift

Folders and files

Latest commit

History

Repository files navigation

tap-redshift

Features

Installation

Configuration

Authentication Options

Username/Password Authentication

IAM Authentication (Provisioned Cluster)

IAM Authentication (Serverless)

S3 UNLOAD Configuration (Recommended)

Required Settings

Authentication Settings

AWS Settings

Optional Settings

Supported SQL Types

Architecture

Schema Conversion Options

Usage

Direct Execution

With Meltano

Performance Recommendations

Development

Setup Development Environment

Run Tests

Test CLI

Dependencies

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

Packages