Skip to content

feat: add Hyperliquid S3 archive backfill pipeline (blocked: data not public)#830

Open
miohtama wants to merge 10 commits intomasterfrom
feat/hyperliquid-s3-backfill
Open

feat: add Hyperliquid S3 archive backfill pipeline (blocked: data not public)#830
miohtama wants to merge 10 commits intomasterfrom
feat/hyperliquid-s3-backfill

Conversation

@miohtama
Copy link
Contributor

This pipeline does not work. The account_values/ prefix does not exist in the
public hyperliquid-archive S3 bucket. It lives in a separate private bucket used
internally by Hyperliquid for stats.hyperliquid.xyz.
The public bucket only contains asset_ctxs/ and market_data/ prefixes, as confirmed
by the official documentation.
Discovered 2026-03-12.

Summary

  • Two-stage pipeline to backfill sparse Hyperliquid vault history from the s3://hyperliquid-archive/account_values/ daily snapshots
  • Stage 1 (extract-s3-vault-data.py): Parse LZ4-compressed CSV files, filter vault rows (is_vault=true), store in staging DuckDB. Resumable — skips already-processed dates
  • Stage 2 (backfill-vault-data.py): Insert missing dates into main daily-metrics.duckdb, recompute share prices. Only fills gaps, never overwrites API data. Rows tagged with data_source='s3_backfill'
  • Adds data_source column to vault_daily_prices table for provenance tracking
  • AWS setup and production migration guide in README-hyperliquid-backfill.md

🤖 Generated with Claude Code

miohtama and others added 10 commits March 11, 2026 22:57
…ta gaps

Two-stage pipeline to backfill sparse vault history from the
hyperliquid-archive S3 bucket (daily account_values snapshots):

Stage 1 (extract): Parse LZ4-compressed CSV files, filter vault rows,
store in staging DuckDB. Resumable — skips already-processed dates.

Stage 2 (apply): Insert missing dates into main daily-metrics DuckDB,
recompute share prices. Only fills gaps, never overwrites API data.
Each row tagged with data_source='s3_backfill' for provenance.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CI does not install the hyperliquid_backfill extra, so lz4 is missing.
Use pytest.importorskip to gracefully skip the test module.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…xtract script

- Add configure_aws_credentials() helper that maps AWS_API_KEY and
  AWS_SECRET_KEY to the standard AWS env vars for boto3
- Add download_s3_files() for direct S3 download with progress bar
- Extract script now downloads from S3 when AWS_API_KEY is set,
  removing the need for a separate aws s3 sync step
- Falls back to S3_DATA_DIR for pre-downloaded files
- Updated README AWS setup section to use the new env vars

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop custom AWS_API_KEY/AWS_SECRET_KEY aliases in favour of the
standard AWS environment variable names that boto3 and the AWS CLI
already recognise natively.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Add `scripts/hyperliquid/refresh-mfa-session.sh` — helper script to obtain AWS STS session credentials for accounts with MFA enforcement, auto-detects MFA device ARN and updates `~/.aws/credentials` automatically
- Update `README-hyperliquid-backfill.md` with MFA authentication section (section 4a) and expand all credential commands to show both without-MFA and with-MFA variants
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants