feat: clickhouse schemas v2 #3914

chronark · 2025-09-04T14:02:04Z

This PR reorganises our clickhouse table schemas and adds credit as well as latency tracking.

In the past we had 2 ways of managing schemas which is never a good idea.
Going forward the way to make changes is to change the table definition in go/clickhouse/schema and then use atlas to generate a migration diff, which can be applied to clickhouse.

I also changed the materialization, previously we materialized all views from the raw table, now we’re cascading changes from raw > minute > hour > day > month cause it’s less CPU intensive, because it needs to process way less data.

I added some tests to ensure data propagates through the materialized views correctly, but please look at the table definitions carefully and try to spot what I missed.

I decided to put all of these tables in the default database, it was just a mess before and to clickhouse these are just namespaces, noting that would impact performance.

This does not change insertion or reads at all yet, it just creates new tables.
there are temporary MVs in place to automatically populate the new tables though and I am running a migration script to backfill all data.

In followup PRs we can switch over reads and writes to these new tables.

Summary by CodeRabbit

New Features
- Introduces ClickHouse v2 analytics with per-minute/hour/day/month aggregations, latency percentiles, and spent-credits tracking for verifications, rate limits, and API requests.
- Local ClickHouse image now auto-initializes schemas on first start; docker-compose builds it locally.
- Adds a simple migration tool to backfill v1 data into v2.
Chores
- Removed goose-based migration steps from CI and local workflows; migrator service removed.
Documentation
- Expanded ClickHouse and Go package docs; editorial cleanup and guidance updates.

vercel · 2025-09-04T14:02:08Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
dashboard	Ready	Preview	Comment	Sep 11, 2025 7:19am
engineering	Ready	Preview	Comment	Sep 11, 2025 7:19am

changeset-bot · 2025-09-04T14:02:08Z

⚠️ No Changeset found

Latest commit: c59ab80

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2025-09-04T14:02:12Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Removes Goose-based migrations and legacy SQL under go/pkg/clickhouse/schema/databases/*. Introduces v2 ClickHouse schema/tables, materialized views, and a Dockerized initialization flow. Adds array utilities (Fill, Map, Reduce, Random) with tests. Updates CI/workflows, compose, Makefiles, and test harness. Adds migration tooling and comprehensive ClickHouse integration tests.

Changes

Cohort / File(s)	Summary
CI workflow cleanup `.github/workflows/job_test_api_local.yaml`, `.github/workflows/job_test_unit.yaml`	Removed steps installing/running Goose; retained other steps.
Makefiles `Makefile`, `go/Makefile`	Removed ClickHouse Goose targets and migrator retries from up target.
Deployment: ClickHouse image and init `deployment/Dockerfile.clickhouse`, `deployment/init-clickhouse.sh`, `deployment/docker-compose.yaml`	New CH Dockerfile copies SQL and init script; compose now builds CH locally; removed clickhouse_migrator service.
Internal ClickHouse docker/migrations removal `internal/clickhouse/Dockerfile`, `go/pkg/clickhouse/schema/embed.go`	Deleted Goose migrator Dockerfile and embedded migrations FS.
ClickHouse schema: v2 foundation (consolidated) `go/pkg/clickhouse/migrations/20250911070454.sql`, `go/pkg/clickhouse/schema/000_legacy.sql`, `.../011_ratelimits_last_used_v2.sql`, `.../012_...` to `.../019_...`	Added v2 raw and aggregated tables, MVs, and temp sync MVs; consolidated legacy v1 schema in 000_legacy.sql.
ClickHouse schema: new verifications v2 chain `go/pkg/clickhouse/schema/001_...v2.sql` to `005_...v2.sql`	Added key_verifications_raw_v2 and per-minute/hour/day/month v2 tables + MVs.
ClickHouse schema: new ratelimits v2 chain `go/pkg/clickhouse/schema/006_...v2.sql` to `010_...v2.sql`	Added ratelimits_raw_v2 and per-minute/hour/day/month v2 tables + MVs.
Legacy per-db schema removals `go/pkg/clickhouse/schema/databases/*`	Removed prior databases/* DDL and MVs for verifications, ratelimits, metrics, billing, business.
ClickHouse schema types `go/pkg/clickhouse/schema/types.go`	Added Go structs for v2 raw and aggregated records (with ch/json tags).
ClickHouse README `go/pkg/clickhouse/README.md`	Added Atlas-based migration instructions.
ClickHouse tests (Go) `go/pkg/clickhouse/key_verifications_test.go`, `go/pkg/clickhouse/ratelimits_test.go`, `go/pkg/clickhouse/testutil_test.go`	Added large-scale integration tests and statistical helpers validating raw → aggregates correctness.
ClickHouse tests (TS) `internal/clickhouse/src/testutil.ts`, `internal/clickhouse/package.json`	Switched test container image, added wait strategy/delay, removed Goose step; tests build CH image before running.
Local tooling updates `tools/local/src/main.ts`	Removed clickhouse_migrator from startup lists.
Migration tooling `tools/migrate/ch_migrate_v2_simple.ts`, `tools/migrate/package.json`	Added chunked v1→v2 backfill script using @clickhouse/client-web; added dependency.
Go array utilities `go/pkg/array/doc.go`, `go/pkg/array/fill.go`, `go/pkg/array/fill_test.go`, `go/pkg/array/map.go`, `go/pkg/array/map_test.go`, `go/pkg/array/reduce.go`, `go/pkg/array/reduce_test.go`, `go/pkg/array/random.go`, `go/pkg/array/random_test.go`	New generic functions Fill, Map, Reduce, Random with extensive tests/benchmarks and docs.
Docs `go/GO_DOCUMENTATION_GUIDELINES.md`	Editorial/doc guideline updates and cleanup.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App as Producer Services
  participant CH as ClickHouse
  participant Raw as raw_v2 tables
  participant MVmin as per_minute_v2 MV
  participant MVhr as per_hour_v2 MV
  participant MVday as per_day_v2 MV
  participant MVmon as per_month_v2 MV
  participant Biz as active_workspaces_v2 / billable_*_v2

  App->>CH: INSERT events (api_requests_raw_v2 / key_verifications_raw_v2 / ratelimits_raw_v2)
  CH-->>Raw: Store raw rows
  Note over Raw, MVmin: New v2 MVs
  Raw-->>MVmin: Aggregate per minute (states)
  MVmin-->>MVhr: Aggregate per hour (merge states)
  MVhr-->>MVday: Aggregate per day (merge)
  MVday-->>MVmon: Aggregate per month (merge)
  MVmon-->>Biz: Populate active_workspaces_v2 and billable_*_v2 via MVs

sequenceDiagram
  autonumber
  participant Src as v1 sources (raw_*_v1)
  participant TempMV as temp_sync_*_v1_to_v2 (MVs)
  participant CH as ClickHouse
  participant Migr as ch_migrate_v2_simple.ts

  Src-->>TempMV: New inserts mirrored to v2 raw
  Migr->>CH: Loop over 6h windows (backfill)
  Migr->>CH: INSERT INTO *_raw_v2 SELECT ... FROM *_raw_v1 WHERE time in window
  CH-->>CH: MVs process to per-minute/hour/day/month v2

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

feat: analytics api #2740 — Earlier addition of Goose CH migration steps and test harness; directly connected to this PR’s removal and new migration approach.
ci: remove outdated steps and flows #3709 — CI workflow changes around removing Goose installs/migrations; overlaps with the same workflow files.
fix: do not send 4x as many rows to clickhouse #3161 — Prior work on ClickHouse migration embedding and test container setup; relates to removal of embed.go and test harness updates here.

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The current description provides a useful narrative about reorganizing ClickHouse schemas, migration/backfill intent, and materialization changes, but it does not adhere to the repository's required PR template. It is missing a linked issue, the filled "Type of change" checklist, a concrete "How should this be tested?" section with exact commands and expected results, and the required checklist items (e.g., build, fmt, tests) are not marked. Because the template explicitly requires those details to evaluate and test schema/migration impacts, the description is incomplete for merge readiness. Test and migration notes are mentioned qualitatively but lack the reproducible verification steps and CI/test evidence the template expects.	Please update the PR to follow the provided template: add or link the issue number, select the appropriate "Type of change" checkbox, and complete the "How should this be tested?" section with step-by-step commands (e.g., build commands, docker-compose/migration/backfill invocation, env vars, and exact verification queries or expected CI outcomes). Tick the required checklist items or explain why any cannot be completed, and include notes on migration impact and rollback steps (which tables/MVs are new/removed and compatibility concerns). Finally, attach or reference relevant test outputs or CI runs that demonstrate the materialized view tests and the backfill succeeded.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "feat: clickhouse schemas v2" succinctly identifies the primary change — introduction of ClickHouse schema v2 — and follows conventional commit style with the "feat:" prefix. It is concise, avoids noisy file lists or emojis, and accurately reflects the main intent of the changeset. A reviewer scanning PR history will understand the high-level purpose from the title alone.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch clickhouse-schemas

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-09-05T12:01:42Z

Thank you for following the naming conventions for pull request titles! 🙏

deployment/Dockerfile.clickhouse

go/pkg/clickhouse/migrations/20250903085516_init.sql

chronark · 2025-09-05T12:13:10Z

@perkinsjr @Flo4604 pls :)

coderabbitai

Actionable comments posted: 98

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

deployment/init-clickhouse.sh (1)

1-19: Harden init-clickhouse.sh

Enable strict mode (set -euo pipefail) and shopt -s nullglob to avoid glob edge cases

Wait for ClickHouse readiness (until SELECT 1 succeeds) before applying schemas

Gather and version-sort .sql files deterministically via find … -sort -zV

Use ${CLICKHOUSE_ADMIN_USER:-default} and ${CLICKHOUSE_ADMIN_PASSWORD:-} to avoid leaking unset envs

go/pkg/clickhouse/schema/000_oldschema.sql (1)

720-823: Duplication/conflict risk between schema bootstrap and migrations

This file recreates DBs/tables also defined in migrations (e.g., billing/verifications/ratelimits). Ensure only one path runs in CI/infra to avoid DDL conflicts.

Would you like a follow-up PR to remove the legacy bootstrap or gate it behind an environment flag?

Also applies to: 1-2
tools/migrate/package.json (1)
10-21: Runtime dependency should not be a devDependency

The migration script imports the ClickHouse client at runtime via tsx. Move it to dependencies to avoid failures in prod installs with NODE_ENV=production.
   "dependencies": {
+    "@clickhouse/client": "^1.4.0",
@@
-  "devDependencies": {
-    "@clickhouse/client-web": "^1.11.1",
+  "devDependencies": {
If you indeed need the web client, put that under dependencies instead and remove the Node client.

Also applies to: 22-29
deployment/docker-compose.yaml (2)
126-142: Healthcheck only proves server liveness, not schema readiness.

Consider validating a known table or schema version to avoid race with tests depending on v2 objects.

Example:
-      test:
-        [
-          "CMD",
-          "clickhouse-client",
-          "--host",
-          "localhost",
-          "--user",
-          "default",
-          "--password",
-          "password",
-          "--query",
-          "SELECT 1",
-        ]
+      test: ["CMD-SHELL", "clickhouse-client --user default --password password --query \"SELECT count() FROM system.tables WHERE database = 'analytics' AND name = 'api_requests_v2'\" | grep -q '1'"]
146-155: MinIO console port mapping mismatch.

MINIO_CONSOLE_PORT_NUMBER=3903 but ports map 2903. Console likely unreachable.
-      - 2903:2903
+      - 3903:3903
go/pkg/clickhouse/ratelimits_test.go (1)

1-316: Define missing min/max helpers: The test calls min and max but no such functions are defined in the repo—add implementations (e.g. func min(a, b int) int { … }, func max(a, b int) int { … }) or import them from a utility package.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f4183e2 and a2bcdc8.

⛔ Files ignored due to path filters (3)

go/go.sum is excluded by !**/*.sum
go/pkg/clickhouse/migrations/atlas.sum is excluded by !**/*.sum
pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (104)

.github/workflows/job_test_api_local.yaml (0 hunks)
.github/workflows/job_test_unit.yaml (0 hunks)
Makefile (0 hunks)
deployment/Dockerfile.clickhouse (1 hunks)
deployment/docker-compose.yaml (2 hunks)
deployment/init-clickhouse.sh (1 hunks)
go/GO_DOCUMENTATION_GUIDELINES.md (11 hunks)
go/Makefile (1 hunks)
go/go.mod (1 hunks)
go/pkg/array/doc.go (1 hunks)
go/pkg/array/fill.go (1 hunks)
go/pkg/array/fill_test.go (1 hunks)
go/pkg/array/map.go (1 hunks)
go/pkg/array/map_test.go (1 hunks)
go/pkg/array/random.go (1 hunks)
go/pkg/array/random_test.go (1 hunks)
go/pkg/array/reduce.go (1 hunks)
go/pkg/array/reduce_test.go (1 hunks)
go/pkg/clickhouse/README.md (1 hunks)
go/pkg/clickhouse/key_verifications_test.go (1 hunks)
go/pkg/clickhouse/migrations/20250903085516_init.sql (1 hunks)
go/pkg/clickhouse/migrations/20250905101812.sql (1 hunks)
go/pkg/clickhouse/ratelimits_test.go (1 hunks)
go/pkg/clickhouse/schema/000_oldschema.sql (1 hunks)
go/pkg/clickhouse/schema/001_key_verifications_raw_v2.sql (1 hunks)
go/pkg/clickhouse/schema/002_key_verifications_per_minute_v2.sql (1 hunks)
go/pkg/clickhouse/schema/003_key_verifications_per_hour_v2.sql (1 hunks)
go/pkg/clickhouse/schema/004_key_verifications_per_day_v2.sql (1 hunks)
go/pkg/clickhouse/schema/005_key_verifications_per_month_v2.sql (1 hunks)
go/pkg/clickhouse/schema/006_ratelimits_raw_v2.sql (1 hunks)
go/pkg/clickhouse/schema/007_ratelimits_per_minute_v2.sql (1 hunks)
go/pkg/clickhouse/schema/008_ratelimits_per_hour_v2.sql (1 hunks)
go/pkg/clickhouse/schema/009_ratelimits_per_day_v2.sql (1 hunks)
go/pkg/clickhouse/schema/010_ratelimits_per_month_v2.sql (1 hunks)
go/pkg/clickhouse/schema/011_api_requests_raw_v2.sql (1 hunks)
go/pkg/clickhouse/schema/012_api_requests_per_minute_v2.sql (1 hunks)
go/pkg/clickhouse/schema/013_api_requests_per_hour_v2.sql (1 hunks)
go/pkg/clickhouse/schema/014_api_requests_per_day_v2.sql (1 hunks)
go/pkg/clickhouse/schema/015_active_workspaces_per_month_v2.sql (1 hunks)
go/pkg/clickhouse/schema/016_billable_ratelimits_per_month_v2.sql (1 hunks)
go/pkg/clickhouse/schema/017_billable_verifications_per_month_v2.sql (1 hunks)
go/pkg/clickhouse/schema/databases/000_README.md (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/001_database.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/003_key_verifications_per_hour_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/004_key_verifications_per_hour_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/005_key_verifications_per_day_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/006_key_verifications_per_day_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/007_key_verifications_per_month_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/008_key_verifications_per_month_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/009_key_verifications_per_hour_v2.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/010_key_verifications_per_hour_mv_v2.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/011_key_verifications_per_day_v2.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/012_key_verifications_per_day_mv_v2.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/013_key_verifications_per_month_v2.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/014_key_verifications_per_month_mv_v2.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/015_key_verifications_per_hour_v3.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/016_key_verifications_per_hour_mv_v3.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/017_key_verifications_per_day_v3.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/018_key_verifications_per_day_mv_v3.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/019_key_verifications_per_month_v3.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/020_key_verifications_per_month_mv_v3.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/021_key_verifications_per_minute_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/001_verifications/022_key_verifications_per_minute_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/002_ratelimits/000_database.sql (0 hunks)
go/pkg/clickhouse/schema/databases/002_ratelimits/001_raw_ratelimits_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/002_ratelimits/002_ratelimits_per_minute_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/002_ratelimits/003_ratelimits_per_minute_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/002_ratelimits/004_ratelimits_per_hour_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/002_ratelimits/005_ratelimits_per_hour_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/002_ratelimits/007_ratelimits_per_day_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/002_ratelimits/008_ratelimits_per_month_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/002_ratelimits/009_ratelimits_per_month_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/002_ratelimits/010_ratelimits_last_used_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/002_ratelimits/011_ratelimits_last_used_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/003_metrics/001_database.sql (0 hunks)
go/pkg/clickhouse/schema/databases/003_metrics/002_raw_api_requests_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/003_metrics/003_api_requests_per_hour_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/003_metrics/004_api_requests_per_hour_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/003_metrics/005_api_requests_per_minute_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/003_metrics/006_api_requests_per_minute_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/003_metrics/007_api_requests_per_day_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/003_metrics/008_api_requests_per_day_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/004_billing/001_database.sql (0 hunks)
go/pkg/clickhouse/schema/databases/004_billing/002_billable_verifications_per_month_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/004_billing/003_billable_verifications_per_month_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/004_billing/004_billable_verifications_v2.sql (0 hunks)
go/pkg/clickhouse/schema/databases/004_billing/005_billalble_verifications_v2_mv.sql (0 hunks)
go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1_mv.sql (0 hunks)
go/pkg/clickhouse/schema/databases/005_business/001_database.sql (0 hunks)
go/pkg/clickhouse/schema/databases/005_business/002_active_workspaces_per_month_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_keys_per_month_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_ratelimits_per_month_mv_v1.sql (0 hunks)
go/pkg/clickhouse/schema/embed.go (0 hunks)
go/pkg/clickhouse/schema/types.go (1 hunks)
go/pkg/clickhouse/testutil_test.go (1 hunks)
internal/clickhouse/Dockerfile (0 hunks)
internal/clickhouse/package.json (1 hunks)
internal/clickhouse/src/testutil.ts (2 hunks)
tools/local/src/main.ts (1 hunks)
tools/migrate/ch_migrate_v2_simple.ts (1 hunks)
tools/migrate/package.json (1 hunks)

💤 Files with no reviewable changes (59)

.github/workflows/job_test_unit.yaml
go/pkg/clickhouse/schema/databases/002_ratelimits/004_ratelimits_per_hour_v1.sql
go/pkg/clickhouse/schema/databases/001_verifications/015_key_verifications_per_hour_v3.sql
go/pkg/clickhouse/schema/databases/002_ratelimits/001_raw_ratelimits_v1.sql
go/pkg/clickhouse/schema/databases/005_business/002_active_workspaces_per_month_v1.sql
go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_keys_per_month_mv_v1.sql
go/pkg/clickhouse/schema/databases/004_billing/004_billable_verifications_v2.sql
go/pkg/clickhouse/schema/embed.go
go/pkg/clickhouse/schema/databases/002_ratelimits/003_ratelimits_per_minute_mv_v1.sql
go/pkg/clickhouse/schema/databases/003_metrics/001_database.sql
go/pkg/clickhouse/schema/databases/003_metrics/006_api_requests_per_minute_mv_v1.sql
go/pkg/clickhouse/schema/databases/001_verifications/005_key_verifications_per_day_v1.sql
go/pkg/clickhouse/schema/databases/001_verifications/019_key_verifications_per_month_v3.sql
go/pkg/clickhouse/schema/databases/001_verifications/007_key_verifications_per_month_v1.sql
.github/workflows/job_test_api_local.yaml
internal/clickhouse/Dockerfile
go/pkg/clickhouse/schema/databases/001_verifications/011_key_verifications_per_day_v2.sql
go/pkg/clickhouse/schema/databases/001_verifications/022_key_verifications_per_minute_mv_v1.sql
go/pkg/clickhouse/schema/databases/004_billing/001_database.sql
go/pkg/clickhouse/schema/databases/003_metrics/004_api_requests_per_hour_mv_v1.sql
go/pkg/clickhouse/schema/databases/001_verifications/017_key_verifications_per_day_v3.sql
go/pkg/clickhouse/schema/databases/002_ratelimits/008_ratelimits_per_month_v1.sql
go/pkg/clickhouse/schema/databases/001_verifications/020_key_verifications_per_month_mv_v3.sql
go/pkg/clickhouse/schema/databases/003_metrics/008_api_requests_per_day_mv_v1.sql
go/pkg/clickhouse/schema/databases/000_README.md
go/pkg/clickhouse/schema/databases/001_verifications/003_key_verifications_per_hour_v1.sql
go/pkg/clickhouse/schema/databases/004_billing/002_billable_verifications_per_month_v1.sql
go/pkg/clickhouse/schema/databases/003_metrics/003_api_requests_per_hour_v1.sql
go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql
go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1.sql
go/pkg/clickhouse/schema/databases/002_ratelimits/000_database.sql
go/pkg/clickhouse/schema/databases/004_billing/005_billalble_verifications_v2_mv.sql
go/pkg/clickhouse/schema/databases/001_verifications/006_key_verifications_per_day_mv_v1.sql
Makefile
go/pkg/clickhouse/schema/databases/001_verifications/012_key_verifications_per_day_mv_v2.sql
go/pkg/clickhouse/schema/databases/005_business/001_database.sql
go/pkg/clickhouse/schema/databases/003_metrics/002_raw_api_requests_v1.sql
go/pkg/clickhouse/schema/databases/001_verifications/001_database.sql
go/pkg/clickhouse/schema/databases/001_verifications/021_key_verifications_per_minute_v1.sql
go/pkg/clickhouse/schema/databases/001_verifications/009_key_verifications_per_hour_v2.sql
go/pkg/clickhouse/schema/databases/002_ratelimits/002_ratelimits_per_minute_v1.sql
go/pkg/clickhouse/schema/databases/002_ratelimits/010_ratelimits_last_used_v1.sql
go/pkg/clickhouse/schema/databases/001_verifications/018_key_verifications_per_day_mv_v3.sql
go/pkg/clickhouse/schema/databases/002_ratelimits/007_ratelimits_per_day_mv_v1.sql
go/pkg/clickhouse/schema/databases/002_ratelimits/011_ratelimits_last_used_mv_v1.sql
go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_ratelimits_per_month_mv_v1.sql
go/pkg/clickhouse/schema/databases/001_verifications/016_key_verifications_per_hour_mv_v3.sql
go/pkg/clickhouse/schema/databases/003_metrics/005_api_requests_per_minute_v1.sql
go/pkg/clickhouse/schema/databases/001_verifications/013_key_verifications_per_month_v2.sql
go/pkg/clickhouse/schema/databases/001_verifications/014_key_verifications_per_month_mv_v2.sql
go/pkg/clickhouse/schema/databases/002_ratelimits/009_ratelimits_per_month_mv_v1.sql
go/pkg/clickhouse/schema/databases/003_metrics/007_api_requests_per_day_v1.sql
go/pkg/clickhouse/schema/databases/001_verifications/008_key_verifications_per_month_mv_v1.sql
go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql
go/pkg/clickhouse/schema/databases/001_verifications/010_key_verifications_per_hour_mv_v2.sql
go/pkg/clickhouse/schema/databases/001_verifications/004_key_verifications_per_hour_mv_v1.sql
go/pkg/clickhouse/schema/databases/004_billing/003_billable_verifications_per_month_mv_v1.sql
go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1_mv.sql
go/pkg/clickhouse/schema/databases/002_ratelimits/005_ratelimits_per_hour_mv_v1.sql

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.

Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.

📚 Learning: 2025-04-22T14:40:51.459Z

Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.

Applied to files:

go/pkg/clickhouse/README.md
go/pkg/clickhouse/schema/006_ratelimits_raw_v2.sql
go/pkg/clickhouse/schema/001_key_verifications_raw_v2.sql
go/pkg/clickhouse/migrations/20250903085516_init.sql
go/pkg/clickhouse/schema/003_key_verifications_per_hour_v2.sql
go/pkg/clickhouse/schema/000_oldschema.sql
go/pkg/clickhouse/migrations/20250905101812.sql

📚 Learning: 2025-04-22T14:43:11.724Z

Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.

Applied to files:

go/pkg/clickhouse/README.md
go/pkg/clickhouse/schema/001_key_verifications_raw_v2.sql

🧬 Code graph analysis (7)

go/pkg/clickhouse/key_verifications_test.go (6)

go/pkg/uid/uid.go (5)

WorkspacePrefix (20-20)

KeyPrefix (16-16)

KeyAuthPrefix (21-21)

IdentityPrefix (28-28)

RequestPrefix (19-19)

go/pkg/array/fill.go (1)

Fill (106-116)

go/pkg/clickhouse/schema/types.go (1)

KeyVerificationV2 (6-18)

go/pkg/array/random.go (1)

Random (137-142)

go/pkg/array/reduce.go (1)

Reduce (193-199)

go/pkg/array/map.go (1)

Map (151-157)

tools/local/src/main.ts (2)

tools/local/src/db.ts (1)

prepareDatabase (13-82)

tools/local/src/cmd/api.ts (1)

bootstrapApi (9-52)

go/pkg/array/fill_test.go (2)

go/pkg/array/fill.go (1)

Fill (106-116)

go/pkg/array/random.go (1)

Random (137-142)

go/pkg/array/reduce_test.go (2)

go/pkg/array/reduce.go (1)

Reduce (193-199)

go/pkg/array/map.go (1)

Map (151-157)

go/pkg/array/map_test.go (1)

go/pkg/array/map.go (1)

Map (151-157)

go/pkg/array/random_test.go (2)

go/pkg/array/random.go (1)

Random (137-142)

go/pkg/array/fill.go (1)

Fill (106-116)

go/pkg/clickhouse/ratelimits_test.go (6)

go/pkg/uid/uid.go (5)

WorkspacePrefix (20-20)

RatelimitNamespacePrefix (25-25)

IdentityPrefix (28-28)

KeyPrefix (16-16)

RequestPrefix (19-19)

go/pkg/array/fill.go (1)

Fill (106-116)

go/pkg/array/random.go (1)

Random (137-142)

go/pkg/clickhouse/schema/types.go (1)

RatelimitV2 (23-31)

go/pkg/array/reduce.go (1)

Reduce (193-199)

go/pkg/array/map.go (1)

Map (151-157)

🪛 golangci-lint (2.2.2)

go/pkg/clickhouse/key_verifications_test.go

19-19: TestKeyVerifications should use t.Cleanup instead of defer

(tparallel)

go/pkg/array/fill_test.go

216-216: SA4010: this result of append is never used, except maybe in other appends

(staticcheck)

go/pkg/array/reduce_test.go

24-24: variable max has same name as predeclared identifier

(predeclared)

36-36: variable min has same name as predeclared identifier

(predeclared)

347-347: variable max has same name as predeclared identifier

(predeclared)

go/pkg/array/map_test.go

339-339: SA4010: this result of append is never used, except maybe in other appends

(staticcheck)

go/pkg/array/random.go

[high] 141-141: G404: Use of weak random number generator (math/rand or math/rand/v2 instead of crypto/rand)

(gosec)

go/pkg/clickhouse/ratelimits_test.go

18-18: TestRatelimits_ComprehensiveLoadTest should use t.Cleanup instead of defer

(tparallel)

🪛 LanguageTool

go/pkg/clickhouse/README.md

[grammar] ~1-~1: Use correct spacing
Context: # Changing schemas To change a schema, just edit a table in...

(QB_NEW_EN_OTHER_ERROR_IDS_5)

[grammar] ~3-~3: There might be a mistake here.
Context: ...just edit a table in the schema folder. Afterwards generate a migration file using atlas. ...

(QB_NEW_EN_OTHER)

[grammar] ~4-~4: There might be a mistake here.
Context: ...s generate a migration file using atlas. Make sure docker is running, so atlas ca...

(QB_NEW_EN)

[grammar] ~5-~5: Use commas correctly
Context: ...using atlas. Make sure docker is running, so atlas can use it to generate migrati...

(QB_NEW_EN_OTHER_ERROR_IDS_33)

[grammar] ~5-~5: Use correct spacing
Context: ...atlas can use it to generate migrations. bash cd go/pkg/clickhouse atlas migrate diff \ --dir "file://migrations" \ --to "file://schema" \ --dev-url "docker://clickhouse/latest"

(QB_NEW_EN_OTHER_ERROR_IDS_5)

go/GO_DOCUMENTATION_GUIDELINES.md

[grammar] ~393-~393: Use correct spacing
Context: ... - [TypeName] for structs/interfaces - [ConstantName] for constants ## Best Practices and Ant...

(QB_NEW_EN_OTHER_ERROR_IDS_5)

🪛 markdownlint-cli2 (0.17.2)

go/pkg/clickhouse/README.md

10-10: Hard tabs
Column: 1

(MD010, no-hard-tabs)

11-11: Hard tabs
Column: 1

(MD010, no-hard-tabs)

12-12: Hard tabs
Column: 1

(MD010, no-hard-tabs)

🪛 Checkov (3.2.334)

deployment/Dockerfile.clickhouse

[LOW] 1-8: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)

[LOW] 1-8: Ensure that a user for the container has been created

(CKV_DOCKER_3)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Test Go API Local / Test
GitHub Check: Test API / API Test Local
GitHub Check: Build / Build
GitHub Check: Test Packages / Test

🔇 Additional comments (15)

go/pkg/clickhouse/testutil_test.go (1)

1-6: Test-only helpers are scoped correctly

Using package clickhouse_test and keeping helpers unexported is appropriate.

tools/local/src/main.ts (1)

51-55: Verify ClickHouse readiness in startContainers
I didn’t find any health-check or wait for ClickHouse startup inside startContainers; please confirm it blocks until CH’s HTTP port accepts queries. If not, add an explicit readiness step (e.g. run clickhouse-client --query "SELECT 1") after each await startContainers([...]).
go/pkg/array/reduce.go (2)
193-199: LGTM — minimal, correct implementation.

Performs a left fold with O(n)/O(1) as advertised.

137-150: Fix non-compiling example in docs.

The “Maximum” example references an undefined max function and may misuse math.MinInt depending on Go version. Suggest an inlined comparator.
-//	max := array.Reduce(numbers, func(a, b int) int { max(a, b) }, math.MinInt) // Maximum
+//	max := array.Reduce(numbers, func(a, b int) int {
+//	    if a > b { return a }
+//	    return b
+//	}, math.MinInt)
⛔ Skipped due to learnings
Learnt from: Flo4604
PR: unkeyed/unkey#3800
File: go/apps/api/integration/multi_node_usagelimiting/accuracy/accuracy_test.go:231-240
Timestamp: 2025-08-19T08:53:43.952Z
Learning: Go 1.21+ includes builtin min() and max() functions for ordered types including integers, so min/max calls in Go code should not be flagged as undefined when the project uses Go 1.21 or later.
go/pkg/array/fill.go (1)
111-115: Compile error: cannot range over int.

for i := range length doesn’t compile. Use a counted loop.
-	for i := range length {
-		slice[i] = generator()
-	}
+	for i := 0; i < length; i++ {
+		slice[i] = generator()
+	}
⛔ Skipped due to learnings
Learnt from: Flo4604
PR: unkeyed/unkey#3800
File: go/apps/api/integration/multi_node_usagelimiting/run.go:107-126
Timestamp: 2025-08-19T08:57:31.793Z
Learning: Go 1.22+ supports "range over integers" syntax: `for range N` iterates N times, and `for i := range N` iterates with i from 0 to N-1. This is valid Go syntax and should not be flagged as a compilation error.
go/pkg/clickhouse/schema/016_billable_ratelimits_per_month_v2.sql (1)

12-23: Invalid suggestion: ratelimits_per_month_v2 only defines a time (DateTime) column, not separate year or month fields, so using toYear(time)/toMonth(time) is correct; the passed column also exists.

Likely an incorrect or invalid review comment.

go/pkg/array/map_test.go (1)

11-200: Solid coverage and properties.

Good breadth: types, immutability, per-element invocation, order, large slices.
go/pkg/array/random_test.go (2)
31-48: Flake note acknowledged; OK as-is.

Probability of all-equal draws is negligible; acceptable for unit scope.

119-121: Use the standard b.N loop; testing.B has no Loop().

This won’t compile; replace b.Loop() with a for over b.N.
-    for b.Loop() {
+    for i := 0; i < b.N; i++ {
       Random(slice)
     }
Also applies to: 131-133
⛔ Skipped due to learnings
Learnt from: imeyer
PR: unkeyed/unkey#3733
File: go/pkg/repeat/every_test.go:383-383
Timestamp: 2025-08-05T15:19:50.563Z
Learning: Go 1.24 introduced testing.B.Loop method which simplifies benchmark functions by handling iteration internally, ensuring setup/cleanup runs only once per benchmark count, and preventing compiler optimizations that could affect benchmark accuracy.
Learnt from: Flo4604
PR: unkeyed/unkey#3800
File: go/apps/api/integration/multi_node_usagelimiting/run.go:107-126
Timestamp: 2025-08-19T08:57:31.793Z
Learning: Go 1.22+ supports "range over integers" syntax: `for range N` iterates N times, and `for i := range N` iterates with i from 0 to N-1. This is valid Go syntax and should not be flagged as a compilation error.
go/pkg/array/fill_test.go (1)

1-114: Confirm Fill implementation loop.

Relevant snippet shows for i := range length, which is invalid over an int. Ensure Fill uses for i := 0; i < length; i++ { ... }.

go/pkg/array/reduce_test.go (1)

272-282: No missing intPtr helper
The helper intPtr(int) *int is defined in map_test.go within the same package array, so reduce_test.go compiles as is.

go/pkg/clickhouse/schema/types.go (2)

55-66: Confirm Time type matches table column type.

Aggregated tables typically use DateTime/DateTime64, but here Time is int64. If you select time directly from CH, scanning into int64 will fail unless you cast (toUnixTimestamp) in queries. Either switch to time.Time or standardize on selecting epoch.

Would you like me to change these fields to time.Time and update queries accordingly?

81-89: Same Time type concern for API requests aggregates.

Align with the chosen convention (either time.Time fields or always casting in SQL).
go/pkg/clickhouse/ratelimits_test.go (1)
41-43: Undefined min/max will not compile unless provided elsewhere.

Either import a helper or implement local minInt/maxInt.
-	numNamespaces := max(1, min(numRecords/1000, 100))   // 100-1000 namespaces
-	numIdentifiers := max(1, min(numRecords/100, 10000)) // 1000-10000 identifiers
+	numNamespaces := maxInt(1, minInt(numRecords/1000, 100))   // 1-100
+	numIdentifiers := maxInt(1, minInt(numRecords/100, 10000)) // 1-10000
Add helpers:
func minInt(a, b int) int { if a < b { return a }; return b }
func maxInt(a, b int) int { if a > b { return a }; return b }
⛔ Skipped due to learnings
Learnt from: Flo4604
PR: unkeyed/unkey#3800
File: go/apps/api/integration/multi_node_usagelimiting/accuracy/accuracy_test.go:231-240
Timestamp: 2025-08-19T08:53:43.952Z
Learning: Go 1.21+ includes builtin min() and max() functions for ordered types including integers, so min/max calls in Go code should not be flagged as undefined when the project uses Go 1.21 or later.
go/pkg/clickhouse/migrations/20250905101812.sql (1)

281-285: Temp sync MVs depend on v1 sources; creation will fail if sources are absent; also no backfill.

Ensure verifications.raw_key_verifications_v1, metrics.raw_api_requests_v1, ratelimits.raw_ratelimits_v1 exist before creating these MVs, or gate creation. Without POPULATE/backfill, historical data won’t migrate.

Proposed approach:

Create databases/tables if needed, or wrap in deployment steps that guarantee presence.

Backfill with one-off INSERT INTO ... SELECT ... for historical windows, then enable MVs for tail data.

deployment/docker-compose.yaml

deployment/Dockerfile.clickhouse

go/GO_DOCUMENTATION_GUIDELINES.md

go/go.mod

tools/migrate/ch_migrate_v2_simple.ts

tools/migrate/package.json

coderabbitai

Review continued from previous batch...

go/pkg/array/fill_test.go

go/pkg/array/map_test.go

go/pkg/array/reduce_test.go

go/pkg/clickhouse/schema/012_api_requests_per_minute_v2.sql

go/pkg/clickhouse/schema/013_api_requests_per_hour_v2.sql

go/pkg/clickhouse/schema/015_active_workspaces_per_month_v2.sql

coderabbitai

Review continued from previous batch...

go/pkg/clickhouse/migrations/20250905101812.sql

go/pkg/clickhouse/schema/001_key_verifications_raw_v2.sql

go/pkg/clickhouse/schema/010_ratelimits_per_month_v2.sql

go/pkg/clickhouse/schema/011_api_requests_raw_v2.sql

tools/migrate/ch_migrate_v2_simple.ts

graphite-app · 2025-09-09T06:06:10Z

Merge activity

Sep 9, 6:06 AM UTC: This pull request can not be added to the Graphite merge queue. Please try rebasing and resubmitting to merge when ready.
Sep 9, 6:06 AM UTC: Graphite disabled "merge when ready" on this PR due to: a merge conflict with the target branch; resolve the conflict and try again..

perkinsjr · 2025-09-09T20:26:30Z

go/pkg/clickhouse/schema/001_key_verifications_raw_v2.sql

+ENGINE = MergeTree()
+PARTITION BY toYYYYMMDD(fromUnixTimestamp64Milli(time))
+ORDER BY (workspace_id, time, key_space_id, outcome)
+TTL toDateTime(fromUnixTimestamp64Milli(time)) + INTERVAL 1 MONTH DELETE


Should this not be INTERVAL 3 MONTH DELETE

Our retention is 90 days, Unless I am dumb. Which is very possible and I don't understand this as much as I thought.

according to our pricing page its 30

perkinsjr

I left a single comment about a time frame.

Outside of that it look good

go/pkg/array/map.go

go/pkg/array/random.go

go/pkg/array/reduce.go

imeyer

too-long-tried-to-read but i trust you know what you're doing here 😄

imeyer · 2025-09-10T02:59:29Z

Mmm @perkinsjr's comment will need addressing.. but I still approve.

Flo4604 · 2025-09-10T08:53:06Z

go/pkg/clickhouse/schema/004_key_verifications_per_day_v2.sql

+  avgMergeState(latency_avg) as latency_avg,
+  quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75,
+  quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99,
+  toStartOfDay(time) AS time


Wouldn't we be better of using toDate(fromUnixTimestamp64Milli(time)) here since the minutes are not relevant to us at all? Applies to all the per_day schemas

time here is a DateTime, not an int64

why would this be better?

We can switch it to Date then which uses 2bytes less its a micro optimization but still since its per day we don't need to store 00:00:00 for each one

yeah let me give it a shot, I think we'd need to do toDate(toStartOfDay(time)) as time

Flo4604 · 2025-09-10T08:58:41Z

The only thing i really noticed is that we are missing ratelimits_last_used_v1 in the new schema, do we not need this mv anymore?

…se-schemas

graphite-app · 2025-09-11T07:10:15Z

Graphite Automations

"Notify author when CI fails" took an action on this PR • (09/11/25)

1 teammate was notified to this PR based on Andreas Thomas's automation.

coderabbitai

Actionable comments posted: 16

♻️ Duplicate comments (18)

go/pkg/array/random.go (1)

3-5: Doc/behavior mismatch + gosec G404: seed global PRNG and justify non-crypto use.
Still unseeded and will trip G404 in many pipelines. Seed once and add scoped nolints.
 import (
-	"math/rand"
+	"math/rand"
+	"time"
 )
 
+func init() {
+	// Seed global PRNG once for non-cryptographic randomness.
+	//nolint:gosec // G404: math/rand is intentional for non-crypto selection.
+	rand.Seed(time.Now().UnixNano())
+}
@@
-	return slice[rand.Intn(len(slice))]
+	return slice[rand.Intn(len(slice))] //nolint:gosec // G404: intentional non-crypto randomness.
Also applies to: 23-28

go/GO_DOCUMENTATION_GUIDELINES.md (1)

392-395: Indentation consistency in cross-reference bullets.
Normalize spacing to match surrounding bullets.

 10. **Cross-references** using Go's `[Reference]` format:
-    - `[OtherFunc]` for functions
-    - `[TypeName]` for structs/interfaces
-    - `[ConstantName]` for constants
+    - `[OtherFunc]` for functions
+    - `[TypeName]` for structs/interfaces
+    - `[ConstantName]` for constants

go/pkg/clickhouse/key_verifications_test.go (7)

25-27: Prefer t.Cleanup over defer for connection teardown.

Ensures proper ordering with subtests and parallel runs.

- defer conn.Close()
+ t.Cleanup(func() { require.NoError(t, conn.Close()) })

35-35: Gate 1M-row test under -short/CI.

Prevents slow/flaky CI.

- numRecords := 1_000_000
+ if testing.Short() {
+   t.Skip("skipping large load test in -short mode")
+ }
+ numRecords := 1_000_000

66-69: Fix misleading latency comment.

Distribution is heavy-tailed; values can exceed 100ms.

-		latency := rand.ExpFloat64()*50 + 1 // 1-100ms base range
+		latency := rand.ExpFloat64()*50 + 1 // ~Exp(mean=50ms); typically <100ms but heavy-tailed

113-123: Add per-query timeouts and avoid shared err in parallel subtests.

Use a fresh err variable and a short context inside Eventually to prevent hangs and data races.

- require.EventuallyWithT(t, func(c *assert.CollectT) {
-   queried := int64(0)
-   err = conn.QueryRow(ctx, "...").Scan(&queried)
-   require.NoError(c, err)
+ require.EventuallyWithT(t, func(c *assert.CollectT) {
+   ctxQ, cancel := context.WithTimeout(ctx, 30*time.Second)
+   defer cancel()
+   queried := int64(0)
+   qErr := conn.QueryRow(ctxQ, "...").Scan(&queried)
+   require.NoError(c, qErr)
 }, time.Minute, time.Second)

Also applies to: 167-179, 198-209, 231-240, 255-270, 281-289, 302-310, 327-335

112-124: Capture loop variables in subtests.

Prevent flakiness under parallelization.

- for _, table := range []string{...} {
-   t.Run(table, func(t *testing.T) {
+ for _, table := range []string{...} {
+   table := table
+   t.Run(table, func(t *testing.T) {

Apply similarly for keyID and identityID closures:

- for _, keyID := range keys[:10] {
+ for _, keyID := range keys[:10] {
+   keyID := keyID
    ...
- for _, identityID := range identities[:10] {
+ for _, identityID := range identities[:10] {
+   identityID := identityID

Also applies to: 166-173, 197-204, 230-236, 254-262, 279-286, 301-306, 326-331

258-263: Scan quantiles into float64.

The driver returns Float64; avoid precision loss/scan errors.

- var (
-   queriedAvg float64
-   queriedP75 float32
-   queriedP99 float32
- )
- err = conn.QueryRow(ctx, "...").Scan(&queriedAvg, &queriedP75, &queriedP99)
+ var (
+   queriedAvg float64
+   queriedP75 float64
+   queriedP99 float64
+ )
+ err = conn.QueryRow(ctx, "...").Scan(&queriedAvg, &queriedP75, &queriedP99)

341-356: Make billing assertion resilient to MV lag.

Poll until MV catches up to avoid flakes.

- var queried int64
- err = conn.QueryRow(ctx, "SELECT sum(count) FROM billable_verifications_per_month_v2 WHERE workspace_id = ?;", workspaceID).Scan(&queried)
- require.NoError(t, err)
- require.Equal(t, billableVerifications, queried)
+ require.EventuallyWithT(t, func(c *assert.CollectT) {
+   ctxQ, cancel := context.WithTimeout(ctx, 30*time.Second)
+   defer cancel()
+   var queried int64
+   qErr := conn.QueryRow(ctxQ, "SELECT sum(count) FROM billable_verifications_per_month_v2 WHERE workspace_id = ?;", workspaceID).Scan(&queried)
+   require.NoError(c, qErr)
+   require.Equal(c, billableVerifications, queried)
+ }, time.Minute, time.Second)

go/pkg/clickhouse/schema/014_api_requests_per_day_v2.sql (3)

35-35: Normalize to UTC before day bucketing.

Avoid server-TZ drift.
-  toDate(toStartOfDay(time)) AS time
+  toDate(toStartOfDay(toTimeZone(time, 'UTC'))) AS time
10-10: Counts should be unsigned.

Keep count non-negative and consistent across v2 layers.
-  count SimpleAggregateFunction(sum, Int64),
+  count SimpleAggregateFunction(sum, UInt64),
5-5: Invalid/undesired type for HTTP status; use UInt16.

ClickHouse has no bare Int type; statuses are non-negative.
-  response_status Int,
+  response_status UInt16,

go/pkg/clickhouse/schema/009_ratelimits_per_day_v2.sql (2)

19-36: Fix MV: column order, quantile MergeState params, and GROUP BY alias.

Current SELECT order doesn’t match table, MergeState uses params, and GROUP BY uses alias.

-CREATE MATERIALIZED VIEW ratelimits_per_day_mv_v2 TO ratelimits_per_day_v2 AS
-SELECT
-  workspace_id,
-  namespace_id,
-  identifier,
-  sum(total) as total,
-  sum(passed) as passed,
-  avgMergeState(latency_avg) as latency_avg,
-  quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75,
-  quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99,
-  toDate(toStartOfDay(time)) AS time
-FROM
-  ratelimits_per_hour_v2
-GROUP BY
-  workspace_id,
-  namespace_id,
-  time,
-  identifier;
+CREATE MATERIALIZED VIEW ratelimits_per_day_mv_v2 TO ratelimits_per_day_v2 AS
+SELECT
+  toDate(toStartOfDay(time)) AS time,
+  workspace_id,
+  namespace_id,
+  identifier,
+  sum(passed) AS passed,
+  sum(total) AS total,
+  avgMergeState(latency_avg) AS latency_avg,
+  quantilesTDigestMergeState(latency_p75) AS latency_p75,
+  quantilesTDigestMergeState(latency_p99) AS latency_p99
+FROM ratelimits_per_hour_v2
+GROUP BY
+  toDate(toStartOfDay(time)),
+  workspace_id,
+  namespace_id,
+  identifier;

17-17: TTL style consistency.

Align with toIntervalDay.

-TTL time + INTERVAL 100 DAY DELETE;
+TTL time + toIntervalDay(100) DELETE;

go/pkg/clickhouse/schema/010_ratelimits_per_month_v2.sql (1)

19-36: Fix MV: column order, MergeState params, and GROUP BY alias.

Bring SELECT into table order, remove TDigest params, and group by expression.

-CREATE MATERIALIZED VIEW ratelimits_per_month_mv_v2 TO ratelimits_per_month_v2 AS
-SELECT
-  workspace_id,
-  namespace_id,
-  identifier,
-  sum(total) as total,
-  sum(passed) as passed,
-  avgMergeState(latency_avg) as latency_avg,
-  quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75,
-  quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99,
-  toDate(toStartOfMonth(time)) AS time
-FROM
-  ratelimits_per_day_v2
-GROUP BY
-  workspace_id,
-  namespace_id,
-  time,
-  identifier;
+CREATE MATERIALIZED VIEW ratelimits_per_month_mv_v2 TO ratelimits_per_month_v2 AS
+SELECT
+  toDate(toStartOfMonth(time)) AS time,
+  workspace_id,
+  namespace_id,
+  identifier,
+  sum(passed) AS passed,
+  sum(total) AS total,
+  avgMergeState(latency_avg) AS latency_avg,
+  quantilesTDigestMergeState(latency_p75) AS latency_p75,
+  quantilesTDigestMergeState(latency_p99) AS latency_p99
+FROM ratelimits_per_day_v2
+GROUP BY
+  toDate(toStartOfMonth(time)),
+  workspace_id,
+  namespace_id,
+  identifier;

go/pkg/clickhouse/schema/004_key_verifications_per_day_v2.sql (2)

33-57: Fix MV: column order, quantile MergeState params, and GROUP BY alias.

Ensure positional match with table schema; remove TDigest params; group by expression.

-CREATE MATERIALIZED VIEW key_verifications_per_day_mv_v2 TO key_verifications_per_day_v2 AS
-SELECT
-  workspace_id,
-  key_space_id,
-  identity_id,
-  key_id,
-  outcome,
-  tags,
-  sum(count) as count,
-  sum(spent_credits) as spent_credits,
-  avgMergeState(latency_avg) as latency_avg,
-  quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75,
-  quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99,
-  toDate(toStartOfDay(time)) AS time
-FROM
-  key_verifications_per_hour_v2
-GROUP BY
-  workspace_id,
-  time,
-  key_space_id,
-  identity_id,
-  key_id,
-  outcome,
-  tags
-;
+CREATE MATERIALIZED VIEW key_verifications_per_day_mv_v2 TO key_verifications_per_day_v2 AS
+SELECT
+  toDate(toStartOfDay(time)) AS time,
+  workspace_id,
+  key_space_id,
+  identity_id,
+  key_id,
+  outcome,
+  tags,
+  sum(count) AS count,
+  sum(spent_credits) AS spent_credits,
+  avgMergeState(latency_avg) AS latency_avg,
+  quantilesTDigestMergeState(latency_p75) AS latency_p75,
+  quantilesTDigestMergeState(latency_p99) AS latency_p99
+FROM key_verifications_per_hour_v2
+GROUP BY
+  toDate(toStartOfDay(time)),
+  workspace_id,
+  key_space_id,
+  identity_id,
+  key_id,
+  outcome,
+  tags;

18-19: Consider monthly partitions for daily grain.

Monthly partitions reduce part churn with 100‑day TTL.

-PARTITION BY
-  toYYYYMMDD (time)
+PARTITION BY
+  toYYYYMM (time)

go/pkg/clickhouse/schema/005_key_verifications_per_month_v2.sql (1)

33-57: Fix MV: positional order, TDigest MergeState params, and GROUP BY alias.

Reorder to match table, remove params, group by expression.

-CREATE MATERIALIZED VIEW key_verifications_per_month_mv_v2 TO key_verifications_per_month_v2 AS
-SELECT
-  workspace_id,
-  key_space_id,
-  identity_id,
-  key_id,
-  outcome,
-  tags,
-  sum(count) as count,
-  sum(spent_credits) as spent_credits,
-  avgMergeState(latency_avg) as latency_avg,
-  quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75,
-  quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99,
-  toDate(toStartOfMonth(time)) AS time
-FROM
-  key_verifications_per_day_v2
-GROUP BY
-  workspace_id,
-  time,
-  key_space_id,
-  identity_id,
-  key_id,
-  outcome,
-  tags
-;
+CREATE MATERIALIZED VIEW key_verifications_per_month_mv_v2 TO key_verifications_per_month_v2 AS
+SELECT
+  toDate(toStartOfMonth(time)) AS time,
+  workspace_id,
+  key_space_id,
+  identity_id,
+  key_id,
+  outcome,
+  tags,
+  sum(count) AS count,
+  sum(spent_credits) AS spent_credits,
+  avgMergeState(latency_avg) AS latency_avg,
+  quantilesTDigestMergeState(latency_p75) AS latency_p75,
+  quantilesTDigestMergeState(latency_p99) AS latency_p99
+FROM key_verifications_per_day_v2
+GROUP BY
+  toDate(toStartOfMonth(time)),
+  workspace_id,
+  key_space_id,
+  identity_id,
+  key_id,
+  outcome,
+  tags;

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between edd3765 and bf1c68b.

📒 Files selected for processing (17)

go/GO_DOCUMENTATION_GUIDELINES.md (12 hunks)
go/pkg/array/doc.go (1 hunks)
go/pkg/array/fill.go (1 hunks)
go/pkg/array/map.go (1 hunks)
go/pkg/array/random.go (1 hunks)
go/pkg/array/reduce.go (1 hunks)
go/pkg/clickhouse/key_verifications_test.go (1 hunks)
go/pkg/clickhouse/schema/000_legacy.sql (1 hunks)
go/pkg/clickhouse/schema/004_key_verifications_per_day_v2.sql (1 hunks)
go/pkg/clickhouse/schema/005_key_verifications_per_month_v2.sql (1 hunks)
go/pkg/clickhouse/schema/009_ratelimits_per_day_v2.sql (1 hunks)
go/pkg/clickhouse/schema/010_ratelimits_per_month_v2.sql (1 hunks)
go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (1 hunks)
go/pkg/clickhouse/schema/014_api_requests_per_day_v2.sql (1 hunks)
go/pkg/clickhouse/schema/016_api_requests_per_minute_v2.sql (1 hunks)
go/pkg/clickhouse/schema/017_billable_ratelimits_per_month_v2.sql (1 hunks)
go/pkg/clickhouse/schema/018_billable_verifications_per_month_v2.sql (1 hunks)

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.

Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.

📚 Learning: 2025-04-22T14:40:51.459Z

Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.

Applied to files:

go/pkg/clickhouse/schema/018_billable_verifications_per_month_v2.sql
go/pkg/clickhouse/schema/017_billable_ratelimits_per_month_v2.sql
go/pkg/clickhouse/schema/000_legacy.sql

📚 Learning: 2025-08-14T16:25:48.167Z

Learnt from: Flo4604
PR: unkeyed/unkey#3785
File: go/apps/api/routes/v2_keys_reroll_key/401_test.go:52-61
Timestamp: 2025-08-14T16:25:48.167Z
Learning: User Flo4604 requested creation of a GitHub issue to track converting all test files to use table-driven test patterns as a broader codebase improvement, following the suggestion made during review of go/apps/api/routes/v2_keys_reroll_key/401_test.go.

Applied to files:

go/pkg/clickhouse/key_verifications_test.go

📚 Learning: 2025-07-16T15:38:53.491Z

Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/db/replica.go:8-11
Timestamp: 2025-07-16T15:38:53.491Z
Learning: For debugging database replica usage in go/pkg/db/replica.go, it's acceptable to mark QueryRowContext operations as "success" even though SQL errors only surface during row.Scan() calls. The timing metrics are the primary concern for debugging replica performance patterns.

Applied to files:

go/pkg/clickhouse/key_verifications_test.go

📚 Learning: 2025-04-22T14:43:11.724Z

Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.

Applied to files:

go/pkg/clickhouse/schema/000_legacy.sql

🧬 Code graph analysis (1)

go/pkg/clickhouse/key_verifications_test.go (6)

go/pkg/uid/uid.go (5)

WorkspacePrefix (20-20)

KeyPrefix (16-16)

KeyAuthPrefix (21-21)

IdentityPrefix (28-28)

RequestPrefix (19-19)

go/pkg/array/fill.go (1)

Fill (23-33)

go/pkg/clickhouse/schema/types.go (1)

KeyVerificationV2 (6-18)

go/pkg/array/random.go (1)

Random (23-28)

go/pkg/array/reduce.go (1)

Reduce (28-34)

go/pkg/array/map.go (1)

Map (23-29)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Build / Build
GitHub Check: Test Packages / Test
GitHub Check: Test Go API Local / Test
GitHub Check: Test API / API Test Local
GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (5)

go/pkg/array/reduce.go (1)

3-27: Docs/readability: Solid and actionable examples.
Clear API intent, parameters, and examples. No changes needed.

go/pkg/array/map.go (1)

24-29: LGTM.
Preallocation + index-only range is idiomatic and efficient.
go/pkg/array/fill.go (1)
29-31: Build-breaking bug: invalid range over int.
for i := range length does not compile. Use a counted loop.
-	for i := range length {
+	for i := 0; i < length; i++ {
 		slice[i] = generator()
 	}
⛔ Skipped due to learnings
Learnt from: Flo4604
PR: unkeyed/unkey#3800
File: go/apps/api/integration/multi_node_usagelimiting/run.go:107-126
Timestamp: 2025-08-19T08:57:31.793Z
Learning: Go 1.22+ supports "range over integers" syntax: `for range N` iterates N times, and `for i := range N` iterates with i from 0 to N-1. This is valid Go syntax and should not be flagged as a compilation error.
go/pkg/array/doc.go (1)

1-41: Package docs are concise and accurate.
Cross-refs resolve; concurrency notes are appropriate.

go/pkg/clickhouse/schema/000_legacy.sql (1)

49-68: Confirm MV→table column order for legacy v1 objects.

Many MVs (e.g., this block) select columns in a different order than the target table. With TO, ClickHouse inserts by position. Verify this matches production definitions or adjust SELECT to table order to avoid column shifts.

go/pkg/array/reduce.go

go/pkg/clickhouse/key_verifications_test.go

go/pkg/clickhouse/schema/000_legacy.sql

go/pkg/clickhouse/schema/004_key_verifications_per_day_v2.sql

go/pkg/clickhouse/schema/017_billable_ratelimits_per_month_v2.sql

go/pkg/clickhouse/schema/018_billable_verifications_per_month_v2.sql

coderabbitai · 2025-09-11T07:12:20Z

go/pkg/clickhouse/schema/018_billable_verifications_per_month_v2.sql

+) ENGINE = SummingMergeTree ()
+PARTITION BY
+  (year, month)
+ORDER BY
+  (workspace_id, year, month);


🧹 Nitpick (assertive)

Retention: confirm TTL policy for billing data.

Monthly billables typically require long retention. If intended, add an explicit TTL or document why this table is indefinite.

🤖 Prompt for AI Agents

In go/pkg/clickhouse/schema/018_billable_verifications_per_month_v2.sql around lines 7 to 11, the table DDL currently has no TTL/retention specified so billing data will be kept indefinitely; either add an explicit TTL clause (e.g., TTL toDate(year, month, 1) + INTERVAL X MONTH or TTL <timestamp_column> + INTERVAL X MONTH) to enforce the desired retention period, or add a comment above the DDL documenting that indefinite retention is intentional and why (including expected data size and compliance requirements). Ensure the chosen TTL uses the appropriate date column and interval and that the change is approved by the billing/PD team.

go/pkg/clickhouse/schema/018_billable_verifications_per_month_v2.sql

…se-schemas

…into clickhouse-schemas

coderabbitai

Actionable comments posted: 21

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf1c68b and c59ab80.

⛔ Files ignored due to path filters (2)

go/pkg/clickhouse/migrations/atlas.sum is excluded by !**/*.sum
pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (10)

go/pkg/clickhouse/migrations/20250911070454.sql (1 hunks)
go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (1 hunks)
go/pkg/clickhouse/schema/012_api_requests_raw_v2.sql (1 hunks)
go/pkg/clickhouse/schema/013_api_requests_per_minute_v2.sql (1 hunks)
go/pkg/clickhouse/schema/014_api_requests_per_hour_v2.sql (1 hunks)
go/pkg/clickhouse/schema/015_api_requests_per_day_v2.sql (1 hunks)
go/pkg/clickhouse/schema/016_api_requests_per_month_v2.sql (1 hunks)
go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql (1 hunks)
go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql (1 hunks)
go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.

Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.

📚 Learning: 2025-04-22T14:40:51.459Z

Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.

Applied to files:

go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql
go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql
go/pkg/clickhouse/migrations/20250911070454.sql

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Test Packages / Test

🔇 Additional comments (10)

go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (1)

23-23: Confirm source time unit alignment (ms vs s).

If ratelimits.raw_ratelimits_v1.time is ms-epoch, keeping Int64 is fine; if it’s DateTime/DateTime64, align the type accordingly.

go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql (1)

12-23: Verify source column passed exists and semantics.

Confirm ratelimits_per_month_v2 exposes a numeric passed compatible with sum(passed) and that summing across its dimensions matches billing rules.

go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql (2)

13-26: Confirm source fields and filter semantics.

Ensure key_verifications_per_month_v2 has outcome and count with expected types; validate that billing should include only outcome = 'VALID' (case, enum). If outcome is enum, compare by enum value to avoid string drift.

23-26: Group by columns must match SELECT after schema change.

If you adopt time Date, adjust MV to group by time instead of (year, month).

go/pkg/clickhouse/schema/013_api_requests_per_minute_v2.sql (1)

27-36: Time bucketing from ms epoch—confirm unit.

fromUnixTimestamp64Milli(time) assumes ms-epoch in api_requests_raw_v2.time. Validate source type.

go/pkg/clickhouse/schema/014_api_requests_per_hour_v2.sql (1)

27-35: Idempotent rollup path looks good.

Hourly MV groups correctly by all dims and buckets via toStartOfHour(time).

go/pkg/clickhouse/schema/015_api_requests_per_day_v2.sql (1)

27-44: Daily aggregation is consistent.

Correct bucketing to day and grouping over all dims.

go/pkg/clickhouse/schema/016_api_requests_per_month_v2.sql (1)

27-35: Monthly rollup LGTM.

Buckets to month start and groups over all dims.

go/pkg/clickhouse/schema/012_api_requests_raw_v2.sql (1)

33-55: Incorrect — no duplicate MV found; only defined in go/pkg/clickhouse/schema/012_api_requests_raw_v2.sql

rg output shows the CREATE TABLE api_requests_raw_v2 and CREATE MATERIALIZED VIEW temp_sync_metrics_v1_to_v2 only in that file; the duplicate alleged in 20250911070454.sql was not found. Use IF NOT EXISTS and fully-qualified names only if you intentionally add a second CREATE.

Likely an incorrect or invalid review comment.

go/pkg/clickhouse/migrations/20250911070454.sql (1)

309-311: Backfill and decommission plan for temp sync MVs.

Temp sync views target only new inserts (no POPULATE). Ensure backfill jobs cover historical data, and add explicit DROP steps in the migration plan once cutover is done to avoid silent double-writes.

Would you like a one-off backfill SQL for ratelimits_last_used_v2 that unions v1/v2 raw tables and computes MAX(time) per (workspace_id, namespace_id, identifier)?

coderabbitai · 2025-09-11T07:37:18Z

go/pkg/clickhouse/migrations/20250911070454.sql

+-- Create "api_requests_raw_v2" table
+CREATE TABLE `default`.`api_requests_raw_v2` (
+  `request_id` String,
+  `time` Int64 CODEC(Delta(8), LZ4),
+  `workspace_id` String,
+  `host` String,
+  `method` LowCardinality(String),
+  `path` String,
+  `request_headers` Array(String),
+  `request_body` String,
+  `response_status` Int32,
+  `response_headers` Array(String),
+  `response_body` String,
+  `error` String,
+  `service_latency` Int64,
+  `user_agent` String,
+  `ip_address` String,
+  `region` LowCardinality(String),
+  INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1
+) ENGINE = MergeTree
+PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000;


🛠️ Refactor suggestion

Unify raw-table shapes across domains.

All three raw_v2 tables should align on: PARTITION BY day, time CODEC(Delta(8), LZ4), index_granularity, and dedup settings. Today only api_requests_raw_v2 sets index_granularity; none set PARTITION BY.

I can generate a consolidated patch if you confirm daily partitioning is desired for all raw tables.

Also applies to: 176-194, 259-271

🤖 Prompt for AI Agents

In go/pkg/clickhouse/migrations/20250911070454.sql around lines 63-83 (and similarly for blocks at 176-194 and 259-271), the raw_v2 tables are not using a unified table shape: add a daily PARTITION BY expression (partition by day derived from the time column), ensure the time column keeps CODEC(Delta(8), LZ4), and make the table SETTINGS consistent across all raw_v2 tables by setting the same index_granularity and non_replicated_deduplication_window values; apply the same primary key / ORDER BY ordering and TTL pattern to match the other raw_v2 tables so all three share identical partitioning, codec, index_granularity, and dedup settings.

coderabbitai · 2025-09-11T07:37:19Z

go/pkg/clickhouse/migrations/20250911070454.sql

+CREATE TABLE `default`.`api_requests_raw_v2` (
+  `request_id` String,
+  `time` Int64 CODEC(Delta(8), LZ4),
+  `workspace_id` String,
+  `host` String,
+  `method` LowCardinality(String),
+  `path` String,
+  `request_headers` Array(String),
+  `request_body` String,
+  `response_status` Int32,
+  `response_headers` Array(String),
+  `response_body` String,
+  `error` String,
+  `service_latency` Int64,
+  `user_agent` String,
+  `ip_address` String,
+  `region` LowCardinality(String),
+  INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1
+) ENGINE = MergeTree
+PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000;


⚠️ Potential issue

Raw table missing PARTITION BY; will create a single huge partition.

api_requests_raw_v2 here has no PARTITION BY, unlike the schema file. Add daily partitioning for manageability and TTL efficiency.

) ENGINE = MergeTree +PARTITION BY toYYYYMMDD(fromUnixTimestamp64Milli(time)) PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000;

Apply the same fix to key_verifications_raw_v2 (Line 193-194 block) and ratelimits_raw_v2 (Line 270-271 block).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

CREATE TABLE `default`.`api_requests_raw_v2` (

`request_id` String,

`time` Int64 CODEC(Delta(8), LZ4),

`workspace_id` String,

`host` String,

`method` LowCardinality(String),

`path` String,

`request_headers` Array(String),

`request_body` String,

`response_status` Int32,

`response_headers` Array(String),

`response_body` String,

`error` String,

`service_latency` Int64,

`user_agent` String,

`ip_address` String,

`region` LowCardinality(String),

INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1

) ENGINE = MergeTree

PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000;

CREATE TABLE `default`.`api_requests_raw_v2` (

`request_id` String,

`time` Int64 CODEC(Delta(8), LZ4),

`workspace_id` String,

`host` String,

`method` LowCardinality(String),

`path` String,

`request_headers` Array(String),

`request_body` String,

`response_status` Int32,

`response_headers` Array(String),

`response_body` String,

`error` String,

`service_latency` Int64,

`user_agent` String,

`ip_address` String,

`region` LowCardinality(String),

INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1

) ENGINE = MergeTree

PARTITION BY toYYYYMMDD(fromUnixTimestamp64Milli(time))

PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000;

🤖 Prompt for AI Agents

In go/pkg/clickhouse/migrations/20250911070454.sql around lines 64-83, the CREATE TABLE for api_requests_raw_v2 lacks a PARTITION BY clause which will create one huge partition; add a daily PARTITION BY expression based on the time column (e.g., partition by the day derived from the Int64 time/timestamp) so partitions are created per day and TTLs work efficiently; update the CREATE TABLE definitions for key_verifications_raw_v2 (around lines 193-194) and ratelimits_raw_v2 (around lines 270-271) the same way to use daily partitioning derived from their time columns.

coderabbitai · 2025-09-11T07:37:19Z

go/pkg/clickhouse/migrations/20250911070454.sql

+-- Create "ratelimits_last_used_v2" table
+CREATE TABLE `default`.`ratelimits_last_used_v2` (
+  `time` Int64,
+  `workspace_id` String,
+  `namespace_id` String,
+  `identifier` String
+) ENGINE = AggregatingMergeTree
+PRIMARY KEY (`workspace_id`, `namespace_id`, `identifier`) ORDER BY (`workspace_id`, `namespace_id`, `identifier`) SETTINGS index_granularity = 8192;


⚠️ Potential issue

ratelimits_last_used_v2 is incorrectly modeled; maxSimpleState() is invalid for the declared column type.

With AggregatingMergeTree, either store AggregateFunction(max, Int64) with maxState(), or use SimpleAggregateFunction(max, Int64) and insert raw values. Current DDL (Int64 + maxSimpleState) won’t merge correctly.

Use SimpleAggregateFunction variant:

-CREATE TABLE `default`.`ratelimits_last_used_v2` ( - `time` Int64, +CREATE TABLE `default`.`ratelimits_last_used_v2` ( + `time` SimpleAggregateFunction(max, Int64), `workspace_id` String, `namespace_id` String, `identifier` String ) ENGINE = AggregatingMergeTree PRIMARY KEY (`workspace_id`, `namespace_id`, `identifier`) ORDER BY (`workspace_id`, `namespace_id`, `identifier`) SETTINGS index_granularity = 8192; @@ -CREATE MATERIALIZED VIEW `default`.`ratelimits_last_used_mv_v2` TO `default`.`ratelimits_last_used_v2` AS SELECT workspace_id, namespace_id, identifier, maxSimpleState(time) AS time FROM ratelimits.raw_ratelimits_v1 GROUP BY workspace_id, namespace_id, identifier; +CREATE MATERIALIZED VIEW `default`.`ratelimits_last_used_mv_v2` TO `default`.`ratelimits_last_used_v2` AS +SELECT workspace_id, namespace_id, identifier, time +FROM ratelimits.raw_ratelimits_v1;

Option B (AggregateFunction) if preferred:

- `time` Int64, + `time` AggregateFunction(max, Int64), @@ -CREATE MATERIALIZED VIEW ... AS SELECT ..., maxSimpleState(time) AS time ... +CREATE MATERIALIZED VIEW ... AS SELECT ..., maxState(time) AS time ...

Also applies to: 296-297

🤖 Prompt for AI Agents

In go/pkg/clickhouse/migrations/20250911070454.sql around lines 195 to 202 (also applies to lines 296-297), the table is declared with `time Int64` but the AggregatingMergeTree is using maxSimpleState() which is invalid for a plain Int64 column; fix by changing the column definition to a matching aggregate type: either make the column SimpleAggregateFunction(max, Int64) and continue inserting raw Int64 values, or change it to AggregateFunction(max, Int64) and ensure you insert maxState() values (and adjust any insert/upsert SQL accordingly); update the DDL to use one of these two consistent patterns and verify the corresponding insert logic in the migration is updated to match.

coderabbitai · 2025-09-11T07:37:19Z

go/pkg/clickhouse/migrations/20250911070454.sql

+CREATE MATERIALIZED VIEW `default`.`active_workspaces_keys_per_month_mv_v2` TO `default`.`active_workspaces_per_month_v2` AS SELECT workspace_id, toDate(time) AS time FROM default.key_verifications_per_month_v2;
+-- Create "active_workspaces_ratelimits_per_month_mv_v2" view
+CREATE MATERIALIZED VIEW `default`.`active_workspaces_ratelimits_per_month_mv_v2` TO `default`.`active_workspaces_per_month_v2` AS SELECT workspace_id, toDate(time) AS time FROM default.ratelimits_per_month_v2;


🧹 Nitpick (assertive)

ReplacingMergeTree without version may accumulate duplicates until merges.

active_workspaces_* MVs insert potentially duplicate PK rows. ReplacingMergeTree will collapse them eventually, but if write volume is high, consider AggregatingMergeTree with SimpleAggregateFunction(anyLast, Date) or add a version column.

coderabbitai · 2025-09-11T07:37:19Z

go/pkg/clickhouse/migrations/20250911070454.sql

+CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` TO `default`.`api_requests_per_minute_v2` AS SELECT workspace_id, path, response_status, host, method, count(*) AS count, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.api_requests_raw_v2 GROUP BY workspace_id, path, response_status, host, method, time;
+-- Create "api_requests_per_month_mv_v2" view
+CREATE MATERIALIZED VIEW `default`.`api_requests_per_month_mv_v2` TO `default`.`api_requests_per_month_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfMonth(time)) AS time FROM default.api_requests_per_day_v2 GROUP BY workspace_id, path, response_status, host, method, time;
+-- Create "billable_ratelimits_per_month_mv_v2" view
+CREATE MATERIALIZED VIEW `default`.`billable_ratelimits_per_month_mv_v2` TO `default`.`billable_ratelimits_per_month_v2` AS SELECT workspace_id, sum(passed) AS count, toYear(time) AS year, toMonth(time) AS month FROM default.ratelimits_per_month_v2 GROUP BY workspace_id, year, month;


🧹 Nitpick (assertive)

Make migration idempotent.

Consider IF NOT EXISTS on MVs to survive partial runs/retries, especially while temp sync views exist in both schema and migrations.

-CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` ... +CREATE MATERIALIZED VIEW IF NOT EXISTS `default`.`api_requests_per_minute_mv_v2` ...

Apply similarly to other CREATE MATERIALIZED VIEW statements.

Also applies to: 287-295, 299-305

🤖 Prompt for AI Agents

In go/pkg/clickhouse/migrations/20250911070454.sql around lines 281 to 285 (and likewise for the other CREATE MATERIALIZED VIEW ranges at 287-295 and 299-305), the CREATE MATERIALIZED VIEW statements are not idempotent and will fail on partial runs/retries; update each CREATE MATERIALIZED VIEW to include IF NOT EXISTS so the migration can safely be re-run when temp/partial views already exist, and apply the same change to all other CREATE MATERIALIZED VIEW statements mentioned in the comment.

coderabbitai · 2025-09-11T07:37:20Z

go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql

+  workspace_id,
+  toDate (time) as time


🧹 Nitpick (assertive)

Nit: normalize spacing.

toDate (time) → toDate(time).

- toDate (time) as time + toDate(time) as time

Also applies to: 18-19

🤖 Prompt for AI Agents

In go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql around lines 11-12 (and also at lines 18-19), normalize the spacing in the function call by changing occurrences of "toDate (time)" to "toDate(time)"; update those two places so the function name and its parentheses have no space between them.

coderabbitai · 2025-09-11T07:37:20Z

go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql

+CREATE TABLE billable_ratelimits_per_month_v2 (
+  year Int16,
+  month Int8,
+  workspace_id String,
+  count Int64
+) ENGINE = SummingMergeTree ()
+PARTITION BY
+  (year, month)
+ORDER BY
+  (workspace_id, year, month);


🧹 Nitpick (assertive)

Align monthly schema with other v2 tables (use time Date).

Other monthly tables (e.g., api_requests_per_month_v2) model month as time Date (month start). Using (year, month) here introduces inconsistency. Prefer time Date for query ergonomics and uniformity.

-CREATE TABLE billable_ratelimits_per_month_v2 ( - year Int16, - month Int8, - workspace_id String, - count Int64 -) ENGINE = SummingMergeTree () -PARTITION BY - (year, month) -ORDER BY - (workspace_id, year, month); +CREATE TABLE billable_ratelimits_per_month_v2 ( + time Date, + workspace_id String, + count Int64 +) ENGINE = SummingMergeTree() +PARTITION BY toYYYYMM(time) +ORDER BY (time, workspace_id);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

CREATE TABLE billable_ratelimits_per_month_v2 (

year Int16,

month Int8,

workspace_id String,

count Int64

) ENGINE = SummingMergeTree ()

PARTITION BY

(year, month)

ORDER BY

(workspace_id, year, month);

CREATE TABLE billable_ratelimits_per_month_v2 (

time Date,

workspace_id String,

count Int64

) ENGINE = SummingMergeTree()

PARTITION BY toYYYYMM(time)

ORDER BY (time, workspace_id);

🤖 Prompt for AI Agents

In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql around lines 1-10, the table models month as separate year Int16 and month Int8 columns which is inconsistent with other v2 monthly tables; change the schema to use a single time Date column representing the month start (drop year and month columns), keep count Int64, and update ENGINE/PARTITION BY/ORDER BY to partition by time and ORDER BY (workspace_id, time) to match the other monthly v2 tables for consistent querying and ergonomics.

coderabbitai · 2025-09-11T07:37:20Z

go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql

+ORDER BY
+  (workspace_id, year, month);


🧹 Nitpick (assertive)

Ordering to match partition for better pruning.

Given PARTITION BY (year, month), ordering as (year, month, workspace_id) aids range scans by month.

-ORDER BY - (workspace_id, year, month); +ORDER BY + (year, month, workspace_id);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

ORDER BY

(workspace_id, year, month);

ORDER BY

(year, month, workspace_id);

🤖 Prompt for AI Agents

In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql around lines 9 to 10, the ORDER BY currently uses (workspace_id, year, month) which doesn't match the PARTITION BY (year, month); change the ORDER BY to (year, month, workspace_id) so the ordering aligns with the partitioning and improves range pruning for month-based queries.

coderabbitai · 2025-09-11T07:37:21Z

go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql

+CREATE MATERIALIZED VIEW billable_ratelimits_per_month_mv_v2 TO billable_ratelimits_per_month_v2 AS
+SELECT
+  workspace_id,
+  sum(passed) AS count,
+  toYear (time) AS year,
+  toMonth (time) AS month
+FROM
+  ratelimits_per_month_v2
+GROUP BY
+  workspace_id,
+  year,
+  month;


🧹 Nitpick (assertive)

Minor: trim extra spaces in function calls.

toYear (time) → toYear(time), toMonth (time) → toMonth(time).

- toYear (time) AS year, - toMonth (time) AS month + toYear(time) AS year, + toMonth(time) AS month

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

CREATE MATERIALIZED VIEW billable_ratelimits_per_month_mv_v2 TO billable_ratelimits_per_month_v2 AS

SELECT

workspace_id,

sum(passed) AS count,

toYear (time) AS year,

toMonth (time) AS month

FROM

ratelimits_per_month_v2

GROUP BY

workspace_id,

year,

month;

CREATE MATERIALIZED VIEW billable_ratelimits_per_month_mv_v2 TO billable_ratelimits_per_month_v2 AS

SELECT

workspace_id,

sum(passed) AS count,

toYear(time) AS year,

toMonth(time) AS month

FROM

ratelimits_per_month_v2

GROUP BY

workspace_id,

year,

month;

🤖 Prompt for AI Agents

In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql around lines 12 to 23, there are extra spaces inside the function calls `toYear (time)` and `toMonth (time)`; remove the space so they read `toYear(time)` and `toMonth(time)` to follow consistent formatting and avoid style issues.

coderabbitai · 2025-09-11T07:37:21Z

go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql

+-- Billing Billable Verifications Per Month Table and Materialized View
+CREATE TABLE billable_verifications_per_month_v2 (
+  year Int16,
+  month Int8,
+  workspace_id String,
+  count Int64
+) ENGINE = SummingMergeTree ()
+PARTITION BY
+  (year, month)
+ORDER BY
+  (workspace_id, year, month);


🧹 Nitpick (assertive)

Unify month key shape with v2 (prefer time Date).

Mirror 016_api_requests_per_month_v2 for consistency: time Date (month start) instead of (year, month).

-CREATE TABLE billable_verifications_per_month_v2 ( - year Int16, - month Int8, - workspace_id String, - count Int64 -) ENGINE = SummingMergeTree () -PARTITION BY - (year, month) -ORDER BY - (workspace_id, year, month); +CREATE TABLE billable_verifications_per_month_v2 ( + time Date, + workspace_id String, + count Int64 +) ENGINE = SummingMergeTree() +PARTITION BY toYYYYMM(time) +ORDER BY (time, workspace_id);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

-- Billing Billable Verifications Per Month Table and Materialized View

CREATE TABLE billable_verifications_per_month_v2 (

year Int16,

month Int8,

workspace_id String,

count Int64

) ENGINE = SummingMergeTree ()

PARTITION BY

(year, month)

ORDER BY

(workspace_id, year, month);

-- Billing Billable Verifications Per Month Table and Materialized View

CREATE TABLE billable_verifications_per_month_v2 (

time Date,

workspace_id String,

count Int64

) ENGINE = SummingMergeTree()

PARTITION BY toYYYYMM(time)

ORDER BY (time, workspace_id);

🤖 Prompt for AI Agents

In go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql around lines 1 to 11, replace the separate year and month columns with a single time Date column (representing the month start) to match v2 conventions; update the PARTITION BY to use the time month key (e.g. toYYYYMM(time) or the same partition expression used in 016_api_requests_per_month_v2) and change ORDER BY to (workspace_id, time) so the table schema and materialized view shape mirror 016_api_requests_per_month_v2.

coderabbitai

Actionable comments posted: 24

♻️ Duplicate comments (3)

go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (3)
15-23: MV target and state function now consistent—good; keep it.

With the above table change, maxSimpleState(time) matches the column type. No DB-qualification mismatch anymore.

1-11: Column order (dims first) improves locality.

Optional: move time after keys in ORDER BY/table for readability; not functionally required.
- (time SimpleAggregateFunction(max, Int64), workspace_id, namespace_id, identifier)
+ (workspace_id String, namespace_id String, identifier String, time SimpleAggregateFunction(max, Int64))
1-10: AggregatingMergeTree requires aggregate states; fix time type.

Table defines time Int64 but MV inserts maxSimpleState(time) (aggregate state). Use SimpleAggregateFunction(max, Int64) (or switch MV to maxState and use AggregateFunction).
 CREATE TABLE IF NOT EXISTS ratelimits_last_used_v2
 (
-  time          Int64,
+  time          SimpleAggregateFunction(max, Int64),
   workspace_id  String,
   namespace_id  String,
   identifier    String,
 )
 ENGINE = AggregatingMergeTree()
 ORDER BY (workspace_id, namespace_id, identifier)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf1c68b and c59ab80.

⛔ Files ignored due to path filters (2)

go/pkg/clickhouse/migrations/atlas.sum is excluded by !**/*.sum
pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (10)

go/pkg/clickhouse/migrations/20250911070454.sql (1 hunks)
go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (1 hunks)
go/pkg/clickhouse/schema/012_api_requests_raw_v2.sql (1 hunks)
go/pkg/clickhouse/schema/013_api_requests_per_minute_v2.sql (1 hunks)
go/pkg/clickhouse/schema/014_api_requests_per_hour_v2.sql (1 hunks)
go/pkg/clickhouse/schema/015_api_requests_per_day_v2.sql (1 hunks)
go/pkg/clickhouse/schema/016_api_requests_per_month_v2.sql (1 hunks)
go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql (1 hunks)
go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql (1 hunks)
go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.

Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Test API / API Test Local

🔇 Additional comments (5)

go/pkg/clickhouse/schema/012_api_requests_raw_v2.sql (2)

25-31: Dedup setting on non-replicated MergeTree requires tokens.

non_replicated_deduplication_window only dedups when inserts carry tokens. Verify your writers (and this MV) set insert_deduplication_token; otherwise this setting has no effect.

33-56: Migration MV: add explicit column order and note drop plan.

Column order matches now—good. Ensure an explicit DROP of this temp MV is included in the migration plan once writers switch to v2.

go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql (1)

12-23: Optional: add billable filter parity check.

If “billable” should only include passed ratelimits, sum(passed) is fine. Verify source schema so passed is a numeric (not a state) and covers all billable outcomes.

go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (1)

15-28: Answer Flo’s concern: is v1 “last used” still required?

You add v2; confirm whether any readers still depend on ratelimits_last_used_v1 and whether a temporary MV should backfill/keep it hot until cutover.

go/pkg/clickhouse/schema/013_api_requests_per_minute_v2.sql (1)

1-26: TTL and partition look sane for minute granularity.

7-day TTL with daily partitions is reasonable.

coderabbitai · 2025-09-11T07:45:46Z

go/pkg/clickhouse/migrations/20250911070454.sql

+-- Create "api_requests_raw_v2" table
+CREATE TABLE `default`.`api_requests_raw_v2` (
+  `request_id` String,
+  `time` Int64 CODEC(Delta(8), LZ4),
+  `workspace_id` String,
+  `host` String,
+  `method` LowCardinality(String),
+  `path` String,
+  `request_headers` Array(String),
+  `request_body` String,
+  `response_status` Int32,
+  `response_headers` Array(String),
+  `response_body` String,
+  `error` String,
+  `service_latency` Int64,
+  `user_agent` String,
+  `ip_address` String,
+  `region` LowCardinality(String),
+  INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1
+) ENGINE = MergeTree
+PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000;


🧹 Nitpick (assertive)

Consider stronger compression for large blobs.

request_body/response_body can be sizeable; consider ZSTD for better compression trade-offs.

-`time` Int64 CODEC(Delta(8), LZ4), +`time` Int64 CODEC(Delta(8), LZ4), ... -) ENGINE = MergeTree +) ENGINE = MergeTree SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000 +-- Consider: CODEC(ZSTD(6)) on request_body/response_body columns ;

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In go/pkg/clickhouse/migrations/20250911070454.sql around lines 63 to 83, the request_body and response_body columns are currently plain String which can be large; update their column definitions to use stronger compression (e.g., CODEC(ZSTD(3)) or ZSTD(6)) to reduce storage and IO. Concretely, add CODEC(ZSTD(<level>)) to request_body and response_body (and consider applying the same to request_headers/response_headers if they can be large), choosing an appropriate ZSTD level for the desired CPU/size trade-off.

coderabbitai · 2025-09-11T07:45:46Z

go/pkg/clickhouse/migrations/20250911070454.sql

+-- Create "ratelimits_last_used_v2" table
+CREATE TABLE `default`.`ratelimits_last_used_v2` (
+  `time` Int64,
+  `workspace_id` String,
+  `namespace_id` String,
+  `identifier` String
+) ENGINE = AggregatingMergeTree
+PRIMARY KEY (`workspace_id`, `namespace_id`, `identifier`) ORDER BY (`workspace_id`, `namespace_id`, `identifier`) SETTINGS index_granularity = 8192;
+-- Create "ratelimits_per_day_v2" table


⚠️ Potential issue

ratelimits_last_used_v2 type/state mismatch; MV can’t insert AggregateFunction into Int64.

The table defines time Int64 under AggregatingMergeTree, but the MV emits maxSimpleState(time) (AggregateFunction). Use SimpleAggregateFunction(max, Int64) and plain max(time) in the MV.

-CREATE TABLE `default`.`ratelimits_last_used_v2` ( - `time` Int64, +CREATE TABLE `default`.`ratelimits_last_used_v2` ( + `time` SimpleAggregateFunction(max, Int64), `workspace_id` String, `namespace_id` String, `identifier` String ) ENGINE = AggregatingMergeTree PRIMARY KEY (`workspace_id`, `namespace_id`, `identifier`) ORDER BY (`workspace_id`, `namespace_id`, `identifier`) SETTINGS index_granularity = 8192;

-CREATE MATERIALIZED VIEW `default`.`ratelimits_last_used_mv_v2` TO `default`.`ratelimits_last_used_v2` AS -SELECT workspace_id, namespace_id, identifier, maxSimpleState(time) AS time FROM ratelimits.raw_ratelimits_v1 GROUP BY workspace_id, namespace_id, identifier; +CREATE MATERIALIZED VIEW `default`.`ratelimits_last_used_mv_v2` TO `default`.`ratelimits_last_used_v2` AS +SELECT max(time) AS time, workspace_id, namespace_id, identifier FROM ratelimits.raw_ratelimits_v1 GROUP BY workspace_id, namespace_id, identifier;

Also applies to: 297-297

🤖 Prompt for AI Agents

In go/pkg/clickhouse/migrations/20250911070454.sql around lines 195-203 (also applies at line 297), the `time` column is declared as Int64 but the materialized view emits AggregateFunction state (maxSimpleState(time)), causing a type/state mismatch; change the `time` column type to SimpleAggregateFunction(max, Int64) in the `ratelimits_last_used_v2` table definition and update the MV to emit a plain aggregate result (use max(time) in the INSERT/select) instead of the state function so the MV inserts a compatible value into the SimpleAggregateFunction column.

coderabbitai · 2025-09-11T07:45:46Z

go/pkg/clickhouse/migrations/20250911070454.sql

+-- Create "active_workspaces_keys_per_month_mv_v2" view
+CREATE MATERIALIZED VIEW `default`.`active_workspaces_keys_per_month_mv_v2` TO `default`.`active_workspaces_per_month_v2` AS SELECT workspace_id, toDate(time) AS time FROM default.key_verifications_per_month_v2;
+-- Create "active_workspaces_ratelimits_per_month_mv_v2" view
+CREATE MATERIALIZED VIEW `default`.`active_workspaces_ratelimits_per_month_mv_v2` TO `default`.`active_workspaces_per_month_v2` AS SELECT workspace_id, toDate(time) AS time FROM default.ratelimits_per_month_v2;
+-- Create "api_requests_per_day_mv_v2" view


🧹 Nitpick (assertive)

ReplacingMergeTree without version: duplicates require FINAL/OPTIMIZE.

active_workspaces_* MVs may insert the same PK multiple times per month. ReplacingMergeTree without a version is fine, but query paths should avoid FINAL in hot paths. If dedup timeliness matters, add a version column.

🤖 Prompt for AI Agents

In go/pkg/clickhouse/migrations/20250911070454.sql around lines 272 to 276, the materialized views active_workspaces_keys_per_month_mv_v2 and active_workspaces_ratelimits_per_month_mv_v2 can insert duplicate primary keys into a ReplacingMergeTree without a version, which forces use of FINAL/OPTIMIZE to dedupe; add a version column to the target table and to these views' SELECTs (e.g. a bigint or datetime like now64() or toUnixTimestamp(time)) and change the target table to ReplacingMergeTree(version) so deduplication is deterministic and can happen during merges without relying on FINAL; alternatively, if adding a version is not acceptable, explicitly document that any hot-path queries must use FINAL or run periodic OPTIMIZE FINAL, but the preferred fix is adding and populating a version column in the MVs and altering the table definition to ReplacingMergeTree(version).

coderabbitai · 2025-09-11T07:45:46Z

go/pkg/clickhouse/migrations/20250911070454.sql

+CREATE MATERIALIZED VIEW `default`.`api_requests_per_day_mv_v2` TO `default`.`api_requests_per_day_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfDay(time)) AS time FROM default.api_requests_per_hour_v2 GROUP BY workspace_id, path, response_status, host, method, time;
+-- Create "api_requests_per_hour_mv_v2" view
+CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toStartOfHour(time) AS time FROM default.api_requests_per_minute_v2 GROUP BY workspace_id, path, response_status, host, method, time;
+-- Create "api_requests_per_minute_mv_v2" view
+CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` TO `default`.`api_requests_per_minute_v2` AS SELECT workspace_id, path, response_status, host, method, count(*) AS count, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.api_requests_raw_v2 GROUP BY workspace_id, path, response_status, host, method, time;
+-- Create "api_requests_per_month_mv_v2" view
+CREATE MATERIALIZED VIEW `default`.`api_requests_per_month_mv_v2` TO `default`.`api_requests_per_month_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfMonth(time)) AS time FROM default.api_requests_per_day_v2 GROUP BY workspace_id, path, response_status, host, method, time;
+-- Create "billable_ratelimits_per_month_mv_v2" view


⚠️ Potential issue

All API request MVs have column-order mismatches (time must be first).

Each TO … MV must emit columns in the exact order of the target table: time, workspace_id, path, response_status, host, method, count. Reorder SELECTs accordingly.

-CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS -SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toStartOfHour(time) AS time +CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS +SELECT toStartOfHour(time) AS time, workspace_id, path, response_status, host, method, sum(count) AS count FROM default.api_requests_per_minute_v2 -GROUP BY workspace_id, path, response_status, host, method, time; +GROUP BY time, workspace_id, path, response_status, host, method;

Apply the same reordering to:

api_requests_per_day_mv_v2 (use toDate(toStartOfDay(time)) AS time first)

api_requests_per_minute_mv_v2 (use toStartOfMinute(…) AS time first)

api_requests_per_month_mv_v2 (use toDate(toStartOfMonth(time)) AS time first)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

CREATE MATERIALIZED VIEW `default`.`api_requests_per_day_mv_v2` TO `default`.`api_requests_per_day_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfDay(time)) AS time FROM default.api_requests_per_hour_v2 GROUP BY workspace_id, path, response_status, host, method, time;

-- Create "api_requests_per_hour_mv_v2" view

CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toStartOfHour(time) AS time FROM default.api_requests_per_minute_v2 GROUP BY workspace_id, path, response_status, host, method, time;

-- Create "api_requests_per_minute_mv_v2" view

CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` TO `default`.`api_requests_per_minute_v2` AS SELECT workspace_id, path, response_status, host, method, count(*) AS count, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.api_requests_raw_v2 GROUP BY workspace_id, path, response_status, host, method, time;

-- Create "api_requests_per_month_mv_v2" view

CREATE MATERIALIZED VIEW `default`.`api_requests_per_month_mv_v2` TO `default`.`api_requests_per_month_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfMonth(time)) AS time FROM default.api_requests_per_day_v2 GROUP BY workspace_id, path, response_status, host, method, time;

-- Create "billable_ratelimits_per_month_mv_v2" view

CREATE MATERIALIZED VIEW `default`.`api_requests_per_day_mv_v2` TO `default`.`api_requests_per_day_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfDay(time)) AS time FROM default.api_requests_per_hour_v2 GROUP BY workspace_id, path, response_status, host, method, time;

-- Create "api_requests_per_hour_mv_v2" view

CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS SELECT toStartOfHour(time) AS time, workspace_id, path, response_status, host, method, sum(count) AS count FROM default.api_requests_per_minute_v2 GROUP BY time, workspace_id, path, response_status, host, method;

-- Create "api_requests_per_minute_mv_v2" view

CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` TO `default`.`api_requests_per_minute_v2` AS SELECT workspace_id, path, response_status, host, method, count(*) AS count, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.api_requests_raw_v2 GROUP BY workspace_id, path, response_status, host, method, time;

-- Create "api_requests_per_month_mv_v2" view

CREATE MATERIALIZED VIEW `default`.`api_requests_per_month_mv_v2` TO `default`.`api_requests_per_month_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfMonth(time)) AS time FROM default.api_requests_per_day_v2 GROUP BY workspace_id, path, response_status, host, method, time;

-- Create "billable_ratelimits_per_month_mv_v2" view

🤖 Prompt for AI Agents

In go/pkg/clickhouse/migrations/20250911070454.sql around lines 277 to 284, the materialized view SELECTs emit columns in the wrong order relative to their target tables; every TO target expects columns in the exact order: time, workspace_id, path, response_status, host, method, count. Fix by reordering each SELECT to project the time expression first (e.g., toDate(toStartOfDay(time)) AS time, toStartOfHour(time) AS time, toStartOfMinute(...) AS time, toDate(toStartOfMonth(time)) AS time), followed by workspace_id, path, response_status, host, method, and then count so the SELECT column order matches the target table exactly.

coderabbitai · 2025-09-11T07:45:46Z

go/pkg/clickhouse/migrations/20250911070454.sql

+CREATE MATERIALIZED VIEW `default`.`key_verifications_per_day_mv_v2` TO `default`.`key_verifications_per_day_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfDay(time)) AS time FROM default.key_verifications_per_hour_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
+-- Create "key_verifications_per_hour_mv_v2" view
+CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toStartOfHour(time) AS time FROM default.key_verifications_per_minute_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
+-- Create "key_verifications_per_minute_mv_v2" view
+CREATE MATERIALIZED VIEW `default`.`key_verifications_per_minute_mv_v2` TO `default`.`key_verifications_per_minute_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, count(*) AS count, sum(spent_credits) AS spent_credits, avgState(latency) AS latency_avg, quantilesTDigestState(0.75)(latency) AS latency_p75, quantilesTDigestState(0.99)(latency) AS latency_p99, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.key_verifications_raw_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
+-- Create "key_verifications_per_month_mv_v2" view
+CREATE MATERIALIZED VIEW `default`.`key_verifications_per_month_mv_v2` TO `default`.`key_verifications_per_month_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfMonth(time)) AS time FROM default.key_verifications_per_day_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
+-- Create "ratelimits_last_used_mv_v2" view


⚠️ Potential issue

Key verifications MVs: SELECT order must match table order.

Target tables start with time, then workspace_id, key_space_id, identity_id, key_id, outcome, tags, then aggregates. Reorder the SELECTs.

Example (hourly):

-CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS -SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toStartOfHour(time) AS time +CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS +SELECT toStartOfHour(time) AS time, workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99 FROM default.key_verifications_per_minute_v2 -GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags; +GROUP BY time, workspace_id, key_space_id, identity_id, key_id, outcome, tags;

Mirror this for minute/day/month MVs.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

CREATE MATERIALIZED VIEW `default`.`key_verifications_per_day_mv_v2` TO `default`.`key_verifications_per_day_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfDay(time)) AS time FROM default.key_verifications_per_hour_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;

-- Create "key_verifications_per_hour_mv_v2" view

CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toStartOfHour(time) AS time FROM default.key_verifications_per_minute_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;

-- Create "key_verifications_per_minute_mv_v2" view

CREATE MATERIALIZED VIEW `default`.`key_verifications_per_minute_mv_v2` TO `default`.`key_verifications_per_minute_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, count(*) AS count, sum(spent_credits) AS spent_credits, avgState(latency) AS latency_avg, quantilesTDigestState(0.75)(latency) AS latency_p75, quantilesTDigestState(0.99)(latency) AS latency_p99, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.key_verifications_raw_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;

-- Create "key_verifications_per_month_mv_v2" view

CREATE MATERIALIZED VIEW `default`.`key_verifications_per_month_mv_v2` TO `default`.`key_verifications_per_month_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfMonth(time)) AS time FROM default.key_verifications_per_day_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;

-- Create "ratelimits_last_used_mv_v2" view

CREATE MATERIALIZED VIEW `default`.`key_verifications_per_day_mv_v2` TO `default`.`key_verifications_per_day_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfDay(time)) AS time FROM default.key_verifications_per_hour_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;

-- Create "key_verifications_per_hour_mv_v2" view

CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS SELECT toStartOfHour(time) AS time, workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99 FROM default.key_verifications_per_minute_v2 GROUP BY time, workspace_id, key_space_id, identity_id, key_id, outcome, tags;

-- Create "key_verifications_per_minute_mv_v2" view

CREATE MATERIALIZED VIEW `default`.`key_verifications_per_minute_mv_v2` TO `default`.`key_verifications_per_minute_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, count(*) AS count, sum(spent_credits) AS spent_credits, avgState(latency) AS latency_avg, quantilesTDigestState(0.75)(latency) AS latency_p75, quantilesTDigestState(0.99)(latency) AS latency_p99, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.key_verifications_raw_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;

-- Create "key_verifications_per_month_mv_v2" view

CREATE MATERIALIZED VIEW `default`.`key_verifications_per_month_mv_v2` TO `default`.`key_verifications_per_month_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfMonth(time)) AS time FROM default.key_verifications_per_day_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;

-- Create "ratelimits_last_used_mv_v2" view

🤖 Prompt for AI Agents

In go/pkg/clickhouse/migrations/20250911070454.sql around lines 289-296, the SELECT column order in the materialized views does not match the target table column order (which expects time first, then workspace_id, key_space_id, identity_id, key_id, outcome, tags, then aggregates); reorder each SELECT so time is the first selected column followed by workspace_id, key_space_id, identity_id, key_id, outcome, tags, and then the aggregate expressions for count, spent_credits, latency_avg, latency_p75, latency_p99 for all four MVs (per_minute, per_hour, per_day, per_month), preserving the existing time expressions (toStartOfMinute(...), toStartOfHour(...), toDate(toStartOfDay(...)), toDate(toStartOfMonth(...))) and GROUP BY the same reordered columns.

coderabbitai · 2025-09-11T07:45:48Z

go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql

+CREATE TABLE billable_ratelimits_per_month_v2 (
+  year Int16,
+  month Int8,
+  workspace_id String,
+  count Int64
+) ENGINE = SummingMergeTree ()
+PARTITION BY
+  (year, month)
+ORDER BY
+  (workspace_id, year, month);
+


🛠️ Refactor suggestion

Make count type consistent with other v2 aggregates.

Align on SimpleAggregateFunction(sum, Int64) or stick to plain Int64 everywhere; avoid mixing.

- count Int64 + count SimpleAggregateFunction(sum, Int64)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql lines 1-11, the count column is currently Int64 but other v2 aggregate tables use SimpleAggregateFunction(sum, Int64); update the count column type to SimpleAggregateFunction(sum, Int64) so it matches the v2 aggregate convention (i.e., change "count Int64" to "count SimpleAggregateFunction(sum, Int64)"), preserving the rest of the DDL as-is.

coderabbitai · 2025-09-11T07:45:48Z

go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql

+  year Int16,
+  month Int8,
+  workspace_id String,
+  count Int64
+) ENGINE = SummingMergeTree ()


⚠️ Potential issue

Use unsigned year/month or cast (type mismatch).

Same as verifications: toYear/toMonth produce unsigned.

- year Int16, - month Int8, + year UInt16, + month UInt8,

or cast in MV.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

year Int16,

month Int8,

workspace_id String,

count Int64

) ENGINE = SummingMergeTree ()

year UInt16,

month UInt8,

workspace_id String,

count Int64

) ENGINE = SummingMergeTree ()

🤖 Prompt for AI Agents

In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql around lines 2 to 6, the year and month columns are Int16/Int8 but ClickHouse toYear/toMonth return UInt16/UInt8, causing a type mismatch; update the schema to use unsigned types (UInt16 for year, UInt8 for month) or explicitly cast the toYear/toMonth expressions to Int16/Int8 in the materialized view definition so the MV column types and source expressions match.

coderabbitai · 2025-09-11T07:45:48Z

go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql

+-- Billing Billable Verifications Per Month Table and Materialized View
+CREATE TABLE billable_verifications_per_month_v2 (
+  year Int16,
+  month Int8,
+  workspace_id String,
+  count Int64
+) ENGINE = SummingMergeTree ()
+PARTITION BY
+  (year, month)
+ORDER BY
+  (workspace_id, year, month);


🧹 Nitpick (assertive)

Prefer a single time Date column over separate year/month (TTL + uniformity).

Most v2 tables (e.g., api_requests_per_month_v2) bucket on a Date and gain clear TTLs. Consider switching to time Date with ORDER BY (workspace_id, time) for uniform querying and retention.

🤖 Prompt for AI Agents

In go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql around lines 1 to 11, the table currently uses separate year and month columns which prevents using Date TTLs and is inconsistent with other v2 monthly tables; change the schema to use a single Date column (e.g., time Date) instead of year/month, update PARTITION BY/ORDER BY to use time (PARTITION BY to use month(time) or to keep monthly partitions as appropriate, and ORDER BY (workspace_id, time)), and update any dependent materialized views/ingest queries to populate the new time Date column from the original timestamp or from year/month conversion so retention and querying align with other v2 tables.

coderabbitai · 2025-09-11T07:45:48Z

go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql

+CREATE TABLE billable_verifications_per_month_v2 (
+  year Int16,
+  month Int8,
+  workspace_id String,
+  count Int64
+) ENGINE = SummingMergeTree ()
+PARTITION BY
+  (year, month)
+ORDER BY
+  (workspace_id, year, month);


🛠️ Refactor suggestion

Align count type with other v2 tables for consistency.

Other v2 monthly tables use SimpleAggregateFunction(sum, Int64). Consider aligning to avoid mixed patterns.

- count Int64 + count SimpleAggregateFunction(sum, Int64)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql around lines 2 to 11, the count column is currently Int64 but other v2 monthly tables use SimpleAggregateFunction(sum, Int64); change the count column type to SimpleAggregateFunction(sum, Int64) so the schema is consistent, and keep the SummingMergeTree engine and partition/order clauses as-is to allow ClickHouse to aggregate sums correctly.

coderabbitai · 2025-09-11T07:45:48Z

go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql

+  year Int16,
+  month Int8,
+  workspace_id String,
+  count Int64


⚠️ Potential issue

Use unsigned year/month types or cast in MV (type mismatch).

toYear() returns UInt16 and toMonth() returns UInt8, but columns are Int16/Int8. Either switch columns to unsigned or CAST in the MV.

Apply one of:

- year Int16, - month Int8, + year UInt16, + month UInt8,

or

- toYear (time) AS year, - toMonth (time) AS month + CAST(toYear(time) AS Int16) AS year, + CAST(toMonth(time) AS Int8) AS month

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

year Int16,

month Int8,

workspace_id String,

count Int64

year UInt16,

month UInt8,

workspace_id String,

count Int64

🤖 Prompt for AI Agents

In go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql around lines 3 to 6, the MV defines year as Int16 and month as Int8 but toYear()/toMonth() produce UInt16/UInt8 causing a type mismatch; fix by either changing the column types to unsigned (UInt16 for year, UInt8 for month) or by casting the toYear()/toMonth() expressions in the MV to the signed types currently declared (i.e., add explicit CASTs), and ensure the chosen approach is applied consistently in the MV definition and any INSERT/SELECTs that populate these columns.

chronark added 2 commits September 3, 2025 11:49

wip

2b1c838

feat: WIP clean up clickhouse

b35512f

vercel bot deployed to Preview – engineering September 4, 2025 14:03 View deployment

vercel bot deployed to Preview – dashboard September 4, 2025 14:05 View deployment

chronark added 3 commits September 5, 2025 00:17

test: key_verifications_xxx_v2

42c1c7a

fix: bugs

c162855

fix: tests

a2bcdc8

chronark marked this pull request as ready for review September 5, 2025 11:59

chronark requested review from perkinsjr, imeyer, Flo4604, mcstepp, MichaelUnkey and ogzhanolguncu as code owners September 5, 2025 11:59

chronark changed the title ~~clickhouse schemas~~ featL: clickhouse schemas v2 Sep 5, 2025

chronark changed the title ~~featL: clickhouse schemas v2~~ feat: clickhouse schemas v2 Sep 5, 2025

chronark commented Sep 5, 2025

View reviewed changes

deployment/Dockerfile.clickhouse Show resolved Hide resolved

chronark commented Sep 5, 2025

View reviewed changes

go/pkg/clickhouse/migrations/20250903085516_init.sql Show resolved Hide resolved

coderabbitai bot reviewed Sep 5, 2025

View reviewed changes

chronark added 2 commits September 5, 2025 14:20

Apply suggestions from code review

2442ec3

Apply suggestions from code review

f7cea6e

perkinsjr reviewed Sep 5, 2025

View reviewed changes

tools/migrate/ch_migrate_v2_simple.ts Outdated Show resolved Hide resolved

Apply suggestions from code review

dba518b

merge

edd3765

vercel bot deployed to Preview – engineering September 9, 2025 11:53 View deployment

chronark requested review from perkinsjr and Flo4604 September 9, 2025 11:55

vercel bot deployed to Preview – dashboard September 9, 2025 11:56 View deployment

perkinsjr reviewed Sep 9, 2025

View reviewed changes

perkinsjr requested changes Sep 9, 2025

View reviewed changes

imeyer reviewed Sep 10, 2025

View reviewed changes

go/pkg/array/map.go Outdated Show resolved Hide resolved

imeyer reviewed Sep 10, 2025

View reviewed changes

go/pkg/array/random.go Outdated Show resolved Hide resolved

imeyer reviewed Sep 10, 2025

View reviewed changes

go/pkg/array/reduce.go Outdated Show resolved Hide resolved

imeyer approved these changes Sep 10, 2025

View reviewed changes

docs: tone down the verbosity

9c58c22

Flo4604 reviewed Sep 10, 2025

View reviewed changes

chronark and others added 4 commits September 11, 2025 08:57

fix: date type and missing table

bf1c68b

fix: schema

a574233

Merge branch 'main' of https://github.com/unkeyed/unkey into clickhou…

eace087

…se-schemas

[autofix.ci] apply automated fixes

52fe434

coderabbitai bot reviewed Sep 11, 2025

View reviewed changes

chronark added 2 commits September 11, 2025 09:12

Merge branch 'main' of https://github.com/unkeyed/unkey into clickhou…

dfda639

…se-schemas

Merge branch 'clickhouse-schemas' of https://github.com/unkeyed/unkey …

c59ab80

…into clickhouse-schemas

vercel bot deployed to Preview – engineering September 11, 2025 07:16 View deployment

vercel bot deployed to Preview – dashboard September 11, 2025 07:19 View deployment

coderabbitai bot reviewed Sep 11, 2025

View reviewed changes

chronark merged commit db5041b into main Sep 11, 2025
24 of 25 checks passed

chronark deleted the clickhouse-schemas branch September 11, 2025 09:20

feat: clickhouse schemas v2 #3914

feat: clickhouse schemas v2 #3914

Conversation

chronark commented Sep 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Pre-merge checks (2 passed, 1 warning)

Uh oh!

github-actions bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chronark commented Sep 5, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

graphite-app bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

perkinsjr Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

chronark Sep 10, 2025

Choose a reason for hiding this comment

chronark commented Sep 4, 2025 •

edited by coderabbitai bot

Loading

vercel bot commented Sep 4, 2025 •

edited

Loading

changeset-bot bot commented Sep 4, 2025 •

edited

Loading

coderabbitai bot commented Sep 4, 2025 •

edited

Loading

github-actions bot commented Sep 5, 2025 •

edited

Loading

graphite-app bot commented Sep 9, 2025 •

edited

Loading

Flo4604 Sep 10, 2025 •

edited

Loading