Skip to content

Conversation

chronark
Copy link
Collaborator

@chronark chronark commented Sep 4, 2025

This PR reorganises our clickhouse table schemas and adds credit as well as latency tracking.

In the past we had 2 ways of managing schemas which is never a good idea.
Going forward the way to make changes is to change the table definition in go/clickhouse/schema and then use atlas to generate a migration diff, which can be applied to clickhouse.

I also changed the materialization, previously we materialized all views from the raw table, now we’re cascading changes from raw > minute > hour > day > month cause it’s less CPU intensive, because it needs to process way less data.

I added some tests to ensure data propagates through the materialized views correctly, but please look at the table definitions carefully and try to spot what I missed.

I decided to put all of these tables in the default database, it was just a mess before and to clickhouse these are just namespaces, noting that would impact performance.

This does not change insertion or reads at all yet, it just creates new tables.
there are temporary MVs in place to automatically populate the new tables though and I am running a migration script to backfill all data.

In followup PRs we can switch over reads and writes to these new tables.

Summary by CodeRabbit

  • New Features

    • Introduces ClickHouse v2 analytics with per-minute/hour/day/month aggregations, latency percentiles, and spent-credits tracking for verifications, rate limits, and API requests.
    • Local ClickHouse image now auto-initializes schemas on first start; docker-compose builds it locally.
    • Adds a simple migration tool to backfill v1 data into v2.
  • Chores

    • Removed goose-based migration steps from CI and local workflows; migrator service removed.
  • Documentation

    • Expanded ClickHouse and Go package docs; editorial cleanup and guidance updates.

Copy link

vercel bot commented Sep 4, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
dashboard Ready Ready Preview Comment Sep 11, 2025 7:19am
engineering Ready Ready Preview Comment Sep 11, 2025 7:19am

Copy link

changeset-bot bot commented Sep 4, 2025

⚠️ No Changeset found

Latest commit: c59ab80

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

coderabbitai bot commented Sep 4, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Removes Goose-based migrations and legacy SQL under go/pkg/clickhouse/schema/databases/*. Introduces v2 ClickHouse schema/tables, materialized views, and a Dockerized initialization flow. Adds array utilities (Fill, Map, Reduce, Random) with tests. Updates CI/workflows, compose, Makefiles, and test harness. Adds migration tooling and comprehensive ClickHouse integration tests.

Changes

Cohort / File(s) Summary
CI workflow cleanup
.github/workflows/job_test_api_local.yaml, .github/workflows/job_test_unit.yaml
Removed steps installing/running Goose; retained other steps.
Makefiles
Makefile, go/Makefile
Removed ClickHouse Goose targets and migrator retries from up target.
Deployment: ClickHouse image and init
deployment/Dockerfile.clickhouse, deployment/init-clickhouse.sh, deployment/docker-compose.yaml
New CH Dockerfile copies SQL and init script; compose now builds CH locally; removed clickhouse_migrator service.
Internal ClickHouse docker/migrations removal
internal/clickhouse/Dockerfile, go/pkg/clickhouse/schema/embed.go
Deleted Goose migrator Dockerfile and embedded migrations FS.
ClickHouse schema: v2 foundation (consolidated)
go/pkg/clickhouse/migrations/20250911070454.sql, go/pkg/clickhouse/schema/000_legacy.sql, .../011_ratelimits_last_used_v2.sql, .../012_... to .../019_...
Added v2 raw and aggregated tables, MVs, and temp sync MVs; consolidated legacy v1 schema in 000_legacy.sql.
ClickHouse schema: new verifications v2 chain
go/pkg/clickhouse/schema/001_...v2.sql to 005_...v2.sql
Added key_verifications_raw_v2 and per-minute/hour/day/month v2 tables + MVs.
ClickHouse schema: new ratelimits v2 chain
go/pkg/clickhouse/schema/006_...v2.sql to 010_...v2.sql
Added ratelimits_raw_v2 and per-minute/hour/day/month v2 tables + MVs.
Legacy per-db schema removals
go/pkg/clickhouse/schema/databases/*
Removed prior databases/* DDL and MVs for verifications, ratelimits, metrics, billing, business.
ClickHouse schema types
go/pkg/clickhouse/schema/types.go
Added Go structs for v2 raw and aggregated records (with ch/json tags).
ClickHouse README
go/pkg/clickhouse/README.md
Added Atlas-based migration instructions.
ClickHouse tests (Go)
go/pkg/clickhouse/key_verifications_test.go, go/pkg/clickhouse/ratelimits_test.go, go/pkg/clickhouse/testutil_test.go
Added large-scale integration tests and statistical helpers validating raw → aggregates correctness.
ClickHouse tests (TS)
internal/clickhouse/src/testutil.ts, internal/clickhouse/package.json
Switched test container image, added wait strategy/delay, removed Goose step; tests build CH image before running.
Local tooling updates
tools/local/src/main.ts
Removed clickhouse_migrator from startup lists.
Migration tooling
tools/migrate/ch_migrate_v2_simple.ts, tools/migrate/package.json
Added chunked v1→v2 backfill script using @clickhouse/client-web; added dependency.
Go array utilities
go/pkg/array/doc.go, go/pkg/array/fill.go, go/pkg/array/fill_test.go, go/pkg/array/map.go, go/pkg/array/map_test.go, go/pkg/array/reduce.go, go/pkg/array/reduce_test.go, go/pkg/array/random.go, go/pkg/array/random_test.go
New generic functions Fill, Map, Reduce, Random with extensive tests/benchmarks and docs.
Docs
go/GO_DOCUMENTATION_GUIDELINES.md
Editorial/doc guideline updates and cleanup.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App as Producer Services
  participant CH as ClickHouse
  participant Raw as raw_v2 tables
  participant MVmin as per_minute_v2 MV
  participant MVhr as per_hour_v2 MV
  participant MVday as per_day_v2 MV
  participant MVmon as per_month_v2 MV
  participant Biz as active_workspaces_v2 / billable_*_v2

  App->>CH: INSERT events (api_requests_raw_v2 / key_verifications_raw_v2 / ratelimits_raw_v2)
  CH-->>Raw: Store raw rows
  Note over Raw, MVmin: New v2 MVs
  Raw-->>MVmin: Aggregate per minute (states)
  MVmin-->>MVhr: Aggregate per hour (merge states)
  MVhr-->>MVday: Aggregate per day (merge)
  MVday-->>MVmon: Aggregate per month (merge)
  MVmon-->>Biz: Populate active_workspaces_v2 and billable_*_v2 via MVs
Loading
sequenceDiagram
  autonumber
  participant Src as v1 sources (raw_*_v1)
  participant TempMV as temp_sync_*_v1_to_v2 (MVs)
  participant CH as ClickHouse
  participant Migr as ch_migrate_v2_simple.ts

  Src-->>TempMV: New inserts mirrored to v2 raw
  Migr->>CH: Loop over 6h windows (backfill)
  Migr->>CH: INSERT INTO *_raw_v2 SELECT ... FROM *_raw_v1 WHERE time in window
  CH-->>CH: MVs process to per-minute/hour/day/month v2
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The current description provides a useful narrative about reorganizing ClickHouse schemas, migration/backfill intent, and materialization changes, but it does not adhere to the repository's required PR template. It is missing a linked issue, the filled "Type of change" checklist, a concrete "How should this be tested?" section with exact commands and expected results, and the required checklist items (e.g., build, fmt, tests) are not marked. Because the template explicitly requires those details to evaluate and test schema/migration impacts, the description is incomplete for merge readiness. Test and migration notes are mentioned qualitatively but lack the reproducible verification steps and CI/test evidence the template expects. Please update the PR to follow the provided template: add or link the issue number, select the appropriate "Type of change" checkbox, and complete the "How should this be tested?" section with step-by-step commands (e.g., build commands, docker-compose/migration/backfill invocation, env vars, and exact verification queries or expected CI outcomes). Tick the required checklist items or explain why any cannot be completed, and include notes on migration impact and rollback steps (which tables/MVs are new/removed and compatibility concerns). Finally, attach or reference relevant test outputs or CI runs that demonstrate the materialized view tests and the backfill succeeded.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "feat: clickhouse schemas v2" succinctly identifies the primary change — introduction of ClickHouse schema v2 — and follows conventional commit style with the "feat:" prefix. It is concise, avoids noisy file lists or emojis, and accurately reflects the main intent of the changeset. A reviewer scanning PR history will understand the high-level purpose from the title alone.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch clickhouse-schemas

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chronark chronark marked this pull request as ready for review September 5, 2025 11:59
@chronark chronark changed the title clickhouse schemas featL: clickhouse schemas v2 Sep 5, 2025
@chronark chronark changed the title featL: clickhouse schemas v2 feat: clickhouse schemas v2 Sep 5, 2025
Copy link
Contributor

github-actions bot commented Sep 5, 2025

Thank you for following the naming conventions for pull request titles! 🙏

@chronark
Copy link
Collaborator Author

chronark commented Sep 5, 2025

@perkinsjr @Flo4604 pls :)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 98

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
deployment/init-clickhouse.sh (1)

1-19: Harden init-clickhouse.sh

  • Enable strict mode (set -euo pipefail) and shopt -s nullglob to avoid glob edge cases
  • Wait for ClickHouse readiness (until SELECT 1 succeeds) before applying schemas
  • Gather and version-sort .sql files deterministically via find … -sort -zV
  • Use ${CLICKHOUSE_ADMIN_USER:-default} and ${CLICKHOUSE_ADMIN_PASSWORD:-} to avoid leaking unset envs
go/pkg/clickhouse/schema/000_oldschema.sql (1)

720-823: Duplication/conflict risk between schema bootstrap and migrations

This file recreates DBs/tables also defined in migrations (e.g., billing/verifications/ratelimits). Ensure only one path runs in CI/infra to avoid DDL conflicts.

Would you like a follow-up PR to remove the legacy bootstrap or gate it behind an environment flag?

Also applies to: 1-2

tools/migrate/package.json (1)

10-21: Runtime dependency should not be a devDependency

The migration script imports the ClickHouse client at runtime via tsx. Move it to dependencies to avoid failures in prod installs with NODE_ENV=production.

   "dependencies": {
+    "@clickhouse/client": "^1.4.0",
@@
-  "devDependencies": {
-    "@clickhouse/client-web": "^1.11.1",
+  "devDependencies": {

If you indeed need the web client, put that under dependencies instead and remove the Node client.

Also applies to: 22-29

deployment/docker-compose.yaml (2)

126-142: Healthcheck only proves server liveness, not schema readiness.

Consider validating a known table or schema version to avoid race with tests depending on v2 objects.

Example:

-      test:
-        [
-          "CMD",
-          "clickhouse-client",
-          "--host",
-          "localhost",
-          "--user",
-          "default",
-          "--password",
-          "password",
-          "--query",
-          "SELECT 1",
-        ]
+      test: ["CMD-SHELL", "clickhouse-client --user default --password password --query \"SELECT count() FROM system.tables WHERE database = 'analytics' AND name = 'api_requests_v2'\" | grep -q '1'"]

146-155: MinIO console port mapping mismatch.

MINIO_CONSOLE_PORT_NUMBER=3903 but ports map 2903. Console likely unreachable.

-      - 2903:2903
+      - 3903:3903
go/pkg/clickhouse/ratelimits_test.go (1)

1-316: Define missing min/max helpers: The test calls min and max but no such functions are defined in the repo—add implementations (e.g. func min(a, b int) int { … }, func max(a, b int) int { … }) or import them from a utility package.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f4183e2 and a2bcdc8.

⛔ Files ignored due to path filters (3)
  • go/go.sum is excluded by !**/*.sum
  • go/pkg/clickhouse/migrations/atlas.sum is excluded by !**/*.sum
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (104)
  • .github/workflows/job_test_api_local.yaml (0 hunks)
  • .github/workflows/job_test_unit.yaml (0 hunks)
  • Makefile (0 hunks)
  • deployment/Dockerfile.clickhouse (1 hunks)
  • deployment/docker-compose.yaml (2 hunks)
  • deployment/init-clickhouse.sh (1 hunks)
  • go/GO_DOCUMENTATION_GUIDELINES.md (11 hunks)
  • go/Makefile (1 hunks)
  • go/go.mod (1 hunks)
  • go/pkg/array/doc.go (1 hunks)
  • go/pkg/array/fill.go (1 hunks)
  • go/pkg/array/fill_test.go (1 hunks)
  • go/pkg/array/map.go (1 hunks)
  • go/pkg/array/map_test.go (1 hunks)
  • go/pkg/array/random.go (1 hunks)
  • go/pkg/array/random_test.go (1 hunks)
  • go/pkg/array/reduce.go (1 hunks)
  • go/pkg/array/reduce_test.go (1 hunks)
  • go/pkg/clickhouse/README.md (1 hunks)
  • go/pkg/clickhouse/key_verifications_test.go (1 hunks)
  • go/pkg/clickhouse/migrations/20250903085516_init.sql (1 hunks)
  • go/pkg/clickhouse/migrations/20250905101812.sql (1 hunks)
  • go/pkg/clickhouse/ratelimits_test.go (1 hunks)
  • go/pkg/clickhouse/schema/000_oldschema.sql (1 hunks)
  • go/pkg/clickhouse/schema/001_key_verifications_raw_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/002_key_verifications_per_minute_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/003_key_verifications_per_hour_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/004_key_verifications_per_day_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/005_key_verifications_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/006_ratelimits_raw_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/007_ratelimits_per_minute_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/008_ratelimits_per_hour_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/009_ratelimits_per_day_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/010_ratelimits_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/011_api_requests_raw_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/012_api_requests_per_minute_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/013_api_requests_per_hour_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/014_api_requests_per_day_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/015_active_workspaces_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/016_billable_ratelimits_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/017_billable_verifications_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/000_README.md (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/001_database.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/003_key_verifications_per_hour_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/004_key_verifications_per_hour_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/005_key_verifications_per_day_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/006_key_verifications_per_day_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/007_key_verifications_per_month_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/008_key_verifications_per_month_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/009_key_verifications_per_hour_v2.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/010_key_verifications_per_hour_mv_v2.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/011_key_verifications_per_day_v2.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/012_key_verifications_per_day_mv_v2.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/013_key_verifications_per_month_v2.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/014_key_verifications_per_month_mv_v2.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/015_key_verifications_per_hour_v3.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/016_key_verifications_per_hour_mv_v3.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/017_key_verifications_per_day_v3.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/018_key_verifications_per_day_mv_v3.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/019_key_verifications_per_month_v3.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/020_key_verifications_per_month_mv_v3.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/021_key_verifications_per_minute_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/022_key_verifications_per_minute_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/000_database.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/001_raw_ratelimits_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/002_ratelimits_per_minute_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/003_ratelimits_per_minute_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/004_ratelimits_per_hour_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/005_ratelimits_per_hour_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/007_ratelimits_per_day_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/008_ratelimits_per_month_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/009_ratelimits_per_month_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/010_ratelimits_last_used_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/011_ratelimits_last_used_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/001_database.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/002_raw_api_requests_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/003_api_requests_per_hour_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/004_api_requests_per_hour_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/005_api_requests_per_minute_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/006_api_requests_per_minute_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/007_api_requests_per_day_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/008_api_requests_per_day_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/004_billing/001_database.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/004_billing/002_billable_verifications_per_month_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/004_billing/003_billable_verifications_per_month_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/004_billing/004_billable_verifications_v2.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/004_billing/005_billalble_verifications_v2_mv.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1_mv.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/005_business/001_database.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/005_business/002_active_workspaces_per_month_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_keys_per_month_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_ratelimits_per_month_mv_v1.sql (0 hunks)
  • go/pkg/clickhouse/schema/embed.go (0 hunks)
  • go/pkg/clickhouse/schema/types.go (1 hunks)
  • go/pkg/clickhouse/testutil_test.go (1 hunks)
  • internal/clickhouse/Dockerfile (0 hunks)
  • internal/clickhouse/package.json (1 hunks)
  • internal/clickhouse/src/testutil.ts (2 hunks)
  • tools/local/src/main.ts (1 hunks)
  • tools/migrate/ch_migrate_v2_simple.ts (1 hunks)
  • tools/migrate/package.json (1 hunks)
💤 Files with no reviewable changes (59)
  • .github/workflows/job_test_unit.yaml
  • go/pkg/clickhouse/schema/databases/002_ratelimits/004_ratelimits_per_hour_v1.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/015_key_verifications_per_hour_v3.sql
  • go/pkg/clickhouse/schema/databases/002_ratelimits/001_raw_ratelimits_v1.sql
  • go/pkg/clickhouse/schema/databases/005_business/002_active_workspaces_per_month_v1.sql
  • go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_keys_per_month_mv_v1.sql
  • go/pkg/clickhouse/schema/databases/004_billing/004_billable_verifications_v2.sql
  • go/pkg/clickhouse/schema/embed.go
  • go/pkg/clickhouse/schema/databases/002_ratelimits/003_ratelimits_per_minute_mv_v1.sql
  • go/pkg/clickhouse/schema/databases/003_metrics/001_database.sql
  • go/pkg/clickhouse/schema/databases/003_metrics/006_api_requests_per_minute_mv_v1.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/005_key_verifications_per_day_v1.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/019_key_verifications_per_month_v3.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/007_key_verifications_per_month_v1.sql
  • .github/workflows/job_test_api_local.yaml
  • internal/clickhouse/Dockerfile
  • go/pkg/clickhouse/schema/databases/001_verifications/011_key_verifications_per_day_v2.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/022_key_verifications_per_minute_mv_v1.sql
  • go/pkg/clickhouse/schema/databases/004_billing/001_database.sql
  • go/pkg/clickhouse/schema/databases/003_metrics/004_api_requests_per_hour_mv_v1.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/017_key_verifications_per_day_v3.sql
  • go/pkg/clickhouse/schema/databases/002_ratelimits/008_ratelimits_per_month_v1.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/020_key_verifications_per_month_mv_v3.sql
  • go/pkg/clickhouse/schema/databases/003_metrics/008_api_requests_per_day_mv_v1.sql
  • go/pkg/clickhouse/schema/databases/000_README.md
  • go/pkg/clickhouse/schema/databases/001_verifications/003_key_verifications_per_hour_v1.sql
  • go/pkg/clickhouse/schema/databases/004_billing/002_billable_verifications_per_month_v1.sql
  • go/pkg/clickhouse/schema/databases/003_metrics/003_api_requests_per_hour_v1.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql
  • go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1.sql
  • go/pkg/clickhouse/schema/databases/002_ratelimits/000_database.sql
  • go/pkg/clickhouse/schema/databases/004_billing/005_billalble_verifications_v2_mv.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/006_key_verifications_per_day_mv_v1.sql
  • Makefile
  • go/pkg/clickhouse/schema/databases/001_verifications/012_key_verifications_per_day_mv_v2.sql
  • go/pkg/clickhouse/schema/databases/005_business/001_database.sql
  • go/pkg/clickhouse/schema/databases/003_metrics/002_raw_api_requests_v1.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/001_database.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/021_key_verifications_per_minute_v1.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/009_key_verifications_per_hour_v2.sql
  • go/pkg/clickhouse/schema/databases/002_ratelimits/002_ratelimits_per_minute_v1.sql
  • go/pkg/clickhouse/schema/databases/002_ratelimits/010_ratelimits_last_used_v1.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/018_key_verifications_per_day_mv_v3.sql
  • go/pkg/clickhouse/schema/databases/002_ratelimits/007_ratelimits_per_day_mv_v1.sql
  • go/pkg/clickhouse/schema/databases/002_ratelimits/011_ratelimits_last_used_mv_v1.sql
  • go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_ratelimits_per_month_mv_v1.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/016_key_verifications_per_hour_mv_v3.sql
  • go/pkg/clickhouse/schema/databases/003_metrics/005_api_requests_per_minute_v1.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/013_key_verifications_per_month_v2.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/014_key_verifications_per_month_mv_v2.sql
  • go/pkg/clickhouse/schema/databases/002_ratelimits/009_ratelimits_per_month_mv_v1.sql
  • go/pkg/clickhouse/schema/databases/003_metrics/007_api_requests_per_day_v1.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/008_key_verifications_per_month_mv_v1.sql
  • go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/010_key_verifications_per_hour_mv_v2.sql
  • go/pkg/clickhouse/schema/databases/001_verifications/004_key_verifications_per_hour_mv_v1.sql
  • go/pkg/clickhouse/schema/databases/004_billing/003_billable_verifications_per_month_mv_v1.sql
  • go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1_mv.sql
  • go/pkg/clickhouse/schema/databases/002_ratelimits/005_ratelimits_per_hour_mv_v1.sql
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
📚 Learning: 2025-04-22T14:40:51.459Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.

Applied to files:

  • go/pkg/clickhouse/README.md
  • go/pkg/clickhouse/schema/006_ratelimits_raw_v2.sql
  • go/pkg/clickhouse/schema/001_key_verifications_raw_v2.sql
  • go/pkg/clickhouse/migrations/20250903085516_init.sql
  • go/pkg/clickhouse/schema/003_key_verifications_per_hour_v2.sql
  • go/pkg/clickhouse/schema/000_oldschema.sql
  • go/pkg/clickhouse/migrations/20250905101812.sql
📚 Learning: 2025-04-22T14:43:11.724Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.

Applied to files:

  • go/pkg/clickhouse/README.md
  • go/pkg/clickhouse/schema/001_key_verifications_raw_v2.sql
🧬 Code graph analysis (7)
go/pkg/clickhouse/key_verifications_test.go (6)
go/pkg/uid/uid.go (5)
  • WorkspacePrefix (20-20)
  • KeyPrefix (16-16)
  • KeyAuthPrefix (21-21)
  • IdentityPrefix (28-28)
  • RequestPrefix (19-19)
go/pkg/array/fill.go (1)
  • Fill (106-116)
go/pkg/clickhouse/schema/types.go (1)
  • KeyVerificationV2 (6-18)
go/pkg/array/random.go (1)
  • Random (137-142)
go/pkg/array/reduce.go (1)
  • Reduce (193-199)
go/pkg/array/map.go (1)
  • Map (151-157)
tools/local/src/main.ts (2)
tools/local/src/db.ts (1)
  • prepareDatabase (13-82)
tools/local/src/cmd/api.ts (1)
  • bootstrapApi (9-52)
go/pkg/array/fill_test.go (2)
go/pkg/array/fill.go (1)
  • Fill (106-116)
go/pkg/array/random.go (1)
  • Random (137-142)
go/pkg/array/reduce_test.go (2)
go/pkg/array/reduce.go (1)
  • Reduce (193-199)
go/pkg/array/map.go (1)
  • Map (151-157)
go/pkg/array/map_test.go (1)
go/pkg/array/map.go (1)
  • Map (151-157)
go/pkg/array/random_test.go (2)
go/pkg/array/random.go (1)
  • Random (137-142)
go/pkg/array/fill.go (1)
  • Fill (106-116)
go/pkg/clickhouse/ratelimits_test.go (6)
go/pkg/uid/uid.go (5)
  • WorkspacePrefix (20-20)
  • RatelimitNamespacePrefix (25-25)
  • IdentityPrefix (28-28)
  • KeyPrefix (16-16)
  • RequestPrefix (19-19)
go/pkg/array/fill.go (1)
  • Fill (106-116)
go/pkg/array/random.go (1)
  • Random (137-142)
go/pkg/clickhouse/schema/types.go (1)
  • RatelimitV2 (23-31)
go/pkg/array/reduce.go (1)
  • Reduce (193-199)
go/pkg/array/map.go (1)
  • Map (151-157)
🪛 golangci-lint (2.2.2)
go/pkg/clickhouse/key_verifications_test.go

19-19: TestKeyVerifications should use t.Cleanup instead of defer

(tparallel)

go/pkg/array/fill_test.go

216-216: SA4010: this result of append is never used, except maybe in other appends

(staticcheck)

go/pkg/array/reduce_test.go

24-24: variable max has same name as predeclared identifier

(predeclared)


36-36: variable min has same name as predeclared identifier

(predeclared)


347-347: variable max has same name as predeclared identifier

(predeclared)

go/pkg/array/map_test.go

339-339: SA4010: this result of append is never used, except maybe in other appends

(staticcheck)

go/pkg/array/random.go

[high] 141-141: G404: Use of weak random number generator (math/rand or math/rand/v2 instead of crypto/rand)

(gosec)

go/pkg/clickhouse/ratelimits_test.go

18-18: TestRatelimits_ComprehensiveLoadTest should use t.Cleanup instead of defer

(tparallel)

🪛 LanguageTool
go/pkg/clickhouse/README.md

[grammar] ~1-~1: Use correct spacing
Context: # Changing schemas To change a schema, just edit a table in...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~3-~3: There might be a mistake here.
Context: ...just edit a table in the schema folder. Afterwards generate a migration file using atlas. ...

(QB_NEW_EN_OTHER)


[grammar] ~4-~4: There might be a mistake here.
Context: ...s generate a migration file using atlas. Make sure docker is running, so atlas ca...

(QB_NEW_EN)


[grammar] ~5-~5: Use commas correctly
Context: ...using atlas. Make sure docker is running, so atlas can use it to generate migrati...

(QB_NEW_EN_OTHER_ERROR_IDS_33)


[grammar] ~5-~5: Use correct spacing
Context: ...atlas can use it to generate migrations. bash cd go/pkg/clickhouse atlas migrate diff \ --dir "file://migrations" \ --to "file://schema" \ --dev-url "docker://clickhouse/latest"

(QB_NEW_EN_OTHER_ERROR_IDS_5)

go/GO_DOCUMENTATION_GUIDELINES.md

[grammar] ~393-~393: Use correct spacing
Context: ... - [TypeName] for structs/interfaces - [ConstantName] for constants ## Best Practices and Ant...

(QB_NEW_EN_OTHER_ERROR_IDS_5)

🪛 markdownlint-cli2 (0.17.2)
go/pkg/clickhouse/README.md

10-10: Hard tabs
Column: 1

(MD010, no-hard-tabs)


11-11: Hard tabs
Column: 1

(MD010, no-hard-tabs)


12-12: Hard tabs
Column: 1

(MD010, no-hard-tabs)

🪛 Checkov (3.2.334)
deployment/Dockerfile.clickhouse

[LOW] 1-8: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)


[LOW] 1-8: Ensure that a user for the container has been created

(CKV_DOCKER_3)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Test Go API Local / Test
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Build / Build
  • GitHub Check: Test Packages / Test
🔇 Additional comments (15)
go/pkg/clickhouse/testutil_test.go (1)

1-6: Test-only helpers are scoped correctly

Using package clickhouse_test and keeping helpers unexported is appropriate.

tools/local/src/main.ts (1)

51-55: Verify ClickHouse readiness in startContainers
I didn’t find any health-check or wait for ClickHouse startup inside startContainers; please confirm it blocks until CH’s HTTP port accepts queries. If not, add an explicit readiness step (e.g. run clickhouse-client --query "SELECT 1") after each await startContainers([...]).

go/pkg/array/reduce.go (2)

193-199: LGTM — minimal, correct implementation.

Performs a left fold with O(n)/O(1) as advertised.


137-150: Fix non-compiling example in docs.

The “Maximum” example references an undefined max function and may misuse math.MinInt depending on Go version. Suggest an inlined comparator.

-//	max := array.Reduce(numbers, func(a, b int) int { max(a, b) }, math.MinInt) // Maximum
+//	max := array.Reduce(numbers, func(a, b int) int {
+//	    if a > b { return a }
+//	    return b
+//	}, math.MinInt)
⛔ Skipped due to learnings
Learnt from: Flo4604
PR: unkeyed/unkey#3800
File: go/apps/api/integration/multi_node_usagelimiting/accuracy/accuracy_test.go:231-240
Timestamp: 2025-08-19T08:53:43.952Z
Learning: Go 1.21+ includes builtin min() and max() functions for ordered types including integers, so min/max calls in Go code should not be flagged as undefined when the project uses Go 1.21 or later.
go/pkg/array/fill.go (1)

111-115: Compile error: cannot range over int.

for i := range length doesn’t compile. Use a counted loop.

-	for i := range length {
-		slice[i] = generator()
-	}
+	for i := 0; i < length; i++ {
+		slice[i] = generator()
+	}
⛔ Skipped due to learnings
Learnt from: Flo4604
PR: unkeyed/unkey#3800
File: go/apps/api/integration/multi_node_usagelimiting/run.go:107-126
Timestamp: 2025-08-19T08:57:31.793Z
Learning: Go 1.22+ supports "range over integers" syntax: `for range N` iterates N times, and `for i := range N` iterates with i from 0 to N-1. This is valid Go syntax and should not be flagged as a compilation error.
go/pkg/clickhouse/schema/016_billable_ratelimits_per_month_v2.sql (1)

12-23: Invalid suggestion: ratelimits_per_month_v2 only defines a time (DateTime) column, not separate year or month fields, so using toYear(time)/toMonth(time) is correct; the passed column also exists.

Likely an incorrect or invalid review comment.

go/pkg/array/map_test.go (1)

11-200: Solid coverage and properties.

Good breadth: types, immutability, per-element invocation, order, large slices.

go/pkg/array/random_test.go (2)

31-48: Flake note acknowledged; OK as-is.

Probability of all-equal draws is negligible; acceptable for unit scope.


119-121: Use the standard b.N loop; testing.B has no Loop().

This won’t compile; replace b.Loop() with a for over b.N.

-    for b.Loop() {
+    for i := 0; i < b.N; i++ {
       Random(slice)
     }

Also applies to: 131-133

⛔ Skipped due to learnings
Learnt from: imeyer
PR: unkeyed/unkey#3733
File: go/pkg/repeat/every_test.go:383-383
Timestamp: 2025-08-05T15:19:50.563Z
Learning: Go 1.24 introduced testing.B.Loop method which simplifies benchmark functions by handling iteration internally, ensuring setup/cleanup runs only once per benchmark count, and preventing compiler optimizations that could affect benchmark accuracy.
Learnt from: Flo4604
PR: unkeyed/unkey#3800
File: go/apps/api/integration/multi_node_usagelimiting/run.go:107-126
Timestamp: 2025-08-19T08:57:31.793Z
Learning: Go 1.22+ supports "range over integers" syntax: `for range N` iterates N times, and `for i := range N` iterates with i from 0 to N-1. This is valid Go syntax and should not be flagged as a compilation error.
go/pkg/array/fill_test.go (1)

1-114: Confirm Fill implementation loop.

Relevant snippet shows for i := range length, which is invalid over an int. Ensure Fill uses for i := 0; i < length; i++ { ... }.

go/pkg/array/reduce_test.go (1)

272-282: No missing intPtr helper
The helper intPtr(int) *int is defined in map_test.go within the same package array, so reduce_test.go compiles as is.

go/pkg/clickhouse/schema/types.go (2)

55-66: Confirm Time type matches table column type.

Aggregated tables typically use DateTime/DateTime64, but here Time is int64. If you select time directly from CH, scanning into int64 will fail unless you cast (toUnixTimestamp) in queries. Either switch to time.Time or standardize on selecting epoch.

Would you like me to change these fields to time.Time and update queries accordingly?


81-89: Same Time type concern for API requests aggregates.

Align with the chosen convention (either time.Time fields or always casting in SQL).

go/pkg/clickhouse/ratelimits_test.go (1)

41-43: Undefined min/max will not compile unless provided elsewhere.

Either import a helper or implement local minInt/maxInt.

-	numNamespaces := max(1, min(numRecords/1000, 100))   // 100-1000 namespaces
-	numIdentifiers := max(1, min(numRecords/100, 10000)) // 1000-10000 identifiers
+	numNamespaces := maxInt(1, minInt(numRecords/1000, 100))   // 1-100
+	numIdentifiers := maxInt(1, minInt(numRecords/100, 10000)) // 1-10000

Add helpers:

func minInt(a, b int) int { if a < b { return a }; return b }
func maxInt(a, b int) int { if a > b { return a }; return b }
⛔ Skipped due to learnings
Learnt from: Flo4604
PR: unkeyed/unkey#3800
File: go/apps/api/integration/multi_node_usagelimiting/accuracy/accuracy_test.go:231-240
Timestamp: 2025-08-19T08:53:43.952Z
Learning: Go 1.21+ includes builtin min() and max() functions for ordered types including integers, so min/max calls in Go code should not be flagged as undefined when the project uses Go 1.21 or later.
go/pkg/clickhouse/migrations/20250905101812.sql (1)

281-285: Temp sync MVs depend on v1 sources; creation will fail if sources are absent; also no backfill.

Ensure verifications.raw_key_verifications_v1, metrics.raw_api_requests_v1, ratelimits.raw_ratelimits_v1 exist before creating these MVs, or gate creation. Without POPULATE/backfill, historical data won’t migrate.

Proposed approach:

  • Create databases/tables if needed, or wrap in deployment steps that guarantee presence.
  • Backfill with one-off INSERT INTO ... SELECT ... for historical windows, then enable MVs for tail data.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review continued from previous batch...

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review continued from previous batch...

Copy link

graphite-app bot commented Sep 9, 2025

Merge activity

  • Sep 9, 6:06 AM UTC: This pull request can not be added to the Graphite merge queue. Please try rebasing and resubmitting to merge when ready.
  • Sep 9, 6:06 AM UTC: Graphite disabled "merge when ready" on this PR due to: a merge conflict with the target branch; resolve the conflict and try again..

ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(fromUnixTimestamp64Milli(time))
ORDER BY (workspace_id, time, key_space_id, outcome)
TTL toDateTime(fromUnixTimestamp64Milli(time)) + INTERVAL 1 MONTH DELETE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be INTERVAL 3 MONTH DELETE

Our retention is 90 days, Unless I am dumb. Which is very possible and I don't understand this as much as I thought.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CleanShot 2025-09-10 at 08.55.57@2x.png

according to our pricing page its 30

Copy link
Collaborator

@perkinsjr perkinsjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a single comment about a time frame.

Outside of that it look good

Copy link
Contributor

@imeyer imeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too-long-tried-to-read but i trust you know what you're doing here 😄

@imeyer
Copy link
Contributor

imeyer commented Sep 10, 2025

Mmm @perkinsjr's comment will need addressing.. but I still approve.

avgMergeState(latency_avg) as latency_avg,
quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75,
quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99,
toStartOfDay(time) AS time
Copy link
Contributor

@Flo4604 Flo4604 Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we be better of using toDate(fromUnixTimestamp64Milli(time)) here since the minutes are not relevant to us at all? Applies to all the per_day schemas

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time here is a DateTime, not an int64

why would this be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can switch it to Date then which uses 2bytes less its a micro optimization but still since its per day we don't need to store 00:00:00 for each one

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let me give it a shot, I think we'd need to do toDate(toStartOfDay(time)) as time

@Flo4604
Copy link
Contributor

Flo4604 commented Sep 10, 2025

The only thing i really noticed is that we are missing ratelimits_last_used_v1 in the new schema, do we not need this mv anymore?

Copy link

graphite-app bot commented Sep 11, 2025

Graphite Automations

"Notify author when CI fails" took an action on this PR • (09/11/25)

1 teammate was notified to this PR based on Andreas Thomas's automation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

♻️ Duplicate comments (18)
go/pkg/array/random.go (1)

3-5: Doc/behavior mismatch + gosec G404: seed global PRNG and justify non-crypto use.
Still unseeded and will trip G404 in many pipelines. Seed once and add scoped nolints.

 import (
-	"math/rand"
+	"math/rand"
+	"time"
 )
 
+func init() {
+	// Seed global PRNG once for non-cryptographic randomness.
+	//nolint:gosec // G404: math/rand is intentional for non-crypto selection.
+	rand.Seed(time.Now().UnixNano())
+}
@@
-	return slice[rand.Intn(len(slice))]
+	return slice[rand.Intn(len(slice))] //nolint:gosec // G404: intentional non-crypto randomness.

Also applies to: 23-28

go/GO_DOCUMENTATION_GUIDELINES.md (1)

392-395: Indentation consistency in cross-reference bullets.
Normalize spacing to match surrounding bullets.

 10. **Cross-references** using Go's `[Reference]` format:
-    - `[OtherFunc]` for functions
-    - `[TypeName]` for structs/interfaces
-    - `[ConstantName]` for constants
+    - `[OtherFunc]` for functions
+    - `[TypeName]` for structs/interfaces
+    - `[ConstantName]` for constants
go/pkg/clickhouse/key_verifications_test.go (7)

25-27: Prefer t.Cleanup over defer for connection teardown.

Ensures proper ordering with subtests and parallel runs.

- defer conn.Close()
+ t.Cleanup(func() { require.NoError(t, conn.Close()) })

35-35: Gate 1M-row test under -short/CI.

Prevents slow/flaky CI.

- numRecords := 1_000_000
+ if testing.Short() {
+   t.Skip("skipping large load test in -short mode")
+ }
+ numRecords := 1_000_000

66-69: Fix misleading latency comment.

Distribution is heavy-tailed; values can exceed 100ms.

-		latency := rand.ExpFloat64()*50 + 1 // 1-100ms base range
+		latency := rand.ExpFloat64()*50 + 1 // ~Exp(mean=50ms); typically <100ms but heavy-tailed

113-123: Add per-query timeouts and avoid shared err in parallel subtests.

Use a fresh err variable and a short context inside Eventually to prevent hangs and data races.

- require.EventuallyWithT(t, func(c *assert.CollectT) {
-   queried := int64(0)
-   err = conn.QueryRow(ctx, "...").Scan(&queried)
-   require.NoError(c, err)
+ require.EventuallyWithT(t, func(c *assert.CollectT) {
+   ctxQ, cancel := context.WithTimeout(ctx, 30*time.Second)
+   defer cancel()
+   queried := int64(0)
+   qErr := conn.QueryRow(ctxQ, "...").Scan(&queried)
+   require.NoError(c, qErr)
 }, time.Minute, time.Second)

Also applies to: 167-179, 198-209, 231-240, 255-270, 281-289, 302-310, 327-335


112-124: Capture loop variables in subtests.

Prevent flakiness under parallelization.

- for _, table := range []string{...} {
-   t.Run(table, func(t *testing.T) {
+ for _, table := range []string{...} {
+   table := table
+   t.Run(table, func(t *testing.T) {

Apply similarly for keyID and identityID closures:

- for _, keyID := range keys[:10] {
+ for _, keyID := range keys[:10] {
+   keyID := keyID
    ...
- for _, identityID := range identities[:10] {
+ for _, identityID := range identities[:10] {
+   identityID := identityID

Also applies to: 166-173, 197-204, 230-236, 254-262, 279-286, 301-306, 326-331


258-263: Scan quantiles into float64.

The driver returns Float64; avoid precision loss/scan errors.

- var (
-   queriedAvg float64
-   queriedP75 float32
-   queriedP99 float32
- )
- err = conn.QueryRow(ctx, "...").Scan(&queriedAvg, &queriedP75, &queriedP99)
+ var (
+   queriedAvg float64
+   queriedP75 float64
+   queriedP99 float64
+ )
+ err = conn.QueryRow(ctx, "...").Scan(&queriedAvg, &queriedP75, &queriedP99)

341-356: Make billing assertion resilient to MV lag.

Poll until MV catches up to avoid flakes.

- var queried int64
- err = conn.QueryRow(ctx, "SELECT sum(count) FROM billable_verifications_per_month_v2 WHERE workspace_id = ?;", workspaceID).Scan(&queried)
- require.NoError(t, err)
- require.Equal(t, billableVerifications, queried)
+ require.EventuallyWithT(t, func(c *assert.CollectT) {
+   ctxQ, cancel := context.WithTimeout(ctx, 30*time.Second)
+   defer cancel()
+   var queried int64
+   qErr := conn.QueryRow(ctxQ, "SELECT sum(count) FROM billable_verifications_per_month_v2 WHERE workspace_id = ?;", workspaceID).Scan(&queried)
+   require.NoError(c, qErr)
+   require.Equal(c, billableVerifications, queried)
+ }, time.Minute, time.Second)
go/pkg/clickhouse/schema/014_api_requests_per_day_v2.sql (3)

35-35: Normalize to UTC before day bucketing.

Avoid server-TZ drift.

-  toDate(toStartOfDay(time)) AS time
+  toDate(toStartOfDay(toTimeZone(time, 'UTC'))) AS time

10-10: Counts should be unsigned.

Keep count non-negative and consistent across v2 layers.

-  count SimpleAggregateFunction(sum, Int64),
+  count SimpleAggregateFunction(sum, UInt64),

5-5: Invalid/undesired type for HTTP status; use UInt16.

ClickHouse has no bare Int type; statuses are non-negative.

-  response_status Int,
+  response_status UInt16,
go/pkg/clickhouse/schema/009_ratelimits_per_day_v2.sql (2)

19-36: Fix MV: column order, quantile MergeState params, and GROUP BY alias.

Current SELECT order doesn’t match table, MergeState uses params, and GROUP BY uses alias.

-CREATE MATERIALIZED VIEW ratelimits_per_day_mv_v2 TO ratelimits_per_day_v2 AS
-SELECT
-  workspace_id,
-  namespace_id,
-  identifier,
-  sum(total) as total,
-  sum(passed) as passed,
-  avgMergeState(latency_avg) as latency_avg,
-  quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75,
-  quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99,
-  toDate(toStartOfDay(time)) AS time
-FROM
-  ratelimits_per_hour_v2
-GROUP BY
-  workspace_id,
-  namespace_id,
-  time,
-  identifier;
+CREATE MATERIALIZED VIEW ratelimits_per_day_mv_v2 TO ratelimits_per_day_v2 AS
+SELECT
+  toDate(toStartOfDay(time)) AS time,
+  workspace_id,
+  namespace_id,
+  identifier,
+  sum(passed) AS passed,
+  sum(total) AS total,
+  avgMergeState(latency_avg) AS latency_avg,
+  quantilesTDigestMergeState(latency_p75) AS latency_p75,
+  quantilesTDigestMergeState(latency_p99) AS latency_p99
+FROM ratelimits_per_hour_v2
+GROUP BY
+  toDate(toStartOfDay(time)),
+  workspace_id,
+  namespace_id,
+  identifier;

17-17: TTL style consistency.

Align with toIntervalDay.

-TTL time + INTERVAL 100 DAY DELETE;
+TTL time + toIntervalDay(100) DELETE;
go/pkg/clickhouse/schema/010_ratelimits_per_month_v2.sql (1)

19-36: Fix MV: column order, MergeState params, and GROUP BY alias.

Bring SELECT into table order, remove TDigest params, and group by expression.

-CREATE MATERIALIZED VIEW ratelimits_per_month_mv_v2 TO ratelimits_per_month_v2 AS
-SELECT
-  workspace_id,
-  namespace_id,
-  identifier,
-  sum(total) as total,
-  sum(passed) as passed,
-  avgMergeState(latency_avg) as latency_avg,
-  quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75,
-  quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99,
-  toDate(toStartOfMonth(time)) AS time
-FROM
-  ratelimits_per_day_v2
-GROUP BY
-  workspace_id,
-  namespace_id,
-  time,
-  identifier;
+CREATE MATERIALIZED VIEW ratelimits_per_month_mv_v2 TO ratelimits_per_month_v2 AS
+SELECT
+  toDate(toStartOfMonth(time)) AS time,
+  workspace_id,
+  namespace_id,
+  identifier,
+  sum(passed) AS passed,
+  sum(total) AS total,
+  avgMergeState(latency_avg) AS latency_avg,
+  quantilesTDigestMergeState(latency_p75) AS latency_p75,
+  quantilesTDigestMergeState(latency_p99) AS latency_p99
+FROM ratelimits_per_day_v2
+GROUP BY
+  toDate(toStartOfMonth(time)),
+  workspace_id,
+  namespace_id,
+  identifier;
go/pkg/clickhouse/schema/004_key_verifications_per_day_v2.sql (2)

33-57: Fix MV: column order, quantile MergeState params, and GROUP BY alias.

Ensure positional match with table schema; remove TDigest params; group by expression.

-CREATE MATERIALIZED VIEW key_verifications_per_day_mv_v2 TO key_verifications_per_day_v2 AS
-SELECT
-  workspace_id,
-  key_space_id,
-  identity_id,
-  key_id,
-  outcome,
-  tags,
-  sum(count) as count,
-  sum(spent_credits) as spent_credits,
-  avgMergeState(latency_avg) as latency_avg,
-  quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75,
-  quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99,
-  toDate(toStartOfDay(time)) AS time
-FROM
-  key_verifications_per_hour_v2
-GROUP BY
-  workspace_id,
-  time,
-  key_space_id,
-  identity_id,
-  key_id,
-  outcome,
-  tags
-;
+CREATE MATERIALIZED VIEW key_verifications_per_day_mv_v2 TO key_verifications_per_day_v2 AS
+SELECT
+  toDate(toStartOfDay(time)) AS time,
+  workspace_id,
+  key_space_id,
+  identity_id,
+  key_id,
+  outcome,
+  tags,
+  sum(count) AS count,
+  sum(spent_credits) AS spent_credits,
+  avgMergeState(latency_avg) AS latency_avg,
+  quantilesTDigestMergeState(latency_p75) AS latency_p75,
+  quantilesTDigestMergeState(latency_p99) AS latency_p99
+FROM key_verifications_per_hour_v2
+GROUP BY
+  toDate(toStartOfDay(time)),
+  workspace_id,
+  key_space_id,
+  identity_id,
+  key_id,
+  outcome,
+  tags;

18-19: Consider monthly partitions for daily grain.

Monthly partitions reduce part churn with 100‑day TTL.

-PARTITION BY
-  toYYYYMMDD (time)
+PARTITION BY
+  toYYYYMM (time)
go/pkg/clickhouse/schema/005_key_verifications_per_month_v2.sql (1)

33-57: Fix MV: positional order, TDigest MergeState params, and GROUP BY alias.

Reorder to match table, remove params, group by expression.

-CREATE MATERIALIZED VIEW key_verifications_per_month_mv_v2 TO key_verifications_per_month_v2 AS
-SELECT
-  workspace_id,
-  key_space_id,
-  identity_id,
-  key_id,
-  outcome,
-  tags,
-  sum(count) as count,
-  sum(spent_credits) as spent_credits,
-  avgMergeState(latency_avg) as latency_avg,
-  quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75,
-  quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99,
-  toDate(toStartOfMonth(time)) AS time
-FROM
-  key_verifications_per_day_v2
-GROUP BY
-  workspace_id,
-  time,
-  key_space_id,
-  identity_id,
-  key_id,
-  outcome,
-  tags
-;
+CREATE MATERIALIZED VIEW key_verifications_per_month_mv_v2 TO key_verifications_per_month_v2 AS
+SELECT
+  toDate(toStartOfMonth(time)) AS time,
+  workspace_id,
+  key_space_id,
+  identity_id,
+  key_id,
+  outcome,
+  tags,
+  sum(count) AS count,
+  sum(spent_credits) AS spent_credits,
+  avgMergeState(latency_avg) AS latency_avg,
+  quantilesTDigestMergeState(latency_p75) AS latency_p75,
+  quantilesTDigestMergeState(latency_p99) AS latency_p99
+FROM key_verifications_per_day_v2
+GROUP BY
+  toDate(toStartOfMonth(time)),
+  workspace_id,
+  key_space_id,
+  identity_id,
+  key_id,
+  outcome,
+  tags;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between edd3765 and bf1c68b.

📒 Files selected for processing (17)
  • go/GO_DOCUMENTATION_GUIDELINES.md (12 hunks)
  • go/pkg/array/doc.go (1 hunks)
  • go/pkg/array/fill.go (1 hunks)
  • go/pkg/array/map.go (1 hunks)
  • go/pkg/array/random.go (1 hunks)
  • go/pkg/array/reduce.go (1 hunks)
  • go/pkg/clickhouse/key_verifications_test.go (1 hunks)
  • go/pkg/clickhouse/schema/000_legacy.sql (1 hunks)
  • go/pkg/clickhouse/schema/004_key_verifications_per_day_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/005_key_verifications_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/009_ratelimits_per_day_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/010_ratelimits_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/014_api_requests_per_day_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/016_api_requests_per_minute_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/017_billable_ratelimits_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/018_billable_verifications_per_month_v2.sql (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
📚 Learning: 2025-04-22T14:40:51.459Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.

Applied to files:

  • go/pkg/clickhouse/schema/018_billable_verifications_per_month_v2.sql
  • go/pkg/clickhouse/schema/017_billable_ratelimits_per_month_v2.sql
  • go/pkg/clickhouse/schema/000_legacy.sql
📚 Learning: 2025-08-14T16:25:48.167Z
Learnt from: Flo4604
PR: unkeyed/unkey#3785
File: go/apps/api/routes/v2_keys_reroll_key/401_test.go:52-61
Timestamp: 2025-08-14T16:25:48.167Z
Learning: User Flo4604 requested creation of a GitHub issue to track converting all test files to use table-driven test patterns as a broader codebase improvement, following the suggestion made during review of go/apps/api/routes/v2_keys_reroll_key/401_test.go.

Applied to files:

  • go/pkg/clickhouse/key_verifications_test.go
📚 Learning: 2025-07-16T15:38:53.491Z
Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/db/replica.go:8-11
Timestamp: 2025-07-16T15:38:53.491Z
Learning: For debugging database replica usage in go/pkg/db/replica.go, it's acceptable to mark QueryRowContext operations as "success" even though SQL errors only surface during row.Scan() calls. The timing metrics are the primary concern for debugging replica performance patterns.

Applied to files:

  • go/pkg/clickhouse/key_verifications_test.go
📚 Learning: 2025-04-22T14:43:11.724Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.

Applied to files:

  • go/pkg/clickhouse/schema/000_legacy.sql
🧬 Code graph analysis (1)
go/pkg/clickhouse/key_verifications_test.go (6)
go/pkg/uid/uid.go (5)
  • WorkspacePrefix (20-20)
  • KeyPrefix (16-16)
  • KeyAuthPrefix (21-21)
  • IdentityPrefix (28-28)
  • RequestPrefix (19-19)
go/pkg/array/fill.go (1)
  • Fill (23-33)
go/pkg/clickhouse/schema/types.go (1)
  • KeyVerificationV2 (6-18)
go/pkg/array/random.go (1)
  • Random (23-28)
go/pkg/array/reduce.go (1)
  • Reduce (28-34)
go/pkg/array/map.go (1)
  • Map (23-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build / Build
  • GitHub Check: Test Packages / Test
  • GitHub Check: Test Go API Local / Test
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (5)
go/pkg/array/reduce.go (1)

3-27: Docs/readability: Solid and actionable examples.
Clear API intent, parameters, and examples. No changes needed.

go/pkg/array/map.go (1)

24-29: LGTM.
Preallocation + index-only range is idiomatic and efficient.

go/pkg/array/fill.go (1)

29-31: Build-breaking bug: invalid range over int.
for i := range length does not compile. Use a counted loop.

-	for i := range length {
+	for i := 0; i < length; i++ {
 		slice[i] = generator()
 	}
⛔ Skipped due to learnings
Learnt from: Flo4604
PR: unkeyed/unkey#3800
File: go/apps/api/integration/multi_node_usagelimiting/run.go:107-126
Timestamp: 2025-08-19T08:57:31.793Z
Learning: Go 1.22+ supports "range over integers" syntax: `for range N` iterates N times, and `for i := range N` iterates with i from 0 to N-1. This is valid Go syntax and should not be flagged as a compilation error.
go/pkg/array/doc.go (1)

1-41: Package docs are concise and accurate.
Cross-refs resolve; concurrency notes are appropriate.

go/pkg/clickhouse/schema/000_legacy.sql (1)

49-68: Confirm MV→table column order for legacy v1 objects.

Many MVs (e.g., this block) select columns in a different order than the target table. With TO, ClickHouse inserts by position. Verify this matches production definitions or adjust SELECT to table order to avoid column shifts.

Comment on lines 7 to 11
) ENGINE = SummingMergeTree ()
PARTITION BY
(year, month)
ORDER BY
(workspace_id, year, month);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Retention: confirm TTL policy for billing data.

Monthly billables typically require long retention. If intended, add an explicit TTL or document why this table is indefinite.

🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/018_billable_verifications_per_month_v2.sql around
lines 7 to 11, the table DDL currently has no TTL/retention specified so billing
data will be kept indefinitely; either add an explicit TTL clause (e.g., TTL
toDate(year, month, 1) + INTERVAL X MONTH or TTL <timestamp_column> + INTERVAL X
MONTH) to enforce the desired retention period, or add a comment above the DDL
documenting that indefinite retention is intentional and why (including expected
data size and compliance requirements). Ensure the chosen TTL uses the
appropriate date column and interval and that the change is approved by the
billing/PD team.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 21

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf1c68b and c59ab80.

⛔ Files ignored due to path filters (2)
  • go/pkg/clickhouse/migrations/atlas.sum is excluded by !**/*.sum
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (10)
  • go/pkg/clickhouse/migrations/20250911070454.sql (1 hunks)
  • go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/012_api_requests_raw_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/013_api_requests_per_minute_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/014_api_requests_per_hour_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/015_api_requests_per_day_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/016_api_requests_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
📚 Learning: 2025-04-22T14:40:51.459Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.

Applied to files:

  • go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql
  • go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql
  • go/pkg/clickhouse/migrations/20250911070454.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test Packages / Test
🔇 Additional comments (10)
go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (1)

23-23: Confirm source time unit alignment (ms vs s).

If ratelimits.raw_ratelimits_v1.time is ms-epoch, keeping Int64 is fine; if it’s DateTime/DateTime64, align the type accordingly.

go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql (1)

12-23: Verify source column passed exists and semantics.

Confirm ratelimits_per_month_v2 exposes a numeric passed compatible with sum(passed) and that summing across its dimensions matches billing rules.

go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql (2)

13-26: Confirm source fields and filter semantics.

Ensure key_verifications_per_month_v2 has outcome and count with expected types; validate that billing should include only outcome = 'VALID' (case, enum). If outcome is enum, compare by enum value to avoid string drift.


23-26: Group by columns must match SELECT after schema change.

If you adopt time Date, adjust MV to group by time instead of (year, month).

go/pkg/clickhouse/schema/013_api_requests_per_minute_v2.sql (1)

27-36: Time bucketing from ms epoch—confirm unit.

fromUnixTimestamp64Milli(time) assumes ms-epoch in api_requests_raw_v2.time. Validate source type.

go/pkg/clickhouse/schema/014_api_requests_per_hour_v2.sql (1)

27-35: Idempotent rollup path looks good.

Hourly MV groups correctly by all dims and buckets via toStartOfHour(time).

go/pkg/clickhouse/schema/015_api_requests_per_day_v2.sql (1)

27-44: Daily aggregation is consistent.

Correct bucketing to day and grouping over all dims.

go/pkg/clickhouse/schema/016_api_requests_per_month_v2.sql (1)

27-35: Monthly rollup LGTM.

Buckets to month start and groups over all dims.

go/pkg/clickhouse/schema/012_api_requests_raw_v2.sql (1)

33-55: Incorrect — no duplicate MV found; only defined in go/pkg/clickhouse/schema/012_api_requests_raw_v2.sql

rg output shows the CREATE TABLE api_requests_raw_v2 and CREATE MATERIALIZED VIEW temp_sync_metrics_v1_to_v2 only in that file; the duplicate alleged in 20250911070454.sql was not found. Use IF NOT EXISTS and fully-qualified names only if you intentionally add a second CREATE.

Likely an incorrect or invalid review comment.

go/pkg/clickhouse/migrations/20250911070454.sql (1)

309-311: Backfill and decommission plan for temp sync MVs.

Temp sync views target only new inserts (no POPULATE). Ensure backfill jobs cover historical data, and add explicit DROP steps in the migration plan once cutover is done to avoid silent double-writes.

Would you like a one-off backfill SQL for ratelimits_last_used_v2 that unions v1/v2 raw tables and computes MAX(time) per (workspace_id, namespace_id, identifier)?

Comment on lines +63 to +83
-- Create "api_requests_raw_v2" table
CREATE TABLE `default`.`api_requests_raw_v2` (
`request_id` String,
`time` Int64 CODEC(Delta(8), LZ4),
`workspace_id` String,
`host` String,
`method` LowCardinality(String),
`path` String,
`request_headers` Array(String),
`request_body` String,
`response_status` Int32,
`response_headers` Array(String),
`response_body` String,
`error` String,
`service_latency` Int64,
`user_agent` String,
`ip_address` String,
`region` LowCardinality(String),
INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1
) ENGINE = MergeTree
PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Unify raw-table shapes across domains.

All three raw_v2 tables should align on: PARTITION BY day, time CODEC(Delta(8), LZ4), index_granularity, and dedup settings. Today only api_requests_raw_v2 sets index_granularity; none set PARTITION BY.

I can generate a consolidated patch if you confirm daily partitioning is desired for all raw tables.

Also applies to: 176-194, 259-271

🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 63-83 (and
similarly for blocks at 176-194 and 259-271), the raw_v2 tables are not using a
unified table shape: add a daily PARTITION BY expression (partition by day
derived from the time column), ensure the time column keeps CODEC(Delta(8),
LZ4), and make the table SETTINGS consistent across all raw_v2 tables by setting
the same index_granularity and non_replicated_deduplication_window values; apply
the same primary key / ORDER BY ordering and TTL pattern to match the other
raw_v2 tables so all three share identical partitioning, codec,
index_granularity, and dedup settings.

Comment on lines +64 to +83
CREATE TABLE `default`.`api_requests_raw_v2` (
`request_id` String,
`time` Int64 CODEC(Delta(8), LZ4),
`workspace_id` String,
`host` String,
`method` LowCardinality(String),
`path` String,
`request_headers` Array(String),
`request_body` String,
`response_status` Int32,
`response_headers` Array(String),
`response_body` String,
`error` String,
`service_latency` Int64,
`user_agent` String,
`ip_address` String,
`region` LowCardinality(String),
INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1
) ENGINE = MergeTree
PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Raw table missing PARTITION BY; will create a single huge partition.

api_requests_raw_v2 here has no PARTITION BY, unlike the schema file. Add daily partitioning for manageability and TTL efficiency.

 ) ENGINE = MergeTree
+PARTITION BY toYYYYMMDD(fromUnixTimestamp64Milli(time))
 PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000;

Apply the same fix to key_verifications_raw_v2 (Line 193-194 block) and ratelimits_raw_v2 (Line 270-271 block).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CREATE TABLE `default`.`api_requests_raw_v2` (
`request_id` String,
`time` Int64 CODEC(Delta(8), LZ4),
`workspace_id` String,
`host` String,
`method` LowCardinality(String),
`path` String,
`request_headers` Array(String),
`request_body` String,
`response_status` Int32,
`response_headers` Array(String),
`response_body` String,
`error` String,
`service_latency` Int64,
`user_agent` String,
`ip_address` String,
`region` LowCardinality(String),
INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1
) ENGINE = MergeTree
PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000;
CREATE TABLE `default`.`api_requests_raw_v2` (
`request_id` String,
`time` Int64 CODEC(Delta(8), LZ4),
`workspace_id` String,
`host` String,
`method` LowCardinality(String),
`path` String,
`request_headers` Array(String),
`request_body` String,
`response_status` Int32,
`response_headers` Array(String),
`response_body` String,
`error` String,
`service_latency` Int64,
`user_agent` String,
`ip_address` String,
`region` LowCardinality(String),
INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1
) ENGINE = MergeTree
PARTITION BY toYYYYMMDD(fromUnixTimestamp64Milli(time))
PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000;
🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 64-83, the
CREATE TABLE for api_requests_raw_v2 lacks a PARTITION BY clause which will
create one huge partition; add a daily PARTITION BY expression based on the time
column (e.g., partition by the day derived from the Int64 time/timestamp) so
partitions are created per day and TTLs work efficiently; update the CREATE
TABLE definitions for key_verifications_raw_v2 (around lines 193-194) and
ratelimits_raw_v2 (around lines 270-271) the same way to use daily partitioning
derived from their time columns.

Comment on lines +195 to +202
-- Create "ratelimits_last_used_v2" table
CREATE TABLE `default`.`ratelimits_last_used_v2` (
`time` Int64,
`workspace_id` String,
`namespace_id` String,
`identifier` String
) ENGINE = AggregatingMergeTree
PRIMARY KEY (`workspace_id`, `namespace_id`, `identifier`) ORDER BY (`workspace_id`, `namespace_id`, `identifier`) SETTINGS index_granularity = 8192;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

ratelimits_last_used_v2 is incorrectly modeled; maxSimpleState() is invalid for the declared column type.

With AggregatingMergeTree, either store AggregateFunction(max, Int64) with maxState(), or use SimpleAggregateFunction(max, Int64) and insert raw values. Current DDL (Int64 + maxSimpleState) won’t merge correctly.

Use SimpleAggregateFunction variant:

-CREATE TABLE `default`.`ratelimits_last_used_v2` (
-  `time` Int64,
+CREATE TABLE `default`.`ratelimits_last_used_v2` (
+  `time` SimpleAggregateFunction(max, Int64),
   `workspace_id` String,
   `namespace_id` String,
   `identifier` String
 ) ENGINE = AggregatingMergeTree
 PRIMARY KEY (`workspace_id`, `namespace_id`, `identifier`) ORDER BY (`workspace_id`, `namespace_id`, `identifier`) SETTINGS index_granularity = 8192;
@@
-CREATE MATERIALIZED VIEW `default`.`ratelimits_last_used_mv_v2` TO `default`.`ratelimits_last_used_v2` AS SELECT workspace_id, namespace_id, identifier, maxSimpleState(time) AS time FROM ratelimits.raw_ratelimits_v1 GROUP BY workspace_id, namespace_id, identifier;
+CREATE MATERIALIZED VIEW `default`.`ratelimits_last_used_mv_v2` TO `default`.`ratelimits_last_used_v2` AS
+SELECT workspace_id, namespace_id, identifier, time
+FROM ratelimits.raw_ratelimits_v1;

Option B (AggregateFunction) if preferred:

-  `time` Int64,
+  `time` AggregateFunction(max, Int64),
@@
-CREATE MATERIALIZED VIEW ... AS SELECT ..., maxSimpleState(time) AS time ...
+CREATE MATERIALIZED VIEW ... AS SELECT ..., maxState(time) AS time ...

Also applies to: 296-297

🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 195 to 202 (also
applies to lines 296-297), the table is declared with `time Int64` but the
AggregatingMergeTree is using maxSimpleState() which is invalid for a plain
Int64 column; fix by changing the column definition to a matching aggregate
type: either make the column SimpleAggregateFunction(max, Int64) and continue
inserting raw Int64 values, or change it to AggregateFunction(max, Int64) and
ensure you insert maxState() values (and adjust any insert/upsert SQL
accordingly); update the DDL to use one of these two consistent patterns and
verify the corresponding insert logic in the migration is updated to match.

Comment on lines +273 to +275
CREATE MATERIALIZED VIEW `default`.`active_workspaces_keys_per_month_mv_v2` TO `default`.`active_workspaces_per_month_v2` AS SELECT workspace_id, toDate(time) AS time FROM default.key_verifications_per_month_v2;
-- Create "active_workspaces_ratelimits_per_month_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`active_workspaces_ratelimits_per_month_mv_v2` TO `default`.`active_workspaces_per_month_v2` AS SELECT workspace_id, toDate(time) AS time FROM default.ratelimits_per_month_v2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

ReplacingMergeTree without version may accumulate duplicates until merges.

active_workspaces_* MVs insert potentially duplicate PK rows. ReplacingMergeTree will collapse them eventually, but if write volume is high, consider AggregatingMergeTree with SimpleAggregateFunction(anyLast, Date) or add a version column.

Comment on lines +281 to +285
CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` TO `default`.`api_requests_per_minute_v2` AS SELECT workspace_id, path, response_status, host, method, count(*) AS count, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.api_requests_raw_v2 GROUP BY workspace_id, path, response_status, host, method, time;
-- Create "api_requests_per_month_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`api_requests_per_month_mv_v2` TO `default`.`api_requests_per_month_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfMonth(time)) AS time FROM default.api_requests_per_day_v2 GROUP BY workspace_id, path, response_status, host, method, time;
-- Create "billable_ratelimits_per_month_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`billable_ratelimits_per_month_mv_v2` TO `default`.`billable_ratelimits_per_month_v2` AS SELECT workspace_id, sum(passed) AS count, toYear(time) AS year, toMonth(time) AS month FROM default.ratelimits_per_month_v2 GROUP BY workspace_id, year, month;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Make migration idempotent.

Consider IF NOT EXISTS on MVs to survive partial runs/retries, especially while temp sync views exist in both schema and migrations.

-CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` ...
+CREATE MATERIALIZED VIEW IF NOT EXISTS `default`.`api_requests_per_minute_mv_v2` ...

Apply similarly to other CREATE MATERIALIZED VIEW statements.

Also applies to: 287-295, 299-305

🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 281 to 285 (and
likewise for the other CREATE MATERIALIZED VIEW ranges at 287-295 and 299-305),
the CREATE MATERIALIZED VIEW statements are not idempotent and will fail on
partial runs/retries; update each CREATE MATERIALIZED VIEW to include IF NOT
EXISTS so the migration can safely be re-run when temp/partial views already
exist, and apply the same change to all other CREATE MATERIALIZED VIEW
statements mentioned in the comment.

Comment on lines +11 to +12
workspace_id,
toDate (time) as time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Nit: normalize spacing.

toDate (time)toDate(time).

-  toDate (time) as time
+  toDate(time) as time

Also applies to: 18-19

🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql around lines
11-12 (and also at lines 18-19), normalize the spacing in the function call by
changing occurrences of "toDate (time)" to "toDate(time)"; update those two
places so the function name and its parentheses have no space between them.

Comment on lines +1 to +10
CREATE TABLE billable_ratelimits_per_month_v2 (
year Int16,
month Int8,
workspace_id String,
count Int64
) ENGINE = SummingMergeTree ()
PARTITION BY
(year, month)
ORDER BY
(workspace_id, year, month);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Align monthly schema with other v2 tables (use time Date).

Other monthly tables (e.g., api_requests_per_month_v2) model month as time Date (month start). Using (year, month) here introduces inconsistency. Prefer time Date for query ergonomics and uniformity.

-CREATE TABLE billable_ratelimits_per_month_v2 (
-  year Int16,
-  month Int8,
-  workspace_id String,
-  count Int64
-) ENGINE = SummingMergeTree ()
-PARTITION BY
-  (year, month)
-ORDER BY
-  (workspace_id, year, month);
+CREATE TABLE billable_ratelimits_per_month_v2 (
+  time Date,
+  workspace_id String,
+  count Int64
+) ENGINE = SummingMergeTree()
+PARTITION BY toYYYYMM(time)
+ORDER BY (time, workspace_id);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CREATE TABLE billable_ratelimits_per_month_v2 (
year Int16,
month Int8,
workspace_id String,
count Int64
) ENGINE = SummingMergeTree ()
PARTITION BY
(year, month)
ORDER BY
(workspace_id, year, month);
CREATE TABLE billable_ratelimits_per_month_v2 (
time Date,
workspace_id String,
count Int64
) ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(time)
ORDER BY (time, workspace_id);
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql around
lines 1-10, the table models month as separate year Int16 and month Int8 columns
which is inconsistent with other v2 monthly tables; change the schema to use a
single time Date column representing the month start (drop year and month
columns), keep count Int64, and update ENGINE/PARTITION BY/ORDER BY to partition
by time and ORDER BY (workspace_id, time) to match the other monthly v2 tables
for consistent querying and ergonomics.

Comment on lines +9 to +10
ORDER BY
(workspace_id, year, month);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Ordering to match partition for better pruning.

Given PARTITION BY (year, month), ordering as (year, month, workspace_id) aids range scans by month.

-ORDER BY
-  (workspace_id, year, month);
+ORDER BY
+  (year, month, workspace_id);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ORDER BY
(workspace_id, year, month);
ORDER BY
(year, month, workspace_id);
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql around
lines 9 to 10, the ORDER BY currently uses (workspace_id, year, month) which
doesn't match the PARTITION BY (year, month); change the ORDER BY to (year,
month, workspace_id) so the ordering aligns with the partitioning and improves
range pruning for month-based queries.

Comment on lines +12 to +23
CREATE MATERIALIZED VIEW billable_ratelimits_per_month_mv_v2 TO billable_ratelimits_per_month_v2 AS
SELECT
workspace_id,
sum(passed) AS count,
toYear (time) AS year,
toMonth (time) AS month
FROM
ratelimits_per_month_v2
GROUP BY
workspace_id,
year,
month;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Minor: trim extra spaces in function calls.

toYear (time)toYear(time), toMonth (time)toMonth(time).

-  toYear (time) AS year,
-  toMonth (time) AS month
+  toYear(time) AS year,
+  toMonth(time) AS month
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CREATE MATERIALIZED VIEW billable_ratelimits_per_month_mv_v2 TO billable_ratelimits_per_month_v2 AS
SELECT
workspace_id,
sum(passed) AS count,
toYear (time) AS year,
toMonth (time) AS month
FROM
ratelimits_per_month_v2
GROUP BY
workspace_id,
year,
month;
CREATE MATERIALIZED VIEW billable_ratelimits_per_month_mv_v2 TO billable_ratelimits_per_month_v2 AS
SELECT
workspace_id,
sum(passed) AS count,
toYear(time) AS year,
toMonth(time) AS month
FROM
ratelimits_per_month_v2
GROUP BY
workspace_id,
year,
month;
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql around
lines 12 to 23, there are extra spaces inside the function calls `toYear (time)`
and `toMonth (time)`; remove the space so they read `toYear(time)` and
`toMonth(time)` to follow consistent formatting and avoid style issues.

Comment on lines +1 to +11
-- Billing Billable Verifications Per Month Table and Materialized View
CREATE TABLE billable_verifications_per_month_v2 (
year Int16,
month Int8,
workspace_id String,
count Int64
) ENGINE = SummingMergeTree ()
PARTITION BY
(year, month)
ORDER BY
(workspace_id, year, month);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Unify month key shape with v2 (prefer time Date).

Mirror 016_api_requests_per_month_v2 for consistency: time Date (month start) instead of (year, month).

-CREATE TABLE billable_verifications_per_month_v2 (
-  year Int16,
-  month Int8,
-  workspace_id String,
-  count Int64
-) ENGINE = SummingMergeTree ()
-PARTITION BY
-  (year, month)
-ORDER BY
-  (workspace_id, year, month);
+CREATE TABLE billable_verifications_per_month_v2 (
+  time Date,
+  workspace_id String,
+  count Int64
+) ENGINE = SummingMergeTree()
+PARTITION BY toYYYYMM(time)
+ORDER BY (time, workspace_id);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
-- Billing Billable Verifications Per Month Table and Materialized View
CREATE TABLE billable_verifications_per_month_v2 (
year Int16,
month Int8,
workspace_id String,
count Int64
) ENGINE = SummingMergeTree ()
PARTITION BY
(year, month)
ORDER BY
(workspace_id, year, month);
-- Billing Billable Verifications Per Month Table and Materialized View
CREATE TABLE billable_verifications_per_month_v2 (
time Date,
workspace_id String,
count Int64
) ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(time)
ORDER BY (time, workspace_id);
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql around
lines 1 to 11, replace the separate year and month columns with a single time
Date column (representing the month start) to match v2 conventions; update the
PARTITION BY to use the time month key (e.g. toYYYYMM(time) or the same
partition expression used in 016_api_requests_per_month_v2) and change ORDER BY
to (workspace_id, time) so the table schema and materialized view shape mirror
016_api_requests_per_month_v2.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 24

♻️ Duplicate comments (3)
go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (3)

15-23: MV target and state function now consistent—good; keep it.

With the above table change, maxSimpleState(time) matches the column type. No DB-qualification mismatch anymore.


1-11: Column order (dims first) improves locality.

Optional: move time after keys in ORDER BY/table for readability; not functionally required.

- (time SimpleAggregateFunction(max, Int64), workspace_id, namespace_id, identifier)
+ (workspace_id String, namespace_id String, identifier String, time SimpleAggregateFunction(max, Int64))

1-10: AggregatingMergeTree requires aggregate states; fix time type.

Table defines time Int64 but MV inserts maxSimpleState(time) (aggregate state). Use SimpleAggregateFunction(max, Int64) (or switch MV to maxState and use AggregateFunction).

 CREATE TABLE IF NOT EXISTS ratelimits_last_used_v2
 (
-  time          Int64,
+  time          SimpleAggregateFunction(max, Int64),
   workspace_id  String,
   namespace_id  String,
   identifier    String,
 )
 ENGINE = AggregatingMergeTree()
 ORDER BY (workspace_id, namespace_id, identifier)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf1c68b and c59ab80.

⛔ Files ignored due to path filters (2)
  • go/pkg/clickhouse/migrations/atlas.sum is excluded by !**/*.sum
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (10)
  • go/pkg/clickhouse/migrations/20250911070454.sql (1 hunks)
  • go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/012_api_requests_raw_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/013_api_requests_per_minute_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/014_api_requests_per_hour_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/015_api_requests_per_day_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/016_api_requests_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test API / API Test Local
🔇 Additional comments (5)
go/pkg/clickhouse/schema/012_api_requests_raw_v2.sql (2)

25-31: Dedup setting on non-replicated MergeTree requires tokens.

non_replicated_deduplication_window only dedups when inserts carry tokens. Verify your writers (and this MV) set insert_deduplication_token; otherwise this setting has no effect.


33-56: Migration MV: add explicit column order and note drop plan.

Column order matches now—good. Ensure an explicit DROP of this temp MV is included in the migration plan once writers switch to v2.

go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql (1)

12-23: Optional: add billable filter parity check.

If “billable” should only include passed ratelimits, sum(passed) is fine. Verify source schema so passed is a numeric (not a state) and covers all billable outcomes.

go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (1)

15-28: Answer Flo’s concern: is v1 “last used” still required?

You add v2; confirm whether any readers still depend on ratelimits_last_used_v1 and whether a temporary MV should backfill/keep it hot until cutover.

go/pkg/clickhouse/schema/013_api_requests_per_minute_v2.sql (1)

1-26: TTL and partition look sane for minute granularity.

7-day TTL with daily partitions is reasonable.

Comment on lines +63 to +83
-- Create "api_requests_raw_v2" table
CREATE TABLE `default`.`api_requests_raw_v2` (
`request_id` String,
`time` Int64 CODEC(Delta(8), LZ4),
`workspace_id` String,
`host` String,
`method` LowCardinality(String),
`path` String,
`request_headers` Array(String),
`request_body` String,
`response_status` Int32,
`response_headers` Array(String),
`response_body` String,
`error` String,
`service_latency` Int64,
`user_agent` String,
`ip_address` String,
`region` LowCardinality(String),
INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1
) ENGINE = MergeTree
PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider stronger compression for large blobs.

request_body/response_body can be sizeable; consider ZSTD for better compression trade-offs.

-`time` Int64 CODEC(Delta(8), LZ4),
+`time` Int64 CODEC(Delta(8), LZ4),
 ...
-) ENGINE = MergeTree
+) ENGINE = MergeTree
 SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000
+-- Consider: CODEC(ZSTD(6)) on request_body/response_body columns
 ;

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 63 to 83, the
request_body and response_body columns are currently plain String which can be
large; update their column definitions to use stronger compression (e.g.,
CODEC(ZSTD(3)) or ZSTD(6)) to reduce storage and IO. Concretely, add
CODEC(ZSTD(<level>)) to request_body and response_body (and consider applying
the same to request_headers/response_headers if they can be large), choosing an
appropriate ZSTD level for the desired CPU/size trade-off.

Comment on lines +195 to +203
-- Create "ratelimits_last_used_v2" table
CREATE TABLE `default`.`ratelimits_last_used_v2` (
`time` Int64,
`workspace_id` String,
`namespace_id` String,
`identifier` String
) ENGINE = AggregatingMergeTree
PRIMARY KEY (`workspace_id`, `namespace_id`, `identifier`) ORDER BY (`workspace_id`, `namespace_id`, `identifier`) SETTINGS index_granularity = 8192;
-- Create "ratelimits_per_day_v2" table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

ratelimits_last_used_v2 type/state mismatch; MV can’t insert AggregateFunction into Int64.

The table defines time Int64 under AggregatingMergeTree, but the MV emits maxSimpleState(time) (AggregateFunction). Use SimpleAggregateFunction(max, Int64) and plain max(time) in the MV.

-CREATE TABLE `default`.`ratelimits_last_used_v2` (
-  `time` Int64,
+CREATE TABLE `default`.`ratelimits_last_used_v2` (
+  `time` SimpleAggregateFunction(max, Int64),
   `workspace_id` String,
   `namespace_id` String,
   `identifier` String
 ) ENGINE = AggregatingMergeTree
 PRIMARY KEY (`workspace_id`, `namespace_id`, `identifier`) ORDER BY (`workspace_id`, `namespace_id`, `identifier`) SETTINGS index_granularity = 8192;
-CREATE MATERIALIZED VIEW `default`.`ratelimits_last_used_mv_v2` TO `default`.`ratelimits_last_used_v2` AS
-SELECT workspace_id, namespace_id, identifier, maxSimpleState(time) AS time FROM ratelimits.raw_ratelimits_v1 GROUP BY workspace_id, namespace_id, identifier;
+CREATE MATERIALIZED VIEW `default`.`ratelimits_last_used_mv_v2` TO `default`.`ratelimits_last_used_v2` AS
+SELECT max(time) AS time, workspace_id, namespace_id, identifier FROM ratelimits.raw_ratelimits_v1 GROUP BY workspace_id, namespace_id, identifier;

Also applies to: 297-297

🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 195-203 (also
applies at line 297), the `time` column is declared as Int64 but the
materialized view emits AggregateFunction state (maxSimpleState(time)), causing
a type/state mismatch; change the `time` column type to
SimpleAggregateFunction(max, Int64) in the `ratelimits_last_used_v2` table
definition and update the MV to emit a plain aggregate result (use max(time) in
the INSERT/select) instead of the state function so the MV inserts a compatible
value into the SimpleAggregateFunction column.

Comment on lines +272 to +276
-- Create "active_workspaces_keys_per_month_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`active_workspaces_keys_per_month_mv_v2` TO `default`.`active_workspaces_per_month_v2` AS SELECT workspace_id, toDate(time) AS time FROM default.key_verifications_per_month_v2;
-- Create "active_workspaces_ratelimits_per_month_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`active_workspaces_ratelimits_per_month_mv_v2` TO `default`.`active_workspaces_per_month_v2` AS SELECT workspace_id, toDate(time) AS time FROM default.ratelimits_per_month_v2;
-- Create "api_requests_per_day_mv_v2" view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

ReplacingMergeTree without version: duplicates require FINAL/OPTIMIZE.

active_workspaces_* MVs may insert the same PK multiple times per month. ReplacingMergeTree without a version is fine, but query paths should avoid FINAL in hot paths. If dedup timeliness matters, add a version column.

🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 272 to 276, the
materialized views active_workspaces_keys_per_month_mv_v2 and
active_workspaces_ratelimits_per_month_mv_v2 can insert duplicate primary keys
into a ReplacingMergeTree without a version, which forces use of FINAL/OPTIMIZE
to dedupe; add a version column to the target table and to these views' SELECTs
(e.g. a bigint or datetime like now64() or toUnixTimestamp(time)) and change the
target table to ReplacingMergeTree(version) so deduplication is deterministic
and can happen during merges without relying on FINAL; alternatively, if adding
a version is not acceptable, explicitly document that any hot-path queries must
use FINAL or run periodic OPTIMIZE FINAL, but the preferred fix is adding and
populating a version column in the MVs and altering the table definition to
ReplacingMergeTree(version).

Comment on lines +277 to +284
CREATE MATERIALIZED VIEW `default`.`api_requests_per_day_mv_v2` TO `default`.`api_requests_per_day_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfDay(time)) AS time FROM default.api_requests_per_hour_v2 GROUP BY workspace_id, path, response_status, host, method, time;
-- Create "api_requests_per_hour_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toStartOfHour(time) AS time FROM default.api_requests_per_minute_v2 GROUP BY workspace_id, path, response_status, host, method, time;
-- Create "api_requests_per_minute_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` TO `default`.`api_requests_per_minute_v2` AS SELECT workspace_id, path, response_status, host, method, count(*) AS count, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.api_requests_raw_v2 GROUP BY workspace_id, path, response_status, host, method, time;
-- Create "api_requests_per_month_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`api_requests_per_month_mv_v2` TO `default`.`api_requests_per_month_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfMonth(time)) AS time FROM default.api_requests_per_day_v2 GROUP BY workspace_id, path, response_status, host, method, time;
-- Create "billable_ratelimits_per_month_mv_v2" view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

All API request MVs have column-order mismatches (time must be first).

Each TO … MV must emit columns in the exact order of the target table: time, workspace_id, path, response_status, host, method, count. Reorder SELECTs accordingly.

-CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS
-SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toStartOfHour(time) AS time
+CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS
+SELECT toStartOfHour(time) AS time, workspace_id, path, response_status, host, method, sum(count) AS count
 FROM default.api_requests_per_minute_v2
-GROUP BY workspace_id, path, response_status, host, method, time;
+GROUP BY time, workspace_id, path, response_status, host, method;

Apply the same reordering to:

  • api_requests_per_day_mv_v2 (use toDate(toStartOfDay(time)) AS time first)
  • api_requests_per_minute_mv_v2 (use toStartOfMinute(…) AS time first)
  • api_requests_per_month_mv_v2 (use toDate(toStartOfMonth(time)) AS time first)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CREATE MATERIALIZED VIEW `default`.`api_requests_per_day_mv_v2` TO `default`.`api_requests_per_day_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfDay(time)) AS time FROM default.api_requests_per_hour_v2 GROUP BY workspace_id, path, response_status, host, method, time;
-- Create "api_requests_per_hour_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toStartOfHour(time) AS time FROM default.api_requests_per_minute_v2 GROUP BY workspace_id, path, response_status, host, method, time;
-- Create "api_requests_per_minute_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` TO `default`.`api_requests_per_minute_v2` AS SELECT workspace_id, path, response_status, host, method, count(*) AS count, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.api_requests_raw_v2 GROUP BY workspace_id, path, response_status, host, method, time;
-- Create "api_requests_per_month_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`api_requests_per_month_mv_v2` TO `default`.`api_requests_per_month_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfMonth(time)) AS time FROM default.api_requests_per_day_v2 GROUP BY workspace_id, path, response_status, host, method, time;
-- Create "billable_ratelimits_per_month_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`api_requests_per_day_mv_v2` TO `default`.`api_requests_per_day_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfDay(time)) AS time FROM default.api_requests_per_hour_v2 GROUP BY workspace_id, path, response_status, host, method, time;
-- Create "api_requests_per_hour_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS SELECT toStartOfHour(time) AS time, workspace_id, path, response_status, host, method, sum(count) AS count FROM default.api_requests_per_minute_v2 GROUP BY time, workspace_id, path, response_status, host, method;
-- Create "api_requests_per_minute_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` TO `default`.`api_requests_per_minute_v2` AS SELECT workspace_id, path, response_status, host, method, count(*) AS count, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.api_requests_raw_v2 GROUP BY workspace_id, path, response_status, host, method, time;
-- Create "api_requests_per_month_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`api_requests_per_month_mv_v2` TO `default`.`api_requests_per_month_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfMonth(time)) AS time FROM default.api_requests_per_day_v2 GROUP BY workspace_id, path, response_status, host, method, time;
-- Create "billable_ratelimits_per_month_mv_v2" view
🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 277 to 284, the
materialized view SELECTs emit columns in the wrong order relative to their
target tables; every TO target expects columns in the exact order: time,
workspace_id, path, response_status, host, method, count. Fix by reordering each
SELECT to project the time expression first (e.g., toDate(toStartOfDay(time)) AS
time, toStartOfHour(time) AS time, toStartOfMinute(...) AS time,
toDate(toStartOfMonth(time)) AS time), followed by workspace_id, path,
response_status, host, method, and then count so the SELECT column order matches
the target table exactly.

Comment on lines +289 to +296
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_day_mv_v2` TO `default`.`key_verifications_per_day_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfDay(time)) AS time FROM default.key_verifications_per_hour_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
-- Create "key_verifications_per_hour_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toStartOfHour(time) AS time FROM default.key_verifications_per_minute_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
-- Create "key_verifications_per_minute_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_minute_mv_v2` TO `default`.`key_verifications_per_minute_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, count(*) AS count, sum(spent_credits) AS spent_credits, avgState(latency) AS latency_avg, quantilesTDigestState(0.75)(latency) AS latency_p75, quantilesTDigestState(0.99)(latency) AS latency_p99, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.key_verifications_raw_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
-- Create "key_verifications_per_month_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_month_mv_v2` TO `default`.`key_verifications_per_month_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfMonth(time)) AS time FROM default.key_verifications_per_day_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
-- Create "ratelimits_last_used_mv_v2" view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Key verifications MVs: SELECT order must match table order.

Target tables start with time, then workspace_id, key_space_id, identity_id, key_id, outcome, tags, then aggregates. Reorder the SELECTs.

Example (hourly):

-CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS
-SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toStartOfHour(time) AS time
+CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS
+SELECT toStartOfHour(time) AS time, workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99
 FROM default.key_verifications_per_minute_v2
-GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
+GROUP BY time, workspace_id, key_space_id, identity_id, key_id, outcome, tags;

Mirror this for minute/day/month MVs.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_day_mv_v2` TO `default`.`key_verifications_per_day_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfDay(time)) AS time FROM default.key_verifications_per_hour_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
-- Create "key_verifications_per_hour_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toStartOfHour(time) AS time FROM default.key_verifications_per_minute_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
-- Create "key_verifications_per_minute_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_minute_mv_v2` TO `default`.`key_verifications_per_minute_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, count(*) AS count, sum(spent_credits) AS spent_credits, avgState(latency) AS latency_avg, quantilesTDigestState(0.75)(latency) AS latency_p75, quantilesTDigestState(0.99)(latency) AS latency_p99, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.key_verifications_raw_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
-- Create "key_verifications_per_month_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_month_mv_v2` TO `default`.`key_verifications_per_month_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfMonth(time)) AS time FROM default.key_verifications_per_day_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
-- Create "ratelimits_last_used_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_day_mv_v2` TO `default`.`key_verifications_per_day_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfDay(time)) AS time FROM default.key_verifications_per_hour_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
-- Create "key_verifications_per_hour_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS SELECT toStartOfHour(time) AS time, workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99 FROM default.key_verifications_per_minute_v2 GROUP BY time, workspace_id, key_space_id, identity_id, key_id, outcome, tags;
-- Create "key_verifications_per_minute_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_minute_mv_v2` TO `default`.`key_verifications_per_minute_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, count(*) AS count, sum(spent_credits) AS spent_credits, avgState(latency) AS latency_avg, quantilesTDigestState(0.75)(latency) AS latency_p75, quantilesTDigestState(0.99)(latency) AS latency_p99, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.key_verifications_raw_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
-- Create "key_verifications_per_month_mv_v2" view
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_month_mv_v2` TO `default`.`key_verifications_per_month_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfMonth(time)) AS time FROM default.key_verifications_per_day_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
-- Create "ratelimits_last_used_mv_v2" view
🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 289-296, the
SELECT column order in the materialized views does not match the target table
column order (which expects time first, then workspace_id, key_space_id,
identity_id, key_id, outcome, tags, then aggregates); reorder each SELECT so
time is the first selected column followed by workspace_id, key_space_id,
identity_id, key_id, outcome, tags, and then the aggregate expressions for
count, spent_credits, latency_avg, latency_p75, latency_p99 for all four MVs
(per_minute, per_hour, per_day, per_month), preserving the existing time
expressions (toStartOfMinute(...), toStartOfHour(...),
toDate(toStartOfDay(...)), toDate(toStartOfMonth(...))) and GROUP BY the same
reordered columns.

Comment on lines +1 to +11
CREATE TABLE billable_ratelimits_per_month_v2 (
year Int16,
month Int8,
workspace_id String,
count Int64
) ENGINE = SummingMergeTree ()
PARTITION BY
(year, month)
ORDER BY
(workspace_id, year, month);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make count type consistent with other v2 aggregates.

Align on SimpleAggregateFunction(sum, Int64) or stick to plain Int64 everywhere; avoid mixing.

-  count Int64
+  count SimpleAggregateFunction(sum, Int64)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql lines 1-11,
the count column is currently Int64 but other v2 aggregate tables use
SimpleAggregateFunction(sum, Int64); update the count column type to
SimpleAggregateFunction(sum, Int64) so it matches the v2 aggregate convention
(i.e., change "count Int64" to "count SimpleAggregateFunction(sum, Int64)"),
preserving the rest of the DDL as-is.

Comment on lines +2 to +6
year Int16,
month Int8,
workspace_id String,
count Int64
) ENGINE = SummingMergeTree ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use unsigned year/month or cast (type mismatch).

Same as verifications: toYear/toMonth produce unsigned.

-  year Int16,
-  month Int8,
+  year UInt16,
+  month UInt8,

or cast in MV.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
year Int16,
month Int8,
workspace_id String,
count Int64
) ENGINE = SummingMergeTree ()
year UInt16,
month UInt8,
workspace_id String,
count Int64
) ENGINE = SummingMergeTree ()
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql around
lines 2 to 6, the year and month columns are Int16/Int8 but ClickHouse
toYear/toMonth return UInt16/UInt8, causing a type mismatch; update the schema
to use unsigned types (UInt16 for year, UInt8 for month) or explicitly cast the
toYear/toMonth expressions to Int16/Int8 in the materialized view definition so
the MV column types and source expressions match.

Comment on lines +1 to +11
-- Billing Billable Verifications Per Month Table and Materialized View
CREATE TABLE billable_verifications_per_month_v2 (
year Int16,
month Int8,
workspace_id String,
count Int64
) ENGINE = SummingMergeTree ()
PARTITION BY
(year, month)
ORDER BY
(workspace_id, year, month);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Prefer a single time Date column over separate year/month (TTL + uniformity).

Most v2 tables (e.g., api_requests_per_month_v2) bucket on a Date and gain clear TTLs. Consider switching to time Date with ORDER BY (workspace_id, time) for uniform querying and retention.

🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql around
lines 1 to 11, the table currently uses separate year and month columns which
prevents using Date TTLs and is inconsistent with other v2 monthly tables;
change the schema to use a single Date column (e.g., time Date) instead of
year/month, update PARTITION BY/ORDER BY to use time (PARTITION BY to use
month(time) or to keep monthly partitions as appropriate, and ORDER BY
(workspace_id, time)), and update any dependent materialized views/ingest
queries to populate the new time Date column from the original timestamp or from
year/month conversion so retention and querying align with other v2 tables.

Comment on lines +2 to +11
CREATE TABLE billable_verifications_per_month_v2 (
year Int16,
month Int8,
workspace_id String,
count Int64
) ENGINE = SummingMergeTree ()
PARTITION BY
(year, month)
ORDER BY
(workspace_id, year, month);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Align count type with other v2 tables for consistency.

Other v2 monthly tables use SimpleAggregateFunction(sum, Int64). Consider aligning to avoid mixed patterns.

-  count Int64
+  count SimpleAggregateFunction(sum, Int64)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql around
lines 2 to 11, the count column is currently Int64 but other v2 monthly tables
use SimpleAggregateFunction(sum, Int64); change the count column type to
SimpleAggregateFunction(sum, Int64) so the schema is consistent, and keep the
SummingMergeTree engine and partition/order clauses as-is to allow ClickHouse to
aggregate sums correctly.

Comment on lines +3 to +6
year Int16,
month Int8,
workspace_id String,
count Int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use unsigned year/month types or cast in MV (type mismatch).

toYear() returns UInt16 and toMonth() returns UInt8, but columns are Int16/Int8. Either switch columns to unsigned or CAST in the MV.

Apply one of:

-  year Int16,
-  month Int8,
+  year UInt16,
+  month UInt8,

or

-  toYear (time) AS year,
-  toMonth (time) AS month
+  CAST(toYear(time) AS Int16)  AS year,
+  CAST(toMonth(time) AS Int8)  AS month
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
year Int16,
month Int8,
workspace_id String,
count Int64
year UInt16,
month UInt8,
workspace_id String,
count Int64
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql around
lines 3 to 6, the MV defines year as Int16 and month as Int8 but
toYear()/toMonth() produce UInt16/UInt8 causing a type mismatch; fix by either
changing the column types to unsigned (UInt16 for year, UInt8 for month) or by
casting the toYear()/toMonth() expressions in the MV to the signed types
currently declared (i.e., add explicit CASTs), and ensure the chosen approach is
applied consistently in the MV definition and any INSERT/SELECTs that populate
these columns.

@chronark chronark merged commit db5041b into main Sep 11, 2025
24 of 25 checks passed
@chronark chronark deleted the clickhouse-schemas branch September 11, 2025 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants