-
Notifications
You must be signed in to change notification settings - Fork 584
feat: clickhouse schemas v2 #3914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughRemoves Goose-based migrations and legacy SQL under go/pkg/clickhouse/schema/databases/*. Introduces v2 ClickHouse schema/tables, materialized views, and a Dockerized initialization flow. Adds array utilities (Fill, Map, Reduce, Random) with tests. Updates CI/workflows, compose, Makefiles, and test harness. Adds migration tooling and comprehensive ClickHouse integration tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App as Producer Services
participant CH as ClickHouse
participant Raw as raw_v2 tables
participant MVmin as per_minute_v2 MV
participant MVhr as per_hour_v2 MV
participant MVday as per_day_v2 MV
participant MVmon as per_month_v2 MV
participant Biz as active_workspaces_v2 / billable_*_v2
App->>CH: INSERT events (api_requests_raw_v2 / key_verifications_raw_v2 / ratelimits_raw_v2)
CH-->>Raw: Store raw rows
Note over Raw, MVmin: New v2 MVs
Raw-->>MVmin: Aggregate per minute (states)
MVmin-->>MVhr: Aggregate per hour (merge states)
MVhr-->>MVday: Aggregate per day (merge)
MVday-->>MVmon: Aggregate per month (merge)
MVmon-->>Biz: Populate active_workspaces_v2 and billable_*_v2 via MVs
sequenceDiagram
autonumber
participant Src as v1 sources (raw_*_v1)
participant TempMV as temp_sync_*_v1_to_v2 (MVs)
participant CH as ClickHouse
participant Migr as ch_migrate_v2_simple.ts
Src-->>TempMV: New inserts mirrored to v2 raw
Migr->>CH: Loop over 6h windows (backfill)
Migr->>CH: INSERT INTO *_raw_v2 SELECT ... FROM *_raw_v1 WHERE time in window
CH-->>CH: MVs process to per-minute/hour/day/month v2
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Pre-merge checks (2 passed, 1 warning)❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Thank you for following the naming conventions for pull request titles! 🙏 |
@perkinsjr @Flo4604 pls :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 98
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
deployment/init-clickhouse.sh (1)
1-19
: Harden init-clickhouse.sh
- Enable strict mode (
set -euo pipefail
) andshopt -s nullglob
to avoid glob edge cases- Wait for ClickHouse readiness (
until SELECT 1 succeeds
) before applying schemas- Gather and version-sort
.sql
files deterministically viafind … -sort -zV
- Use
${CLICKHOUSE_ADMIN_USER:-default}
and${CLICKHOUSE_ADMIN_PASSWORD:-}
to avoid leaking unset envsgo/pkg/clickhouse/schema/000_oldschema.sql (1)
720-823
: Duplication/conflict risk between schema bootstrap and migrationsThis file recreates DBs/tables also defined in migrations (e.g., billing/verifications/ratelimits). Ensure only one path runs in CI/infra to avoid DDL conflicts.
Would you like a follow-up PR to remove the legacy bootstrap or gate it behind an environment flag?
Also applies to: 1-2
tools/migrate/package.json (1)
10-21
: Runtime dependency should not be a devDependencyThe migration script imports the ClickHouse client at runtime via tsx. Move it to dependencies to avoid failures in prod installs with NODE_ENV=production.
"dependencies": { + "@clickhouse/client": "^1.4.0", @@ - "devDependencies": { - "@clickhouse/client-web": "^1.11.1", + "devDependencies": {If you indeed need the web client, put that under dependencies instead and remove the Node client.
Also applies to: 22-29
deployment/docker-compose.yaml (2)
126-142
: Healthcheck only proves server liveness, not schema readiness.Consider validating a known table or schema version to avoid race with tests depending on v2 objects.
Example:
- test: - [ - "CMD", - "clickhouse-client", - "--host", - "localhost", - "--user", - "default", - "--password", - "password", - "--query", - "SELECT 1", - ] + test: ["CMD-SHELL", "clickhouse-client --user default --password password --query \"SELECT count() FROM system.tables WHERE database = 'analytics' AND name = 'api_requests_v2'\" | grep -q '1'"]
146-155
: MinIO console port mapping mismatch.MINIO_CONSOLE_PORT_NUMBER=3903 but ports map 2903. Console likely unreachable.
- - 2903:2903 + - 3903:3903go/pkg/clickhouse/ratelimits_test.go (1)
1-316
: Define missingmin
/max
helpers: The test callsmin
andmax
but no such functions are defined in the repo—add implementations (e.g.func min(a, b int) int { … }
,func max(a, b int) int { … }
) or import them from a utility package.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (3)
go/go.sum
is excluded by!**/*.sum
go/pkg/clickhouse/migrations/atlas.sum
is excluded by!**/*.sum
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (104)
.github/workflows/job_test_api_local.yaml
(0 hunks).github/workflows/job_test_unit.yaml
(0 hunks)Makefile
(0 hunks)deployment/Dockerfile.clickhouse
(1 hunks)deployment/docker-compose.yaml
(2 hunks)deployment/init-clickhouse.sh
(1 hunks)go/GO_DOCUMENTATION_GUIDELINES.md
(11 hunks)go/Makefile
(1 hunks)go/go.mod
(1 hunks)go/pkg/array/doc.go
(1 hunks)go/pkg/array/fill.go
(1 hunks)go/pkg/array/fill_test.go
(1 hunks)go/pkg/array/map.go
(1 hunks)go/pkg/array/map_test.go
(1 hunks)go/pkg/array/random.go
(1 hunks)go/pkg/array/random_test.go
(1 hunks)go/pkg/array/reduce.go
(1 hunks)go/pkg/array/reduce_test.go
(1 hunks)go/pkg/clickhouse/README.md
(1 hunks)go/pkg/clickhouse/key_verifications_test.go
(1 hunks)go/pkg/clickhouse/migrations/20250903085516_init.sql
(1 hunks)go/pkg/clickhouse/migrations/20250905101812.sql
(1 hunks)go/pkg/clickhouse/ratelimits_test.go
(1 hunks)go/pkg/clickhouse/schema/000_oldschema.sql
(1 hunks)go/pkg/clickhouse/schema/001_key_verifications_raw_v2.sql
(1 hunks)go/pkg/clickhouse/schema/002_key_verifications_per_minute_v2.sql
(1 hunks)go/pkg/clickhouse/schema/003_key_verifications_per_hour_v2.sql
(1 hunks)go/pkg/clickhouse/schema/004_key_verifications_per_day_v2.sql
(1 hunks)go/pkg/clickhouse/schema/005_key_verifications_per_month_v2.sql
(1 hunks)go/pkg/clickhouse/schema/006_ratelimits_raw_v2.sql
(1 hunks)go/pkg/clickhouse/schema/007_ratelimits_per_minute_v2.sql
(1 hunks)go/pkg/clickhouse/schema/008_ratelimits_per_hour_v2.sql
(1 hunks)go/pkg/clickhouse/schema/009_ratelimits_per_day_v2.sql
(1 hunks)go/pkg/clickhouse/schema/010_ratelimits_per_month_v2.sql
(1 hunks)go/pkg/clickhouse/schema/011_api_requests_raw_v2.sql
(1 hunks)go/pkg/clickhouse/schema/012_api_requests_per_minute_v2.sql
(1 hunks)go/pkg/clickhouse/schema/013_api_requests_per_hour_v2.sql
(1 hunks)go/pkg/clickhouse/schema/014_api_requests_per_day_v2.sql
(1 hunks)go/pkg/clickhouse/schema/015_active_workspaces_per_month_v2.sql
(1 hunks)go/pkg/clickhouse/schema/016_billable_ratelimits_per_month_v2.sql
(1 hunks)go/pkg/clickhouse/schema/017_billable_verifications_per_month_v2.sql
(1 hunks)go/pkg/clickhouse/schema/databases/000_README.md
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/001_database.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/003_key_verifications_per_hour_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/004_key_verifications_per_hour_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/005_key_verifications_per_day_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/006_key_verifications_per_day_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/007_key_verifications_per_month_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/008_key_verifications_per_month_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/009_key_verifications_per_hour_v2.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/010_key_verifications_per_hour_mv_v2.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/011_key_verifications_per_day_v2.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/012_key_verifications_per_day_mv_v2.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/013_key_verifications_per_month_v2.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/014_key_verifications_per_month_mv_v2.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/015_key_verifications_per_hour_v3.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/016_key_verifications_per_hour_mv_v3.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/017_key_verifications_per_day_v3.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/018_key_verifications_per_day_mv_v3.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/019_key_verifications_per_month_v3.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/020_key_verifications_per_month_mv_v3.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/021_key_verifications_per_minute_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/001_verifications/022_key_verifications_per_minute_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/002_ratelimits/000_database.sql
(0 hunks)go/pkg/clickhouse/schema/databases/002_ratelimits/001_raw_ratelimits_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/002_ratelimits/002_ratelimits_per_minute_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/002_ratelimits/003_ratelimits_per_minute_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/002_ratelimits/004_ratelimits_per_hour_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/002_ratelimits/005_ratelimits_per_hour_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/002_ratelimits/007_ratelimits_per_day_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/002_ratelimits/008_ratelimits_per_month_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/002_ratelimits/009_ratelimits_per_month_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/002_ratelimits/010_ratelimits_last_used_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/002_ratelimits/011_ratelimits_last_used_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/003_metrics/001_database.sql
(0 hunks)go/pkg/clickhouse/schema/databases/003_metrics/002_raw_api_requests_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/003_metrics/003_api_requests_per_hour_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/003_metrics/004_api_requests_per_hour_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/003_metrics/005_api_requests_per_minute_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/003_metrics/006_api_requests_per_minute_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/003_metrics/007_api_requests_per_day_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/003_metrics/008_api_requests_per_day_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/004_billing/001_database.sql
(0 hunks)go/pkg/clickhouse/schema/databases/004_billing/002_billable_verifications_per_month_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/004_billing/003_billable_verifications_per_month_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/004_billing/004_billable_verifications_v2.sql
(0 hunks)go/pkg/clickhouse/schema/databases/004_billing/005_billalble_verifications_v2_mv.sql
(0 hunks)go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1_mv.sql
(0 hunks)go/pkg/clickhouse/schema/databases/005_business/001_database.sql
(0 hunks)go/pkg/clickhouse/schema/databases/005_business/002_active_workspaces_per_month_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_keys_per_month_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_ratelimits_per_month_mv_v1.sql
(0 hunks)go/pkg/clickhouse/schema/embed.go
(0 hunks)go/pkg/clickhouse/schema/types.go
(1 hunks)go/pkg/clickhouse/testutil_test.go
(1 hunks)internal/clickhouse/Dockerfile
(0 hunks)internal/clickhouse/package.json
(1 hunks)internal/clickhouse/src/testutil.ts
(2 hunks)tools/local/src/main.ts
(1 hunks)tools/migrate/ch_migrate_v2_simple.ts
(1 hunks)tools/migrate/package.json
(1 hunks)
💤 Files with no reviewable changes (59)
- .github/workflows/job_test_unit.yaml
- go/pkg/clickhouse/schema/databases/002_ratelimits/004_ratelimits_per_hour_v1.sql
- go/pkg/clickhouse/schema/databases/001_verifications/015_key_verifications_per_hour_v3.sql
- go/pkg/clickhouse/schema/databases/002_ratelimits/001_raw_ratelimits_v1.sql
- go/pkg/clickhouse/schema/databases/005_business/002_active_workspaces_per_month_v1.sql
- go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_keys_per_month_mv_v1.sql
- go/pkg/clickhouse/schema/databases/004_billing/004_billable_verifications_v2.sql
- go/pkg/clickhouse/schema/embed.go
- go/pkg/clickhouse/schema/databases/002_ratelimits/003_ratelimits_per_minute_mv_v1.sql
- go/pkg/clickhouse/schema/databases/003_metrics/001_database.sql
- go/pkg/clickhouse/schema/databases/003_metrics/006_api_requests_per_minute_mv_v1.sql
- go/pkg/clickhouse/schema/databases/001_verifications/005_key_verifications_per_day_v1.sql
- go/pkg/clickhouse/schema/databases/001_verifications/019_key_verifications_per_month_v3.sql
- go/pkg/clickhouse/schema/databases/001_verifications/007_key_verifications_per_month_v1.sql
- .github/workflows/job_test_api_local.yaml
- internal/clickhouse/Dockerfile
- go/pkg/clickhouse/schema/databases/001_verifications/011_key_verifications_per_day_v2.sql
- go/pkg/clickhouse/schema/databases/001_verifications/022_key_verifications_per_minute_mv_v1.sql
- go/pkg/clickhouse/schema/databases/004_billing/001_database.sql
- go/pkg/clickhouse/schema/databases/003_metrics/004_api_requests_per_hour_mv_v1.sql
- go/pkg/clickhouse/schema/databases/001_verifications/017_key_verifications_per_day_v3.sql
- go/pkg/clickhouse/schema/databases/002_ratelimits/008_ratelimits_per_month_v1.sql
- go/pkg/clickhouse/schema/databases/001_verifications/020_key_verifications_per_month_mv_v3.sql
- go/pkg/clickhouse/schema/databases/003_metrics/008_api_requests_per_day_mv_v1.sql
- go/pkg/clickhouse/schema/databases/000_README.md
- go/pkg/clickhouse/schema/databases/001_verifications/003_key_verifications_per_hour_v1.sql
- go/pkg/clickhouse/schema/databases/004_billing/002_billable_verifications_per_month_v1.sql
- go/pkg/clickhouse/schema/databases/003_metrics/003_api_requests_per_hour_v1.sql
- go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql
- go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1.sql
- go/pkg/clickhouse/schema/databases/002_ratelimits/000_database.sql
- go/pkg/clickhouse/schema/databases/004_billing/005_billalble_verifications_v2_mv.sql
- go/pkg/clickhouse/schema/databases/001_verifications/006_key_verifications_per_day_mv_v1.sql
- Makefile
- go/pkg/clickhouse/schema/databases/001_verifications/012_key_verifications_per_day_mv_v2.sql
- go/pkg/clickhouse/schema/databases/005_business/001_database.sql
- go/pkg/clickhouse/schema/databases/003_metrics/002_raw_api_requests_v1.sql
- go/pkg/clickhouse/schema/databases/001_verifications/001_database.sql
- go/pkg/clickhouse/schema/databases/001_verifications/021_key_verifications_per_minute_v1.sql
- go/pkg/clickhouse/schema/databases/001_verifications/009_key_verifications_per_hour_v2.sql
- go/pkg/clickhouse/schema/databases/002_ratelimits/002_ratelimits_per_minute_v1.sql
- go/pkg/clickhouse/schema/databases/002_ratelimits/010_ratelimits_last_used_v1.sql
- go/pkg/clickhouse/schema/databases/001_verifications/018_key_verifications_per_day_mv_v3.sql
- go/pkg/clickhouse/schema/databases/002_ratelimits/007_ratelimits_per_day_mv_v1.sql
- go/pkg/clickhouse/schema/databases/002_ratelimits/011_ratelimits_last_used_mv_v1.sql
- go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_ratelimits_per_month_mv_v1.sql
- go/pkg/clickhouse/schema/databases/001_verifications/016_key_verifications_per_hour_mv_v3.sql
- go/pkg/clickhouse/schema/databases/003_metrics/005_api_requests_per_minute_v1.sql
- go/pkg/clickhouse/schema/databases/001_verifications/013_key_verifications_per_month_v2.sql
- go/pkg/clickhouse/schema/databases/001_verifications/014_key_verifications_per_month_mv_v2.sql
- go/pkg/clickhouse/schema/databases/002_ratelimits/009_ratelimits_per_month_mv_v1.sql
- go/pkg/clickhouse/schema/databases/003_metrics/007_api_requests_per_day_v1.sql
- go/pkg/clickhouse/schema/databases/001_verifications/008_key_verifications_per_month_mv_v1.sql
- go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql
- go/pkg/clickhouse/schema/databases/001_verifications/010_key_verifications_per_hour_mv_v2.sql
- go/pkg/clickhouse/schema/databases/001_verifications/004_key_verifications_per_hour_mv_v1.sql
- go/pkg/clickhouse/schema/databases/004_billing/003_billable_verifications_per_month_mv_v1.sql
- go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1_mv.sql
- go/pkg/clickhouse/schema/databases/002_ratelimits/005_ratelimits_per_hour_mv_v1.sql
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
📚 Learning: 2025-04-22T14:40:51.459Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Applied to files:
go/pkg/clickhouse/README.md
go/pkg/clickhouse/schema/006_ratelimits_raw_v2.sql
go/pkg/clickhouse/schema/001_key_verifications_raw_v2.sql
go/pkg/clickhouse/migrations/20250903085516_init.sql
go/pkg/clickhouse/schema/003_key_verifications_per_hour_v2.sql
go/pkg/clickhouse/schema/000_oldschema.sql
go/pkg/clickhouse/migrations/20250905101812.sql
📚 Learning: 2025-04-22T14:43:11.724Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
Applied to files:
go/pkg/clickhouse/README.md
go/pkg/clickhouse/schema/001_key_verifications_raw_v2.sql
🧬 Code graph analysis (7)
go/pkg/clickhouse/key_verifications_test.go (6)
go/pkg/uid/uid.go (5)
WorkspacePrefix
(20-20)KeyPrefix
(16-16)KeyAuthPrefix
(21-21)IdentityPrefix
(28-28)RequestPrefix
(19-19)go/pkg/array/fill.go (1)
Fill
(106-116)go/pkg/clickhouse/schema/types.go (1)
KeyVerificationV2
(6-18)go/pkg/array/random.go (1)
Random
(137-142)go/pkg/array/reduce.go (1)
Reduce
(193-199)go/pkg/array/map.go (1)
Map
(151-157)
tools/local/src/main.ts (2)
tools/local/src/db.ts (1)
prepareDatabase
(13-82)tools/local/src/cmd/api.ts (1)
bootstrapApi
(9-52)
go/pkg/array/fill_test.go (2)
go/pkg/array/fill.go (1)
Fill
(106-116)go/pkg/array/random.go (1)
Random
(137-142)
go/pkg/array/reduce_test.go (2)
go/pkg/array/reduce.go (1)
Reduce
(193-199)go/pkg/array/map.go (1)
Map
(151-157)
go/pkg/array/map_test.go (1)
go/pkg/array/map.go (1)
Map
(151-157)
go/pkg/array/random_test.go (2)
go/pkg/array/random.go (1)
Random
(137-142)go/pkg/array/fill.go (1)
Fill
(106-116)
go/pkg/clickhouse/ratelimits_test.go (6)
go/pkg/uid/uid.go (5)
WorkspacePrefix
(20-20)RatelimitNamespacePrefix
(25-25)IdentityPrefix
(28-28)KeyPrefix
(16-16)RequestPrefix
(19-19)go/pkg/array/fill.go (1)
Fill
(106-116)go/pkg/array/random.go (1)
Random
(137-142)go/pkg/clickhouse/schema/types.go (1)
RatelimitV2
(23-31)go/pkg/array/reduce.go (1)
Reduce
(193-199)go/pkg/array/map.go (1)
Map
(151-157)
🪛 golangci-lint (2.2.2)
go/pkg/clickhouse/key_verifications_test.go
19-19: TestKeyVerifications should use t.Cleanup instead of defer
(tparallel)
go/pkg/array/fill_test.go
216-216: SA4010: this result of append is never used, except maybe in other appends
(staticcheck)
go/pkg/array/reduce_test.go
24-24: variable max has same name as predeclared identifier
(predeclared)
36-36: variable min has same name as predeclared identifier
(predeclared)
347-347: variable max has same name as predeclared identifier
(predeclared)
go/pkg/array/map_test.go
339-339: SA4010: this result of append is never used, except maybe in other appends
(staticcheck)
go/pkg/array/random.go
[high] 141-141: G404: Use of weak random number generator (math/rand or math/rand/v2 instead of crypto/rand)
(gosec)
go/pkg/clickhouse/ratelimits_test.go
18-18: TestRatelimits_ComprehensiveLoadTest should use t.Cleanup instead of defer
(tparallel)
🪛 LanguageTool
go/pkg/clickhouse/README.md
[grammar] ~1-~1: Use correct spacing
Context: # Changing schemas To change a schema, just edit a table in...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~3-~3: There might be a mistake here.
Context: ...just edit a table in the schema folder. Afterwards generate a migration file using atlas. ...
(QB_NEW_EN_OTHER)
[grammar] ~4-~4: There might be a mistake here.
Context: ...s generate a migration file using atlas. Make sure docker is running, so atlas ca...
(QB_NEW_EN)
[grammar] ~5-~5: Use commas correctly
Context: ...using atlas. Make sure docker is running, so atlas can use it to generate migrati...
(QB_NEW_EN_OTHER_ERROR_IDS_33)
[grammar] ~5-~5: Use correct spacing
Context: ...atlas can use it to generate migrations. bash cd go/pkg/clickhouse atlas migrate diff \ --dir "file://migrations" \ --to "file://schema" \ --dev-url "docker://clickhouse/latest"
(QB_NEW_EN_OTHER_ERROR_IDS_5)
go/GO_DOCUMENTATION_GUIDELINES.md
[grammar] ~393-~393: Use correct spacing
Context: ... - [TypeName]
for structs/interfaces - [ConstantName]
for constants ## Best Practices and Ant...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
🪛 markdownlint-cli2 (0.17.2)
go/pkg/clickhouse/README.md
10-10: Hard tabs
Column: 1
(MD010, no-hard-tabs)
11-11: Hard tabs
Column: 1
(MD010, no-hard-tabs)
12-12: Hard tabs
Column: 1
(MD010, no-hard-tabs)
🪛 Checkov (3.2.334)
deployment/Dockerfile.clickhouse
[LOW] 1-8: Ensure that HEALTHCHECK instructions have been added to container images
(CKV_DOCKER_2)
[LOW] 1-8: Ensure that a user for the container has been created
(CKV_DOCKER_3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Test Go API Local / Test
- GitHub Check: Test API / API Test Local
- GitHub Check: Build / Build
- GitHub Check: Test Packages / Test
🔇 Additional comments (15)
go/pkg/clickhouse/testutil_test.go (1)
1-6
: Test-only helpers are scoped correctlyUsing package clickhouse_test and keeping helpers unexported is appropriate.
tools/local/src/main.ts (1)
51-55
: Verify ClickHouse readiness in startContainers
I didn’t find any health-check or wait for ClickHouse startup insidestartContainers
; please confirm it blocks until CH’s HTTP port accepts queries. If not, add an explicit readiness step (e.g. runclickhouse-client --query "SELECT 1"
) after eachawait startContainers([...])
.go/pkg/array/reduce.go (2)
193-199
: LGTM — minimal, correct implementation.Performs a left fold with O(n)/O(1) as advertised.
137-150
: Fix non-compiling example in docs.The “Maximum” example references an undefined max function and may misuse math.MinInt depending on Go version. Suggest an inlined comparator.
-// max := array.Reduce(numbers, func(a, b int) int { max(a, b) }, math.MinInt) // Maximum +// max := array.Reduce(numbers, func(a, b int) int { +// if a > b { return a } +// return b +// }, math.MinInt)⛔ Skipped due to learnings
Learnt from: Flo4604 PR: unkeyed/unkey#3800 File: go/apps/api/integration/multi_node_usagelimiting/accuracy/accuracy_test.go:231-240 Timestamp: 2025-08-19T08:53:43.952Z Learning: Go 1.21+ includes builtin min() and max() functions for ordered types including integers, so min/max calls in Go code should not be flagged as undefined when the project uses Go 1.21 or later.
go/pkg/array/fill.go (1)
111-115
: Compile error: cannot range over int.
for i := range length
doesn’t compile. Use a counted loop.- for i := range length { - slice[i] = generator() - } + for i := 0; i < length; i++ { + slice[i] = generator() + }⛔ Skipped due to learnings
Learnt from: Flo4604 PR: unkeyed/unkey#3800 File: go/apps/api/integration/multi_node_usagelimiting/run.go:107-126 Timestamp: 2025-08-19T08:57:31.793Z Learning: Go 1.22+ supports "range over integers" syntax: `for range N` iterates N times, and `for i := range N` iterates with i from 0 to N-1. This is valid Go syntax and should not be flagged as a compilation error.
go/pkg/clickhouse/schema/016_billable_ratelimits_per_month_v2.sql (1)
12-23
: Invalid suggestion:ratelimits_per_month_v2
only defines atime
(DateTime) column, not separateyear
ormonth
fields, so usingtoYear(time)
/toMonth(time)
is correct; thepassed
column also exists.Likely an incorrect or invalid review comment.
go/pkg/array/map_test.go (1)
11-200
: Solid coverage and properties.Good breadth: types, immutability, per-element invocation, order, large slices.
go/pkg/array/random_test.go (2)
31-48
: Flake note acknowledged; OK as-is.Probability of all-equal draws is negligible; acceptable for unit scope.
119-121
: Use the standard b.N loop; testing.B has no Loop().This won’t compile; replace b.Loop() with a for over b.N.
- for b.Loop() { + for i := 0; i < b.N; i++ { Random(slice) }Also applies to: 131-133
⛔ Skipped due to learnings
Learnt from: imeyer PR: unkeyed/unkey#3733 File: go/pkg/repeat/every_test.go:383-383 Timestamp: 2025-08-05T15:19:50.563Z Learning: Go 1.24 introduced testing.B.Loop method which simplifies benchmark functions by handling iteration internally, ensuring setup/cleanup runs only once per benchmark count, and preventing compiler optimizations that could affect benchmark accuracy.
Learnt from: Flo4604 PR: unkeyed/unkey#3800 File: go/apps/api/integration/multi_node_usagelimiting/run.go:107-126 Timestamp: 2025-08-19T08:57:31.793Z Learning: Go 1.22+ supports "range over integers" syntax: `for range N` iterates N times, and `for i := range N` iterates with i from 0 to N-1. This is valid Go syntax and should not be flagged as a compilation error.
go/pkg/array/fill_test.go (1)
1-114
: Confirm Fill implementation loop.Relevant snippet shows
for i := range length
, which is invalid over an int. Ensure Fill usesfor i := 0; i < length; i++ { ... }
.go/pkg/array/reduce_test.go (1)
272-282
: No missingintPtr
helper
The helperintPtr(int) *int
is defined in map_test.go within the same package array, so reduce_test.go compiles as is.go/pkg/clickhouse/schema/types.go (2)
55-66
: ConfirmTime
type matches table column type.Aggregated tables typically use
DateTime/DateTime64
, but hereTime
isint64
. If you selecttime
directly from CH, scanning intoint64
will fail unless you cast (toUnixTimestamp
) in queries. Either switch totime.Time
or standardize on selecting epoch.Would you like me to change these fields to
time.Time
and update queries accordingly?
81-89
: SameTime
type concern for API requests aggregates.Align with the chosen convention (either
time.Time
fields or always casting in SQL).go/pkg/clickhouse/ratelimits_test.go (1)
41-43
: Undefined min/max will not compile unless provided elsewhere.Either import a helper or implement local minInt/maxInt.
- numNamespaces := max(1, min(numRecords/1000, 100)) // 100-1000 namespaces - numIdentifiers := max(1, min(numRecords/100, 10000)) // 1000-10000 identifiers + numNamespaces := maxInt(1, minInt(numRecords/1000, 100)) // 1-100 + numIdentifiers := maxInt(1, minInt(numRecords/100, 10000)) // 1-10000Add helpers:
func minInt(a, b int) int { if a < b { return a }; return b } func maxInt(a, b int) int { if a > b { return a }; return b }⛔ Skipped due to learnings
Learnt from: Flo4604 PR: unkeyed/unkey#3800 File: go/apps/api/integration/multi_node_usagelimiting/accuracy/accuracy_test.go:231-240 Timestamp: 2025-08-19T08:53:43.952Z Learning: Go 1.21+ includes builtin min() and max() functions for ordered types including integers, so min/max calls in Go code should not be flagged as undefined when the project uses Go 1.21 or later.
go/pkg/clickhouse/migrations/20250905101812.sql (1)
281-285
: Temp sync MVs depend on v1 sources; creation will fail if sources are absent; also no backfill.Ensure verifications.raw_key_verifications_v1, metrics.raw_api_requests_v1, ratelimits.raw_ratelimits_v1 exist before creating these MVs, or gate creation. Without POPULATE/backfill, historical data won’t migrate.
Proposed approach:
- Create databases/tables if needed, or wrap in deployment steps that guarantee presence.
- Backfill with one-off INSERT INTO ... SELECT ... for historical windows, then enable MVs for tail data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review continued from previous batch...
go/pkg/clickhouse/schema/015_active_workspaces_per_month_v2.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review continued from previous batch...
Merge activity
|
ENGINE = MergeTree() | ||
PARTITION BY toYYYYMMDD(fromUnixTimestamp64Milli(time)) | ||
ORDER BY (workspace_id, time, key_space_id, outcome) | ||
TTL toDateTime(fromUnixTimestamp64Milli(time)) + INTERVAL 1 MONTH DELETE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not be INTERVAL 3 MONTH DELETE
Our retention is 90 days, Unless I am dumb. Which is very possible and I don't understand this as much as I thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a single comment about a time frame.
Outside of that it look good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too-long-tried-to-read but i trust you know what you're doing here 😄
Mmm @perkinsjr's comment will need addressing.. but I still approve. |
avgMergeState(latency_avg) as latency_avg, | ||
quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75, | ||
quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99, | ||
toStartOfDay(time) AS time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we be better of using toDate(fromUnixTimestamp64Milli(time))
here since the minutes are not relevant to us at all? Applies to all the per_day schemas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time
here is a DateTime, not an int64
why would this be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can switch it to Date then which uses 2bytes less its a micro optimization but still since its per day we don't need to store 00:00:00 for each one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah let me give it a shot, I think we'd need to do toDate(toStartOfDay(time)) as time
The only thing i really noticed is that we are missing ratelimits_last_used_v1 in the new schema, do we not need this mv anymore? |
Graphite Automations"Notify author when CI fails" took an action on this PR • (09/11/25)1 teammate was notified to this PR based on Andreas Thomas's automation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
♻️ Duplicate comments (18)
go/pkg/array/random.go (1)
3-5
: Doc/behavior mismatch + gosec G404: seed global PRNG and justify non-crypto use.
Still unseeded and will trip G404 in many pipelines. Seed once and add scoped nolints.import ( - "math/rand" + "math/rand" + "time" ) +func init() { + // Seed global PRNG once for non-cryptographic randomness. + //nolint:gosec // G404: math/rand is intentional for non-crypto selection. + rand.Seed(time.Now().UnixNano()) +} @@ - return slice[rand.Intn(len(slice))] + return slice[rand.Intn(len(slice))] //nolint:gosec // G404: intentional non-crypto randomness.Also applies to: 23-28
go/GO_DOCUMENTATION_GUIDELINES.md (1)
392-395
: Indentation consistency in cross-reference bullets.
Normalize spacing to match surrounding bullets.10. **Cross-references** using Go's `[Reference]` format: - - `[OtherFunc]` for functions - - `[TypeName]` for structs/interfaces - - `[ConstantName]` for constants + - `[OtherFunc]` for functions + - `[TypeName]` for structs/interfaces + - `[ConstantName]` for constantsgo/pkg/clickhouse/key_verifications_test.go (7)
25-27
: Prefer t.Cleanup over defer for connection teardown.Ensures proper ordering with subtests and parallel runs.
- defer conn.Close() + t.Cleanup(func() { require.NoError(t, conn.Close()) })
35-35
: Gate 1M-row test under -short/CI.Prevents slow/flaky CI.
- numRecords := 1_000_000 + if testing.Short() { + t.Skip("skipping large load test in -short mode") + } + numRecords := 1_000_000
66-69
: Fix misleading latency comment.Distribution is heavy-tailed; values can exceed 100ms.
- latency := rand.ExpFloat64()*50 + 1 // 1-100ms base range + latency := rand.ExpFloat64()*50 + 1 // ~Exp(mean=50ms); typically <100ms but heavy-tailed
113-123
: Add per-query timeouts and avoid shared err in parallel subtests.Use a fresh err variable and a short context inside Eventually to prevent hangs and data races.
- require.EventuallyWithT(t, func(c *assert.CollectT) { - queried := int64(0) - err = conn.QueryRow(ctx, "...").Scan(&queried) - require.NoError(c, err) + require.EventuallyWithT(t, func(c *assert.CollectT) { + ctxQ, cancel := context.WithTimeout(ctx, 30*time.Second) + defer cancel() + queried := int64(0) + qErr := conn.QueryRow(ctxQ, "...").Scan(&queried) + require.NoError(c, qErr) }, time.Minute, time.Second)Also applies to: 167-179, 198-209, 231-240, 255-270, 281-289, 302-310, 327-335
112-124
: Capture loop variables in subtests.Prevent flakiness under parallelization.
- for _, table := range []string{...} { - t.Run(table, func(t *testing.T) { + for _, table := range []string{...} { + table := table + t.Run(table, func(t *testing.T) {Apply similarly for keyID and identityID closures:
- for _, keyID := range keys[:10] { + for _, keyID := range keys[:10] { + keyID := keyID ... - for _, identityID := range identities[:10] { + for _, identityID := range identities[:10] { + identityID := identityIDAlso applies to: 166-173, 197-204, 230-236, 254-262, 279-286, 301-306, 326-331
258-263
: Scan quantiles into float64.The driver returns Float64; avoid precision loss/scan errors.
- var ( - queriedAvg float64 - queriedP75 float32 - queriedP99 float32 - ) - err = conn.QueryRow(ctx, "...").Scan(&queriedAvg, &queriedP75, &queriedP99) + var ( + queriedAvg float64 + queriedP75 float64 + queriedP99 float64 + ) + err = conn.QueryRow(ctx, "...").Scan(&queriedAvg, &queriedP75, &queriedP99)
341-356
: Make billing assertion resilient to MV lag.Poll until MV catches up to avoid flakes.
- var queried int64 - err = conn.QueryRow(ctx, "SELECT sum(count) FROM billable_verifications_per_month_v2 WHERE workspace_id = ?;", workspaceID).Scan(&queried) - require.NoError(t, err) - require.Equal(t, billableVerifications, queried) + require.EventuallyWithT(t, func(c *assert.CollectT) { + ctxQ, cancel := context.WithTimeout(ctx, 30*time.Second) + defer cancel() + var queried int64 + qErr := conn.QueryRow(ctxQ, "SELECT sum(count) FROM billable_verifications_per_month_v2 WHERE workspace_id = ?;", workspaceID).Scan(&queried) + require.NoError(c, qErr) + require.Equal(c, billableVerifications, queried) + }, time.Minute, time.Second)go/pkg/clickhouse/schema/014_api_requests_per_day_v2.sql (3)
35-35
: Normalize to UTC before day bucketing.Avoid server-TZ drift.
- toDate(toStartOfDay(time)) AS time + toDate(toStartOfDay(toTimeZone(time, 'UTC'))) AS time
10-10
: Counts should be unsigned.Keep count non-negative and consistent across v2 layers.
- count SimpleAggregateFunction(sum, Int64), + count SimpleAggregateFunction(sum, UInt64),
5-5
: Invalid/undesired type for HTTP status; use UInt16.ClickHouse has no bare Int type; statuses are non-negative.
- response_status Int, + response_status UInt16,go/pkg/clickhouse/schema/009_ratelimits_per_day_v2.sql (2)
19-36
: Fix MV: column order, quantile MergeState params, and GROUP BY alias.Current SELECT order doesn’t match table, MergeState uses params, and GROUP BY uses alias.
-CREATE MATERIALIZED VIEW ratelimits_per_day_mv_v2 TO ratelimits_per_day_v2 AS -SELECT - workspace_id, - namespace_id, - identifier, - sum(total) as total, - sum(passed) as passed, - avgMergeState(latency_avg) as latency_avg, - quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75, - quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99, - toDate(toStartOfDay(time)) AS time -FROM - ratelimits_per_hour_v2 -GROUP BY - workspace_id, - namespace_id, - time, - identifier; +CREATE MATERIALIZED VIEW ratelimits_per_day_mv_v2 TO ratelimits_per_day_v2 AS +SELECT + toDate(toStartOfDay(time)) AS time, + workspace_id, + namespace_id, + identifier, + sum(passed) AS passed, + sum(total) AS total, + avgMergeState(latency_avg) AS latency_avg, + quantilesTDigestMergeState(latency_p75) AS latency_p75, + quantilesTDigestMergeState(latency_p99) AS latency_p99 +FROM ratelimits_per_hour_v2 +GROUP BY + toDate(toStartOfDay(time)), + workspace_id, + namespace_id, + identifier;
17-17
: TTL style consistency.Align with toIntervalDay.
-TTL time + INTERVAL 100 DAY DELETE; +TTL time + toIntervalDay(100) DELETE;go/pkg/clickhouse/schema/010_ratelimits_per_month_v2.sql (1)
19-36
: Fix MV: column order, MergeState params, and GROUP BY alias.Bring SELECT into table order, remove TDigest params, and group by expression.
-CREATE MATERIALIZED VIEW ratelimits_per_month_mv_v2 TO ratelimits_per_month_v2 AS -SELECT - workspace_id, - namespace_id, - identifier, - sum(total) as total, - sum(passed) as passed, - avgMergeState(latency_avg) as latency_avg, - quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75, - quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99, - toDate(toStartOfMonth(time)) AS time -FROM - ratelimits_per_day_v2 -GROUP BY - workspace_id, - namespace_id, - time, - identifier; +CREATE MATERIALIZED VIEW ratelimits_per_month_mv_v2 TO ratelimits_per_month_v2 AS +SELECT + toDate(toStartOfMonth(time)) AS time, + workspace_id, + namespace_id, + identifier, + sum(passed) AS passed, + sum(total) AS total, + avgMergeState(latency_avg) AS latency_avg, + quantilesTDigestMergeState(latency_p75) AS latency_p75, + quantilesTDigestMergeState(latency_p99) AS latency_p99 +FROM ratelimits_per_day_v2 +GROUP BY + toDate(toStartOfMonth(time)), + workspace_id, + namespace_id, + identifier;go/pkg/clickhouse/schema/004_key_verifications_per_day_v2.sql (2)
33-57
: Fix MV: column order, quantile MergeState params, and GROUP BY alias.Ensure positional match with table schema; remove TDigest params; group by expression.
-CREATE MATERIALIZED VIEW key_verifications_per_day_mv_v2 TO key_verifications_per_day_v2 AS -SELECT - workspace_id, - key_space_id, - identity_id, - key_id, - outcome, - tags, - sum(count) as count, - sum(spent_credits) as spent_credits, - avgMergeState(latency_avg) as latency_avg, - quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75, - quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99, - toDate(toStartOfDay(time)) AS time -FROM - key_verifications_per_hour_v2 -GROUP BY - workspace_id, - time, - key_space_id, - identity_id, - key_id, - outcome, - tags -; +CREATE MATERIALIZED VIEW key_verifications_per_day_mv_v2 TO key_verifications_per_day_v2 AS +SELECT + toDate(toStartOfDay(time)) AS time, + workspace_id, + key_space_id, + identity_id, + key_id, + outcome, + tags, + sum(count) AS count, + sum(spent_credits) AS spent_credits, + avgMergeState(latency_avg) AS latency_avg, + quantilesTDigestMergeState(latency_p75) AS latency_p75, + quantilesTDigestMergeState(latency_p99) AS latency_p99 +FROM key_verifications_per_hour_v2 +GROUP BY + toDate(toStartOfDay(time)), + workspace_id, + key_space_id, + identity_id, + key_id, + outcome, + tags;
18-19
: Consider monthly partitions for daily grain.Monthly partitions reduce part churn with 100‑day TTL.
-PARTITION BY - toYYYYMMDD (time) +PARTITION BY + toYYYYMM (time)go/pkg/clickhouse/schema/005_key_verifications_per_month_v2.sql (1)
33-57
: Fix MV: positional order, TDigest MergeState params, and GROUP BY alias.Reorder to match table, remove params, group by expression.
-CREATE MATERIALIZED VIEW key_verifications_per_month_mv_v2 TO key_verifications_per_month_v2 AS -SELECT - workspace_id, - key_space_id, - identity_id, - key_id, - outcome, - tags, - sum(count) as count, - sum(spent_credits) as spent_credits, - avgMergeState(latency_avg) as latency_avg, - quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75, - quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99, - toDate(toStartOfMonth(time)) AS time -FROM - key_verifications_per_day_v2 -GROUP BY - workspace_id, - time, - key_space_id, - identity_id, - key_id, - outcome, - tags -; +CREATE MATERIALIZED VIEW key_verifications_per_month_mv_v2 TO key_verifications_per_month_v2 AS +SELECT + toDate(toStartOfMonth(time)) AS time, + workspace_id, + key_space_id, + identity_id, + key_id, + outcome, + tags, + sum(count) AS count, + sum(spent_credits) AS spent_credits, + avgMergeState(latency_avg) AS latency_avg, + quantilesTDigestMergeState(latency_p75) AS latency_p75, + quantilesTDigestMergeState(latency_p99) AS latency_p99 +FROM key_verifications_per_day_v2 +GROUP BY + toDate(toStartOfMonth(time)), + workspace_id, + key_space_id, + identity_id, + key_id, + outcome, + tags;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (17)
go/GO_DOCUMENTATION_GUIDELINES.md
(12 hunks)go/pkg/array/doc.go
(1 hunks)go/pkg/array/fill.go
(1 hunks)go/pkg/array/map.go
(1 hunks)go/pkg/array/random.go
(1 hunks)go/pkg/array/reduce.go
(1 hunks)go/pkg/clickhouse/key_verifications_test.go
(1 hunks)go/pkg/clickhouse/schema/000_legacy.sql
(1 hunks)go/pkg/clickhouse/schema/004_key_verifications_per_day_v2.sql
(1 hunks)go/pkg/clickhouse/schema/005_key_verifications_per_month_v2.sql
(1 hunks)go/pkg/clickhouse/schema/009_ratelimits_per_day_v2.sql
(1 hunks)go/pkg/clickhouse/schema/010_ratelimits_per_month_v2.sql
(1 hunks)go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql
(1 hunks)go/pkg/clickhouse/schema/014_api_requests_per_day_v2.sql
(1 hunks)go/pkg/clickhouse/schema/016_api_requests_per_minute_v2.sql
(1 hunks)go/pkg/clickhouse/schema/017_billable_ratelimits_per_month_v2.sql
(1 hunks)go/pkg/clickhouse/schema/018_billable_verifications_per_month_v2.sql
(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
📚 Learning: 2025-04-22T14:40:51.459Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Applied to files:
go/pkg/clickhouse/schema/018_billable_verifications_per_month_v2.sql
go/pkg/clickhouse/schema/017_billable_ratelimits_per_month_v2.sql
go/pkg/clickhouse/schema/000_legacy.sql
📚 Learning: 2025-08-14T16:25:48.167Z
Learnt from: Flo4604
PR: unkeyed/unkey#3785
File: go/apps/api/routes/v2_keys_reroll_key/401_test.go:52-61
Timestamp: 2025-08-14T16:25:48.167Z
Learning: User Flo4604 requested creation of a GitHub issue to track converting all test files to use table-driven test patterns as a broader codebase improvement, following the suggestion made during review of go/apps/api/routes/v2_keys_reroll_key/401_test.go.
Applied to files:
go/pkg/clickhouse/key_verifications_test.go
📚 Learning: 2025-07-16T15:38:53.491Z
Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/db/replica.go:8-11
Timestamp: 2025-07-16T15:38:53.491Z
Learning: For debugging database replica usage in go/pkg/db/replica.go, it's acceptable to mark QueryRowContext operations as "success" even though SQL errors only surface during row.Scan() calls. The timing metrics are the primary concern for debugging replica performance patterns.
Applied to files:
go/pkg/clickhouse/key_verifications_test.go
📚 Learning: 2025-04-22T14:43:11.724Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
Applied to files:
go/pkg/clickhouse/schema/000_legacy.sql
🧬 Code graph analysis (1)
go/pkg/clickhouse/key_verifications_test.go (6)
go/pkg/uid/uid.go (5)
WorkspacePrefix
(20-20)KeyPrefix
(16-16)KeyAuthPrefix
(21-21)IdentityPrefix
(28-28)RequestPrefix
(19-19)go/pkg/array/fill.go (1)
Fill
(23-33)go/pkg/clickhouse/schema/types.go (1)
KeyVerificationV2
(6-18)go/pkg/array/random.go (1)
Random
(23-28)go/pkg/array/reduce.go (1)
Reduce
(28-34)go/pkg/array/map.go (1)
Map
(23-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build / Build
- GitHub Check: Test Packages / Test
- GitHub Check: Test Go API Local / Test
- GitHub Check: Test API / API Test Local
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (5)
go/pkg/array/reduce.go (1)
3-27
: Docs/readability: Solid and actionable examples.
Clear API intent, parameters, and examples. No changes needed.go/pkg/array/map.go (1)
24-29
: LGTM.
Preallocation + index-only range is idiomatic and efficient.go/pkg/array/fill.go (1)
29-31
: Build-breaking bug: invalid range over int.
for i := range length
does not compile. Use a counted loop.- for i := range length { + for i := 0; i < length; i++ { slice[i] = generator() }⛔ Skipped due to learnings
Learnt from: Flo4604 PR: unkeyed/unkey#3800 File: go/apps/api/integration/multi_node_usagelimiting/run.go:107-126 Timestamp: 2025-08-19T08:57:31.793Z Learning: Go 1.22+ supports "range over integers" syntax: `for range N` iterates N times, and `for i := range N` iterates with i from 0 to N-1. This is valid Go syntax and should not be flagged as a compilation error.
go/pkg/array/doc.go (1)
1-41
: Package docs are concise and accurate.
Cross-refs resolve; concurrency notes are appropriate.go/pkg/clickhouse/schema/000_legacy.sql (1)
49-68
: Confirm MV→table column order for legacy v1 objects.Many MVs (e.g., this block) select columns in a different order than the target table. With TO, ClickHouse inserts by position. Verify this matches production definitions or adjust SELECT to table order to avoid column shifts.
go/pkg/clickhouse/schema/017_billable_ratelimits_per_month_v2.sql
Outdated
Show resolved
Hide resolved
go/pkg/clickhouse/schema/017_billable_ratelimits_per_month_v2.sql
Outdated
Show resolved
Hide resolved
go/pkg/clickhouse/schema/018_billable_verifications_per_month_v2.sql
Outdated
Show resolved
Hide resolved
) ENGINE = SummingMergeTree () | ||
PARTITION BY | ||
(year, month) | ||
ORDER BY | ||
(workspace_id, year, month); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Retention: confirm TTL policy for billing data.
Monthly billables typically require long retention. If intended, add an explicit TTL or document why this table is indefinite.
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/018_billable_verifications_per_month_v2.sql around
lines 7 to 11, the table DDL currently has no TTL/retention specified so billing
data will be kept indefinitely; either add an explicit TTL clause (e.g., TTL
toDate(year, month, 1) + INTERVAL X MONTH or TTL <timestamp_column> + INTERVAL X
MONTH) to enforce the desired retention period, or add a comment above the DDL
documenting that indefinite retention is intentional and why (including expected
data size and compliance requirements). Ensure the chosen TTL uses the
appropriate date column and interval and that the change is approved by the
billing/PD team.
go/pkg/clickhouse/schema/018_billable_verifications_per_month_v2.sql
Outdated
Show resolved
Hide resolved
…into clickhouse-schemas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 21
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (2)
go/pkg/clickhouse/migrations/atlas.sum
is excluded by!**/*.sum
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (10)
go/pkg/clickhouse/migrations/20250911070454.sql
(1 hunks)go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql
(1 hunks)go/pkg/clickhouse/schema/012_api_requests_raw_v2.sql
(1 hunks)go/pkg/clickhouse/schema/013_api_requests_per_minute_v2.sql
(1 hunks)go/pkg/clickhouse/schema/014_api_requests_per_hour_v2.sql
(1 hunks)go/pkg/clickhouse/schema/015_api_requests_per_day_v2.sql
(1 hunks)go/pkg/clickhouse/schema/016_api_requests_per_month_v2.sql
(1 hunks)go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql
(1 hunks)go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql
(1 hunks)go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
📚 Learning: 2025-04-22T14:40:51.459Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Applied to files:
go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql
go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql
go/pkg/clickhouse/migrations/20250911070454.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test Packages / Test
🔇 Additional comments (10)
go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (1)
23-23
: Confirm source time unit alignment (ms vs s).If
ratelimits.raw_ratelimits_v1.time
is ms-epoch, keepingInt64
is fine; if it’s DateTime/DateTime64, align the type accordingly.go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql (1)
12-23
: Verify source columnpassed
exists and semantics.Confirm
ratelimits_per_month_v2
exposes a numericpassed
compatible withsum(passed)
and that summing across its dimensions matches billing rules.go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql (2)
13-26
: Confirm source fields and filter semantics.Ensure
key_verifications_per_month_v2
hasoutcome
andcount
with expected types; validate that billing should include onlyoutcome = 'VALID'
(case, enum). If outcome is enum, compare by enum value to avoid string drift.
23-26
: Group by columns must match SELECT after schema change.If you adopt
time Date
, adjust MV to group bytime
instead of(year, month)
.go/pkg/clickhouse/schema/013_api_requests_per_minute_v2.sql (1)
27-36
: Time bucketing from ms epoch—confirm unit.
fromUnixTimestamp64Milli(time)
assumes ms-epoch inapi_requests_raw_v2.time
. Validate source type.go/pkg/clickhouse/schema/014_api_requests_per_hour_v2.sql (1)
27-35
: Idempotent rollup path looks good.Hourly MV groups correctly by all dims and buckets via
toStartOfHour(time)
.go/pkg/clickhouse/schema/015_api_requests_per_day_v2.sql (1)
27-44
: Daily aggregation is consistent.Correct bucketing to day and grouping over all dims.
go/pkg/clickhouse/schema/016_api_requests_per_month_v2.sql (1)
27-35
: Monthly rollup LGTM.Buckets to month start and groups over all dims.
go/pkg/clickhouse/schema/012_api_requests_raw_v2.sql (1)
33-55
: Incorrect — no duplicate MV found; only defined in go/pkg/clickhouse/schema/012_api_requests_raw_v2.sqlrg output shows the CREATE TABLE api_requests_raw_v2 and CREATE MATERIALIZED VIEW temp_sync_metrics_v1_to_v2 only in that file; the duplicate alleged in 20250911070454.sql was not found. Use IF NOT EXISTS and fully-qualified names only if you intentionally add a second CREATE.
Likely an incorrect or invalid review comment.
go/pkg/clickhouse/migrations/20250911070454.sql (1)
309-311
: Backfill and decommission plan for temp sync MVs.Temp sync views target only new inserts (no POPULATE). Ensure backfill jobs cover historical data, and add explicit DROP steps in the migration plan once cutover is done to avoid silent double-writes.
Would you like a one-off backfill SQL for ratelimits_last_used_v2 that unions v1/v2 raw tables and computes MAX(time) per (workspace_id, namespace_id, identifier)?
-- Create "api_requests_raw_v2" table | ||
CREATE TABLE `default`.`api_requests_raw_v2` ( | ||
`request_id` String, | ||
`time` Int64 CODEC(Delta(8), LZ4), | ||
`workspace_id` String, | ||
`host` String, | ||
`method` LowCardinality(String), | ||
`path` String, | ||
`request_headers` Array(String), | ||
`request_body` String, | ||
`response_status` Int32, | ||
`response_headers` Array(String), | ||
`response_body` String, | ||
`error` String, | ||
`service_latency` Int64, | ||
`user_agent` String, | ||
`ip_address` String, | ||
`region` LowCardinality(String), | ||
INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1 | ||
) ENGINE = MergeTree | ||
PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Unify raw-table shapes across domains.
All three raw_v2 tables should align on: PARTITION BY day, time CODEC(Delta(8), LZ4), index_granularity, and dedup settings. Today only api_requests_raw_v2 sets index_granularity; none set PARTITION BY.
I can generate a consolidated patch if you confirm daily partitioning is desired for all raw tables.
Also applies to: 176-194, 259-271
🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 63-83 (and
similarly for blocks at 176-194 and 259-271), the raw_v2 tables are not using a
unified table shape: add a daily PARTITION BY expression (partition by day
derived from the time column), ensure the time column keeps CODEC(Delta(8),
LZ4), and make the table SETTINGS consistent across all raw_v2 tables by setting
the same index_granularity and non_replicated_deduplication_window values; apply
the same primary key / ORDER BY ordering and TTL pattern to match the other
raw_v2 tables so all three share identical partitioning, codec,
index_granularity, and dedup settings.
CREATE TABLE `default`.`api_requests_raw_v2` ( | ||
`request_id` String, | ||
`time` Int64 CODEC(Delta(8), LZ4), | ||
`workspace_id` String, | ||
`host` String, | ||
`method` LowCardinality(String), | ||
`path` String, | ||
`request_headers` Array(String), | ||
`request_body` String, | ||
`response_status` Int32, | ||
`response_headers` Array(String), | ||
`response_body` String, | ||
`error` String, | ||
`service_latency` Int64, | ||
`user_agent` String, | ||
`ip_address` String, | ||
`region` LowCardinality(String), | ||
INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1 | ||
) ENGINE = MergeTree | ||
PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raw table missing PARTITION BY; will create a single huge partition.
api_requests_raw_v2 here has no PARTITION BY, unlike the schema file. Add daily partitioning for manageability and TTL efficiency.
) ENGINE = MergeTree
+PARTITION BY toYYYYMMDD(fromUnixTimestamp64Milli(time))
PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000;
Apply the same fix to key_verifications_raw_v2 (Line 193-194 block) and ratelimits_raw_v2 (Line 270-271 block).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
CREATE TABLE `default`.`api_requests_raw_v2` ( | |
`request_id` String, | |
`time` Int64 CODEC(Delta(8), LZ4), | |
`workspace_id` String, | |
`host` String, | |
`method` LowCardinality(String), | |
`path` String, | |
`request_headers` Array(String), | |
`request_body` String, | |
`response_status` Int32, | |
`response_headers` Array(String), | |
`response_body` String, | |
`error` String, | |
`service_latency` Int64, | |
`user_agent` String, | |
`ip_address` String, | |
`region` LowCardinality(String), | |
INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1 | |
) ENGINE = MergeTree | |
PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000; | |
CREATE TABLE `default`.`api_requests_raw_v2` ( | |
`request_id` String, | |
`time` Int64 CODEC(Delta(8), LZ4), | |
`workspace_id` String, | |
`host` String, | |
`method` LowCardinality(String), | |
`path` String, | |
`request_headers` Array(String), | |
`request_body` String, | |
`response_status` Int32, | |
`response_headers` Array(String), | |
`response_body` String, | |
`error` String, | |
`service_latency` Int64, | |
`user_agent` String, | |
`ip_address` String, | |
`region` LowCardinality(String), | |
INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1 | |
) ENGINE = MergeTree | |
PARTITION BY toYYYYMMDD(fromUnixTimestamp64Milli(time)) | |
PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000; |
🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 64-83, the
CREATE TABLE for api_requests_raw_v2 lacks a PARTITION BY clause which will
create one huge partition; add a daily PARTITION BY expression based on the time
column (e.g., partition by the day derived from the Int64 time/timestamp) so
partitions are created per day and TTLs work efficiently; update the CREATE
TABLE definitions for key_verifications_raw_v2 (around lines 193-194) and
ratelimits_raw_v2 (around lines 270-271) the same way to use daily partitioning
derived from their time columns.
-- Create "ratelimits_last_used_v2" table | ||
CREATE TABLE `default`.`ratelimits_last_used_v2` ( | ||
`time` Int64, | ||
`workspace_id` String, | ||
`namespace_id` String, | ||
`identifier` String | ||
) ENGINE = AggregatingMergeTree | ||
PRIMARY KEY (`workspace_id`, `namespace_id`, `identifier`) ORDER BY (`workspace_id`, `namespace_id`, `identifier`) SETTINGS index_granularity = 8192; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ratelimits_last_used_v2 is incorrectly modeled; maxSimpleState() is invalid for the declared column type.
With AggregatingMergeTree, either store AggregateFunction(max, Int64) with maxState(), or use SimpleAggregateFunction(max, Int64) and insert raw values. Current DDL (Int64 + maxSimpleState) won’t merge correctly.
Use SimpleAggregateFunction variant:
-CREATE TABLE `default`.`ratelimits_last_used_v2` (
- `time` Int64,
+CREATE TABLE `default`.`ratelimits_last_used_v2` (
+ `time` SimpleAggregateFunction(max, Int64),
`workspace_id` String,
`namespace_id` String,
`identifier` String
) ENGINE = AggregatingMergeTree
PRIMARY KEY (`workspace_id`, `namespace_id`, `identifier`) ORDER BY (`workspace_id`, `namespace_id`, `identifier`) SETTINGS index_granularity = 8192;
@@
-CREATE MATERIALIZED VIEW `default`.`ratelimits_last_used_mv_v2` TO `default`.`ratelimits_last_used_v2` AS SELECT workspace_id, namespace_id, identifier, maxSimpleState(time) AS time FROM ratelimits.raw_ratelimits_v1 GROUP BY workspace_id, namespace_id, identifier;
+CREATE MATERIALIZED VIEW `default`.`ratelimits_last_used_mv_v2` TO `default`.`ratelimits_last_used_v2` AS
+SELECT workspace_id, namespace_id, identifier, time
+FROM ratelimits.raw_ratelimits_v1;
Option B (AggregateFunction) if preferred:
- `time` Int64,
+ `time` AggregateFunction(max, Int64),
@@
-CREATE MATERIALIZED VIEW ... AS SELECT ..., maxSimpleState(time) AS time ...
+CREATE MATERIALIZED VIEW ... AS SELECT ..., maxState(time) AS time ...
Also applies to: 296-297
🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 195 to 202 (also
applies to lines 296-297), the table is declared with `time Int64` but the
AggregatingMergeTree is using maxSimpleState() which is invalid for a plain
Int64 column; fix by changing the column definition to a matching aggregate
type: either make the column SimpleAggregateFunction(max, Int64) and continue
inserting raw Int64 values, or change it to AggregateFunction(max, Int64) and
ensure you insert maxState() values (and adjust any insert/upsert SQL
accordingly); update the DDL to use one of these two consistent patterns and
verify the corresponding insert logic in the migration is updated to match.
CREATE MATERIALIZED VIEW `default`.`active_workspaces_keys_per_month_mv_v2` TO `default`.`active_workspaces_per_month_v2` AS SELECT workspace_id, toDate(time) AS time FROM default.key_verifications_per_month_v2; | ||
-- Create "active_workspaces_ratelimits_per_month_mv_v2" view | ||
CREATE MATERIALIZED VIEW `default`.`active_workspaces_ratelimits_per_month_mv_v2` TO `default`.`active_workspaces_per_month_v2` AS SELECT workspace_id, toDate(time) AS time FROM default.ratelimits_per_month_v2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
ReplacingMergeTree without version may accumulate duplicates until merges.
active_workspaces_* MVs insert potentially duplicate PK rows. ReplacingMergeTree will collapse them eventually, but if write volume is high, consider AggregatingMergeTree with SimpleAggregateFunction(anyLast, Date) or add a version column.
CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` TO `default`.`api_requests_per_minute_v2` AS SELECT workspace_id, path, response_status, host, method, count(*) AS count, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.api_requests_raw_v2 GROUP BY workspace_id, path, response_status, host, method, time; | ||
-- Create "api_requests_per_month_mv_v2" view | ||
CREATE MATERIALIZED VIEW `default`.`api_requests_per_month_mv_v2` TO `default`.`api_requests_per_month_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfMonth(time)) AS time FROM default.api_requests_per_day_v2 GROUP BY workspace_id, path, response_status, host, method, time; | ||
-- Create "billable_ratelimits_per_month_mv_v2" view | ||
CREATE MATERIALIZED VIEW `default`.`billable_ratelimits_per_month_mv_v2` TO `default`.`billable_ratelimits_per_month_v2` AS SELECT workspace_id, sum(passed) AS count, toYear(time) AS year, toMonth(time) AS month FROM default.ratelimits_per_month_v2 GROUP BY workspace_id, year, month; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Make migration idempotent.
Consider IF NOT EXISTS on MVs to survive partial runs/retries, especially while temp sync views exist in both schema and migrations.
-CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` ...
+CREATE MATERIALIZED VIEW IF NOT EXISTS `default`.`api_requests_per_minute_mv_v2` ...
Apply similarly to other CREATE MATERIALIZED VIEW statements.
Also applies to: 287-295, 299-305
🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 281 to 285 (and
likewise for the other CREATE MATERIALIZED VIEW ranges at 287-295 and 299-305),
the CREATE MATERIALIZED VIEW statements are not idempotent and will fail on
partial runs/retries; update each CREATE MATERIALIZED VIEW to include IF NOT
EXISTS so the migration can safely be re-run when temp/partial views already
exist, and apply the same change to all other CREATE MATERIALIZED VIEW
statements mentioned in the comment.
workspace_id, | ||
toDate (time) as time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Nit: normalize spacing.
toDate (time)
→ toDate(time)
.
- toDate (time) as time
+ toDate(time) as time
Also applies to: 18-19
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql around lines
11-12 (and also at lines 18-19), normalize the spacing in the function call by
changing occurrences of "toDate (time)" to "toDate(time)"; update those two
places so the function name and its parentheses have no space between them.
CREATE TABLE billable_ratelimits_per_month_v2 ( | ||
year Int16, | ||
month Int8, | ||
workspace_id String, | ||
count Int64 | ||
) ENGINE = SummingMergeTree () | ||
PARTITION BY | ||
(year, month) | ||
ORDER BY | ||
(workspace_id, year, month); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Align monthly schema with other v2 tables (use time Date
).
Other monthly tables (e.g., api_requests_per_month_v2) model month as time Date
(month start). Using (year, month)
here introduces inconsistency. Prefer time Date
for query ergonomics and uniformity.
-CREATE TABLE billable_ratelimits_per_month_v2 (
- year Int16,
- month Int8,
- workspace_id String,
- count Int64
-) ENGINE = SummingMergeTree ()
-PARTITION BY
- (year, month)
-ORDER BY
- (workspace_id, year, month);
+CREATE TABLE billable_ratelimits_per_month_v2 (
+ time Date,
+ workspace_id String,
+ count Int64
+) ENGINE = SummingMergeTree()
+PARTITION BY toYYYYMM(time)
+ORDER BY (time, workspace_id);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
CREATE TABLE billable_ratelimits_per_month_v2 ( | |
year Int16, | |
month Int8, | |
workspace_id String, | |
count Int64 | |
) ENGINE = SummingMergeTree () | |
PARTITION BY | |
(year, month) | |
ORDER BY | |
(workspace_id, year, month); | |
CREATE TABLE billable_ratelimits_per_month_v2 ( | |
time Date, | |
workspace_id String, | |
count Int64 | |
) ENGINE = SummingMergeTree() | |
PARTITION BY toYYYYMM(time) | |
ORDER BY (time, workspace_id); |
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql around
lines 1-10, the table models month as separate year Int16 and month Int8 columns
which is inconsistent with other v2 monthly tables; change the schema to use a
single time Date column representing the month start (drop year and month
columns), keep count Int64, and update ENGINE/PARTITION BY/ORDER BY to partition
by time and ORDER BY (workspace_id, time) to match the other monthly v2 tables
for consistent querying and ergonomics.
ORDER BY | ||
(workspace_id, year, month); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Ordering to match partition for better pruning.
Given PARTITION BY (year, month)
, ordering as (year, month, workspace_id)
aids range scans by month.
-ORDER BY
- (workspace_id, year, month);
+ORDER BY
+ (year, month, workspace_id);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
ORDER BY | |
(workspace_id, year, month); | |
ORDER BY | |
(year, month, workspace_id); |
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql around
lines 9 to 10, the ORDER BY currently uses (workspace_id, year, month) which
doesn't match the PARTITION BY (year, month); change the ORDER BY to (year,
month, workspace_id) so the ordering aligns with the partitioning and improves
range pruning for month-based queries.
CREATE MATERIALIZED VIEW billable_ratelimits_per_month_mv_v2 TO billable_ratelimits_per_month_v2 AS | ||
SELECT | ||
workspace_id, | ||
sum(passed) AS count, | ||
toYear (time) AS year, | ||
toMonth (time) AS month | ||
FROM | ||
ratelimits_per_month_v2 | ||
GROUP BY | ||
workspace_id, | ||
year, | ||
month; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Minor: trim extra spaces in function calls.
toYear (time)
→ toYear(time)
, toMonth (time)
→ toMonth(time)
.
- toYear (time) AS year,
- toMonth (time) AS month
+ toYear(time) AS year,
+ toMonth(time) AS month
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
CREATE MATERIALIZED VIEW billable_ratelimits_per_month_mv_v2 TO billable_ratelimits_per_month_v2 AS | |
SELECT | |
workspace_id, | |
sum(passed) AS count, | |
toYear (time) AS year, | |
toMonth (time) AS month | |
FROM | |
ratelimits_per_month_v2 | |
GROUP BY | |
workspace_id, | |
year, | |
month; | |
CREATE MATERIALIZED VIEW billable_ratelimits_per_month_mv_v2 TO billable_ratelimits_per_month_v2 AS | |
SELECT | |
workspace_id, | |
sum(passed) AS count, | |
toYear(time) AS year, | |
toMonth(time) AS month | |
FROM | |
ratelimits_per_month_v2 | |
GROUP BY | |
workspace_id, | |
year, | |
month; |
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql around
lines 12 to 23, there are extra spaces inside the function calls `toYear (time)`
and `toMonth (time)`; remove the space so they read `toYear(time)` and
`toMonth(time)` to follow consistent formatting and avoid style issues.
-- Billing Billable Verifications Per Month Table and Materialized View | ||
CREATE TABLE billable_verifications_per_month_v2 ( | ||
year Int16, | ||
month Int8, | ||
workspace_id String, | ||
count Int64 | ||
) ENGINE = SummingMergeTree () | ||
PARTITION BY | ||
(year, month) | ||
ORDER BY | ||
(workspace_id, year, month); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Unify month key shape with v2 (prefer time Date
).
Mirror 016_api_requests_per_month_v2 for consistency: time Date
(month start) instead of (year, month)
.
-CREATE TABLE billable_verifications_per_month_v2 (
- year Int16,
- month Int8,
- workspace_id String,
- count Int64
-) ENGINE = SummingMergeTree ()
-PARTITION BY
- (year, month)
-ORDER BY
- (workspace_id, year, month);
+CREATE TABLE billable_verifications_per_month_v2 (
+ time Date,
+ workspace_id String,
+ count Int64
+) ENGINE = SummingMergeTree()
+PARTITION BY toYYYYMM(time)
+ORDER BY (time, workspace_id);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
-- Billing Billable Verifications Per Month Table and Materialized View | |
CREATE TABLE billable_verifications_per_month_v2 ( | |
year Int16, | |
month Int8, | |
workspace_id String, | |
count Int64 | |
) ENGINE = SummingMergeTree () | |
PARTITION BY | |
(year, month) | |
ORDER BY | |
(workspace_id, year, month); | |
-- Billing Billable Verifications Per Month Table and Materialized View | |
CREATE TABLE billable_verifications_per_month_v2 ( | |
time Date, | |
workspace_id String, | |
count Int64 | |
) ENGINE = SummingMergeTree() | |
PARTITION BY toYYYYMM(time) | |
ORDER BY (time, workspace_id); |
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql around
lines 1 to 11, replace the separate year and month columns with a single time
Date column (representing the month start) to match v2 conventions; update the
PARTITION BY to use the time month key (e.g. toYYYYMM(time) or the same
partition expression used in 016_api_requests_per_month_v2) and change ORDER BY
to (workspace_id, time) so the table schema and materialized view shape mirror
016_api_requests_per_month_v2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 24
♻️ Duplicate comments (3)
go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (3)
15-23
: MV target and state function now consistent—good; keep it.With the above table change,
maxSimpleState(time)
matches the column type. No DB-qualification mismatch anymore.
1-11
: Column order (dims first) improves locality.Optional: move
time
after keys in ORDER BY/table for readability; not functionally required.- (time SimpleAggregateFunction(max, Int64), workspace_id, namespace_id, identifier) + (workspace_id String, namespace_id String, identifier String, time SimpleAggregateFunction(max, Int64))
1-10
: AggregatingMergeTree requires aggregate states; fix time type.Table defines
time Int64
but MV insertsmaxSimpleState(time)
(aggregate state). Use SimpleAggregateFunction(max, Int64) (or switch MV to maxState and use AggregateFunction).CREATE TABLE IF NOT EXISTS ratelimits_last_used_v2 ( - time Int64, + time SimpleAggregateFunction(max, Int64), workspace_id String, namespace_id String, identifier String, ) ENGINE = AggregatingMergeTree() ORDER BY (workspace_id, namespace_id, identifier)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (2)
go/pkg/clickhouse/migrations/atlas.sum
is excluded by!**/*.sum
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (10)
go/pkg/clickhouse/migrations/20250911070454.sql
(1 hunks)go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql
(1 hunks)go/pkg/clickhouse/schema/012_api_requests_raw_v2.sql
(1 hunks)go/pkg/clickhouse/schema/013_api_requests_per_minute_v2.sql
(1 hunks)go/pkg/clickhouse/schema/014_api_requests_per_hour_v2.sql
(1 hunks)go/pkg/clickhouse/schema/015_api_requests_per_day_v2.sql
(1 hunks)go/pkg/clickhouse/schema/016_api_requests_per_month_v2.sql
(1 hunks)go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql
(1 hunks)go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql
(1 hunks)go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test API / API Test Local
🔇 Additional comments (5)
go/pkg/clickhouse/schema/012_api_requests_raw_v2.sql (2)
25-31
: Dedup setting on non-replicated MergeTree requires tokens.
non_replicated_deduplication_window
only dedups when inserts carry tokens. Verify your writers (and this MV) setinsert_deduplication_token
; otherwise this setting has no effect.
33-56
: Migration MV: add explicit column order and note drop plan.Column order matches now—good. Ensure an explicit DROP of this temp MV is included in the migration plan once writers switch to v2.
go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql (1)
12-23
: Optional: add billable filter parity check.If “billable” should only include passed ratelimits,
sum(passed)
is fine. Verify source schema sopassed
is a numeric (not a state) and covers all billable outcomes.go/pkg/clickhouse/schema/011_ratelimits_last_used_v2.sql (1)
15-28
: Answer Flo’s concern: is v1 “last used” still required?You add v2; confirm whether any readers still depend on
ratelimits_last_used_v1
and whether a temporary MV should backfill/keep it hot until cutover.go/pkg/clickhouse/schema/013_api_requests_per_minute_v2.sql (1)
1-26
: TTL and partition look sane for minute granularity.7-day TTL with daily partitions is reasonable.
-- Create "api_requests_raw_v2" table | ||
CREATE TABLE `default`.`api_requests_raw_v2` ( | ||
`request_id` String, | ||
`time` Int64 CODEC(Delta(8), LZ4), | ||
`workspace_id` String, | ||
`host` String, | ||
`method` LowCardinality(String), | ||
`path` String, | ||
`request_headers` Array(String), | ||
`request_body` String, | ||
`response_status` Int32, | ||
`response_headers` Array(String), | ||
`response_body` String, | ||
`error` String, | ||
`service_latency` Int64, | ||
`user_agent` String, | ||
`ip_address` String, | ||
`region` LowCardinality(String), | ||
INDEX `idx_request_id` ((request_id)) TYPE bloom_filter GRANULARITY 1 | ||
) ENGINE = MergeTree | ||
PRIMARY KEY (`workspace_id`, `time`, `request_id`) ORDER BY (`workspace_id`, `time`, `request_id`) TTL toDateTime(fromUnixTimestamp64Milli(time)) + toIntervalMonth(1) SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider stronger compression for large blobs.
request_body/response_body can be sizeable; consider ZSTD for better compression trade-offs.
-`time` Int64 CODEC(Delta(8), LZ4),
+`time` Int64 CODEC(Delta(8), LZ4),
...
-) ENGINE = MergeTree
+) ENGINE = MergeTree
SETTINGS index_granularity = 8192, non_replicated_deduplication_window = 10000
+-- Consider: CODEC(ZSTD(6)) on request_body/response_body columns
;
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 63 to 83, the
request_body and response_body columns are currently plain String which can be
large; update their column definitions to use stronger compression (e.g.,
CODEC(ZSTD(3)) or ZSTD(6)) to reduce storage and IO. Concretely, add
CODEC(ZSTD(<level>)) to request_body and response_body (and consider applying
the same to request_headers/response_headers if they can be large), choosing an
appropriate ZSTD level for the desired CPU/size trade-off.
-- Create "ratelimits_last_used_v2" table | ||
CREATE TABLE `default`.`ratelimits_last_used_v2` ( | ||
`time` Int64, | ||
`workspace_id` String, | ||
`namespace_id` String, | ||
`identifier` String | ||
) ENGINE = AggregatingMergeTree | ||
PRIMARY KEY (`workspace_id`, `namespace_id`, `identifier`) ORDER BY (`workspace_id`, `namespace_id`, `identifier`) SETTINGS index_granularity = 8192; | ||
-- Create "ratelimits_per_day_v2" table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ratelimits_last_used_v2 type/state mismatch; MV can’t insert AggregateFunction into Int64.
The table defines time Int64 under AggregatingMergeTree, but the MV emits maxSimpleState(time) (AggregateFunction). Use SimpleAggregateFunction(max, Int64) and plain max(time) in the MV.
-CREATE TABLE `default`.`ratelimits_last_used_v2` (
- `time` Int64,
+CREATE TABLE `default`.`ratelimits_last_used_v2` (
+ `time` SimpleAggregateFunction(max, Int64),
`workspace_id` String,
`namespace_id` String,
`identifier` String
) ENGINE = AggregatingMergeTree
PRIMARY KEY (`workspace_id`, `namespace_id`, `identifier`) ORDER BY (`workspace_id`, `namespace_id`, `identifier`) SETTINGS index_granularity = 8192;
-CREATE MATERIALIZED VIEW `default`.`ratelimits_last_used_mv_v2` TO `default`.`ratelimits_last_used_v2` AS
-SELECT workspace_id, namespace_id, identifier, maxSimpleState(time) AS time FROM ratelimits.raw_ratelimits_v1 GROUP BY workspace_id, namespace_id, identifier;
+CREATE MATERIALIZED VIEW `default`.`ratelimits_last_used_mv_v2` TO `default`.`ratelimits_last_used_v2` AS
+SELECT max(time) AS time, workspace_id, namespace_id, identifier FROM ratelimits.raw_ratelimits_v1 GROUP BY workspace_id, namespace_id, identifier;
Also applies to: 297-297
🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 195-203 (also
applies at line 297), the `time` column is declared as Int64 but the
materialized view emits AggregateFunction state (maxSimpleState(time)), causing
a type/state mismatch; change the `time` column type to
SimpleAggregateFunction(max, Int64) in the `ratelimits_last_used_v2` table
definition and update the MV to emit a plain aggregate result (use max(time) in
the INSERT/select) instead of the state function so the MV inserts a compatible
value into the SimpleAggregateFunction column.
-- Create "active_workspaces_keys_per_month_mv_v2" view | ||
CREATE MATERIALIZED VIEW `default`.`active_workspaces_keys_per_month_mv_v2` TO `default`.`active_workspaces_per_month_v2` AS SELECT workspace_id, toDate(time) AS time FROM default.key_verifications_per_month_v2; | ||
-- Create "active_workspaces_ratelimits_per_month_mv_v2" view | ||
CREATE MATERIALIZED VIEW `default`.`active_workspaces_ratelimits_per_month_mv_v2` TO `default`.`active_workspaces_per_month_v2` AS SELECT workspace_id, toDate(time) AS time FROM default.ratelimits_per_month_v2; | ||
-- Create "api_requests_per_day_mv_v2" view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
ReplacingMergeTree without version: duplicates require FINAL/OPTIMIZE.
active_workspaces_* MVs may insert the same PK multiple times per month. ReplacingMergeTree without a version is fine, but query paths should avoid FINAL in hot paths. If dedup timeliness matters, add a version column.
🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 272 to 276, the
materialized views active_workspaces_keys_per_month_mv_v2 and
active_workspaces_ratelimits_per_month_mv_v2 can insert duplicate primary keys
into a ReplacingMergeTree without a version, which forces use of FINAL/OPTIMIZE
to dedupe; add a version column to the target table and to these views' SELECTs
(e.g. a bigint or datetime like now64() or toUnixTimestamp(time)) and change the
target table to ReplacingMergeTree(version) so deduplication is deterministic
and can happen during merges without relying on FINAL; alternatively, if adding
a version is not acceptable, explicitly document that any hot-path queries must
use FINAL or run periodic OPTIMIZE FINAL, but the preferred fix is adding and
populating a version column in the MVs and altering the table definition to
ReplacingMergeTree(version).
CREATE MATERIALIZED VIEW `default`.`api_requests_per_day_mv_v2` TO `default`.`api_requests_per_day_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfDay(time)) AS time FROM default.api_requests_per_hour_v2 GROUP BY workspace_id, path, response_status, host, method, time; | ||
-- Create "api_requests_per_hour_mv_v2" view | ||
CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toStartOfHour(time) AS time FROM default.api_requests_per_minute_v2 GROUP BY workspace_id, path, response_status, host, method, time; | ||
-- Create "api_requests_per_minute_mv_v2" view | ||
CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` TO `default`.`api_requests_per_minute_v2` AS SELECT workspace_id, path, response_status, host, method, count(*) AS count, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.api_requests_raw_v2 GROUP BY workspace_id, path, response_status, host, method, time; | ||
-- Create "api_requests_per_month_mv_v2" view | ||
CREATE MATERIALIZED VIEW `default`.`api_requests_per_month_mv_v2` TO `default`.`api_requests_per_month_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfMonth(time)) AS time FROM default.api_requests_per_day_v2 GROUP BY workspace_id, path, response_status, host, method, time; | ||
-- Create "billable_ratelimits_per_month_mv_v2" view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All API request MVs have column-order mismatches (time must be first).
Each TO … MV must emit columns in the exact order of the target table: time, workspace_id, path, response_status, host, method, count. Reorder SELECTs accordingly.
-CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS
-SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toStartOfHour(time) AS time
+CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS
+SELECT toStartOfHour(time) AS time, workspace_id, path, response_status, host, method, sum(count) AS count
FROM default.api_requests_per_minute_v2
-GROUP BY workspace_id, path, response_status, host, method, time;
+GROUP BY time, workspace_id, path, response_status, host, method;
Apply the same reordering to:
- api_requests_per_day_mv_v2 (use toDate(toStartOfDay(time)) AS time first)
- api_requests_per_minute_mv_v2 (use toStartOfMinute(…) AS time first)
- api_requests_per_month_mv_v2 (use toDate(toStartOfMonth(time)) AS time first)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
CREATE MATERIALIZED VIEW `default`.`api_requests_per_day_mv_v2` TO `default`.`api_requests_per_day_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfDay(time)) AS time FROM default.api_requests_per_hour_v2 GROUP BY workspace_id, path, response_status, host, method, time; | |
-- Create "api_requests_per_hour_mv_v2" view | |
CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toStartOfHour(time) AS time FROM default.api_requests_per_minute_v2 GROUP BY workspace_id, path, response_status, host, method, time; | |
-- Create "api_requests_per_minute_mv_v2" view | |
CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` TO `default`.`api_requests_per_minute_v2` AS SELECT workspace_id, path, response_status, host, method, count(*) AS count, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.api_requests_raw_v2 GROUP BY workspace_id, path, response_status, host, method, time; | |
-- Create "api_requests_per_month_mv_v2" view | |
CREATE MATERIALIZED VIEW `default`.`api_requests_per_month_mv_v2` TO `default`.`api_requests_per_month_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfMonth(time)) AS time FROM default.api_requests_per_day_v2 GROUP BY workspace_id, path, response_status, host, method, time; | |
-- Create "billable_ratelimits_per_month_mv_v2" view | |
CREATE MATERIALIZED VIEW `default`.`api_requests_per_day_mv_v2` TO `default`.`api_requests_per_day_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfDay(time)) AS time FROM default.api_requests_per_hour_v2 GROUP BY workspace_id, path, response_status, host, method, time; | |
-- Create "api_requests_per_hour_mv_v2" view | |
CREATE MATERIALIZED VIEW `default`.`api_requests_per_hour_mv_v2` TO `default`.`api_requests_per_hour_v2` AS SELECT toStartOfHour(time) AS time, workspace_id, path, response_status, host, method, sum(count) AS count FROM default.api_requests_per_minute_v2 GROUP BY time, workspace_id, path, response_status, host, method; | |
-- Create "api_requests_per_minute_mv_v2" view | |
CREATE MATERIALIZED VIEW `default`.`api_requests_per_minute_mv_v2` TO `default`.`api_requests_per_minute_v2` AS SELECT workspace_id, path, response_status, host, method, count(*) AS count, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.api_requests_raw_v2 GROUP BY workspace_id, path, response_status, host, method, time; | |
-- Create "api_requests_per_month_mv_v2" view | |
CREATE MATERIALIZED VIEW `default`.`api_requests_per_month_mv_v2` TO `default`.`api_requests_per_month_v2` AS SELECT workspace_id, path, response_status, host, method, sum(count) AS count, toDate(toStartOfMonth(time)) AS time FROM default.api_requests_per_day_v2 GROUP BY workspace_id, path, response_status, host, method, time; | |
-- Create "billable_ratelimits_per_month_mv_v2" view |
🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 277 to 284, the
materialized view SELECTs emit columns in the wrong order relative to their
target tables; every TO target expects columns in the exact order: time,
workspace_id, path, response_status, host, method, count. Fix by reordering each
SELECT to project the time expression first (e.g., toDate(toStartOfDay(time)) AS
time, toStartOfHour(time) AS time, toStartOfMinute(...) AS time,
toDate(toStartOfMonth(time)) AS time), followed by workspace_id, path,
response_status, host, method, and then count so the SELECT column order matches
the target table exactly.
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_day_mv_v2` TO `default`.`key_verifications_per_day_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfDay(time)) AS time FROM default.key_verifications_per_hour_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags; | ||
-- Create "key_verifications_per_hour_mv_v2" view | ||
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toStartOfHour(time) AS time FROM default.key_verifications_per_minute_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags; | ||
-- Create "key_verifications_per_minute_mv_v2" view | ||
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_minute_mv_v2` TO `default`.`key_verifications_per_minute_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, count(*) AS count, sum(spent_credits) AS spent_credits, avgState(latency) AS latency_avg, quantilesTDigestState(0.75)(latency) AS latency_p75, quantilesTDigestState(0.99)(latency) AS latency_p99, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.key_verifications_raw_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags; | ||
-- Create "key_verifications_per_month_mv_v2" view | ||
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_month_mv_v2` TO `default`.`key_verifications_per_month_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfMonth(time)) AS time FROM default.key_verifications_per_day_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags; | ||
-- Create "ratelimits_last_used_mv_v2" view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key verifications MVs: SELECT order must match table order.
Target tables start with time, then workspace_id, key_space_id, identity_id, key_id, outcome, tags, then aggregates. Reorder the SELECTs.
Example (hourly):
-CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS
-SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toStartOfHour(time) AS time
+CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS
+SELECT toStartOfHour(time) AS time, workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99
FROM default.key_verifications_per_minute_v2
-GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags;
+GROUP BY time, workspace_id, key_space_id, identity_id, key_id, outcome, tags;
Mirror this for minute/day/month MVs.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_day_mv_v2` TO `default`.`key_verifications_per_day_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfDay(time)) AS time FROM default.key_verifications_per_hour_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags; | |
-- Create "key_verifications_per_hour_mv_v2" view | |
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toStartOfHour(time) AS time FROM default.key_verifications_per_minute_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags; | |
-- Create "key_verifications_per_minute_mv_v2" view | |
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_minute_mv_v2` TO `default`.`key_verifications_per_minute_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, count(*) AS count, sum(spent_credits) AS spent_credits, avgState(latency) AS latency_avg, quantilesTDigestState(0.75)(latency) AS latency_p75, quantilesTDigestState(0.99)(latency) AS latency_p99, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.key_verifications_raw_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags; | |
-- Create "key_verifications_per_month_mv_v2" view | |
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_month_mv_v2` TO `default`.`key_verifications_per_month_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfMonth(time)) AS time FROM default.key_verifications_per_day_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags; | |
-- Create "ratelimits_last_used_mv_v2" view | |
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_day_mv_v2` TO `default`.`key_verifications_per_day_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfDay(time)) AS time FROM default.key_verifications_per_hour_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags; | |
-- Create "key_verifications_per_hour_mv_v2" view | |
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_hour_mv_v2` TO `default`.`key_verifications_per_hour_v2` AS SELECT toStartOfHour(time) AS time, workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99 FROM default.key_verifications_per_minute_v2 GROUP BY time, workspace_id, key_space_id, identity_id, key_id, outcome, tags; | |
-- Create "key_verifications_per_minute_mv_v2" view | |
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_minute_mv_v2` TO `default`.`key_verifications_per_minute_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, count(*) AS count, sum(spent_credits) AS spent_credits, avgState(latency) AS latency_avg, quantilesTDigestState(0.75)(latency) AS latency_p75, quantilesTDigestState(0.99)(latency) AS latency_p99, toStartOfMinute(fromUnixTimestamp64Milli(time)) AS time FROM default.key_verifications_raw_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags; | |
-- Create "key_verifications_per_month_mv_v2" view | |
CREATE MATERIALIZED VIEW `default`.`key_verifications_per_month_mv_v2` TO `default`.`key_verifications_per_month_v2` AS SELECT workspace_id, key_space_id, identity_id, key_id, outcome, tags, sum(count) AS count, sum(spent_credits) AS spent_credits, avgMergeState(latency_avg) AS latency_avg, quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99, toDate(toStartOfMonth(time)) AS time FROM default.key_verifications_per_day_v2 GROUP BY workspace_id, time, key_space_id, identity_id, key_id, outcome, tags; | |
-- Create "ratelimits_last_used_mv_v2" view |
🤖 Prompt for AI Agents
In go/pkg/clickhouse/migrations/20250911070454.sql around lines 289-296, the
SELECT column order in the materialized views does not match the target table
column order (which expects time first, then workspace_id, key_space_id,
identity_id, key_id, outcome, tags, then aggregates); reorder each SELECT so
time is the first selected column followed by workspace_id, key_space_id,
identity_id, key_id, outcome, tags, and then the aggregate expressions for
count, spent_credits, latency_avg, latency_p75, latency_p99 for all four MVs
(per_minute, per_hour, per_day, per_month), preserving the existing time
expressions (toStartOfMinute(...), toStartOfHour(...),
toDate(toStartOfDay(...)), toDate(toStartOfMonth(...))) and GROUP BY the same
reordered columns.
CREATE TABLE billable_ratelimits_per_month_v2 ( | ||
year Int16, | ||
month Int8, | ||
workspace_id String, | ||
count Int64 | ||
) ENGINE = SummingMergeTree () | ||
PARTITION BY | ||
(year, month) | ||
ORDER BY | ||
(workspace_id, year, month); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Make count type consistent with other v2 aggregates.
Align on SimpleAggregateFunction(sum, Int64) or stick to plain Int64 everywhere; avoid mixing.
- count Int64
+ count SimpleAggregateFunction(sum, Int64)
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql lines 1-11,
the count column is currently Int64 but other v2 aggregate tables use
SimpleAggregateFunction(sum, Int64); update the count column type to
SimpleAggregateFunction(sum, Int64) so it matches the v2 aggregate convention
(i.e., change "count Int64" to "count SimpleAggregateFunction(sum, Int64)"),
preserving the rest of the DDL as-is.
year Int16, | ||
month Int8, | ||
workspace_id String, | ||
count Int64 | ||
) ENGINE = SummingMergeTree () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use unsigned year/month or cast (type mismatch).
Same as verifications: toYear/toMonth produce unsigned.
- year Int16,
- month Int8,
+ year UInt16,
+ month UInt8,
or cast in MV.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
year Int16, | |
month Int8, | |
workspace_id String, | |
count Int64 | |
) ENGINE = SummingMergeTree () | |
year UInt16, | |
month UInt8, | |
workspace_id String, | |
count Int64 | |
) ENGINE = SummingMergeTree () |
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/018_billable_ratelimits_per_month_v2.sql around
lines 2 to 6, the year and month columns are Int16/Int8 but ClickHouse
toYear/toMonth return UInt16/UInt8, causing a type mismatch; update the schema
to use unsigned types (UInt16 for year, UInt8 for month) or explicitly cast the
toYear/toMonth expressions to Int16/Int8 in the materialized view definition so
the MV column types and source expressions match.
-- Billing Billable Verifications Per Month Table and Materialized View | ||
CREATE TABLE billable_verifications_per_month_v2 ( | ||
year Int16, | ||
month Int8, | ||
workspace_id String, | ||
count Int64 | ||
) ENGINE = SummingMergeTree () | ||
PARTITION BY | ||
(year, month) | ||
ORDER BY | ||
(workspace_id, year, month); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Prefer a single time Date column over separate year/month (TTL + uniformity).
Most v2 tables (e.g., api_requests_per_month_v2) bucket on a Date and gain clear TTLs. Consider switching to time Date
with ORDER BY (workspace_id, time) for uniform querying and retention.
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql around
lines 1 to 11, the table currently uses separate year and month columns which
prevents using Date TTLs and is inconsistent with other v2 monthly tables;
change the schema to use a single Date column (e.g., time Date) instead of
year/month, update PARTITION BY/ORDER BY to use time (PARTITION BY to use
month(time) or to keep monthly partitions as appropriate, and ORDER BY
(workspace_id, time)), and update any dependent materialized views/ingest
queries to populate the new time Date column from the original timestamp or from
year/month conversion so retention and querying align with other v2 tables.
CREATE TABLE billable_verifications_per_month_v2 ( | ||
year Int16, | ||
month Int8, | ||
workspace_id String, | ||
count Int64 | ||
) ENGINE = SummingMergeTree () | ||
PARTITION BY | ||
(year, month) | ||
ORDER BY | ||
(workspace_id, year, month); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Align count type with other v2 tables for consistency.
Other v2 monthly tables use SimpleAggregateFunction(sum, Int64). Consider aligning to avoid mixed patterns.
- count Int64
+ count SimpleAggregateFunction(sum, Int64)
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql around
lines 2 to 11, the count column is currently Int64 but other v2 monthly tables
use SimpleAggregateFunction(sum, Int64); change the count column type to
SimpleAggregateFunction(sum, Int64) so the schema is consistent, and keep the
SummingMergeTree engine and partition/order clauses as-is to allow ClickHouse to
aggregate sums correctly.
year Int16, | ||
month Int8, | ||
workspace_id String, | ||
count Int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use unsigned year/month types or cast in MV (type mismatch).
toYear() returns UInt16 and toMonth() returns UInt8, but columns are Int16/Int8. Either switch columns to unsigned or CAST in the MV.
Apply one of:
- year Int16,
- month Int8,
+ year UInt16,
+ month UInt8,
or
- toYear (time) AS year,
- toMonth (time) AS month
+ CAST(toYear(time) AS Int16) AS year,
+ CAST(toMonth(time) AS Int8) AS month
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
year Int16, | |
month Int8, | |
workspace_id String, | |
count Int64 | |
year UInt16, | |
month UInt8, | |
workspace_id String, | |
count Int64 |
🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql around
lines 3 to 6, the MV defines year as Int16 and month as Int8 but
toYear()/toMonth() produce UInt16/UInt8 causing a type mismatch; fix by either
changing the column types to unsigned (UInt16 for year, UInt8 for month) or by
casting the toYear()/toMonth() expressions in the MV to the signed types
currently declared (i.e., add explicit CASTs), and ensure the chosen approach is
applied consistently in the MV definition and any INSERT/SELECTs that populate
these columns.
This PR reorganises our clickhouse table schemas and adds credit as well as latency tracking.
In the past we had 2 ways of managing schemas which is never a good idea.
Going forward the way to make changes is to change the table definition in go/clickhouse/schema and then use atlas to generate a migration diff, which can be applied to clickhouse.
I also changed the materialization, previously we materialized all views from the raw table, now we’re cascading changes from raw > minute > hour > day > month cause it’s less CPU intensive, because it needs to process way less data.
I added some tests to ensure data propagates through the materialized views correctly, but please look at the table definitions carefully and try to spot what I missed.
I decided to put all of these tables in the default database, it was just a mess before and to clickhouse these are just namespaces, noting that would impact performance.
This does not change insertion or reads at all yet, it just creates new tables.
there are temporary MVs in place to automatically populate the new tables though and I am running a migration script to backfill all data.
In followup PRs we can switch over reads and writes to these new tables.
Summary by CodeRabbit
New Features
Chores
Documentation