Skip to content

Conversation

samejr
Copy link
Member

@samejr samejr commented Sep 18, 2025

Improves the evals for time specific queries in the AI search runs field.

It specifically fixes issues like this:

image

Copy link

changeset-bot bot commented Sep 18, 2025

⚠️ No Changeset found

Latest commit: becf3e4

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

coderabbitai bot commented Sep 18, 2025

Walkthrough

Adds three new test cases to apps/webapp/evals/aiRunFilter.eval.ts covering queries with hour-specific and relative-day references ("tasks that started around 8am today", "runs that started at 2pm yesterday", "any runs started this morning") and updates apps/webapp/app/v3/services/aiRunFilterService.server.ts by enhancing the AI system prompt with time-specific patterns, explicit phrase mappings (e.g., "around X" => ±1 hour, "at X" => exact hour range), temporal semantics for today/yesterday, ISO from/to conversion, and stricter JSON output requirements. No exported/public API signature changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description does not follow the repository's required template: it lacks the "Closes #" line, the ✅ Checklist, a Testing section describing steps taken to verify the change, and a Changelog entry; only a brief summary and an attached screenshot are present. Because several required sections are missing, reviewers cannot confirm issue linkage or the testing performed, so the description is incomplete relative to the template. Please update the PR description to follow the template: add "Closes #" if applicable, complete the checklist, include a "Testing" section with concrete reproduction and verification steps, and add a short "Changelog" summary; also ensure any relevant logs or failing/passing test outputs are attached so reviewers can validate the fix.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Fix for errors returned from searching time specific queries" directly describes the primary change in this PR — addressing errors when searching with time-specific queries — and matches the modifications to aiRunFilter evals and time-handling logic in the service. It is concise and focused, though the phrase "time specific" would read more cleanly as "time-specific." Overall the title accurately reflects the main purpose of the changeset and is appropriate for reviewers scanning history.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore(webapp)-better-ai-search-evals

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
apps/webapp/evals/aiRunFilter.eval.ts (1)

241-249: Define "this morning" with numeric constructors; avoid locale‑sensitive parsing

Same parsing concern as above. Build times with the Date numeric constructor for reliability.

-            from: new Date(new Date().toDateString() + " 06:00:00").getTime(),
-            to: new Date(new Date().toDateString() + " 12:00:00").getTime(),
+            from: atLocal(baseNow, 6, 0, 0),
+            to: atLocal(baseNow, 12, 0, 0),

Please confirm that "morning" = 06:00–12:00 local matches product intent.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db87295 and c195e2c.

📒 Files selected for processing (1)
  • apps/webapp/evals/aiRunFilter.eval.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations

Files:

  • apps/webapp/evals/aiRunFilter.eval.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

We use zod a lot in packages/core and in the webapp

Files:

  • apps/webapp/evals/aiRunFilter.eval.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

When importing from @trigger.dev/core in the webapp, never import the root package path; always use one of the documented subpath exports from @trigger.dev/core’s package.json

Files:

  • apps/webapp/evals/aiRunFilter.eval.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
apps/webapp/evals/aiRunFilter.eval.ts (2)

117-125: Timezone consistency check with earlier "June 16" case

new Date("2025-06-16") is parsed as UTC midnight, while the new hour‑based tests are local time. Ensure filters consistently use either UTC or local across all tests to avoid subtle mismatches.


219-229: Stabilize time-window parsing — replace toDateString() with numeric constructors (local time)

String parsing via toDateString() is implementation-dependent and can flake across Node/locales; build Date values numerically and prefer half‑open intervals.

File: apps/webapp/evals/aiRunFilter.eval.ts (around lines 219–229 and the nearby "yesterday" case)

Apply these replacements:

-            from: new Date(new Date().toDateString() + " 07:00:00").getTime(),
-            to: new Date(new Date().toDateString() + " 09:00:00").getTime(),
+            from: atLocal(baseNow, 7, 0, 0),
+            to: atLocal(baseNow, 9, 0, 0),
-            from: new Date(new Date(Date.now() - 24*60*60*1000).toDateString() + " 14:00:00").getTime(),
-            to: new Date(new Date(Date.now() - 24*60*60*1000).toDateString() + " 14:59:59").getTime(),
+            from: atLocal(plusDays(baseNow, -1), 14, 0, 0),
+            to: atLocal(plusDays(baseNow, -1), 15, 0, 0), // half‑open [14:00,15:00)

Add these helpers near the top of the file (above the evalite(...) call):

const baseNow = new Date(); // capture once to avoid midnight rollovers
const atLocal = (d: Date, h: number, m = 0, s = 0) =>
  new Date(d.getFullYear(), d.getMonth(), d.getDate(), h, m, s).getTime();
const plusDays = (d: Date, days: number) => {
  const r = new Date(d);
  r.setDate(r.getDate() + days);
  return r;
};

Confirm whether the service expects these windows in local time and whether you want half‑open intervals; if UTC is required, switch to Date.UTC(...) instead.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
apps/webapp/app/v3/services/aiRunFilterService.server.ts (3)

13-15: Validate from/to as real ISO datetimes with offsets

Unvalidated strings can parse to NaN and propagate bad filters. Require ISO 8601 with offsets via zod.

Apply:

-  from: z.string().optional().describe("The ISO datetime to filter from"),
-  to: z.string().optional().describe("The ISO datetime to filter to"),
+  from: z.string().datetime({ offset: true }).optional().describe("ISO datetime with offset (e.g., 2025-09-18T08:00:00-07:00)"),
+  to: z.string().datetime({ offset: true }).optional().describe("ISO datetime with offset (e.g., 2025-09-18T09:00:00-07:00)"),

17-28: Schema/prompt mismatch: success case lacks 'explanation'

The system prompt requires an explanation on success, but the schema doesn’t allow it. Align them to reduce model confusion.

Apply:

   .discriminatedUnion("success", [
     z.object({
       success: z.literal(true),
-      filters: AIFilters,
+      filters: AIFilters,
+      explanation: z
+        .string()
+        .describe("Brief explanation of how filters were applied"),
     }),

145-217: Add evals for midday boundaries, DST, and timezone offsets

Existing evals in apps/webapp/evals/aiRunFilter.eval.ts cover "around 8am", "2pm yesterday", and "this morning". Add focused evals for:

  • "around 12pm" (include 11:30am inputs to verify the ±1 hour buffer crossing noon)
  • explicit "12pm" vs "12am" disambiguation
  • DST transition days (both start and end) to validate hour-skips/duplicates
  • explicit timezone offsets / named zones (e.g., UTC± offsets, EST/EDT) to verify correct ISO conversion

apps/webapp/evals/aiRunFilter.eval.ts

🧹 Nitpick comments (7)
apps/webapp/app/v3/services/aiRunFilterService.server.ts (7)

201-217: Unify unclear-query behavior

Prompt simultaneously asks for success:true with empty filters+explanation and also success:false with error. Pick one to avoid indecision. If you adopt the schema change above, keep success:false for truly unparseable queries; otherwise remove the “empty filters + explanation” path.

-  IMPORTANT: Return ONLY the filters that are explicitly mentioned or can be reasonably inferred. If the description is unclear or doesn't match any known patterns, return an empty filters object {} and explain why in the explanation field.
+  IMPORTANT: Return ONLY the filters that are explicitly mentioned or can be reasonably inferred. If the description is unclear or doesn't match any known patterns, return:
+  {
+    "success": false,
+    "error": "Query unclear"
+  }

251-256: Guard against NaN and inverted ranges

Parsing can still fail or produce inverted ranges. Prevent NaN, and swap if from > to.

Apply:

-        filters: {
-          ...validationResult.data,
-          from: validationResult.data.from
-            ? new Date(validationResult.data.from).getTime()
-            : undefined,
-          to: validationResult.data.to ? new Date(validationResult.data.to).getTime() : undefined,
-        },
+        filters: (() => {
+          const fromStr = validationResult.data.from;
+          const toStr = validationResult.data.to;
+          const fromMs = fromStr ? Date.parse(fromStr) : undefined;
+          const toMs = toStr ? Date.parse(toStr) : undefined;
+          if ((fromStr && Number.isNaN(fromMs)) || (toStr && Number.isNaN(toMs))) {
+            return { ...validationResult.data, from: undefined, to: undefined };
+          }
+          if (fromMs !== undefined && toMs !== undefined && fromMs > toMs) {
+            return { ...validationResult.data, from: toMs, to: fromMs };
+          }
+          return { ...validationResult.data, from: fromMs, to: toMs };
+        })(),

169-172: “This month” ≠ “30d”

“30d” doesn’t align with calendar month boundaries. Prefer explicit month start/end.

-  - "this month" → period: "30d"
+  - "this month" → from/to: first day 00:00 to first day of next month 00:00 (in the user's timezone)

86-96: Pass user timezone through the API (optional but recommended)

Consider adding a userTimeZone param and embedding it in the prompt to make “today/yesterday/8am” correct for the viewer.

-  async call(text: string, environmentId: string): Promise<AIFilterResult> {
+  async call(text: string, environmentId: string, userTimeZone: string): Promise<AIFilterResult> {

Then reference ${userTimeZone} in the system prompt as shown above. This will require updating call sites.


178-186: Coverage ask: add “last night/early morning/yesterday morning”

These are common user phrases. Consider mapping:

  • last night → 6pm–12am yesterday
  • early morning → 4am–6am today
  • yesterday morning → 6am–12pm yesterday

I can draft prompt bullets if helpful.


1-84: Nit: prefer type aliases over interfaces (house style)

House style suggests “types over interfaces” for TS. Not blocking.


266-277: Error path: schema-specific errors vs generic

Clear separation is good. If you decide to error on invalid datetime parsing, reuse the short error pattern (“Invalid ISO datetime”).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c195e2c and becf3e4.

📒 Files selected for processing (1)
  • apps/webapp/app/v3/services/aiRunFilterService.server.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations

Files:

  • apps/webapp/app/v3/services/aiRunFilterService.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

We use zod a lot in packages/core and in the webapp

Files:

  • apps/webapp/app/v3/services/aiRunFilterService.server.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

When importing from @trigger.dev/core in the webapp, never import the root package path; always use one of the documented subpath exports from @trigger.dev/core’s package.json

Files:

  • apps/webapp/app/v3/services/aiRunFilterService.server.ts
{apps/webapp/app/**/*.server.{ts,tsx},apps/webapp/app/routes/**/*.ts}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Access environment variables only via the env export from app/env.server.ts; do not reference process.env directly

Files:

  • apps/webapp/app/v3/services/aiRunFilterService.server.ts
apps/webapp/app/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Modules intended for test consumption under apps/webapp/app/**/*.ts must not read environment variables; accept configuration via options instead

Files:

  • apps/webapp/app/v3/services/aiRunFilterService.server.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (3)
apps/webapp/app/v3/services/aiRunFilterService.server.ts (3)

227-246: Good: strict schema validation and logging

Validation + structured error logging looks solid.

If you adopt the .datetime() change, confirm the logger still surfaces parse failures clearly.


145-156: Nice: tool guidance and concrete examples improve grounding

The expanded instructions plus tool usage should reduce hallucinations and improve filter accuracy.


218-225: Telemetry metadata: keep environmentId only, avoid PII

Current metadata is minimal and safe. LGTM.

@ericallam ericallam merged commit 28f8cee into main Sep 19, 2025
31 checks passed
@ericallam ericallam deleted the chore(webapp)-better-ai-search-evals branch September 19, 2025 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants