Skip to content

feat: attachments get cited in reponse#1257

Open
Himanshvarma wants to merge 7 commits intomainfrom
feature/attachmentCitations
Open

feat: attachments get cited in reponse#1257
Himanshvarma wants to merge 7 commits intomainfrom
feature/attachmentCitations

Conversation

@Himanshvarma
Copy link
Contributor

@Himanshvarma Himanshvarma commented Feb 24, 2026

Description

Testing

Additional Notes

Summary by CodeRabbit

  • New Features

    • Gallery and preview now use direct attachment URLs for thumbnails and full-size images.
    • Citations can reference chunk-level indices for more precise source linking.
  • Bug Fixes

    • Downloads abort safely when an attachment URL is missing; single-image previews render only when a URL exists.
    • Thumbnail fallback to the main image URL to avoid broken images.
    • Attachment upload now records and surfaces thumbnail/file URLs reliably.
  • Refactor

    • Unified attachment processing and tighter file-serving security with path validation and safer storage handling.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Frontend attachment UI now uses direct attachment.url / attachment.thumbnailUrl and validates URL presence. Backend unifies attachment handling with disk storage, opaque attachment URLs, updated metadata (filePath, thumbnailUrl), secure serve endpoints, and threads a new allowChunkCitations flag (and renamed citation regex) through chat/citation/RAG flows.

Changes

Cohort / File(s) Summary
Frontend Attachment UI
frontend/src/components/AttachmentGallery.tsx, frontend/src/components/AttachmentPreview.tsx
Switch image/thumbnail sources to attachment.url / attachment.thumbnailUrl; guard rendering on URL presence; update download/fetch to use attachment.url.
Frontend Citation & Chat
frontend/src/components/CitationLink.tsx, frontend/src/components/CitationPreview.tsx, frontend/src/routes/_authenticated/chat.tsx, frontend/src/utils/chatUtils.tsx, frontend/src/components/GroupFilter.tsx
Add app to Citation, derive attachment links, pass chunkIndex on citation clicks, prefer docId fallback for document selection, rename textToKbItemCitationIndextextToChunkCitationIndex, extend chat params schema, add AttachmentEntity handling in group filter.
Server: Attachment upload, storage & serving
server/api/files.ts, server/api/knowledgeBase.ts, server/shared/types.ts, server/integrations/ribbie/index.ts, server/api/workflow.ts
Persist attachments to disk with generated storageKey/storagePath, produce opaque attachment URLs, store filePath and thumbnailUrl in Vespa metadata, sanitize filenames, secure serve endpoints with path validation, export KB_STORAGE_ROOT.
Server: Chat/citation & context pipeline
server/api/chat/chat.ts, server/api/chat/utils.ts, server/api/chat/agents.ts, server/api/chat/message-agents.ts, server/ai/context.ts, server/ai/provider/index.ts
Replace isMsgWithKbItems/isMsgWithSources with allowChunkCitations across signatures and flows, rename exported regex to textToChunkCitationIndex, thread allowChunkCitations through context builders and provider RAG prompt selection, drop attachmentFileIds from WorkingSet and unify attachment handling.
Server: Attachment orchestration & streaming
server/api/chat/agents.ts, server/api/chat/utils.ts, server/api/chat/message-agents.ts, server/api/agent/workflowAgentUtils.ts, server/api/chat/agents.ts
Unify attachmentFileIds into single attachmentFileIds flow, introduce isMsgWithAttachments, store attachment metadata during message handling, emit AttachmentUpdate events and propagate storage errors to stream, adjust RAG/chat callsites to use allowChunkCitations.

Sequence Diagram

sequenceDiagram
    actor User
    participant Frontend
    participant ChatAPI
    participant FileAPI
    participant Storage
    participant Vespa

    User->>Frontend: send message + attachment
    Frontend->>ChatAPI: POST message + attachment metadata
    ChatAPI->>FileAPI: request upload/store
    FileAPI->>Storage: generate storageKey, write file, create thumbnail
    Storage-->>FileAPI: filePath, thumbnailPath
    FileAPI->>Vespa: index attachment doc with url, thumbnailUrl, filePath
    Vespa-->>FileAPI: ack
    FileAPI-->>ChatAPI: return attachment metadata
    ChatAPI-->>Frontend: stream AttachmentUpdate event
    User->>Frontend: click attachment/citation
    Frontend->>ChatAPI: request content / citation handling (allowChunkCitations)
    ChatAPI->>FileAPI: handleAttachmentServe(fileId) -> resolve via Vespa
    FileAPI->>Storage: validate path, stream file
    Storage-->>FileAPI: file stream
    FileAPI-->>ChatAPI: file stream
    ChatAPI-->>Frontend: deliver file/content
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • zereraz
  • junaid-shirur
  • kalpadhwaryu
  • devesh-juspay

Poem

🐰 I hopped through chunks and files galore,
URLs now hidden, thumbnails not poor,
Citations by chunk, neat numbers in play,
Files stored safe where bunnies can't stray,
I twitch my nose — the refactor's hooray! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: attachments get cited in response' is partially related to the changeset. It captures a main aspect of the work (attachments being cited), but contains a typo ('reponse' instead of 'response') and doesn't fully convey the scope—the changes also involve refactoring citation handling across multiple components, renaming constants, and restructuring attachment metadata patterns beyond just 'attachments getting cited.'

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/attachmentCitations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Himanshvarma, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates attachments more deeply into the AI response and citation system. It streamlines how attachments are referenced and displayed in the user interface, ensures they are properly included in the AI's contextual understanding, and enhances the backend's ability to serve these files securely and efficiently. The changes provide a more robust and feature-rich experience for users interacting with attached documents and images.

Highlights

  • Attachment Citation Feature: Introduced the capability for attachments to be cited directly within AI responses, enhancing the traceability and context of generated content.
  • Refactored Attachment URL Handling: Updated frontend components to use direct url and thumbnailUrl properties from attachment metadata, moving away from fileId-based API paths for serving attachments.
  • Unified Attachment File ID Management: Consolidated the handling of image and non-image attachment file IDs into a single attachmentFileIds array and introduced an isMsgWithAttachments flag for clearer context in backend RAG processes.
  • Enhanced AI Context Generation for Attachments: Modified AI context construction to include image chunks and document content from attachments, allowing the AI to leverage these files more effectively in its responses.
  • Improved Attachment Serving and Security: Changed attachment serving endpoints to use URL-encoded storagePath parameters, implemented structured storage for non-image attachments, and added security checks to prevent unauthorized file access.
  • Citation Interface and Logic Updates: Extended the Citation interface with an app field and updated citation link creation and preview logic to specifically support and display attachment-related citations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • frontend/src/components/AttachmentGallery.tsx
    • Updated image and thumbnail source URLs to use direct "attachment.url" and "attachment.thumbnailUrl" properties.
  • frontend/src/components/AttachmentPreview.tsx
    • Removed local "thumbnailUrl" variable and updated image sources to directly use "attachment.thumbnailUrl" and "attachment.url".
  • frontend/src/components/CitationLink.tsx
    • Added an "app" property to the "Citation" interface, updated citation link creation logic to handle attachment app types, and modified "onCitationClick" to pass "chunkIndex".
  • frontend/src/components/CitationPreview.tsx
    • Updated the "documentId" used for scoped finding to prioritize "itemId" then "docId", and adjusted "useEffect" dependencies accordingly.
  • frontend/src/components/GroupFilter.tsx
    • Imported "AttachmentEntity" and expanded the "getName" function to provide descriptive names for various attachment types.
  • frontend/src/routes/_authenticated/chat.tsx
    • Modified citation handling logic to correctly identify and process attachment citations, and updated "handleChunkIndexChange" to use "docId" for attachments.
  • frontend/src/utils/chatUtils.tsx
    • Renamed "textToKbItemCitationIndex" to "textToChunkCitationIndex" and updated its usage for cleaning and processing citations.
  • server/ai/context.ts
    • Added "allowChunkCitations" parameter to context construction functions and included image chunks in the generated AI context.
  • server/ai/provider/index.ts
    • Replaced the "isMsgWithKbItems" parameter with "allowChunkCitations" in the "baselineRAGJsonStream" function.
  • server/api/chat/agents.ts
    • Refactored attachment handling by consolidating file IDs, introducing "isMsgWithAttachments", and updating parameters for RAG functions.
  • server/api/chat/chat.ts
    • Updated citation parsing logic to use "textToChunkCitationIndex", modified "checkAndYieldCitations" to handle "allowChunkCitations" and skip datasource files, and adjusted RAG function parameters.
  • server/api/chat/utils.ts
    • Removed "attachmentFileIds" from the "WorkingSet" type and "collectFollowupContext" function, integrating them into the general "fileIds" array.
  • server/api/files.ts
    • Modified attachment upload to store direct API URLs for attachments and thumbnails, implemented structured storage paths for non-image files, and added security checks for serving attachments.
  • server/api/knowledgeBase.ts
    • Exported "getStoragePath" and updated "GetChunkContentApi" to dynamically select the Vespa schema based on whether the document is an attachment.
  • server/server.ts
    • Updated API routes for serving attachments and thumbnails to use a "storagePath" parameter instead of "fileId".
  • server/shared/types.ts
    • Renamed the "thumbnailPath" property to "thumbnailUrl" within the "AttachmentMetadata" schema.
Activity
  • The pull request introduces a new feature to allow attachments to be cited in AI responses.
  • Frontend components for displaying attachments (AttachmentGallery, AttachmentPreview) were updated to use new URL structures for images and thumbnails.
  • The CitationLink and CitationPreview components were modified to properly handle and display attachment-specific citations.
  • Backend logic for AI context generation (server/ai/context.ts, server/ai/provider/index.ts) was adjusted to incorporate attachment data, including image chunks.
  • Attachment handling in chat APIs (server/api/chat/agents.ts, server/api/chat/chat.ts, server/api/chat/utils.ts) was refactored to unify file ID management and pass attachment context more effectively.
  • File serving mechanisms (server/api/files.ts, server/server.ts) were updated to use secure, structured storage paths for attachments, replacing older fileId-based routes.
  • The isMsgWithKbItems parameter was consistently renamed to allowChunkCitations across relevant backend functions to better reflect its purpose.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors how attachments are handled, enabling them to be cited in responses. The changes involve using direct URLs for attachments instead of constructing them from file IDs, which simplifies the frontend code and centralizes URL generation on the backend. The backend is updated to serve attachments via these new URL structures, including security checks to prevent path traversal. Overall, this is a solid refactoring. I've added a few comments to improve code clarity and fix a couple of minor bugs.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
frontend/src/utils/chatUtils.tsx (1)

47-68: ⚠️ Potential issue | 🟠 Major

Fix chunk citation indexing for DB-loaded messages.

K[...] chunk citations use 0-based document indices; subtracting 1 shifts links to the wrong URL.

🐛 Proposed fix
-      finalIndex = originalIndex - 1 // Convert [1] to array index 0
+      finalIndex = originalIndex
Based on learnings: KB item citations (K[n_m] format) use 0-based indexing for the document index in frontend/src/utils/chatUtils.tsx, unlike regular citations which use 1-based indexing. When processing textToKbItemCitationIndex replacements, the raw parsed index should be used directly with citationUrls without subtracting 1.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/utils/chatUtils.tsx` around lines 47 - 68, The DB-loaded branch
in the text replacement handler incorrectly subtracts 1 from the parsed
originalIndex, shifting K[n_m] citations off-by-one; in the replacement inside
text.replace for textToChunkCitationIndex (where citationMap, citationUrls,
finalIndex and chunkIndex are used), change the DB-loaded mapping so finalIndex
= originalIndex (use the raw 0-based parsed index), keep the streaming branch
using citationMap[originalIndex], and retain the existing checks before forming
the returned string; also ensure finalIndex is validated against citationUrls
bounds before using it.
frontend/src/components/AttachmentPreview.tsx (1)

102-116: ⚠️ Potential issue | 🟡 Minor

Guard modal image when attachment.url is absent.

attachment.url is optional; rendering the modal without a fallback leads to a broken image for older records.

🐛 Proposed fix
-      {isImage && (
+      {isImage && (attachment.url || attachment.thumbnailUrl) && (
         <Dialog open={showImageModal} onOpenChange={setShowImageModal}>
@@
             <div className="px-6 pb-6">
               <img
-                src={attachment.url}
+                src={attachment.url ?? attachment.thumbnailUrl}
                 alt={attachment.fileName}
                 className="w-full h-auto max-h-[70vh] object-contain rounded-lg"
                 onError={handleImageError}
               />
             </div>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/components/AttachmentPreview.tsx` around lines 102 - 116, The
modal currently renders an <img> with attachment.url even when that optional
field is missing; update the AttachmentPreview component so the Dialog and its
<img> are only rendered when attachment.url is truthy (or replace the <img> with
a safe fallback UI/message if url is absent), e.g., wrap the Dialog block (the
code using isImage, showImageModal, DialogContent/Title and the <img> with
onError={handleImageError}) in a conditional that checks attachment.url (and
keep the isImage check), and ensure any state that opens the modal
(showImageModal) is not set to true when url is undefined to avoid showing a
broken image.
server/api/knowledgeBase.ts (1)

1930-1944: ⚠️ Potential issue | 🟠 Major

Attachment documents are rejected by the sddocname guard.

When isAttachment is true, GetDocument uses fileSchema but the guard enforces "kb_items", causing attachment documents to return 404 errors even when they exist.

🐛 Proposed fix
-    if (resp.fields.sddocname && resp.fields.sddocname !== "kb_items") {
-      throw new HTTPException(404, { message: "Invalid document type" })
-    }
+    const expectedDoc = isAttachment ? "file" : "kb_items"
+    if (resp.fields.sddocname && resp.fields.sddocname !== expectedDoc) {
+      throw new HTTPException(404, { message: "Invalid document type" })
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/knowledgeBase.ts` around lines 1930 - 1944, The sddocname guard
rejects attachment documents because when isAttachment is true you call
GetDocument with fileSchema but still enforce resp.fields.sddocname ===
"kb_items"; update the check around resp.fields.sddocname in the handler to
allow the file schema name when isAttachment is true (i.e., if isAttachment
accept the file schema sddocname, otherwise require "kb_items"); locate the
logic around isAttachment, GetDocument(fileSchema|KbItemsSchema) and the
resp.fields.sddocname conditional and adjust it so attachments are not forced to
match "kb_items".
frontend/src/components/AttachmentGallery.tsx (1)

49-58: ⚠️ Potential issue | 🟡 Minor

Guard against missing attachment URLs before download.

attachment.url ?? "" can trigger a fetch to the current origin (and download HTML) if the URL is absent. An early return prevents confusing downloads.

✅ Suggested fix
   const downloadCurrentImage = async () => {
     if (images[currentImageIndex]) {
       const attachment = images[currentImageIndex]
+      if (!attachment.url) {
+        console.error("Download failed: missing attachment URL")
+        return
+      }
       try {
         const response = await authFetch(
-          attachment.url ?? "",
+          attachment.url,
           {
             credentials: "include",
           },
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/components/AttachmentGallery.tsx` around lines 49 - 58, In
downloadCurrentImage, guard against missing or empty attachment URLs before
calling authFetch: check images[currentImageIndex] and that attachment.url is a
non-empty string (not null/undefined/empty) and return early (or surface an
error) if it's missing; update the code around the attachment variable and
authFetch call so you never pass attachment.url ?? "" into authFetch.
server/ai/provider/index.ts (1)

1270-1350: ⚠️ Potential issue | 🟡 Minor

Apply allowChunkCitations check in non-specificFiles branches to prevent incorrect citation remapping.

The function currently applies indexToCitation() unconditionally in the defaultReasoning and default branches, regardless of allowChunkCitations value. When allowChunkCitations=true (e.g., with file IDs starting with "clf-" or "attf_"), this remaps chunk citations using logic designed for regular citations.

The allowChunkCitations flag should guard the indexToCitation() calls in the non-specificFiles branches to prevent mishandling of chunk citations, similar to how it controls prompt selection in the specificFiles branch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/ai/provider/index.ts` around lines 1270 - 1350, In
baselineRAGJsonStream, the non-specificFiles branches call
indexToCitation(retrievedCtx) unconditionally which breaks chunk citations when
allowChunkCitations is true; update the defaultReasoning and default
(baselinePromptJson / agentBaselineReasoningPromptJson / agentBaselinePromptJson
/ baselineReasoningPromptJson / baselinePromptJson) branches to conditionally
use indexToCitation(retrievedCtx) only when allowChunkCitations is false, and
pass the raw retrievedCtx when allowChunkCitations is true before assigning
params.systemPrompt.
server/api/chat/chat.ts (1)

492-528: ⚠️ Potential issue | 🟠 Major

Fix stale chunkMatch so image citations aren’t skipped.

Because the while condition short‑circuits, chunkMatch can retain a previous value when imgMatch is found, causing the (match || chunkMatch) branch to run and the image citation to be ignored. Reset chunkMatch when an image match is detected (or restructure the loop).

🔧 Minimal fix
   while (
     (match = textToCitationIndex.exec(text)) !== null ||
     (imgMatch = textToImageCitationIndex.exec(text)) !== null ||
     (allowChunkCitations && 
       (chunkMatch = textToChunkCitationIndex.exec(text)) !== null)
     ) {
+    if (imgMatch) {
+      chunkMatch = null
+    }
       if (match || chunkMatch) {
         let citationIndex = 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/chat/chat.ts` around lines 492 - 528, The loop's short‑circuiting
lets chunkMatch retain a previous match and cause the (match || chunkMatch)
branch to run when imgMatch is actually present; update the while body so
chunkMatch is cleared whenever an image match is detected (or re-evaluate
matches freshly each iteration) — e.g., after setting imgMatch from
textToImageCitationIndex ensure chunkMatch is reset (or compute matches into
local vars and prefer imgMatch first) so that image citations handled by
imgMatch aren't skipped; look at the while loop, variables
chunkMatch/imgMatch/match, and the handling that yields citations and
yieldedCitations/results to make this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@frontend/src/components/CitationLink.tsx`:
- Line 147: The ternary in CitationLink.tsx is mis-parsed because of operator
precedence; change the JSX so the condition explicitly groups the filename
expression with isAttachmentLink (i.e., evaluate
citation.title.replace(/[^/]*$/, "") || isAttachmentLink first) and then apply
the ? "attachment" : "No file name" ternary so non-attachment citations with a
path don't incorrectly show "attachment".

In `@server/ai/context.ts`:
- Around line 483-519: The code indexes into fields.image_chunks_pos_summary
without guarding against it being undefined which yields "docId_undefined";
update the logic around imageChunks construction (the variables imageChunksPos,
imageChunks and the map that builds chunk names) to defensively handle a missing
image_chunks_pos_summary — e.g., default imageChunksPos to an empty array or
bail to a different mapping branch when fields.image_chunks_pos_summary is falsy
so imageChunks.map/ indexing never accesses undefined; mirror the defensive
pattern used in constructDataSourceFileContext so chunk names are built from
valid positions or a safe fallback.

In `@server/api/chat/agents.ts`:
- Around line 1082-1084: The current isMsgWithAttachments (computed from
parseAttachmentMetadata and expandSheetIds) only flags sheet-based attachments
and misses KB file IDs (prefix "clf-"), causing chunk-citation to be disabled
for KB selections; compute a single reusable flag (e.g., hasFileOrKbIds) by
deriving attachmentFileIds from parseAttachmentMetadata/expandSheetIds and
OR-ing a check for KB file id prefixes (fileId startsWith "clf-") or any
explicit KB id source, then replace all uses of isMsgWithAttachments in flows
(dual RAG, given-context, non-streaming, follow-up paths referenced near
parseAttachmentMetadata/expandSheetIds and lines you noted) with this unified
flag so chunk citations are consistently enabled for both sheet attachments and
KB file IDs.

In `@server/api/chat/utils.ts`:
- Around line 1416-1428: The loop suffers from stale regex state because
textToCitationIndex, textToImageCitationIndex, and textToChunkCitationIndex use
/g and the while condition short-circuits, leaving previous
match/imgMatch/chunkMatch values set; to fix, at the start of each iteration in
the while that uses those regexes (the block using match, imgMatch, chunkMatch
and citationsProcessed), explicitly reset match = imgMatch = chunkMatch = null
before calling textToCitationIndex.exec(text),
textToImageCitationIndex.exec(text), and textToChunkCitationIndex.exec(text) so
each exec() runs against a clean state and only the truly current match is
truthy.

In `@server/api/files.ts`:
- Around line 654-684: The thumbnail route handleThumbnailServe currently only
checks directory bounds but not file ownership; add the same ownership
verification used in handleAttachmentServe: after resolving
storagePath/resolvedPath and confirming isAllowed, fetch the file/thumbnail
metadata (or call the existing ownership helper used by handleAttachmentServe)
and verify the JwtPayloadKey sub/email matches the owner; if the user does not
own the thumbnail, log with loggerWithChild({ email }) including resolvedPath
and throw HTTPException(403, { message: "Access denied" }); ensure you use the
same unique helpers or DB lookup used by handleAttachmentServe so behavior is
consistent.
- Around line 287-305: The code writes non-image attachments to disk using
filePath (via writeFile) but the existing error handler only cleans image
directories; modify the error handling so if any subsequent step
(FileProcessorService.processFile or the DB insert) throws, the non-image file
at filePath is deleted (unlink) and its parent directories are removed if empty;
ensure deletion is attempted inside the same catch/finally path that currently
removes image dirs, use the same scope variables (filePath, collectionId,
storageKey) and perform cleanup without swallowing the original error (rethrow
or preserve the original error after cleanup).
- Around line 259-260: The current code constructs attachmentApiUrl and
attachmentThumbnailUrl by embedding absolute file system paths (filePath,
thumbnailPath) into URLs which leaks server internals; change these to use an
opaque identifier (e.g., fileId or storageKey) instead of filePath/thumbnailPath
when building attachmentApiUrl and attachmentThumbnailUrl (e.g.,
`/api/v1/attachments/${encodeURIComponent(fileId)}` and
`/api/v1/attachments/${encodeURIComponent(fileId)}/thumbnail`), and update the
corresponding server handler that serves attachments to resolve that
fileId/storageKey to the physical path server-side (lookup in DB or storage
mapping) so no absolute paths are exposed to clients. Ensure you reference the
variables attachmentApiUrl, attachmentThumbnailUrl, filePath, thumbnailPath and
the server attachment endpoint when making these changes.
- Around line 583-613: handleAttachmentServe currently authorizes only by
filesystem path and must enforce per-file permissions like
handleAttachmentDeleteApi; update handleAttachmentServe to map the incoming
storagePath to the corresponding fileId (or change the route to accept fileId
instead of storagePath), fetch the Vespa document (the same lookup used by
handleAttachmentDeleteApi), and verify the document's permissions array includes
the requesting user (use the JwtPayloadKey sub/email) before serving the file;
if the permission check fails, log with loggerWithChild({ email }) and throw
HTTPException(403).

---

Outside diff comments:
In `@frontend/src/components/AttachmentGallery.tsx`:
- Around line 49-58: In downloadCurrentImage, guard against missing or empty
attachment URLs before calling authFetch: check images[currentImageIndex] and
that attachment.url is a non-empty string (not null/undefined/empty) and return
early (or surface an error) if it's missing; update the code around the
attachment variable and authFetch call so you never pass attachment.url ?? ""
into authFetch.

In `@frontend/src/components/AttachmentPreview.tsx`:
- Around line 102-116: The modal currently renders an <img> with attachment.url
even when that optional field is missing; update the AttachmentPreview component
so the Dialog and its <img> are only rendered when attachment.url is truthy (or
replace the <img> with a safe fallback UI/message if url is absent), e.g., wrap
the Dialog block (the code using isImage, showImageModal, DialogContent/Title
and the <img> with onError={handleImageError}) in a conditional that checks
attachment.url (and keep the isImage check), and ensure any state that opens the
modal (showImageModal) is not set to true when url is undefined to avoid showing
a broken image.

In `@frontend/src/utils/chatUtils.tsx`:
- Around line 47-68: The DB-loaded branch in the text replacement handler
incorrectly subtracts 1 from the parsed originalIndex, shifting K[n_m] citations
off-by-one; in the replacement inside text.replace for textToChunkCitationIndex
(where citationMap, citationUrls, finalIndex and chunkIndex are used), change
the DB-loaded mapping so finalIndex = originalIndex (use the raw 0-based parsed
index), keep the streaming branch using citationMap[originalIndex], and retain
the existing checks before forming the returned string; also ensure finalIndex
is validated against citationUrls bounds before using it.

In `@server/ai/provider/index.ts`:
- Around line 1270-1350: In baselineRAGJsonStream, the non-specificFiles
branches call indexToCitation(retrievedCtx) unconditionally which breaks chunk
citations when allowChunkCitations is true; update the defaultReasoning and
default (baselinePromptJson / agentBaselineReasoningPromptJson /
agentBaselinePromptJson / baselineReasoningPromptJson / baselinePromptJson)
branches to conditionally use indexToCitation(retrievedCtx) only when
allowChunkCitations is false, and pass the raw retrievedCtx when
allowChunkCitations is true before assigning params.systemPrompt.

In `@server/api/chat/chat.ts`:
- Around line 492-528: The loop's short‑circuiting lets chunkMatch retain a
previous match and cause the (match || chunkMatch) branch to run when imgMatch
is actually present; update the while body so chunkMatch is cleared whenever an
image match is detected (or re-evaluate matches freshly each iteration) — e.g.,
after setting imgMatch from textToImageCitationIndex ensure chunkMatch is reset
(or compute matches into local vars and prefer imgMatch first) so that image
citations handled by imgMatch aren't skipped; look at the while loop, variables
chunkMatch/imgMatch/match, and the handling that yields citations and
yieldedCitations/results to make this change.

In `@server/api/knowledgeBase.ts`:
- Around line 1930-1944: The sddocname guard rejects attachment documents
because when isAttachment is true you call GetDocument with fileSchema but still
enforce resp.fields.sddocname === "kb_items"; update the check around
resp.fields.sddocname in the handler to allow the file schema name when
isAttachment is true (i.e., if isAttachment accept the file schema sddocname,
otherwise require "kb_items"); locate the logic around isAttachment,
GetDocument(fileSchema|KbItemsSchema) and the resp.fields.sddocname conditional
and adjust it so attachments are not forced to match "kb_items".

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71531e4 and 84a19b4.

📒 Files selected for processing (16)
  • frontend/src/components/AttachmentGallery.tsx
  • frontend/src/components/AttachmentPreview.tsx
  • frontend/src/components/CitationLink.tsx
  • frontend/src/components/CitationPreview.tsx
  • frontend/src/components/GroupFilter.tsx
  • frontend/src/routes/_authenticated/chat.tsx
  • frontend/src/utils/chatUtils.tsx
  • server/ai/context.ts
  • server/ai/provider/index.ts
  • server/api/chat/agents.ts
  • server/api/chat/chat.ts
  • server/api/chat/utils.ts
  • server/api/files.ts
  • server/api/knowledgeBase.ts
  • server/server.ts
  • server/shared/types.ts

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/api/chat/message-agents.ts (1)

1335-1393: ⚠️ Potential issue | 🟡 Minor

Respect explicit allowChunkCitations = false when collection IDs exist.
Right now the flag is overwritten to true whenever collectionFileIds are present, so callers can’t intentionally disable chunk citations. If false is meant to be a hard off-switch, derive a separate flag that only defaults based on collection IDs when the param is undefined.

💡 Suggested adjustment
-        if (collectionFileIds && collectionFileIds.length > 0) {
-          allowChunkCitations = true // for the case where kb files are in @
+        const enableChunkCitations =
+          allowChunkCitations ?? (collectionFileIds.length > 0)
+        if (collectionFileIds && collectionFileIds.length > 0) {
           results = await searchCollectionRAG(
             queryText,
             collectionFileIds,
             undefined,
           )
         }
@@
-          allowChunkCitations,
+          enableChunkCitations,
         )
       )
     )

Also applies to: 1466-1474

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/chat/message-agents.ts` around lines 1335 - 1393, Summary: The
function currently overwrites the caller-provided allowChunkCitations by setting
allowChunkCitations = true when collectionFileIds exist; instead derive a
separate flag so explicit false is respected. Fix: don’t mutate the parameter
allowChunkCitations; create a new local flag (e.g., enableChunkCitations)
computed as allowChunkCitations === undefined ? (collectionFileIds.length > 0) :
allowChunkCitations, replace uses of the mutated variable (the assignment at the
collectionFileIds block and any later checks around searchCollectionRAG and the
same pattern at the 1466-1474 area) to use the new local flag, and ensure
searchCollectionRAG/attachment handling uses enableChunkCitations so callers can
explicitly pass false.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/api/chat/chat.ts`:
- Around line 5139-5141: The merge of attachmentFileIds into fileIds can produce
duplicates (especially on retries); update both merge sites where fileIds =
fileIds.concat(attachmentFileIds) to perform a dedupe merge instead: when
attachmentFileIds exists, add only IDs not already present in fileIds (e.g.,
filter attachmentFileIds by existence in fileIds before concatenation or build a
Set from fileIds and union in), ensuring you apply the same change at the second
merge site (the other occurrence around the later block) so fileIds contains
unique attachment IDs only.

---

Outside diff comments:
In `@server/api/chat/message-agents.ts`:
- Around line 1335-1393: Summary: The function currently overwrites the
caller-provided allowChunkCitations by setting allowChunkCitations = true when
collectionFileIds exist; instead derive a separate flag so explicit false is
respected. Fix: don’t mutate the parameter allowChunkCitations; create a new
local flag (e.g., enableChunkCitations) computed as allowChunkCitations ===
undefined ? (collectionFileIds.length > 0) : allowChunkCitations, replace uses
of the mutated variable (the assignment at the collectionFileIds block and any
later checks around searchCollectionRAG and the same pattern at the 1466-1474
area) to use the new local flag, and ensure searchCollectionRAG/attachment
handling uses enableChunkCitations so callers can explicitly pass false.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84a19b4 and 2cc6083.

📒 Files selected for processing (2)
  • server/api/chat/chat.ts
  • server/api/chat/message-agents.ts

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/api/chat/utils.ts (1)

642-649: ⚠️ Potential issue | 🟠 Major

Chunk citation indexing is off by one in both remap and lookup paths.

Line 648 adds + 1 to chunk doc index, and Line 1442 subtracts - 1 during lookup. For K[n_m], this should stay 0-based end-to-end, otherwise chunk citations can map to the wrong source.

Suggested patch
- return typeof finalIndex === "number" ? `K[${finalIndex + 1}_${chunkIndex}]` : ""
+ return typeof finalIndex === "number" ? `K[${finalIndex}_${chunkIndex}]` : ""
- const item = results[citationIndex - 1]
+ const item = results[citationIndex]

Based on learnings: KB item citations (K[n_m] format) use 0-based indexing for the document index in frontend/src/utils/chatUtils.tsx; the raw parsed index should be used directly.

Also applies to: 1438-1443

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/chat/utils.ts` around lines 642 - 649, The chunk citation indexing
is being shifted by +1 when building K[n_m] in the mapping callback (the
anonymous function using textToChunkCitationIndex and citationMap) while the
lookup path subtracts 1 later, causing off-by-one errors; update the mapping to
use the raw parsed docIndex (remove the +1 when creating `K[...]`) and also
remove the -1 adjustment in the corresponding lookup code (the reverse remap
logic that currently does subtract 1) so that both the emitter
(textToChunkCitationIndex path) and the resolver use 0-based document indices
consistently for K[n_m].
♻️ Duplicate comments (2)
server/ai/context.ts (1)

496-500: ⚠️ Potential issue | 🟡 Minor

Guard image_chunks_pos_summary before passing to image chunk scorer.

Both matchfeatures branches pass fields.image_chunks_pos_summary as number[] without checking presence. If summary exists but positions are missing, this can mis-map or fail in scoring.

Suggested patch
- if (fields.matchfeatures) {
+ if (fields.matchfeatures && fields.image_chunks_pos_summary) {
    const summaryStrings =
      fields.image_chunks_summary?.map((c) =>
        typeof c === "string" ? c : c.chunk,
      ) || []

    imageChunks = getSortedScoredImageChunks(
      fields.matchfeatures,
      fields.image_chunks_pos_summary as number[],
      summaryStrings as string[],
      fields.docId,
    )
  } else {

Also applies to: 1020-1024

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/ai/context.ts` around lines 496 - 500, The code passes
fields.image_chunks_pos_summary as number[] into getSortedScoredImageChunks
without checking it exists or matches summaryStrings length; update both places
where getSortedScoredImageChunks is called (the matchfeatures branches) to guard
the value: compute a safeImagePosSummary =
Array.isArray(fields.image_chunks_pos_summary) &&
fields.image_chunks_pos_summary.length === summaryStrings.length ?
fields.image_chunks_pos_summary : undefined (or an empty array if the scorer
expects that), and pass safeImagePosSummary instead of casting
fields.image_chunks_pos_summary directly; ensure getSortedScoredImageChunks and
any downstream logic handle undefined/empty input safely.
server/api/chat/chat.ts (1)

5149-5151: ⚠️ Potential issue | 🟡 Minor

Deduplicate attachmentFileIds when merging into fileIds (still unresolved).

Line [5150] and Line [6782] still append directly, which can duplicate IDs and produce repeated retrieval/citation entries.

🛠️ Suggested fix (apply at both merge sites)
-    if (attachmentFileIds && attachmentFileIds.length > 0) {
-      fileIds = fileIds.concat(attachmentFileIds)
-    }
+    if (attachmentFileIds && attachmentFileIds.length > 0) {
+      fileIds = Array.from(new Set([...fileIds, ...attachmentFileIds]))
+    }

Also applies to: 6781-6783

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/chat/chat.ts` around lines 5149 - 5151, When merging
attachmentFileIds into fileIds (the code paths that currently do fileIds =
fileIds.concat(attachmentFileIds)), ensure you deduplicate so duplicate IDs are
not appended: replace the direct concat with logic that filters
attachmentFileIds against existing fileIds (or create a Set of fileIds and add
only new IDs) so only unique IDs are kept; apply the same change at both merge
sites that use fileIds and attachmentFileIds (the concat calls) so downstream
retrieval/citation routines don't see repeated IDs.
🧹 Nitpick comments (2)
server/api/chat/chat.ts (2)

5057-5058: Extract shared attachment-file-id expansion logic.

attachmentMetadata.flatMap((m) => expandSheetIds(m.fileId)) is repeated across Message and Retry paths. A small helper would reduce drift and keep behavior consistent.

Also applies to: 6659-6660, 6714-6715

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/chat/chat.ts` around lines 5057 - 5058, Create a small helper
function (e.g., expandAttachmentFileIds or getAttachmentFileIds) that takes
attachmentMetadata and returns attachmentMetadata.flatMap(m =>
expandSheetIds(m.fileId)); replace the inline calls in the Message and Retry
paths (references: attachmentFileIds, isMsgWithAttachments, and usages at the
other locations indicated) to call this helper so the expansion logic is
centralized and behavior stays consistent across expandSheetIds usages.

2100-2102: Avoid mutating allowChunkCitations; compute an explicit effective flag instead.

At Line [2101] and Line [2491], the input parameter is overwritten, which makes call-site intent harder to reason about. Prefer a derived effectiveAllowChunkCitations and pass that through.

♻️ Suggested refactor
-  if (fileIds.length > 0 || (folderIds && folderIds.length > 0)) {
-    allowChunkCitations = true
+  const effectiveAllowChunkCitations =
+    allowChunkCitations ?? (fileIds.length > 0 || (folderIds?.length ?? 0) > 0)
+
+  if (fileIds.length > 0 || (folderIds && folderIds.length > 0)) {
     const fileSearchSpan = generateAnswerSpan?.startSpan("file_search")
@@
-      if (collectionFileIds && collectionFileIds.length > 0) {
-        allowChunkCitations = true
+      if (collectionFileIds && collectionFileIds.length > 0) {
         results = await searchCollectionRAG(
@@
-      allowChunkCitations,
+      effectiveAllowChunkCitations,

Also applies to: 2140-2141, 2490-2492, 2554-2556

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/chat/chat.ts` around lines 2100 - 2102, The PR mutates the input
parameter allowChunkCitations; instead compute an explicit derived flag (e.g.,
effectiveAllowChunkCitations) based on the same conditions (fileIds.length > 0
|| (folderIds && folderIds.length > 0)) and use that new variable wherever the
mutated value is currently relied on (including the block that calls
generateAnswerSpan?.startSpan("file_search") and subsequent uses). Update all
other places mentioned (around the checks at the other occurrences) to stop
assigning to allowChunkCitations and instead pass/use
effectiveAllowChunkCitations, keeping the original parameter immutable and
preserving intent at call sites.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@frontend/src/components/AttachmentGallery.tsx`:
- Around line 89-90: AttachmentGallery renders images assuming images[*].url
exists but metadata can be partial; update all render sites (the main image
render, thumbnail renders, and modal image render used alongside
downloadCurrentImage) to use a safe fallback for src and alt — e.g., use
images[idx].thumbnailUrl || images[idx].url || a local placeholder image, and
alt fallback to images[idx].fileName || 'attachment'; also guard rendering when
the images array item is undefined (conditional render or skip) so the component
won't try to access .url on missing metadata. Locate these changes in the
AttachmentGallery component where images[0].url/images[0].fileName and the
similar usages (thumbnail and modal image render functions) are referenced and
replace them with the fallback logic and conditional checks.

In `@frontend/src/routes/_authenticated/chat.tsx`:
- Around line 1248-1250: The race occurs because selectedCitation is read before
the await and can change during the async fetch; capture the identity to compare
after the await instead: store a local key (e.g., const expectedId =
selectedCitation?.itemId ?? selectedCitation?.docId ?? null) before the await,
then after the await re-check against the current selectedCitation
(selectedCitation?.itemId and selectedCitation?.docId) or compare to documentId
as appropriate and bail out if they differ; update the conditional in the block
that currently references selectedCitation, itemId, docId, and documentId so it
uses the captured expectedId and the current selectedCitation to avoid
highlighting stale responses.

In `@server/api/files.ts`:
- Around line 349-351: The response `url` is being built from the per-sheet
`docId` while `fileId` is returned as `vespaId`, causing mismatches; update the
code that computes attachmentApiUrl (and the similar block around the
vespa/expand logic) to derive the sheet id from the returned `fileId`/`vespaId`
(use expandSheetIds(fileId)[0] or expandSheetIds(vespaId)[0]) instead of
expandSheetIds(docId)[0], and ensure attachmentApiUrl =
`/api/v1/attachments/${fileId}` (or `${vespaId}`) so the `url` and `fileId` are
always aligned for sheet attachments.
- Around line 292-309: Deletion currently only removes Vespa docs/images but
leaves non-image files persisted at paths generated by getStoragePath (using
collectionId and storageKey) and recorded in metadata.filePath; update the
attachment deletion flow (the code that removes Vespa entries/images) to also
remove the physical file at metadata.filePath: check that metadata.filePath
exists, perform a safe unlink/rm (handling ENOENT and errors), and optionally
remove empty parent directories after deleting the file. Ensure you reference
the same storage path logic (collectionId, storageKey, getStoragePath) so you
delete the exact file written earlier and avoid race conditions by using async
file APIs with try/catch and logging on failure.
- Around line 298-303: Sanitize the raw multipart filename before passing it to
getStoragePath to prevent path traversal or unexpected separators: replace or
remove path separators and control characters, normalize unicode, limit length,
and optionally enforce a whitelist of allowed characters; then use the sanitized
filename instead of file.name in the filePath construction (the call that
combines user.workspaceExternalId, collectionId, storageKey, file.name into
getStoragePath). Ensure the sanitization is applied consistently wherever
filenames are accepted and consider adding a test for malicious names.

---

Outside diff comments:
In `@server/api/chat/utils.ts`:
- Around line 642-649: The chunk citation indexing is being shifted by +1 when
building K[n_m] in the mapping callback (the anonymous function using
textToChunkCitationIndex and citationMap) while the lookup path subtracts 1
later, causing off-by-one errors; update the mapping to use the raw parsed
docIndex (remove the +1 when creating `K[...]`) and also remove the -1
adjustment in the corresponding lookup code (the reverse remap logic that
currently does subtract 1) so that both the emitter (textToChunkCitationIndex
path) and the resolver use 0-based document indices consistently for K[n_m].

---

Duplicate comments:
In `@server/ai/context.ts`:
- Around line 496-500: The code passes fields.image_chunks_pos_summary as
number[] into getSortedScoredImageChunks without checking it exists or matches
summaryStrings length; update both places where getSortedScoredImageChunks is
called (the matchfeatures branches) to guard the value: compute a
safeImagePosSummary = Array.isArray(fields.image_chunks_pos_summary) &&
fields.image_chunks_pos_summary.length === summaryStrings.length ?
fields.image_chunks_pos_summary : undefined (or an empty array if the scorer
expects that), and pass safeImagePosSummary instead of casting
fields.image_chunks_pos_summary directly; ensure getSortedScoredImageChunks and
any downstream logic handle undefined/empty input safely.

In `@server/api/chat/chat.ts`:
- Around line 5149-5151: When merging attachmentFileIds into fileIds (the code
paths that currently do fileIds = fileIds.concat(attachmentFileIds)), ensure you
deduplicate so duplicate IDs are not appended: replace the direct concat with
logic that filters attachmentFileIds against existing fileIds (or create a Set
of fileIds and add only new IDs) so only unique IDs are kept; apply the same
change at both merge sites that use fileIds and attachmentFileIds (the concat
calls) so downstream retrieval/citation routines don't see repeated IDs.

---

Nitpick comments:
In `@server/api/chat/chat.ts`:
- Around line 5057-5058: Create a small helper function (e.g.,
expandAttachmentFileIds or getAttachmentFileIds) that takes attachmentMetadata
and returns attachmentMetadata.flatMap(m => expandSheetIds(m.fileId)); replace
the inline calls in the Message and Retry paths (references: attachmentFileIds,
isMsgWithAttachments, and usages at the other locations indicated) to call this
helper so the expansion logic is centralized and behavior stays consistent
across expandSheetIds usages.
- Around line 2100-2102: The PR mutates the input parameter allowChunkCitations;
instead compute an explicit derived flag (e.g., effectiveAllowChunkCitations)
based on the same conditions (fileIds.length > 0 || (folderIds &&
folderIds.length > 0)) and use that new variable wherever the mutated value is
currently relied on (including the block that calls
generateAnswerSpan?.startSpan("file_search") and subsequent uses). Update all
other places mentioned (around the checks at the other occurrences) to stop
assigning to allowChunkCitations and instead pass/use
effectiveAllowChunkCitations, keeping the original parameter immutable and
preserving intent at call sites.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cc6083 and feec210.

📒 Files selected for processing (9)
  • frontend/src/components/AttachmentGallery.tsx
  • frontend/src/components/CitationLink.tsx
  • frontend/src/routes/_authenticated/chat.tsx
  • server/ai/context.ts
  • server/api/agent/workflowAgentUtils.ts
  • server/api/chat/chat.ts
  • server/api/chat/utils.ts
  • server/api/files.ts
  • server/api/workflow.ts
💤 Files with no reviewable changes (1)
  • server/api/agent/workflowAgentUtils.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/src/components/CitationLink.tsx

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
server/api/files.ts (2)

349-351: ⚠️ Potential issue | 🟠 Major

attachmentApiUrl vs vespaId mismatch for multi-sheet files is unresolved.

attachmentApiUrl is overwritten on each sheet iteration and ends as /api/v1/attachments/${lastSheetDocId} (e.g. _sheet_2), while vespaId returned as metadata.fileId is _sheet_${totalSheets} (e.g. _sheet_3). Any consumer using metadata.url and metadata.fileId will observe different IDs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/files.ts` around lines 349 - 351, The loop that builds per-sheet
metadata overwrites attachmentApiUrl each iteration so it ends up pointing to
the last sheet (attachmentApiUrl = `/api/v1/attachments/${docId}`) while
vespaId/metadata.fileId is set to a different sheet id; fix by computing the
attachment URL from the same identifier used for vespaId/metadata.fileId inside
the sheet loop (e.g., derive attachmentApiUrl from vespaDoc.id or
metadata.fileId for that sheet) instead of mutating a single outer
attachmentApiUrl, or store per-sheet attachmentApiUrl alongside
vespaDoc/metadata so metadata.url and metadata.fileId always reference the same
sheet id.

298-303: ⚠️ Potential issue | 🔴 Critical

file.name is still passed unsanitized to getStoragePath.

A crafted filename with .. or path separators can break the storage path. The fix was suggested in a prior review but has not been applied yet.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/files.ts` around lines 298 - 303, file.name is passed unsanitized
into getStoragePath which allows crafted names like "../foo" to escape intended
storage; before calling getStoragePath (in the block that constructs filePath)
sanitize the filename (e.g., const safeName = path.basename(file.name) or a
dedicated sanitizeFileName(file.name) that strips path separators, "..",
null/control chars and enforces a max length and non-empty value) and then pass
safeName to getStoragePath instead of file.name; update any validation error
handling to reject or normalize invalid names.
🧹 Nitpick comments (1)
server/api/files.ts (1)

766-821: handleThumbnailServe includes kb_files in allowedBaseDirs unnecessarily.

Thumbnails are only generated for image attachments and stored under IMAGE_DIR. Non-image attachments never have a thumbnailPath in metadata (they'd 404 at line 801). The kb_files entry in allowedBaseDirs is therefore dead code here—harmless today, but could silently permit unexpected paths if a bug later writes a thumbnailPath under kb_files.

♻️ Suggested simplification
     const allowedBaseDirs = [
       path.resolve(process.env.IMAGE_DIR || "downloads/xyne_images_db"),
-      path.resolve(join(process.cwd(), "storage", "kb_files")),
     ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/files.ts` around lines 766 - 821, The allowedBaseDirs array in
handleThumbnailServe includes the kb_files directory unnecessarily, which could
permit unexpected thumbnail paths; remove the second entry that resolves
join(process.cwd(), "storage", "kb_files") from allowedBaseDirs so only the
IMAGE_DIR (process.env.IMAGE_DIR or "downloads/xyne_images_db") is allowed, and
update any inline comment to state thumbnails are only served from IMAGE_DIR;
ensure the isAllowed check continues to use resolvedPath and path.sep as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/ai/context.ts`:
- Around line 1248-1249: The fragment header currently uses a 1-based index
(citationIndex = index + 1) and capitalized "Index", which misaligns K[n_m]
citations that expect 0-based raw parsed doc indices; update the header
generation in the function that builds fragment strings to use the raw 0-based
index (remove the +1) and use a consistent token format (e.g., "index"
lowercase) when returning `Index ${...} \n ${fragment.content}` so it emits
`index ${index} \n ${fragment.content}` (reference symbols: citationIndex,
index, fragment.content).

In `@server/api/chat/message-agents.ts`:
- Line 3570: The logic that sets chunk-citation mode only checks uploaded
attachment metadata (attachmentMetadata.length > 0) so
referenced/referencedFileIds are ignored; update the condition used to derive
isMstWithAttachments / allowChunkCitations to consider both uploaded attachments
and referenced files (e.g., treat isMstWithAttachments as
attachmentMetadata.length > 0 || (referencedFileIds?.length ?? 0) > 0) and apply
the same change to the other occurrence around the allowChunkCitations usage
(the similar block at lines ~4051-4058) so chunk-citation formatting is enabled
whenever either source of attachments exists.
- Around line 3186-3231: The system prompt currently injects full attachment
text via fragmentsContent (built by answerContextMapFromFragments) into the
returned template, which risks prompt injection and bloats tokens; change
message construction so the system prompt contains only rules/policy and
placeholders (e.g., a reference like "See ATTACHMENT TOOL INPUT" or chunk
indices), move the actual fragmentsContent output out of the system-level string
and pass it via a tool/user message or a separate non-system input (e.g., the
tool that calls toDoWrite or the assistant message payload), update the return
template where the string contains "# ATTACHMENT CONTEXT
FRAGMENTS\n${fragmentsContent}" to instead reference the placeholder and ensure
answerContextMapFromFragments is called only when preparing the
tool/user-context payload, not embedded inside the system prompt.

In `@server/api/files.ts`:
- Around line 430-441: The cleanup loop uses a relative storageBaseDir computed
with join(...) which can mismatch an absolute currentDir from dirname(filePath);
change the storageBaseDir to an absolute path using path.resolve(process.cwd(),
"storage", "kb_files") (or resolve storageBaseDir and currentDir with
path.resolve/ path.normalize) so the while check (currentDir !== storageBaseDir
&& currentDir.length > storageBaseDir.length) behaves consistently with the
delete flow; update the storageBaseDir variable and/or resolve currentDir at
loop start to ensure both are absolute and comparable.
- Around line 432-433: Replace the three hardcoded join(process.cwd(),
"storage", "kb_files") usages in files.ts with a single canonical constant by
exporting KB_STORAGE_ROOT from knowledgeBase.ts and importing it into files.ts;
specifically, export KB_STORAGE_ROOT (the existing storage root used by
getStoragePath) from knowledgeBase.ts, add an import for KB_STORAGE_ROOT in
files.ts, and replace occurrences where storageBaseDir is set (the variable
currently initialized from join(process.cwd(), "storage", "kb_files")) with
path.resolve(KB_STORAGE_ROOT) so files.ts uses the same storage root as
getStoragePath and avoids divergence.

---

Duplicate comments:
In `@server/api/files.ts`:
- Around line 349-351: The loop that builds per-sheet metadata overwrites
attachmentApiUrl each iteration so it ends up pointing to the last sheet
(attachmentApiUrl = `/api/v1/attachments/${docId}`) while
vespaId/metadata.fileId is set to a different sheet id; fix by computing the
attachment URL from the same identifier used for vespaId/metadata.fileId inside
the sheet loop (e.g., derive attachmentApiUrl from vespaDoc.id or
metadata.fileId for that sheet) instead of mutating a single outer
attachmentApiUrl, or store per-sheet attachmentApiUrl alongside
vespaDoc/metadata so metadata.url and metadata.fileId always reference the same
sheet id.
- Around line 298-303: file.name is passed unsanitized into getStoragePath which
allows crafted names like "../foo" to escape intended storage; before calling
getStoragePath (in the block that constructs filePath) sanitize the filename
(e.g., const safeName = path.basename(file.name) or a dedicated
sanitizeFileName(file.name) that strips path separators, "..", null/control
chars and enforces a max length and non-empty value) and then pass safeName to
getStoragePath instead of file.name; update any validation error handling to
reject or normalize invalid names.

---

Nitpick comments:
In `@server/api/files.ts`:
- Around line 766-821: The allowedBaseDirs array in handleThumbnailServe
includes the kb_files directory unnecessarily, which could permit unexpected
thumbnail paths; remove the second entry that resolves join(process.cwd(),
"storage", "kb_files") from allowedBaseDirs so only the IMAGE_DIR
(process.env.IMAGE_DIR or "downloads/xyne_images_db") is allowed, and update any
inline comment to state thumbnails are only served from IMAGE_DIR; ensure the
isAllowed check continues to use resolvedPath and path.sep as before.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between feec210 and 0e54693.

📒 Files selected for processing (3)
  • server/ai/context.ts
  • server/api/chat/message-agents.ts
  • server/api/files.ts

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/api/knowledgeBase.ts (1)

1984-1994: ⚠️ Potential issue | 🔴 Critical

Attachment chunk retrieval can be rejected by the current type guard.

At Line 1984 you correctly switch to fileSchema for attachments, but Line 1992 still only allows sddocname === "kb_items". That can 404 valid attachment docs during citation chunk fetch.

💡 Proposed fix
-    if (resp.fields.sddocname && resp.fields.sddocname !== "kb_items") {
+    if (
+      !isAttachment &&
+      resp.fields.sddocname &&
+      resp.fields.sddocname !== "kb_items"
+    ) {
       throw new HTTPException(404, { message: "Invalid document type" })
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/knowledgeBase.ts` around lines 1984 - 1994, The type guard rejects
attachment documents because after calling GetDocument with fileSchema for
attachments you still only accept resp.fields.sddocname === "kb_items"; update
the check in the handler that calls GetDocument (referencing GetDocument,
fileSchema, KbItemsSchema and resp.fields.sddocname) to allow the attachment
document name as well (e.g., accept either "kb_items" or the file/attachment
sddocname used by fileSchema) so attachments don't 404 during citation chunk
fetch; ensure the conditional branches use isAttachment to decide which
sddocname value(s) are valid.
♻️ Duplicate comments (1)
frontend/src/routes/_authenticated/chat.tsx (1)

1209-1211: ⚠️ Potential issue | 🟠 Major

Post-await citation guard still uses a stale closure snapshot.

Line 1252 re-reads selectedCitation from the same callback closure as Line 1211, so this check won’t detect citation changes that happened during await. The stale-highlight race is still possible.

Suggested fix
+  const selectedCitationIdRef = useRef<string | null>(null)
+  useEffect(() => {
+    selectedCitationIdRef.current =
+      selectedCitation?.itemId ?? selectedCitation?.docId ?? null
+  }, [selectedCitation])
+
   const handleChunkIndexChange = useCallback(
     async (newChunkIndex: number | null, documentId: string, docId: string) => {
-      const expectedCitationId = selectedCitation?.itemId ?? selectedCitation?.docId ?? null;
+      const expectedCitationId = selectedCitationIdRef.current
 
       if (expectedCitationId && !documentId) {
         console.error("handleChunkIndexChange called without documentId");
         return;
       }
@@
-      const currentCitationId = selectedCitation?.itemId ?? selectedCitation?.docId ?? null;
+      const currentCitationId = selectedCitationIdRef.current
       if (currentCitationId !== expectedCitationId) {
         return
       }
@@
-    [selectedCitation, toast, documentOperationsRef],
+    [toast, documentOperationsRef],
   )
#!/bin/bash
# Verify whether identity checks inside handleChunkIndexChange rely on closed-over selectedCitation.
rg -n -C4 'handleChunkIndexChange|expectedCitationId|currentCitationId|selectedCitationIdRef|selectedCitation\?\.(itemId|docId)' frontend/src/routes/_authenticated/chat.tsx

Also applies to: 1250-1254

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/routes/_authenticated/chat.tsx` around lines 1209 - 1211, The
post-await citation guard in handleChunkIndexChange uses the closed-over
selectedCitation (captured into expectedCitationId) so re-reading
selectedCitation later in the same closure (lines ~1250) can still be stale;
change the logic to capture and compare against a stable, up-to-date reference
instead: create or use an existing selectedCitationRef (or
selectedCitationIdRef) that is updated whenever selectedCitation changes,
compute expectedCitationId from selectedCitation once before the async work,
then after await compare the current identity via selectedCitationRef.current
(or selectedCitationIdRef.current) instead of reading selectedCitation from the
closure; update or add the ref wiring where selection changes to keep it current
and use it in handleChunkIndexChange for the post-await guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/api/files.ts`:
- Line 393: The inline comment referencing encoding the absolute storagePath
into a URL is now outdated due to the opaque attachment endpoint model; update
or remove that comment near the code handling storagePath/attachment URL
construction (e.g., references to storagePath or generation of attachment
endpoints) to reflect that we no longer expose raw paths and instead use opaque
attachment endpoints, or simply delete the stale note to prevent confusion.

---

Outside diff comments:
In `@server/api/knowledgeBase.ts`:
- Around line 1984-1994: The type guard rejects attachment documents because
after calling GetDocument with fileSchema for attachments you still only accept
resp.fields.sddocname === "kb_items"; update the check in the handler that calls
GetDocument (referencing GetDocument, fileSchema, KbItemsSchema and
resp.fields.sddocname) to allow the attachment document name as well (e.g.,
accept either "kb_items" or the file/attachment sddocname used by fileSchema) so
attachments don't 404 during citation chunk fetch; ensure the conditional
branches use isAttachment to decide which sddocname value(s) are valid.

---

Duplicate comments:
In `@frontend/src/routes/_authenticated/chat.tsx`:
- Around line 1209-1211: The post-await citation guard in handleChunkIndexChange
uses the closed-over selectedCitation (captured into expectedCitationId) so
re-reading selectedCitation later in the same closure (lines ~1250) can still be
stale; change the logic to capture and compare against a stable, up-to-date
reference instead: create or use an existing selectedCitationRef (or
selectedCitationIdRef) that is updated whenever selectedCitation changes,
compute expectedCitationId from selectedCitation once before the async work,
then after await compare the current identity via selectedCitationRef.current
(or selectedCitationIdRef.current) instead of reading selectedCitation from the
closure; update or add the ref wiring where selection changes to keep it current
and use it in handleChunkIndexChange for the post-await guard.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e54693 and 6b2ad55.

📒 Files selected for processing (5)
  • frontend/src/components/AttachmentGallery.tsx
  • frontend/src/routes/_authenticated/chat.tsx
  • server/api/files.ts
  • server/api/knowledgeBase.ts
  • server/integrations/ribbie/index.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/src/components/AttachmentGallery.tsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant