Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(chat-attachment): Upload PDF file as Chat Attachment #149

Open
wants to merge 43 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
dd59f6e
Upload files from frontend to server
kalpadhwaryu Nov 18, 2024
20dc9c0
Merge branch 'main' of https://github.com/xynehq/xyne into pdf-upload
kalpadhwaryu Nov 18, 2024
cc5d49f
Fix selected files UI
kalpadhwaryu Nov 18, 2024
c83dcd6
Fix Staged Files UI
kalpadhwaryu Nov 19, 2024
d8fe300
Add File Type in Staged UI
kalpadhwaryu Nov 19, 2024
25490e0
Ingest uploaded PDF in vespa
kalpadhwaryu Nov 19, 2024
4253d54
Add uploaded file to downloads folder and delete them when done
kalpadhwaryu Nov 20, 2024
3967d39
Add uploaded file metadata in postgres
kalpadhwaryu Nov 20, 2024
817802b
Fix attachments metadata & send it also as sources
kalpadhwaryu Nov 21, 2024
48c9542
Use Toasts for errors
kalpadhwaryu Nov 21, 2024
5686951
Add UI for file upload in messages
kalpadhwaryu Nov 21, 2024
eea9cbe
Add new chatAttachment schema & insert accordingly
kalpadhwaryu Nov 25, 2024
ce11bb1
Add tanstack router context & use state from it
kalpadhwaryu Nov 26, 2024
530edc1
Use for..of instead of forEach
kalpadhwaryu Nov 26, 2024
066e368
Add chatAttachment context
kalpadhwaryu Nov 26, 2024
1e02727
Small Fix and Comments
kalpadhwaryu Nov 26, 2024
5adbd85
Add attachments also to message as attachments
kalpadhwaryu Nov 26, 2024
3d53b21
Merge branch 'main' of https://github.com/xynehq/xyne into pdf-upload
kalpadhwaryu Nov 26, 2024
56ab70d
Updated chatAttachment schema
kalpadhwaryu Nov 26, 2024
2e5540e
New chatAttachment schema
kalpadhwaryu Nov 26, 2024
744ba98
Remove permissions from schema and add chatId, messageId to chat Atta…
kalpadhwaryu Nov 27, 2024
f65c5f5
Add attachments metadata to only message, fix docId and title
kalpadhwaryu Nov 27, 2024
853347f
Add a loader to indicate file upload
kalpadhwaryu Nov 28, 2024
8359f7c
Add query, setQuery as global state
kalpadhwaryu Nov 28, 2024
5865789
Diable sending if streaming is on
kalpadhwaryu Nov 28, 2024
e1870aa
Add searchVespaWithChatAttach fn
kalpadhwaryu Nov 29, 2024
77c6cd0
Fix getName & getIcon for showing attachments
kalpadhwaryu Nov 29, 2024
debea4e
Add hasAttachments flag to decide search function to use
kalpadhwaryu Nov 29, 2024
a8feffc
Improve chat update logic
kalpadhwaryu Nov 29, 2024
858e669
Fix types and remove unncessary code
kalpadhwaryu Nov 29, 2024
f708394
Fix types for searchToCitation fn
kalpadhwaryu Nov 29, 2024
308c5a1
Fix zValidator for Upload api & add todo
kalpadhwaryu Nov 29, 2024
8217a93
Merge branch 'main' of https://github.com/xynehq/xyne into pdf-upload
kalpadhwaryu Nov 29, 2024
e94012e
Change Models & Remove unnecessary code
kalpadhwaryu Dec 2, 2024
f9bd178
Use MessageApiV2
kalpadhwaryu Dec 3, 2024
2387083
Restore MessageApi
kalpadhwaryu Dec 3, 2024
2e5a687
Merge branch 'main' of https://github.com/xynehq/xyne into pdf-upload
kalpadhwaryu Dec 3, 2024
51b8a9f
Merge main branch
kalpadhwaryu Jan 19, 2025
9a299f9
Merge branch 'main' of https://github.com/xynehq/xyne into pdf-upload
kalpadhwaryu Jan 19, 2025
ecfc704
Remove unnecessary code
kalpadhwaryu Jan 19, 2025
a040cff
Fix types
kalpadhwaryu Jan 20, 2025
fc1978b
Merge branch 'main' of https://github.com/xynehq/xyne into pdf-upload
kalpadhwaryu Jan 20, 2025
2db5863
Merge branch 'main' of https://github.com/xynehq/xyne into pdf-upload
kalpadhwaryu Jan 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add uploaded file to downloads folder and delete them when done
kalpadhwaryu committed Nov 20, 2024
commit 4253d5490c4cade2c572249fd8ef9bf70d096c2a
51 changes: 48 additions & 3 deletions server/api/chat.ts
Original file line number Diff line number Diff line change
@@ -81,6 +81,9 @@ import { getConnInfo } from "hono/bun"
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf"
import type { Document } from "@langchain/core/documents"
import { chunkDocument } from "@/chunks"
import { deleteDocument, downloadDir } from "@/integrations/google"
import fs from "node:fs"
import path from "node:path"

const { JwtPayloadKey, maxTokenBeforeMetadataCleanup } = config
const Logger = getLogger(Subsystem.Chat)
@@ -141,22 +144,57 @@ export const GetChatApi = async (c: Context) => {
}
}

const blobToBuffer = async (blob: Blob) => {
const arrayBuffer = await blob.arrayBuffer() // Convert Blob to ArrayBuffer
return Buffer.from(arrayBuffer) // Convert ArrayBuffer to Buffer
}

const saveToDownloads = async (file: Blob) => {
if (!fs.existsSync(downloadDir)) {
fs.mkdirSync(downloadDir, { recursive: true })
}

// Define the file path
const filePath = path.join(downloadDir, file?.name)

// Convert the Blob to a Buffer
const fileBuffer = await blobToBuffer(file)

// Save the file
try {
await fs.promises.writeFile(filePath, fileBuffer)
Logger.info(`File saved successfully to ${filePath}`)
} catch (err) {
console.error("Error saving file:", err)
await deleteDocument(filePath)
}
}

const handlePDFFile = async (file: Blob) => {
let wasDownloaded = false
try {
// saving the uploaded file in downloads folder
await saveToDownloads(file)
wasDownloaded = true

let docs: Document[] = []
const loader = new PDFLoader(file)
const filePath = `${downloadDir}/${file?.name}`
const loader = new PDFLoader(filePath)
docs = await loader.load()

if (!docs || docs.length === 0) {
Logger.error(`Could not get content for file: ${file.name}. Skipping it`)
await deleteDocument(filePath)
return
}

const chunks = docs.flatMap((doc) => chunkDocument(doc.pageContent))

const dateTime = new Date().getTime()

const pdfToIngest = {
title: file.name!,
url: "",
url: "https://google.com",
app: Apps.GoogleDrive, // todo what here
docId: `${file.name}-upload-PDF`, // create id here, maybe??
owner: "kalp.a@xynehq.com", // todo how to get userEmail
@@ -166,13 +204,20 @@ const handlePDFFile = async (file: Blob) => {
chunks: chunks.map((v) => v.chunk),
permissions: ["kalp.a@xynehq.com"],
mimeType: "application/pdf",
metadata: "",
metadata: "Metadata",
createdAt: dateTime,
updatedAt: dateTime,
}

await insertDocument(pdfToIngest)

// Delete the file here
await deleteDocument(filePath)
} catch (err) {
if (wasDownloaded) {
const filePath = `${downloadDir}/${file?.name}`
await deleteDocument(filePath)
}
Logger.error(
`Error handling PDF ${file.name}: ${err} ${(err as Error).stack}`,
err,