Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onnxruntime-node uncompressed too large for NextJS 15 API routes #1164

Open
raymondhechen opened this issue Jan 23, 2025 · 1 comment
Open
Labels
question Further information is requested

Comments

@raymondhechen
Copy link

raymondhechen commented Jan 23, 2025

Question

Hello! I'm trying to deploy xenova/bge-small-en-v1.5 locally to embed text in an Next 15 API route, but I'm encountering this error with the route's unzipped max size exceeding 250 MB. Wanted to check in to see if there's some error on my side? Doesn't seem like onnxruntime-node should be ~720 MB uncompressed by itself? Thanks!

Image

generateEmbeddingV2() below is called within the API route.

import {
  FeatureExtractionPipeline,
  layer_norm,
  pipeline,
  PreTrainedTokenizer,
  env,
} from '@huggingface/transformers'

const MAX_TOKENS = 512
const MATRYOSHKA_DIM = 768

let cachedExtractor: FeatureExtractionPipeline | null = null
const getExtractor = async () => {
  if (!cachedExtractor) {
    cachedExtractor = await pipeline(
      'feature-extraction',
      'xenova/bge-small-en-v1.5',
      { dtype: 'fp16' }
    )
  }
  return cachedExtractor
}

const chunkText = (text: string, tokenizer: PreTrainedTokenizer) => {
  const tokens = tokenizer.encode(text)

  const chunks = []
  for (let i = 0; i < tokens.length; i += MAX_TOKENS) {
    const chunk = tokens.slice(i, i + MAX_TOKENS)
    chunks.push(chunk)
  }

  return chunks.map((chunk) => tokenizer.decode(chunk))
}

export const generateEmbeddingV2 = async (value: string) => {
  const extractor = await getExtractor()

  const chunks = chunkText(value, extractor.tokenizer)

  let embedding = await extractor(chunk[0], { pooling: 'mean' })
  embedding = layer_norm(embedding, [embedding.dims[1]])
    .slice(null, [0, MATRYOSHKA_DIM])
    .normalize(2, -1)

  return embedding.tolist()[0]
}

I also tried downloading the model file locally, but that didn't work in deployment either.

@raymondhechen raymondhechen added the question Further information is requested label Jan 23, 2025
@wassgha
Copy link

wassgha commented Jan 29, 2025

have the same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants