Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct way to implement RAG with vllm #103

Open
Hel1zor opened this issue Aug 22, 2024 · 0 comments
Open

Correct way to implement RAG with vllm #103

Hel1zor opened this issue Aug 22, 2024 · 0 comments

Comments

@Hel1zor
Copy link

Hel1zor commented Aug 22, 2024

Is this the correct way to implement RAG with vllm API or is there a better or easier way to do it?

`import os
from langchain_pinecone import PineconeVectorStore
from runpod.client import Client as RunPodClient  # Use RunPod client
from sentence_transformers import SentenceTransformer  # Open-source embedding model

# Environment variables (replace with your keys)
os.environ['RUNPOD_API_KEY'] = '<YOUR_RUNPOD_API_KEY>'
os.environ['PINECONE_API_KEY'] = '<YOUR_PINECONE_API_KEY>'

# RunPod client setup
client = RunPodClient(
    api_key=os.environ['RUNPOD_API_KEY'],
    base_url="https://api.runpod.ai/v2/vllm-cf5z42rtrzdc2o/openai/v1",
)

# Set up SentenceTransformer embeddings
embedding_model_name = 'sentence-transformers/all-MiniLM-L6-v2'  # Example model
embeddings = SentenceTransformer(embedding_model_name)

# Function to create embeddings for a list of texts
def embed_texts(texts):
    return embeddings.encode(texts)

# Connect to the existing Pinecone index
index_name = "chatbot-v1"
vectorstore = PineconeVectorStore(index_name=index_name, embedding_function=embed_texts)

# Query the existing index
from langchain_openai import ChatOpenAI  
from langchain.chains import RetrievalQA  

# Set up the language model with RunPod-hosted model
llm = ChatOpenAI(
    openai_api_key=os.environ['RUNPOD_API_KEY'],
    model_name='openchat/openchat-3.5-1210',  # Use your model's name
    temperature=0.0
)

# Set up Retrieval QA chain
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever()
)

# Example query
query = "Who was Benito Mussolini?"
response = qa.run(query)

# Print the response
print(response)
`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant