Skip to content

essiebx/ChatWithYourDocs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Chat with Your PDFs ๐Ÿ“š

This project allows you to interact with your PDF documents using a conversational AI model. Users can upload multiple PDFs, and the app will extract text from them, process it into manageable chunks, and allow users to ask questions about the content of the documents. The system leverages Langchain, HuggingFace, and FAISS for efficient document processing and retrieval.

Features

  • Upload multiple PDF documents.
  • Text extraction from PDF documents.
  • Split large documents into smaller, manageable chunks.
  • Use of HuggingFace's FLAN-T5 for question answering.
  • Store embeddings of text chunks in a FAISS vector store for fast retrieval.
  • Conversational UI for interactive Q&A about the uploaded PDFs.

Requirements

  • Python 3.11 or later
  • Streamlit
  • langchain
  • langchain-community
  • PyPDF2
  • HuggingFace Hub
  • FAISS
  • dotenv
  • other dependencies from requirements.txt

Install Dependencies

To set up the project environment, create a virtual environment and install the required dependencies:

  1. Create a virtual environment:
    py -3.11 -m venv myenv

How It Works

1. Upload PDF Files:

Upload one or more PDF files via the sidebar.

2. Process PDFs:

Once the PDFs are uploaded, click the "Process" button. The app will extract text from the PDFs and split it into chunks for efficient searching.

3. Ask Questions:

You can now ask questions about the content of the documents using the text input field. The AI will retrieve relevant information from the uploaded PDFs and respond to your query.

Code Walkthrough

1. Text Extraction from PDFs (get_pdf_text)

This function uses the PyPDF2 library to extract text from the uploaded PDF files.

2. Text Chunking (get_text_chunks)

The extracted text is split into smaller chunks using CharacterTextSplitter. This helps with efficient processing and retrieval of relevant information.

3. Vector Store Creation (get_vectorstore)

The text chunks are converted into embeddings using HuggingFaceInstructEmbeddings with the hkunlp/instructor-xl model. These embeddings are then stored in a FAISS vector store for fast retrieval.

4. Conversational Chain (get_conversation_chain)

The conversational chain is set up using Langchain, where HuggingFaceHub (with the google/flan-t5-xxl model) is used to generate responses. The ConversationBufferMemory is used to maintain the context of the conversation.

5. User Input Handling (handle_userinput)

User input is processed by calling the conversation chain. The conversation history is updated with each new interaction.

6. Streamlit UI (main)

The Streamlit UI allows the user to upload PDFs, ask questions, and view the responses from the AI model. The conversation is displayed interactively on the page.

Running the Application

To run the application, execute the following command in your terminal:

streamlit run app.py
## Future Improvements

- **Support for more file types** (e.g., DOCX, TXT).
- **Enhance text chunking strategy** for better context understanding.
- **Add additional AI models** for improved question answering.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages