Chat with Your PDFs 📚

This project allows you to interact with your PDF documents using a conversational AI model. Users can upload multiple PDFs, and the app will extract text from them, process it into manageable chunks, and allow users to ask questions about the content of the documents. The system leverages Langchain, HuggingFace, and FAISS for efficient document processing and retrieval.

Features

Upload multiple PDF documents.
Text extraction from PDF documents.
Split large documents into smaller, manageable chunks.
Use of HuggingFace's FLAN-T5 for question answering.
Store embeddings of text chunks in a FAISS vector store for fast retrieval.
Conversational UI for interactive Q&A about the uploaded PDFs.

Requirements

Python 3.11 or later
Streamlit
langchain
langchain-community
PyPDF2
HuggingFace Hub
FAISS
dotenv
other dependencies from requirements.txt

Install Dependencies

To set up the project environment, create a virtual environment and install the required dependencies:

Create a virtual environment:
```
py -3.11 -m venv myenv
```

How It Works

1. Upload PDF Files:

Upload one or more PDF files via the sidebar.

2. Process PDFs:

Once the PDFs are uploaded, click the "Process" button. The app will extract text from the PDFs and split it into chunks for efficient searching.

3. Ask Questions:

You can now ask questions about the content of the documents using the text input field. The AI will retrieve relevant information from the uploaded PDFs and respond to your query.

Code Walkthrough

1. Text Extraction from PDFs (`get_pdf_text`)

This function uses the PyPDF2 library to extract text from the uploaded PDF files.

2. Text Chunking (`get_text_chunks`)

The extracted text is split into smaller chunks using CharacterTextSplitter. This helps with efficient processing and retrieval of relevant information.

3. Vector Store Creation (`get_vectorstore`)

The text chunks are converted into embeddings using HuggingFaceInstructEmbeddings with the hkunlp/instructor-xl model. These embeddings are then stored in a FAISS vector store for fast retrieval.

4. Conversational Chain (`get_conversation_chain`)

The conversational chain is set up using Langchain, where HuggingFaceHub (with the google/flan-t5-xxl model) is used to generate responses. The ConversationBufferMemory is used to maintain the context of the conversation.

5. User Input Handling (`handle_userinput`)

User input is processed by calling the conversation chain. The conversation history is updated with each new interaction.

6. Streamlit UI (`main`)

The Streamlit UI allows the user to upload PDFs, ask questions, and view the responses from the AI model. The conversation is displayed interactively on the page.

Running the Application

To run the application, execute the following command in your terminal:

streamlit run app.py
## Future Improvements

- **Support for more file types** (e.g., DOCX, TXT).
- **Enhance text chunking strategy** for better context understanding.
- **Add additional AI models** for improved question answering.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
app.py		app.py
htmlTemplates.py		htmlTemplates.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chat with Your PDFs 📚

Features

Requirements

Install Dependencies

How It Works

1. Upload PDF Files:

2. Process PDFs:

3. Ask Questions:

Code Walkthrough

1. Text Extraction from PDFs (`get_pdf_text`)

2. Text Chunking (`get_text_chunks`)

3. Vector Store Creation (`get_vectorstore`)

4. Conversational Chain (`get_conversation_chain`)

5. User Input Handling (`handle_userinput`)

6. Streamlit UI (`main`)

Running the Application

About

Uh oh!

Releases

Packages

Languages

essiebx/ChatWithYourDocs

Folders and files

Latest commit

History

Repository files navigation

Chat with Your PDFs 📚

Features

Requirements

Install Dependencies

How It Works

1. Upload PDF Files:

2. Process PDFs:

3. Ask Questions:

Code Walkthrough

1. Text Extraction from PDFs (get_pdf_text)

2. Text Chunking (get_text_chunks)

3. Vector Store Creation (get_vectorstore)

4. Conversational Chain (get_conversation_chain)

5. User Input Handling (handle_userinput)

6. Streamlit UI (main)

Running the Application

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Text Extraction from PDFs (`get_pdf_text`)

2. Text Chunking (`get_text_chunks`)

3. Vector Store Creation (`get_vectorstore`)

4. Conversational Chain (`get_conversation_chain`)

5. User Input Handling (`handle_userinput`)

6. Streamlit UI (`main`)

Packages