A Python application that extracts text from PDF documents and processes it using OpenAI's GPT models to answer questions about the document content.
This application provides a simple interface to:
- 📄 Extract text from PDF documents
- 🧠 Process the text using LangChain and OpenAI's GPT models
- ❓ Ask questions about the document contents and receive AI-generated answers
This code was originally created by Professor Daniel Cavalieri and adapted by Paulo Sergio dos Santos Júnior.
- 🐍 Python 3.6+
- 🔑 OpenAI API key
- Clone the repository:
git clone https://github.com/paulossjunior/OpenAIandPDF.git
cd OpenAIandPDF
- Install required dependencies:
pip install -r requirements.txt
- Create a
.env
file in the root directory with your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key_here
- Place your PDF file in the project directory or specify the path in the code.
- Run the main program:
python program_gpt.py
- The program will:
- 🔐 Load your OpenAI API key from the .env file
- 📝 Process the PDF file specified in the code (default: "edital.pdf")
- 🧩 Ask a predefined question about the document ("Qual o objetivo do Edital")
- 📊 Print the answer from GPT
To ask different questions, modify the send_question
parameter in program_gpt.py
:
answer = gpt.send_question("Your question here")
To use a different PDF file, change the pdf_path
variable:
pdf_path = "your_document.pdf"
program_gpt.py
: Main entry point for the applicationfapes_gpt.py
: Contains theGPT
class with methods for processing PDFs and interacting with the OpenAI API
- 📄 PDF text extraction
- 🔌 Integration with OpenAI's GPT models (default: gpt-4o-mini)
- 🧩 Simple API for asking questions about document content
- 🔒 Environment variable support for secure API key storage
- The code has an error in the
__create_chain
method, where the chain assignment is missing. - The private methods
__create_prompt
,__chunkify_txt
, and__get_vector
are defined but not used in the current workflow.