A Python application that extracts text from PDF documents and processes it using OpenAI's GPT models to answer questions about the document content.
This application provides a simple interface to:
- π Extract text from PDF documents
- π§ Process the text using LangChain and OpenAI's GPT models
- β Ask questions about the document contents and receive AI-generated answers
This code was originally created by Professor Daniel Cavalieri and adapted by Paulo Sergio dos Santos JΓΊnior.
- π Python 3.6+
- π OpenAI API key
- Clone the repository:
git clone https://github.com/paulossjunior/OpenAIandPDF.git
cd OpenAIandPDF
- Install required dependencies:
pip install -r requirements.txt
- Create a
.env
file in the root directory with your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key_here
- Place your PDF file in the project directory or specify the path in the code.
- Run the main program:
python program_gpt.py
- The program will:
- π Load your OpenAI API key from the .env file
- π Process the PDF file specified in the code (default: "edital.pdf")
- 𧩠Ask a predefined question about the document ("Qual o objetivo do Edital")
- π Print the answer from GPT
To ask different questions, modify the send_question
parameter in program_gpt.py
:
answer = gpt.send_question("Your question here")
To use a different PDF file, change the pdf_path
variable:
pdf_path = "your_document.pdf"
program_gpt.py
: Main entry point for the applicationfapes_gpt.py
: Contains theGPT
class with methods for processing PDFs and interacting with the OpenAI API
- π PDF text extraction
- π Integration with OpenAI's GPT models (default: gpt-4o-mini)
- 𧩠Simple API for asking questions about document content
- π Environment variable support for secure API key storage
- The code has an error in the
__create_chain
method, where the chain assignment is missing. - The private methods
__create_prompt
,__chunkify_txt
, and__get_vector
are defined but not used in the current workflow.