This project is a Jupyter Notebook for parsing text from PDF files using Python. It utilizes the PyMuPDF
library for extracting text and the openai
library for any subsequent processing.
To use this project, you need to have Python installed on your system. You can install the required libraries using pip
.
pip install pymupdf openai python-dotenv
-
Clone the repository:
git clone <repository-url> cd <repository-directory>
-
Set up your environment: Create a
.env
file in the root directory of the project and add your OpenAI API key:OPENAI_KEY=your_openai_api_key
-
Run the Jupyter Notebook:
jupyter notebook pdf_parser.ipynb
-
Extract text from a PDF: Follow the instructions in the notebook to load a PDF file and extract text from it.
- Extract text from PDF files.
- Simple and easy-to-use interface.
- Integration with OpenAI API for additional text processing (e.g., summarization, translation).
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch:
git checkout -b my-feature-branch
. - Make your changes and commit them:
git commit -m 'Add some feature'
. - Push to the branch:
git push origin my-feature-branch
. - Submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.