Skip to content

Latest commit

 

History

History
49 lines (38 loc) · 2.1 KB

File metadata and controls

49 lines (38 loc) · 2.1 KB

Document AI PDF Annotator Sample

This project uses the Document AI API to annotate PDF documents.

Quick start

  1. Install Python
  2. Install the prerequisites: pip install -r requirements.txt
  3. Install the Google Cloud SDK
  4. Run gcloud init and create a new project
  5. Enable the Document AI API: gcloud services enable documentai.googleapis.com
  6. Setup application default authentication, run: gcloud auth application-default login
  7. Clone this repo and run the sample: python main.py -i invoice.pdf. You should see the annotated document in the current directory named invoice_annotated.pdf.

Setup

Install dependencies

  1. Install pyenv: https://github.com/pyenv/pyenv#installation
  2. Use pyenv to install the latest version of Python 3 for example, to install Python version 3.10.1, run: pyenv install 3.10.1
  3. Create a Python virtual environment with the installed version of Python 3, for example, to create a Python 3.10.1 virtual environment called docai-annotator, run: pyenv virtualenv 3.10.1 docai-annotator
  4. Clone this repo and cd to the root of the repo
  5. Configure pyenv to use the virtual python environment we created earlier when in this repo: pyenv local docai-annotator
  6. Install the prerequisites: pip install -r requirements.txt

Setup Google Cloud

  1. Install the Cloud SDK: https://cloud.google.com/sdk/docs/install
  2. Run gcloud init, to create a new project, and link a billing to your project
  3. Enable the Document AI API: gcloud services enable documentai.googleapis.com
  4. Setup application default authentication, run: gcloud auth application-default login

Testing

Manual

  1. Run the sample: python main.py -i invoice.pdf
  2. Check to see the annotated version of the PDF created in the current directory with the name invoice_annotated.pdf.