Skip to content

Document Truncation for PDFs with 16 pages with Vertex AI Gemini API #1773

@NSanjayRelyance

Description

@NSanjayRelyance

Hi, we're seeing a peculiar issue with the Document Understanding feature of the google-genai Library (specifically with the Vertex AI Gemini API).

  • When initializing the client object with: vertexai=True, we're observing that PDFs (specifically with 16 pages) are being truncated to 15 pages.
    • This issue isn't present with:
      • when the client is NOT initialized with vertexai=True (Gemini Developer API).
      • PDFs with different number of pages

Environment details

  • Programming language: python
  • OS: MacOS Sequoia 15.6.1
  • Language runtime version: 3.10.16
  • Package version: 1.52.0, 1.47.0, 1.24.0

Steps to reproduce

  1. Consider the 3 attached PDFs having 16 pages exactly:
    a. Initialize the client by setting vertexai=True and ask the model how many pages are present/contents of the last page.
    Eg: appsflyer_dpa.pdf
     # Init Client #
     client = genai.Client(vertexai=True, project=GCP_PROJECT, location=GCP_PROJECT_REGION)
     
     # Read File #
     with open("appsflyer_dpa.pdf", "rb") as f:
         appsflyer_doc_data = f.read()
     
     config = types.GenerateContentConfig(system_instruction="How many pages are in this document, and what's in the last page?", temperature=0.0, thinking_config=types.ThinkingConfig(
          thinking_budget=0
      ))
    
     parts = [Part.from_bytes(data=appsflyer_doc_data, mime_type='application/pdf')]
     contents = [Content(parts=parts, role="user")]
     
     page_count_response = await client.aio.models.generate_content(
          model="gemini-2.5-flash",
          contents=contents,
          config=config) # issue also occurs with synchronous client
     page_count_response.candidates[0].content.parts[0].text
    
     """
     Output: This document contains 15 pages.\n\nThe last page (page 15) is titled "APPENDIX 1 TO THE STANDARD CONTRACTUAL CLAUSES" and contains information about the data exporter, data importer, data subjects, categories of data, special categories of data, and processing operations related to the transfer of personal data. It also includes signature blocks for both the data exporter and data importer.
     """
    b. Initialize the client WITHOUT setting vertexai=True and repeat.
    client = genai.Client()
    # Rest of the snippet is the same
    
    """
    Output: 'This document contains **16 pages**.\n\nThe last page (page 16) is titled "APPENDIX 2 TO THE STANDARD CONTRACTUAL CLAUSES" and describes the **technical and organizational security measures** implemented by the data importer (AppsFlyer) in accordance with Clauses 4(d) and 5(c) of the Standard Contractual Clauses. It details measures related to:\n*   **Access Controls**\n*   **Physical and Environmental Security**\n*   **Application Security**\n*   **Vulnerability monitoring through penetration testing**\n*   **Data transfer security**\n*   **Networks security**'
    """
  2. The issue was replicated with two other 16 page PDFs (see attached broadcom_dpa.pdf)
  3. Its not present with PDFs having different number of pages. Happy to provide additional PDFs if needed

appsflyer_dpa.pdf
broadcom_dpa.pdf

Thanks!

Metadata

Metadata

Assignees

Labels

api: vertex-aiIssues related to the Vertex AI API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions