-
Notifications
You must be signed in to change notification settings - Fork 671
Open
Labels
api: vertex-aiIssues related to the Vertex AI API.Issues related to the Vertex AI API.priority: p2Moderately-important priority. Fix may not be included in next release.Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Description
Hi, we're seeing a peculiar issue with the Document Understanding feature of the google-genai Library (specifically with the Vertex AI Gemini API).
- When initializing the client object with:
vertexai=True, we're observing that PDFs (specifically with 16 pages) are being truncated to 15 pages.- This issue isn't present with:
- when the client is NOT initialized with
vertexai=True(Gemini Developer API). - PDFs with different number of pages
- when the client is NOT initialized with
- This issue isn't present with:
Environment details
- Programming language:
python - OS:
MacOS Sequoia 15.6.1 - Language runtime version:
3.10.16 - Package version:
1.52.0, 1.47.0, 1.24.0
Steps to reproduce
- Consider the 3 attached PDFs having 16 pages exactly:
a. Initialize the client by settingvertexai=Trueand ask the model how many pages are present/contents of the last page.
Eg:appsflyer_dpa.pdfb. Initialize the client WITHOUT setting# Init Client # client = genai.Client(vertexai=True, project=GCP_PROJECT, location=GCP_PROJECT_REGION) # Read File # with open("appsflyer_dpa.pdf", "rb") as f: appsflyer_doc_data = f.read() config = types.GenerateContentConfig(system_instruction="How many pages are in this document, and what's in the last page?", temperature=0.0, thinking_config=types.ThinkingConfig( thinking_budget=0 )) parts = [Part.from_bytes(data=appsflyer_doc_data, mime_type='application/pdf')] contents = [Content(parts=parts, role="user")] page_count_response = await client.aio.models.generate_content( model="gemini-2.5-flash", contents=contents, config=config) # issue also occurs with synchronous client page_count_response.candidates[0].content.parts[0].text """ Output: This document contains 15 pages.\n\nThe last page (page 15) is titled "APPENDIX 1 TO THE STANDARD CONTRACTUAL CLAUSES" and contains information about the data exporter, data importer, data subjects, categories of data, special categories of data, and processing operations related to the transfer of personal data. It also includes signature blocks for both the data exporter and data importer. """
vertexai=Trueand repeat.client = genai.Client() # Rest of the snippet is the same """ Output: 'This document contains **16 pages**.\n\nThe last page (page 16) is titled "APPENDIX 2 TO THE STANDARD CONTRACTUAL CLAUSES" and describes the **technical and organizational security measures** implemented by the data importer (AppsFlyer) in accordance with Clauses 4(d) and 5(c) of the Standard Contractual Clauses. It details measures related to:\n* **Access Controls**\n* **Physical and Environmental Security**\n* **Application Security**\n* **Vulnerability monitoring through penetration testing**\n* **Data transfer security**\n* **Networks security**' """
- The issue was replicated with two other 16 page PDFs (see attached
broadcom_dpa.pdf) - Its not present with PDFs having different number of pages. Happy to provide additional PDFs if needed
appsflyer_dpa.pdf
broadcom_dpa.pdf
Thanks!
Metadata
Metadata
Assignees
Labels
api: vertex-aiIssues related to the Vertex AI API.Issues related to the Vertex AI API.priority: p2Moderately-important priority. Fix may not be included in next release.Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.