Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Images in Single API Call #2

Open
Philomath88 opened this issue Feb 16, 2024 · 0 comments
Open

Multiple Images in Single API Call #2

Philomath88 opened this issue Feb 16, 2024 · 0 comments

Comments

@Philomath88
Copy link

Is your feature request related to a problem? Please describe.
Yes, I am facing a challenge with uploading images (specifically, pages extracted from a PDF document) to a server or API for processing. My goal is to extract text from these images for further analysis or storage. The current system or tool I'm using does not support batch processing of multiple images in a single request, which leads to inefficiencies and increased processing time.

Describe the solution you'd like
I would like a feature that enables the batch uploading and processing of multiple images in a single API request. This feature should allow me to send a list of images (converted pages from a PDF document) and receive a consolidated response that includes the extracted text from each image. Ideally, the solution would handle varying image formats and sizes, ensuring accurate text extraction. Additionally, having the ability to specify certain parameters for text extraction, such as language or extraction mode (e.g., OCR, structured text extraction), would be highly beneficial.

Describe alternatives you've considered
An alternative solution I've considered involves manually splitting the PDF into individual pages and sending separate requests for each page. However, this approach is not scalable and increases the complexity of handling responses and reassembling the text in the correct order. Another alternative is using a third-party service that supports batch processing, but this often comes with higher costs and potential data privacy concerns.

Additional context
In my use case, the ability to efficiently process documents and extract text is crucial for data analysis and entry. The documents I'm dealing with are often scanned pages of text, which necessitates robust OCR capabilities. Enhancing the current system to support batch image processing in a single request would significantly improve our workflow, reduce processing times, and potentially increase accuracy by allowing context to be maintained across pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant