Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support LLava or other local vision models instead of using OpenAI GPT4-vision #674

Open
ai-agents-challenge opened this issue May 15, 2024 · 2 comments · May be fixed by R-ohit-B-isht/OpenAdapt#3
Labels
$ bounty $ Please suggest a price range 🙏 enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@ai-agents-challenge
Copy link

Feature request

Instead of relying solely ton OpenAI's GPT4-vision for image processing, provide a locally hosted alternative, such as LLAVA.

Motivation

OpenAI often gives this error when parsing images: "Your input image may contain content that is not allowed by our safety system."

@ai-agents-challenge ai-agents-challenge added the enhancement New feature or request label May 15, 2024
@abrichr
Copy link
Member

abrichr commented May 15, 2024

@abrichr
Copy link
Member

abrichr commented Jun 11, 2024

Related: https://community.openai.com/t/your-input-image-may-contain-content-that-is-not-allowed-by-our-safety-system-vision-api-response/653372/17

I expect that the AI is denying your request because it doesn’t know if you are trying to solve a CAPTCHA or attempting to use the AI for other purposes it has been trained to prohibit, such as driving cars or tasks beyond the capabilities of computer vision.

https://community.openai.com/t/vision-api-image-not-allowed-by-our-safety-system/679147

One thing which can help is to modify the image slightly to make it look less like a CAPTCHA.
I discovered this as a side-effect of using “set-of-marks” prompting with the vision model.

Mostly it’s “business related” information that OpenAI will refuse to OCR, like people’s names, addresses, emails, phone numbers, company names, etc. So as long as your use case doesn’t involve business info you’ll be fine, …unless/until OpenAI changes their mind and censors your use case as well.

@abrichr abrichr self-assigned this Jun 11, 2024
@abrichr abrichr removed their assignment Jun 13, 2024
@abrichr abrichr added good first issue Good for newcomers help wanted Extra attention is needed $ bounty $ Please suggest a price range 🙏 labels Jun 13, 2024
@R-ohit-B-isht R-ohit-B-isht linked a pull request Jun 18, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
$ bounty $ Please suggest a price range 🙏 enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants