https://app.readytensor.ai/publications/UUYpS17iTiK7
Vision Bot is an AI-powered tool that combines computer vision and natural language processing to dynamically answer questions about images captured from your webcam. It uses the BLIP model from Salesforce to perform visual question answering.
- Captures an image from the webcam.
- Answers questions about the captured image using AI.
- Combines computer vision (OpenCV) with natural language processing (transformers).
- Clone this repository:
git clone https://github.com/your_username/vision-bot.git cd vision-bot
- pip install opencv-python transformers requests pillow torch
Run the script:
python vision_bot.py
Follow the prompts to:
- Capture an image using your webcam.
- Type a question about the captured image.
- Receive an AI-generated answer.
- Question: "What object is in the image?"
- Answer: "A red car."
- Python 3.8 or higher
- Webcam-enabled system
- Libraries: OpenCV, Transformers, Torch, Pillow
- Salesforce BLIP Model
- Hugging Face Transformers