GitHub - octalpixel/ai-in-the-browser-devfest-sl-2024

Purpose

This is a web application that combines OCR (Optical Character Recognition) and AI to extract structured data from images of invoices/receipts.

Running the Project

Clone Repository
Install Dependencies
```
npm install
```
Start Development Server
```
npm run dev
```
Access Application
- Open your browser and navigate to http://localhost:3000
- Ensure WebGPU is enabled (see WebGPU setup instructions below)

Enabling WebGPU in Chrome

WebGPU is disabled when the user has turned off "Use graphics acceleration when available" in chrome://settings/system . Check to see if this setting is turned off and turn it back on. WebGPU is not supported on this platform yet. You can enable the chrome://flags/#enable-unsafe-webgpu flag and restart Chrome to enable it.

Open Browser
- Launch Google Chrome
Access Flags Page
- Type chrome://flags in address bar
- Press Enter
Enable WebGPU Flag
- Search for 'enable-unsafe-webgpu'
- Find #enable-unsafe-webgpu flag
- Set dropdown menu to "Enabled"
Restart Browser
- Click Relaunch button to apply changes

Key Components

1. Technologies Used

@mlc-ai/web-llm: For running AI models in the browser
Tesseract.js: For OCR (converting images to text)
Ace Editor: For JSON schema editing
Highlight.js: For syntax highlighting

2. Core Features

Image upload and preview
OCR text extraction
AI-powered JSON data extraction based on a schema
Model selection (SmolLM2 variants)
Real-time performance statistics

Setup Instructions

Dependencies Installation

npm install @mlc-ai/web-llm tesseract.js ace-builds highlight.js

HTML Requirements Your HTML needs these elements:

<select id="model-selection"></select>
<input type="file" id="image-upload">
<img id="image-preview">
<div id="image-preview-container">
<textarea id="extracted-text"></textarea>
<div id="output"></div>
<p id="stats"></p>
<div id="schema"></div>
<button id="generate">Generate</button>

How It Works

Initialization Flow
- Loads when DOM is ready
- Initializes Tesseract worker for OCR
- Sets up model selection dropdown
- Configures JSON schema editor
- Sets up event listeners

Processing Pipeline

graph LR
A[Image Upload] --> B[OCR Processing]
B --> C[Text Extraction]
C --> D[AI Processing]
D --> E[JSON Output]

Key Features
- Model Selection: Filters and displays available SmolLM2 models
- Image Processing: Handles image upload, preview, and OCR
- Schema Definition: Configurable JSON schema for structured data extraction
- AI Processing: Uses the selected model to convert OCR text into structured JSON
- Performance Monitoring: Tracks and displays processing speeds

Important Concepts

OCR Integration
- Uses Tesseract.js worker for image-to-text conversion
- Runs asynchronously to prevent UI blocking
AI Model Management
- Lazy loading of AI models (only when needed)
- Model switching capability
- Streaming response support
Data Extraction
- Schema-based extraction ensures consistent output format
- Uses system prompts to guide AI behavior
- Supports structured JSON output
Resource Management
- Proper cleanup of Tesseract worker on page unload
- Engine reset on model change
- Error handling throughout the pipeline

Usage Tips

Start by selecting an appropriate model from the dropdown
Upload an invoice/receipt image
Wait for OCR processing to complete
Modify the JSON schema if needed
Click "Generate" to extract structured data
Monitor performance metrics in the stats section

This setup provides a complete pipeline for converting document images into structured data using browser-based AI and OCR technologies.

Credits

This project was adapted from github-issue-generator-webgpu by Vaibhavs10. Thank you for the inspiration and foundation!

Other resources:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
public		public
src		src
.gitignore		.gitignore
README.md		README.md
index.html		index.html
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Purpose

Running the Project

Enabling WebGPU in Chrome

Key Components

1. Technologies Used

2. Core Features

Setup Instructions

How It Works

Important Concepts

Usage Tips

Credits

About

Releases

Packages

Languages

octalpixel/ai-in-the-browser-devfest-sl-2024

Folders and files

Latest commit

History

Repository files navigation

Purpose

Running the Project

Enabling WebGPU in Chrome

Key Components

1. Technologies Used

2. Core Features

Setup Instructions

How It Works

Important Concepts

Usage Tips

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages