Terrier AI - Your Web Scraping Companion 🐕

Terrier AI helps you extract structured data from webpages.

Key Features ✨

📄 HTML-JSON Extraction

Parsing of browser HTML to look for structured content
Uses Gemini's long context window under the hood to process HTML

(New!) Chrome Extension - Terrier Pup

Coming Soon

Getting Started 🚀

Prerequisites

Node.js v16+
Python 3.13+ (Important - previous versions will throw package version/OS related issues)
Chrome/Firefox browsers

Installation

Client Setup:

cd client
npm install

API/Server Setup:

cd server
python3 -m venv venv

on MacOS run:

source venv/bin/activate

or with Windows:

cd venv/Scripts && activate && cd ../../

Install python modules:

pip install -r requirements.txt

Environment Variables

To run this project, you will need to add the following environment variables to your .env file(s) (depending on your usage)

GEMINI_API_KEY=your_gemini_api_key

VITE_BACKEND_URL=your_backend_url_or_localhost

Running the Application

Start both services simultaneously in separate terminals (We recommend using Docker to run the server to avoid versioning issues):

Frontend:

cd client && npm run dev

Backend:

docker build -t <image_name> .
docker run -p 5000:5000 <image_name>

Tech Stack

Client: React, Redux, TailwindCSS

API: FastAPI, Playwright, Selenium Wire, langchain, browser-use

AI Components: Custom agent implementations, smolagents class, pocketflow framework

Contributing

Contributions are welcome!

See contributing.md for ways to get started.

Please adhere to this project's code of conduct.

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
.github/workflows		.github/workflows
client		client
server		server
.dockerignore		.dockerignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
genezio.yaml		genezio.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Terrier AI - Your Web Scraping Companion 🐕

Key Features ✨

📄 HTML-JSON Extraction

(New!) Chrome Extension - Terrier Pup

Getting Started 🚀

Prerequisites

Installation

Environment Variables

Running the Application

Tech Stack

Contributing

About

Releases 1

Packages

Contributors 2

Languages

License

Pauullamm/scrape-playground

Folders and files

Latest commit

History

Repository files navigation

Terrier AI - Your Web Scraping Companion 🐕

Key Features ✨

📄 HTML-JSON Extraction

(New!) Chrome Extension - Terrier Pup

Getting Started 🚀

Prerequisites

Installation

Environment Variables

Running the Application

Tech Stack

Contributing

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages