Setup Guide - Getting Started

Prerequisites

Before you begin, ensure you have the following installed:

Python 3.9 or higher (Python 3.13 recommended)
Git (for cloning the repository)
8GB+ RAM (16GB recommended for local LLM models)
macOS, Linux, or Windows

Step 1: Clone the Repository

# Clone the repository
git clone <repository-url>
cd folder

# Verify you're in the correct directory
ls -la
# You should see: main.py, requirements.txt, setup.sh, src/, etc.

Step 2: Set Up Python Virtual Environment

# Create a virtual environment
python3 -m venv venv

# Activate the virtual environment
# On macOS/Linux:
source venv/bin/activate

# On Windows:
# venv\Scripts\activate

# Verify activation (you should see (venv) in your prompt)
which python  # Should point to venv/bin/python

Step 3: Install Dependencies

Option A: Using the Setup Script (Recommended)

# Make setup script executable
chmod +x setup.sh

# Run the setup script
./setup.sh

The setup script will:

Check Python version
Create virtual environment
Install uv (fast Python package installer) if needed
Install all dependencies from requirements.txt
Install Playwright browsers (Chromium)
Create .env file from template
Create necessary directories

Option B: Manual Installation

# Install dependencies
pip install -r requirements.txt

# Install Playwright browsers
playwright install chromium

# Create .env file from template
cp env.example .env

Step 4: Configure Environment Variables

Edit the .env file with your preferred settings:

# Open .env in your editor
nano .env  # or use your preferred editor

Minimum Configuration (Free Option - Ollama)

# LLM Configuration
LLM_PROVIDER=ollama
LLM_MODEL=llava  # or qwen3-vl:235b-cloud for cloud-based

# Ollama Configuration
OLLAMA_HOST=http://localhost:11434

# Browser Configuration
HEADLESS=false  # Set to true for headless mode

Alternative: Google Gemini (Free Tier)

# LLM Configuration
LLM_PROVIDER=gemini
LLM_MODEL=gemini-1.5-pro  # or gemini-2.0-flash

# Get free API key from: https://makersuite.google.com/app/apikey
GOOGLE_API_KEY=your_google_api_key_here

Optional: Authentication (for automatic login)

# Notion credentials (optional)
NOTION_EMAIL=your_email@example.com
NOTION_PASSWORD=your_password_here

# Asana credentials (optional)
ASANA_EMAIL=your_email@example.com
ASANA_PASSWORD=your_password_here

Note: If you don't provide credentials, you can manually log in when the browser opens (if HEADLESS=false). The session will be saved for future runs.

Step 5: Set Up LLM Provider

Option 1: Ollama (Free, Local) - Recommended for Beginners

# Install Ollama
# macOS:
brew install ollama

# Linux:
curl -fsSL https://ollama.com/install.sh | sh

# Windows: Download from https://ollama.com/download

# Start Ollama service (keep this terminal open)
ollama serve

# In another terminal, download a vision model
ollama pull llava  # ~4.7GB, local model
# OR
ollama pull qwen3-vl:235b-cloud  # Cloud-based, best quality

Option 2: Google Gemini (Free Tier)

Visit https://makersuite.google.com/app/apikey
Click "Create API Key"
Copy your API key

Add to .env:

LLM_PROVIDER=gemini
GOOGLE_API_KEY=your_api_key_here

Option 3: OpenAI or Anthropic (Paid)

Add your API keys to .env:

LLM_PROVIDER=openai
OPENAI_API_KEY=your_openai_key_here

# OR

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=your_anthropic_key_here

Step 6: Verify Installation

# Make sure you're in the project directory and venv is activated
cd /path/to/folder
source venv/bin/activate  # if not already activated

# Verify Python version
python --version  # Should be 3.9+

# Verify dependencies
python -c "import playwright; print('Playwright OK')"
python -c "import pydantic; print('Pydantic OK')"

# Verify Playwright browser
playwright --version

Step 7: Run Your First Task

# Make sure virtual environment is activated
source venv/bin/activate

# Run a simple task
python main.py --app notion --task "Create a new database"

# OR for Asana
python main.py --app asana --task "Create a new project"

Command Reference

You can drive the agent entirely from the CLI:

# Generic command structure
python main.py --app <app_name> --task "<human-readable instruction>"

# Examples
python main.py --app notion --task "Create a kanban board called Sprint Plan"
python main.py --app asana --task "Add a new task named 'Follow up with design team'"

Command-line flags:

--app notion|asana – selects which app configuration to load (URL, selectors, documentation context).
--task "…" – free-form text that becomes the mission statement for the LLM/browser agent.
--all – run every sample task defined inside main.py (handy for sanity checks).

Interactive mode:

If you omit both --app and --task, the script will prompt you for inputs at runtime:

python main.py
# Prompts:
# Enter app name:
# Enter task description:

Tips:

Keep tasks descriptive (e.g., “Create a new project named Alpha and add two sections”).
To reuse the same browser session, leave the terminal running; authentication state persists in .auth/.
Add quotes around your task string so the shell doesn’t split it on spaces.

Troubleshooting

Issue: "Python not found" or "python3 not found"

Solution:

Install Python 3.9+ from https://www.python.org/downloads/
On macOS: brew install python3
On Linux: sudo apt-get install python3 python3-pip python3-venv

Issue: "playwright: command not found"

Solution:

# Make sure venv is activated
source venv/bin/activate

# Install Playwright
pip install playwright

# Install browsers
playwright install chromium

Issue: "Ollama connection refused"

Solution:

# Make sure Ollama is running
ollama serve

# In another terminal, verify it's running
curl http://localhost:11434/api/tags

# If it's not running, start it:
ollama serve

Issue: "Model not found" (Ollama)

Solution:

# Pull the model
ollama pull llava

# List installed models
ollama list

# Verify model is available
ollama show llava

Issue: "Import errors" or "Module not found"

Solution:

# Make sure venv is activated
source venv/bin/activate

# Reinstall dependencies
pip install -r requirements.txt

Issue: "Browser fails to start"

Solution:

# Reinstall Playwright browsers
playwright install chromium --force

# Check system dependencies (Linux)
playwright install-deps

Issue: "Authentication fails"

Solution:

Check your credentials in .env
Try manual login: Set HEADLESS=false and log in manually when browser opens
Check if the app requires 2FA (may need manual intervention)

Project Structure

folder/
├── main.py                 # Entry point
├── requirements.txt        # Python dependencies
├── setup.sh               # Setup script
├── env.example            # Environment variables template
├── .env                   # Your environment variables (create this)
├── src/                   # Source code
│   ├── agent/            # AI navigation logic
│   ├── browser/          # Browser automation
│   ├── capture/          # Screenshot and state capture
│   ├── apps/             # App configurations (Notion, Asana)
│   └── utils/            # Utilities (LLM client, logger)
├── datasets/             # Output datasets (created automatically)
├── screenshots/          # Temporary screenshots
├── .auth/                # Saved authentication states
└── venv/                 # Virtual environment (don't commit)

Next Steps

Try a simple task:

python main.py --app notion --task "Create a new page"

Check the output:

ls datasets/notion/create_a_new_page/
# You should see: screenshots/, metadata.json, workflow.json

Run all test tasks:
```
python main.py --all
```
Explore configuration options:
- See env.example for all available settings
- Adjust performance settings (FAST_MODE, SKIP_AI_VALIDATION, etc.)
- Try different LLM providers

Getting Help

Check logs in the console output
Review captured screenshots in datasets/ to see what happened
Enable debug logging: Set LOG_LEVEL=DEBUG in .env
Check the other README files for detailed documentation

Common Commands

# Activate virtual environment
source venv/bin/activate

# Run a task
python main.py --app notion --task "Your task here"

# Run all tasks
python main.py --all

# Check Python packages
pip list

# Update dependencies
pip install -r requirements.txt --upgrade

# Deactivate virtual environment
deactivate

System Requirements

Minimum:

RAM: 8GB
Disk: 5GB free space
CPU: Modern multi-core processor

Platform-Specific Notes

macOS

Works great with M1/M2/M3 chips (fast LLM inference)
Use Homebrew for package management
Ollama runs natively on Apple Silicon

Linux

May need to install system dependencies: playwright install-deps
GPU acceleration available for Ollama with NVIDIA GPUs
Works well on Ubuntu, Debian, Fedora

Windows

Use PowerShell or Command Prompt
May need to install Visual C++ Redistributable
Ollama works on Windows 10/11

You're all set! Start with a simple task and explore the system. For detailed workflow information, see WORKFLOW_FLOWCHART.md. For information about models and tools, see TECHNICAL_STACK.md.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
TECHNICAL_STACK.md		TECHNICAL_STACK.md
WORKFLOW_FLOWCHART.md		WORKFLOW_FLOWCHART.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.sh		setup.sh

dakshaladia/multiagent-browser-automation

Folders and files

Latest commit

History

Repository files navigation

Setup Guide - Getting Started

Prerequisites

Step 1: Clone the Repository

Step 2: Set Up Python Virtual Environment

Step 3: Install Dependencies

Option A: Using the Setup Script (Recommended)

Option B: Manual Installation

Step 4: Configure Environment Variables

Minimum Configuration (Free Option - Ollama)

Alternative: Google Gemini (Free Tier)

Optional: Authentication (for automatic login)

Step 5: Set Up LLM Provider

Option 1: Ollama (Free, Local) - Recommended for Beginners

Option 2: Google Gemini (Free Tier)

Option 3: OpenAI or Anthropic (Paid)

Step 6: Verify Installation

Step 7: Run Your First Task

Command Reference

Troubleshooting

Issue: "Python not found" or "python3 not found"

Issue: "playwright: command not found"

Issue: "Ollama connection refused"

Issue: "Model not found" (Ollama)

Issue: "Import errors" or "Module not found"

Issue: "Browser fails to start"

Issue: "Authentication fails"

Project Structure

Next Steps

Getting Help

Common Commands

System Requirements

Minimum:

Recommended:

Platform-Specific Notes

macOS

Linux

Windows

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages