Skip to content

dakshaladia/multiagent-browser-automation

Repository files navigation

Setup Guide - Getting Started

Prerequisites

Before you begin, ensure you have the following installed:

  • Python 3.9 or higher (Python 3.13 recommended)
  • Git (for cloning the repository)
  • 8GB+ RAM (16GB recommended for local LLM models)
  • macOS, Linux, or Windows

Step 1: Clone the Repository

# Clone the repository
git clone <repository-url>
cd folder

# Verify you're in the correct directory
ls -la
# You should see: main.py, requirements.txt, setup.sh, src/, etc.

Step 2: Set Up Python Virtual Environment

# Create a virtual environment
python3 -m venv venv

# Activate the virtual environment
# On macOS/Linux:
source venv/bin/activate

# On Windows:
# venv\Scripts\activate

# Verify activation (you should see (venv) in your prompt)
which python  # Should point to venv/bin/python

Step 3: Install Dependencies

Option A: Using the Setup Script (Recommended)

# Make setup script executable
chmod +x setup.sh

# Run the setup script
./setup.sh

The setup script will:

  • Check Python version
  • Create virtual environment
  • Install uv (fast Python package installer) if needed
  • Install all dependencies from requirements.txt
  • Install Playwright browsers (Chromium)
  • Create .env file from template
  • Create necessary directories

Option B: Manual Installation

# Install dependencies
pip install -r requirements.txt

# Install Playwright browsers
playwright install chromium

# Create .env file from template
cp env.example .env

Step 4: Configure Environment Variables

Edit the .env file with your preferred settings:

# Open .env in your editor
nano .env  # or use your preferred editor

Minimum Configuration (Free Option - Ollama)

# LLM Configuration
LLM_PROVIDER=ollama
LLM_MODEL=llava  # or qwen3-vl:235b-cloud for cloud-based

# Ollama Configuration
OLLAMA_HOST=http://localhost:11434

# Browser Configuration
HEADLESS=false  # Set to true for headless mode

Alternative: Google Gemini (Free Tier)

# LLM Configuration
LLM_PROVIDER=gemini
LLM_MODEL=gemini-1.5-pro  # or gemini-2.0-flash

# Get free API key from: https://makersuite.google.com/app/apikey
GOOGLE_API_KEY=your_google_api_key_here

Optional: Authentication (for automatic login)

# Notion credentials (optional)
NOTION_EMAIL=your_email@example.com
NOTION_PASSWORD=your_password_here

# Asana credentials (optional)
ASANA_EMAIL=your_email@example.com
ASANA_PASSWORD=your_password_here

Note: If you don't provide credentials, you can manually log in when the browser opens (if HEADLESS=false). The session will be saved for future runs.

Step 5: Set Up LLM Provider

Option 1: Ollama (Free, Local) - Recommended for Beginners

# Install Ollama
# macOS:
brew install ollama

# Linux:
curl -fsSL https://ollama.com/install.sh | sh

# Windows: Download from https://ollama.com/download

# Start Ollama service (keep this terminal open)
ollama serve

# In another terminal, download a vision model
ollama pull llava  # ~4.7GB, local model
# OR
ollama pull qwen3-vl:235b-cloud  # Cloud-based, best quality

Option 2: Google Gemini (Free Tier)

  1. Visit https://makersuite.google.com/app/apikey
  2. Click "Create API Key"
  3. Copy your API key
  4. Add to .env:
    LLM_PROVIDER=gemini
    GOOGLE_API_KEY=your_api_key_here

Option 3: OpenAI or Anthropic (Paid)

Add your API keys to .env:

LLM_PROVIDER=openai
OPENAI_API_KEY=your_openai_key_here

# OR

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=your_anthropic_key_here

Step 6: Verify Installation

# Make sure you're in the project directory and venv is activated
cd /path/to/folder
source venv/bin/activate  # if not already activated

# Verify Python version
python --version  # Should be 3.9+

# Verify dependencies
python -c "import playwright; print('Playwright OK')"
python -c "import pydantic; print('Pydantic OK')"

# Verify Playwright browser
playwright --version

Step 7: Run Your First Task

# Make sure virtual environment is activated
source venv/bin/activate

# Run a simple task
python main.py --app notion --task "Create a new database"

# OR for Asana
python main.py --app asana --task "Create a new project"

Command Reference

You can drive the agent entirely from the CLI:

# Generic command structure
python main.py --app <app_name> --task "<human-readable instruction>"

# Examples
python main.py --app notion --task "Create a kanban board called Sprint Plan"
python main.py --app asana --task "Add a new task named 'Follow up with design team'"

Command-line flags:

  • --app notion|asana – selects which app configuration to load (URL, selectors, documentation context).
  • --task "…" – free-form text that becomes the mission statement for the LLM/browser agent.
  • --all – run every sample task defined inside main.py (handy for sanity checks).

Interactive mode:

If you omit both --app and --task, the script will prompt you for inputs at runtime:

python main.py
# Prompts:
# Enter app name:
# Enter task description:

Tips:

  • Keep tasks descriptive (e.g., “Create a new project named Alpha and add two sections”).
  • To reuse the same browser session, leave the terminal running; authentication state persists in .auth/.
  • Add quotes around your task string so the shell doesn’t split it on spaces.

Troubleshooting

Issue: "Python not found" or "python3 not found"

Solution:

Issue: "playwright: command not found"

Solution:

# Make sure venv is activated
source venv/bin/activate

# Install Playwright
pip install playwright

# Install browsers
playwright install chromium

Issue: "Ollama connection refused"

Solution:

# Make sure Ollama is running
ollama serve

# In another terminal, verify it's running
curl http://localhost:11434/api/tags

# If it's not running, start it:
ollama serve

Issue: "Model not found" (Ollama)

Solution:

# Pull the model
ollama pull llava

# List installed models
ollama list

# Verify model is available
ollama show llava

Issue: "Import errors" or "Module not found"

Solution:

# Make sure venv is activated
source venv/bin/activate

# Reinstall dependencies
pip install -r requirements.txt

Issue: "Browser fails to start"

Solution:

# Reinstall Playwright browsers
playwright install chromium --force

# Check system dependencies (Linux)
playwright install-deps

Issue: "Authentication fails"

Solution:

  • Check your credentials in .env
  • Try manual login: Set HEADLESS=false and log in manually when browser opens
  • Check if the app requires 2FA (may need manual intervention)

Project Structure

folder/
├── main.py                 # Entry point
├── requirements.txt        # Python dependencies
├── setup.sh               # Setup script
├── env.example            # Environment variables template
├── .env                   # Your environment variables (create this)
├── src/                   # Source code
│   ├── agent/            # AI navigation logic
│   ├── browser/          # Browser automation
│   ├── capture/          # Screenshot and state capture
│   ├── apps/             # App configurations (Notion, Asana)
│   └── utils/            # Utilities (LLM client, logger)
├── datasets/             # Output datasets (created automatically)
├── screenshots/          # Temporary screenshots
├── .auth/                # Saved authentication states
└── venv/                 # Virtual environment (don't commit)

Next Steps

  1. Try a simple task:

    python main.py --app notion --task "Create a new page"
  2. Check the output:

    ls datasets/notion/create_a_new_page/
    # You should see: screenshots/, metadata.json, workflow.json
  3. Run all test tasks:

    python main.py --all
  4. Explore configuration options:

    • See env.example for all available settings
    • Adjust performance settings (FAST_MODE, SKIP_AI_VALIDATION, etc.)
    • Try different LLM providers

Getting Help

  • Check logs in the console output
  • Review captured screenshots in datasets/ to see what happened
  • Enable debug logging: Set LOG_LEVEL=DEBUG in .env
  • Check the other README files for detailed documentation

Common Commands

# Activate virtual environment
source venv/bin/activate

# Run a task
python main.py --app notion --task "Your task here"

# Run all tasks
python main.py --all

# Check Python packages
pip list

# Update dependencies
pip install -r requirements.txt --upgrade

# Deactivate virtual environment
deactivate

System Requirements

Minimum:

  • RAM: 8GB
  • Disk: 5GB free space
  • CPU: Modern multi-core processor

Recommended:

  • RAM: 16GB+ (for local LLM models)
  • Disk: 10GB+ free space
  • CPU: M1/M2 Mac or modern GPU (for faster LLM inference)

Platform-Specific Notes

macOS

  • Works great with M1/M2/M3 chips (fast LLM inference)
  • Use Homebrew for package management
  • Ollama runs natively on Apple Silicon

Linux

  • May need to install system dependencies: playwright install-deps
  • GPU acceleration available for Ollama with NVIDIA GPUs
  • Works well on Ubuntu, Debian, Fedora

Windows

  • Use PowerShell or Command Prompt
  • May need to install Visual C++ Redistributable
  • Ollama works on Windows 10/11

You're all set! Start with a simple task and explore the system. For detailed workflow information, see WORKFLOW_FLOWCHART.md. For information about models and tools, see TECHNICAL_STACK.md.

About

multi-agent browser automation task for notion and asana

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors