Before you begin, ensure you have the following installed:
- Python 3.9 or higher (Python 3.13 recommended)
- Git (for cloning the repository)
- 8GB+ RAM (16GB recommended for local LLM models)
- macOS, Linux, or Windows
# Clone the repository
git clone <repository-url>
cd folder
# Verify you're in the correct directory
ls -la
# You should see: main.py, requirements.txt, setup.sh, src/, etc.# Create a virtual environment
python3 -m venv venv
# Activate the virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate
# Verify activation (you should see (venv) in your prompt)
which python # Should point to venv/bin/python# Make setup script executable
chmod +x setup.sh
# Run the setup script
./setup.shThe setup script will:
- Check Python version
- Create virtual environment
- Install
uv(fast Python package installer) if needed - Install all dependencies from
requirements.txt - Install Playwright browsers (Chromium)
- Create
.envfile from template - Create necessary directories
# Install dependencies
pip install -r requirements.txt
# Install Playwright browsers
playwright install chromium
# Create .env file from template
cp env.example .envEdit the .env file with your preferred settings:
# Open .env in your editor
nano .env # or use your preferred editor# LLM Configuration
LLM_PROVIDER=ollama
LLM_MODEL=llava # or qwen3-vl:235b-cloud for cloud-based
# Ollama Configuration
OLLAMA_HOST=http://localhost:11434
# Browser Configuration
HEADLESS=false # Set to true for headless mode# LLM Configuration
LLM_PROVIDER=gemini
LLM_MODEL=gemini-1.5-pro # or gemini-2.0-flash
# Get free API key from: https://makersuite.google.com/app/apikey
GOOGLE_API_KEY=your_google_api_key_here# Notion credentials (optional)
NOTION_EMAIL=your_email@example.com
NOTION_PASSWORD=your_password_here
# Asana credentials (optional)
ASANA_EMAIL=your_email@example.com
ASANA_PASSWORD=your_password_hereNote: If you don't provide credentials, you can manually log in when the browser opens (if HEADLESS=false). The session will be saved for future runs.
# Install Ollama
# macOS:
brew install ollama
# Linux:
curl -fsSL https://ollama.com/install.sh | sh
# Windows: Download from https://ollama.com/download
# Start Ollama service (keep this terminal open)
ollama serve
# In another terminal, download a vision model
ollama pull llava # ~4.7GB, local model
# OR
ollama pull qwen3-vl:235b-cloud # Cloud-based, best quality- Visit https://makersuite.google.com/app/apikey
- Click "Create API Key"
- Copy your API key
- Add to
.env:LLM_PROVIDER=gemini GOOGLE_API_KEY=your_api_key_here
Add your API keys to .env:
LLM_PROVIDER=openai
OPENAI_API_KEY=your_openai_key_here
# OR
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=your_anthropic_key_here# Make sure you're in the project directory and venv is activated
cd /path/to/folder
source venv/bin/activate # if not already activated
# Verify Python version
python --version # Should be 3.9+
# Verify dependencies
python -c "import playwright; print('Playwright OK')"
python -c "import pydantic; print('Pydantic OK')"
# Verify Playwright browser
playwright --version# Make sure virtual environment is activated
source venv/bin/activate
# Run a simple task
python main.py --app notion --task "Create a new database"
# OR for Asana
python main.py --app asana --task "Create a new project"You can drive the agent entirely from the CLI:
# Generic command structure
python main.py --app <app_name> --task "<human-readable instruction>"
# Examples
python main.py --app notion --task "Create a kanban board called Sprint Plan"
python main.py --app asana --task "Add a new task named 'Follow up with design team'"Command-line flags:
--app notion|asana– selects which app configuration to load (URL, selectors, documentation context).--task "…"– free-form text that becomes the mission statement for the LLM/browser agent.--all– run every sample task defined insidemain.py(handy for sanity checks).
Interactive mode:
If you omit both --app and --task, the script will prompt you for inputs at runtime:
python main.py
# Prompts:
# Enter app name:
# Enter task description:Tips:
- Keep tasks descriptive (e.g., “Create a new project named Alpha and add two sections”).
- To reuse the same browser session, leave the terminal running; authentication state persists in
.auth/. - Add quotes around your task string so the shell doesn’t split it on spaces.
Solution:
- Install Python 3.9+ from https://www.python.org/downloads/
- On macOS:
brew install python3 - On Linux:
sudo apt-get install python3 python3-pip python3-venv
Solution:
# Make sure venv is activated
source venv/bin/activate
# Install Playwright
pip install playwright
# Install browsers
playwright install chromiumSolution:
# Make sure Ollama is running
ollama serve
# In another terminal, verify it's running
curl http://localhost:11434/api/tags
# If it's not running, start it:
ollama serveSolution:
# Pull the model
ollama pull llava
# List installed models
ollama list
# Verify model is available
ollama show llavaSolution:
# Make sure venv is activated
source venv/bin/activate
# Reinstall dependencies
pip install -r requirements.txtSolution:
# Reinstall Playwright browsers
playwright install chromium --force
# Check system dependencies (Linux)
playwright install-depsSolution:
- Check your credentials in
.env - Try manual login: Set
HEADLESS=falseand log in manually when browser opens - Check if the app requires 2FA (may need manual intervention)
folder/
├── main.py # Entry point
├── requirements.txt # Python dependencies
├── setup.sh # Setup script
├── env.example # Environment variables template
├── .env # Your environment variables (create this)
├── src/ # Source code
│ ├── agent/ # AI navigation logic
│ ├── browser/ # Browser automation
│ ├── capture/ # Screenshot and state capture
│ ├── apps/ # App configurations (Notion, Asana)
│ └── utils/ # Utilities (LLM client, logger)
├── datasets/ # Output datasets (created automatically)
├── screenshots/ # Temporary screenshots
├── .auth/ # Saved authentication states
└── venv/ # Virtual environment (don't commit)
-
Try a simple task:
python main.py --app notion --task "Create a new page" -
Check the output:
ls datasets/notion/create_a_new_page/ # You should see: screenshots/, metadata.json, workflow.json -
Run all test tasks:
python main.py --all
-
Explore configuration options:
- See
env.examplefor all available settings - Adjust performance settings (FAST_MODE, SKIP_AI_VALIDATION, etc.)
- Try different LLM providers
- See
- Check logs in the console output
- Review captured screenshots in
datasets/to see what happened - Enable debug logging: Set
LOG_LEVEL=DEBUGin.env - Check the other README files for detailed documentation
# Activate virtual environment
source venv/bin/activate
# Run a task
python main.py --app notion --task "Your task here"
# Run all tasks
python main.py --all
# Check Python packages
pip list
# Update dependencies
pip install -r requirements.txt --upgrade
# Deactivate virtual environment
deactivate- RAM: 8GB
- Disk: 5GB free space
- CPU: Modern multi-core processor
- RAM: 16GB+ (for local LLM models)
- Disk: 10GB+ free space
- CPU: M1/M2 Mac or modern GPU (for faster LLM inference)
- Works great with M1/M2/M3 chips (fast LLM inference)
- Use Homebrew for package management
- Ollama runs natively on Apple Silicon
- May need to install system dependencies:
playwright install-deps - GPU acceleration available for Ollama with NVIDIA GPUs
- Works well on Ubuntu, Debian, Fedora
- Use PowerShell or Command Prompt
- May need to install Visual C++ Redistributable
- Ollama works on Windows 10/11
You're all set! Start with a simple task and explore the system. For detailed workflow information, see WORKFLOW_FLOWCHART.md. For information about models and tools, see TECHNICAL_STACK.md.