SouperScraper

A simple web scraper base that combines BeautifulSoup and Selenium to scrape dynamic websites.

Setup

Install with pip

pip install souperscraper

Download the appropriate ChromeDriver for your Chrome version using getchromedriver.py (command below) or manually from the ChromeDriver website.

To find your Chrome version, go to chrome://settings/help in your browser.

getchromedriver

Create a new SouperScaper object using the path to your ChromeDriver

from souperscraper import SouperScraper

scraper = SouperScraper('/path/to/your/chromedriver')

Start scraping using BeautifulSoup and/or Selenium methods

scraper.goto('https://github.com/LucasFaudman')

# Use BeautifulSoup to search for and extract content
# by accessing the scraper's 'soup' attribute
# or with the 'soup_find' / 'soup_find_all' methods
repos = scraper.soup.find_all('span', class_='repo')
for repo in repos:
    repo_name = repo.text
    print(repo_name)

# Use Selenium to interact with the page such as clicking buttons
# or filling out forms by accessing the scraper's
# find_element_by_* / find_elements_by_* / wait_for_* methods
repos_tab = scraper.find_element_by_css_selector("a[data-tab-item='repositories']")
repos_tab.click()

search_input = scraper.wait_for_visibility_of_element_located_by_id('your-repos-filter')
search_input.send_keys('souper-scraper')
search_input.submit()

BeautifulSoup Reference

Quick Start
Types of Objects
The BeautifulSoup object
Navigating the HTML tree
Searching for HTML Elements
Modifying the tree

Selenium Reference

Quick Start
Navigating the Web
Locating HTML Elements
Interacting with HTML elements on the page
Filling in Forms
Waiting (for page to load, element to be visible, etc)
Full Webdriver API Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!