Skip to content

mnmldave/scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraper

A Google Chrome extension for getting data out of web pages and into spreadsheets.

Usage

Highlight a part of the page that is similar to what you want to scrape. Right-click and select the "Scrape selected..." item. The scraper window will appear, showing you the initial results. You can export the table to by pressing the "Export to Google Docs..." button or use the left-hand pane to further refine or customize your scraping.

The "Selector" section lets you change which page elements are scraped. You can specify the query as either a jQuery selector, or in XPath.

You may also customize the columns of the table in the "Columns" section. These must be specified in XPath. You can specify names for columns if you would like.

Selecting the "Exclude empty results" filter will prevent any matches that contain no column values from appearing in the table.

After making any customizations, you must press the "Scrape" button to update the table of results.

Download

Download the extension from http://chrome.google.com/extensions/detail/mbigbapnjcgaffohmbkdlecaccepngjd.

Get the sources from https://github.com/mnmldave/scraper.

Building

You don't need to 'build' this extension per se. To test it out, you first need to navigate to chrome://extensions from Google Chrome then expand "Developer Mode". Click the "Load unpacked extension..." button and point it to the src directory.

Learn more about plugin development from the Google Chrome Extensions page.

A Rakefile is included for compiling the Google Chrome extension into a zip file. It also does javascript and css minification.

License

Scraper is open-sourced under a BSD license which you can find in LICENSE.txt.

Credits

Many of the icons used in this extension are from the generous Yusuke Kamiyamane.


Copyright (c) 2010 David Heaton ([email protected])