Skip to content

Generating spiders dynamically to crawl and check those free proxy ip on the internet with scrapy.

Notifications You must be signed in to change notification settings

leeyis/ip_proxy_pool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ip_proxy_pool

A dynamic configurable proxy IP crawler based on Scrapy. It makes it easy to crawl hundreds of thousands of proxy IPs in a short time. By maintaining a spider code and a few groups of website data extraction rules you can easily grab lots of proxy IPs of these sites. See the blogs for more detail.

Main Requirements

For more details see requirements.txt

  • Scrapy 1.2.1
  • MySQL-python 1.2.5
  • Redis 2.10.5
  • SQLAlchemy 1.1.4

Install in development

CentOS

$ sudo yum install python-devel
$ sudo yum install gcc libffi-devel openssl-devel
$ pip install scrapy
$ pip install SQLAlchemy
$ pip install redis

About

Generating spiders dynamically to crawl and check those free proxy ip on the internet with scrapy.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages