Some web spiders including: Reddit and experience project.
There are two versions of Reddit spiders using Scrapy and Reddit API.
To run these spiders, some folders need to be created and the path name should be modified.
There are three scripts in the folder of redditapi.
(1) crawl the posts (including titles and text bodies) of subreddit.
(2) crawl the posts and comments.
(3) get the sequences of comments.
Scrapy is used to crawl the posts in this script and comments in this script. XPATH and CSS are used to match specific information. Run the code:
python scrapy_reddit.py
Scrapy is used to crawl the posts from Experience Project. XPATH and CSS are used to match specific information, implemented in this script. Run the code:
python scrapy_ep.py
python 3.6
scrapy 1.3.3
requests 2.18.4