GitHub - njl/hnscrape: A handy little library to scrape the first and second page of hacker news

hnscrape

Pretty straightforward, this one. It scrapes the stories off of the first and second pages of Hacker News. It has one useful function, get_stories(). This will return a list of dictionaries that represent the stories parsed from Hacker News' notoriously old-school html. Values are

comments: The number of comments on the article, 0 for a job posting
id: The hacker news id on an article, 0 for a job posting
points: The number of points for the article, None for a job posting
rank: Story rank
time: Plain-text human-readable time story was posted; None for a job posting
title: Posting title
url: Posting URL
user: User who posted the article; None for a job posting

Installation

pip install git+https://github.com/njl/hnscrape.git#egg=hnscrape

This should install the requirements: lxml, requests, and cssselect.

Don't Be A Jerk

You don't need to run this very often; cache results, and don't hit the servers more than once every couple of minutes.

LICENSE

Look in the LICENSE file. MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
hnscrape.py		hnscrape.py
hnscrape_test.py		hnscrape_test.py
setup.py		setup.py
test_one.html		test_one.html
test_two.html		test_two.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

hnscrape

Installation

Don't Be A Jerk

LICENSE

About

Uh oh!

Releases

Packages

Languages

License

njl/hnscrape

Folders and files

Latest commit

History

Repository files navigation

hnscrape

Installation

Don't Be A Jerk

LICENSE

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages