Skip to content

Scrapper with CSS Selectors and XPath. #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sjehuda opened this issue Apr 15, 2025 · 0 comments
Open

Scrapper with CSS Selectors and XPath. #18

sjehuda opened this issue Apr 15, 2025 · 0 comments

Comments

@sjehuda
Copy link

sjehuda commented Apr 15, 2025

A dialog which would allow to fetch title, date, content, and link.

It would be possible to create custom rules and also import and export sets of rules.

CSS Selectors and XPath.

Reference: https://github.com/sjehuda/html2atom

I utilize this script html2atom.py.gz (not published).

Example usage with Liferea and Snownews (i.e. stdout):

python .local/share/liferea/scripts/html2atom.py --url 'http://www.slackware.com/' --title 'The Slackware Linux Project' --description 'News' --subtitle '' --language 'en' --root '//center[not(parent::body)]/table' --entry-title './tr[1]//b/text()' --entry-link '@href' --entry-description './tr[2]/td[1]//text()' --entry-date 'normalize-space(./tr[2]/td[2]//b/text())' --date-format '%Y-%m-%d'
Node    : _______________________
Title   : _______________________
Date    : _______________________
Summary : _______________________
Content : _______________________
Language: _______________________
Type    : [                     ] # News, Updates, Catalogue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant