Skip to content
/ collate Public

A simple python script to collate a series of webpages into a single page with internal linking.

Notifications You must be signed in to change notification settings

bootsa/collate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

collate

A simple python script to collate a series of webpages into a single page with internal linking.

A webpage combiner to take a list of webpages and combine them into one.

A learning project so unlikely to be beautiful or the best way of doing it...

Originally using lxml with principles from:

Moved over to BeautifulSoup with:

May implement Scrapy (http://doc.scrapy.org/en/latest/intro/tutorial.html) in the end...

May produce as a Django add on.

Basic breakdown:

  • produce url and selector list
    • embedded dictionary list (href, selectortype, selectorname)
    • editable
    • jquery highlighting selector selection?
  • generate unique tokens from URLs for internal anchors/links, append to list
  • retrieve each content
    • append token to selector ID (remove styling?)
    • add title and anchor link (within selector)
    • append to cumulative store
  • process cumulative store
    • replace links with anchor names
  • generate table of contents
  • build collated page
    • html header
    • table of contents
    • cumulative content
    • html close
  • turn into ebook?
    • process with calibre?

About

A simple python script to collate a series of webpages into a single page with internal linking.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published