Skip to content

baberparweez/scrape-press

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WordPress Content Scraper

Collects posts/pages from a CSV list of Wordpress URLs, spin's them, then prepares them in a JSON file.

Requirements

This set of scripts is specifically designed to run on:

Setup

  1. Install Python for Windows
  2. From the project root, run python setup.py
  3. Add appropriate values to the .env file

Running the "application"

This is done in 3 parts...

1. Download the articles
  1. Compile a list of all URL articles or pages you want to pull content from
  2. Add CSV file with list of all URLs to the ./sources folder
2. Spin and compile the articles
  1. Using terminal, bash, PowerShell or similar, navigate to ./scrapers
  2. Run python scrape-press.py
  3. Wait for the script to finish compiling the JSON file to the ./data folder
2. Import to your blog
  1. Install a processor / importer on your blogging platform (if you're using WordPress, WP All Import is brilliant)
  2. Upload the ./data/____.json file to the importer
  3. Map the appropriate fields
  4. Run your importer

About

Python script for scraping WordPress data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages