Downloader for some Wikipedia articles and related information
- Download and install node.js from http://nodejs.org/download/
- Checkout this repository:
git clone https://github.com/geoont/wikie-pooh.git
cd wikie-pooh
- Install dependencies (this command should be run from the top project directory that contains file
package.json
):
npm install
- Optional: run tests for some of the dependencies
cd node_modules/nodemw/0.3.14/package
npm install vows
npm test
if any dependencies are missing simply install them with npm install.
- Initialize new experiment:
node ../experiment_init.js 0.cat npp.sqlite3
where0.cat
is a list of initial categories with one category per line and npp.sqlite3 is a new database - Update database to current version:
node ../experiment_fix.js en npp.sqlite3
(may not be needed but it won't break the database) - Launch the server:
node ../experiment_srv.js en npp.sqlite3
- Open in the browser: http://localhost:8282
- To see the content of a Wikipedia page run:
node retrieve-page.njs zh 山
(set language and page name accordingly). - To retrieve a list of categories and relevant pages run:
node retrieve-cats.njs en 0.cats
where en is the language and 0.cats is a file with initial list of pages and categories. This will produce a new file 1.cats (or higher number) with a list of pages and categories retrieved based on the original list. All files are tab-delimited and can be opened in a spreadsheet. - output file can be edited to remove irrelevant entries which can be either commented out using # symbol or placed on the ignore list by entering dash (
-
) in into the first column - the list of ignored entries will be added to the end of output file
- To get edits stats run
node retrieve-edit-stats.njs en 0.cats
.
- nodemw docs are here: https://github.com/macbre/nodemw
- Wikipedia Category
- lines at the end of the page are parsed out
- general information: http://en.wikipedia.org/wiki/Help:Category .
Created with Nodeclipse (Eclipse Marketplace, site)
Nodeclipse is free open-source project that grows with your contributions.