Note: code was developed in python version 3.6.2
- In ./implementation-indexing directory run command
pip install -r requirements.txt
- In ./implementation-indexing directory run command
python run-data-process.py
Search string is hardcoded. To change it comment the current one and uncomment the one you want to use.
- For BASIC search in ./implementation-indexing directory run command
python run-basic-search.py
- For SQLITE search in ./implementation-indexing directory run command
python run-sqlite-search.py
Knjižnice:
- from bs4 import Comment
- from htmldom import htmldom
- from lingpy import *
- import re
- from pathlib import Path
- import sys
- import json
- from lxml import html
- import regex
- from pathlib import Path
Run one of the following commands (from the implementation-extraction directory), depending on the method you want to use:
- Regular expression: python .\run-extraction.py A
- XPath: python .\run-extraction.py B
- RoadRunner: python .\run-extraction.py C
- Add geckodriver.exe ro yur path
- Run postgres (host="localhost", port="5433", database="postgres", user="postgres", password="password")
- Import database with script (https://szitnik.github.io/wier-labs/data/pa1/crawldb.sql)
- In cmd run command: python .\ieps.py
- Select how many threads you want to use
- If you have ran it before you can choose to not delete the database and continue from where you left.