Search engine built on 75 gb wiki dump
This project generates a sorted indexer for the dump specified. It is optimized by compression
techniques. Given a dump, it will create the inverted index file in Index/ folder, create a tree
of indexers in Split/
folder for the inverted index and tree in Title/
for title-docID mappings file.
Inverted index and title mapping file can be found in Index/
folder
run start.sh
indexer : make index
time python indexer.py ./wiki-search-small.xml
(get xml file which is our dump)
run Kwaymerge.py : merge small files
time python Kwaymerge.py
copy file from output_files to index folder and make split folder inside index folder.
run create_index.py : create multilevel index
python create_index.py
finally run query.py : answer query
python query.py