Skip to content

h-sinha/Wikipedia-Search-Engine

Repository files navigation

Wikipedia-Search-Engine

A Mini-Wikipedia search engine, which creates the inverted index of a given wikipedia dump, queries on the index and retrieves top 10 results via relevance ranking of the documents(implemented via tf-idf scoring).

Requirements

  • python 3

Setting up conda environment

  • Use environment.yml
  • Install conda
  • Run conda env create -f environment.yml
  • Run conda activate wiki-search

Instructions for running

  • Index creation
bash index.sh <path_to_dump> <index folder>
  • Searching
bash search.sh <path_to_index>

Query Format

  • Normal query - Enter words
  • Field query - title:TITLE category:CATEGORY infobox:INFO ref:REFERENCE body:BODY

Releases

No releases published

Packages

No packages published