Skip to content

simple Command line tool to search text file using tf-idf indexing and cosine similarity

Notifications You must be signed in to change notification settings

aman-nidhi/IR-Text-Document-Retrieval

Repository files navigation

##A Simple Text File Retrieval System

Documents and query are represented as vectors. The retrieved Text Files are ranked based on Cosine similarity of document vectors and the query vector. The vector representation of any document is an array of Tf-Idf score of the terms present in the respective document.

First run the create index program:

    python createIndex.py

Then run the query index program:

    python queryDoc.py pq 

To run the query file, specify the the type of query

pq - phrase query ftq - free text query

english_stopwords.txt :is the stopwords File Index_db.json :is the inverted index of the corpus, stores the term and corresponding posting list
index_score_db.json :is the tf-idf database for each word

Index Creation

Index Read and Query

About

simple Command line tool to search text file using tf-idf indexing and cosine similarity

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages