Skip to content
This repository has been archived by the owner on Jul 25, 2020. It is now read-only.

Indexes delimited text files (csv,tsv) with Lucene

Notifications You must be signed in to change notification settings

grahamcrowell/DelimitedLucene

Repository files navigation

Lucene Indexer for Delimited Files

(Lucene 7.0.1 docs)

  • index delimited files without predefining schema
    • schema is inferred by splitting header
  • full text search across multiple folders, files, and columns
  • each column in source data is a Field in Lucene
  • each line in source data is a Document in Lucene
  • each header in source data is a Index
  • each tenant's Lucene index is persisted in a separate FSDirectory

High level logic diagram

Alt

Search results include

  • name of matched file
  • name (date stamp) of parent folder
  • matched column name
  • matched line number
  • names and values of other columns on matched line

Overview of Per Tenant Indexing

Each data is organized physically (ie. in /esldata/) hierarchially as follows

  • Tenant has 1 or more:
    • Dated folder has 1 or more:
      • Delimited data file has 1:
        • Header, delimiter

About

Indexes delimited text files (csv,tsv) with Lucene

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages