Skip to content
This repository has been archived by the owner on Aug 27, 2019. It is now read-only.

valpackett/mail2elasticsearch

Repository files navigation

mail2elasticsearch unlicense

A MIME email indexer for ElasticSearch, written in Go.

  • preserves mail structure (nested parts)
  • tells ElasticSearch to index dates correctly
  • deduplicates attachments by storing them in a plain filesystem folder, using hashed contents as the filename (note: potential attachment content indexing using other tools)
  • decodes a ton of character sets, with autodetection when needed
  • is observable: uses structured logging, optionally exposes profiling and stats over HTTP
  • is fast: indexes multiple files in parallel, uses ElasticSearch's bulk index endpoint, static JSON encoding, SIMD accelerated BLAKE2b hashing
  • is (mostly) robust: tested on a large real-world mail archive, did not crash, most mail was parsed correctly, but some messages were skipped (weird EOFs, quoted-printable errors)

Usage

$ mail2elasticsearch -h # check available flags
$ mail2elasticsearch -init # setup the index

$ mail2elasticsearch < /mail/cur/some.letter # stdin
$ mail2elasticsearch /mail/cur/some.letter /mail/cur/other.letter # paths
$ mail2elasticsearch /mail/cur # recursive walk (e.g. initial bulk indexing)

Development

$ dep ensure
$ mail2elasticsearch -srvaddr 127.0.0.1:42069 -attachdir /tmp/files ~/testmail/cur 2>&1 | humanlog
$ go-torch -u http://127.0.0.1:42069/ -t 120 --binaryname=mail2elasticsearch
$ expvarmon -ports="http://127.0.0.1:42069"

Use

Contributing

Please feel free to submit pull requests!

By participating in this project you agree to follow the Contributor Code of Conduct.

The list of contributors is available on GitHub.

License

This is free and unencumbered software released into the public domain.
For more information, please refer to the UNLICENSE file or unlicense.org.