Skip to content

AndreiSasu/logfile-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Batch Log Processing Demo application

The general approach:

  1. loop through all files in the folder
  2. parse and load the content of each file in a separate table, using a thread pool
  3. process all loaded events and load them in a second table, using a thread pool

Large file support

Split file into chunks, either by line, or by size:

split -l 200 large_file.log

split -b 500MB large_file.log

How to run the application:

  1. From the IDE, make sure "annotation processing is enabled", because Lombok is being used. https://www.jetbrains.com/help/idea/configuring-annotation-processing.html

  2. Run DemoApplication.java with a folder name parameter containing log files. Ex: /home/andrei/Projects/logfile-processor

Production support

This is only a demo application. For real world production use a streaming solution is preferred to a batch processing once since it's more suitable for logs.

Possible options:

  1. AWS Kinesis + ElasticSearch:

    Ex: https://aws.amazon.com/getting-started/projects/build-log-analytics-solution/

  2. Stream logs to Kafka and process with Apache Spark

  3. Load logs to any other distributed cache(like Redis) and process batches with Hadoop MapReduce

  4. Spring Batch in parallel mode with file chunking.

Todo

The application can be extended to become a lightweight log streaming agent

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages