Skip to content

Latest commit

 

History

History

pg-streaming-feature-engineering

Feature engineering of performance traces

This folder contains a multi-project build in Gradle for feature engineering software and system traces to allow for our machine learning on streaming data.

The multi-project contains the following subprojects, each with an own README:

  • pg-streaming-schema: a Java library that contains our data models using Apache Avro. It is used by all other apps.
  • pg-streaming-feature-extraction: a Java / Kafka Streams app to calculate features in hopping windows from a stream of system traces.
  • pg-streaming-target-extraction: a Java / Kafka Streams app to calculate targets in hopping windows from a stream of software traces (i.e., Zipkin).
  • pg-streaming-labeling: a Java / Kafka Streams app to join the feature and target stream so we have a stream of labeled data in hopping windows.
  • integration-test: a Java project that runs an integration test of all projects above. It automatically builds docker images and starts and stops docker-compose.

The feature engineering procedure is as follows:

README

Run feature engineering

The easiest way to run the project is to use Gradle and Docker Compose.

Please make sure to have the following technologies installed:

We use a Docker Gradle plugin and Docker Compose Gradle pugin to start Zookeeper, Kafka, Schema Registry, and all our Kafka Streams applications. All images, including images for our applications, are automatically built as part of the integration test.

Build the multi-project and execute the integration test by running:

gradle build

To start all default containers, run of the following command:

docker-compose up

Alternatively, use profiles with Compose to start additional containers:

  • docker-compose --profile debug up starts Kafdrop

To stop and remove everything, we recommend using the following command to prevent future errors with Apache Kafka:

docker-compose rm -sfv

To generate test data, have a look at the integration test or on our other project pg-streaming-data-collection in the root of this repository.