Skip to content

Latest commit

 

History

History

README.md

Machine learning for streaming performance traces

This folder provides a very simple prototype demonstration to implement online machine learning for streaming data and train and predict performance bottleneck events based feature engineered performance traces. Please have a look there first. For this purpose, we use scikit-multiflow and implement an Adaptive Random Forest Classifier that is updated for each event in the data stream.

The incoming data stream is in Apache Avro registered at a Schema Registry. As Kafka Streams is not available in Python, we use Faust to consume hopping windows of those performance traces. Data a deserialized using Dataclasses Avro Schema Generator and Python Schema Registry Client.

Run machine learning

The easiest way to run the project is to use Python3 and/or Docker Compose.

To generate test data, have a look at the integration test.

Run using Docker Compose

Please make sure to have the following technologies installed:

  • Docker (>= 3.2 to support profiles)

To build our app, run:

docker-compose build

To start all default containers, that is, Zookeeper, Kafka, Schema Registry and this project, run the following command:

docker-compose --profile ml --profile infrastructure up

To stop and remove everything, we recommend using the following command to prevent future errors with Apache Kafka:

docker-compose rm -sfv

Run using Python

Please make sure to have the following technologies installed:

Start dependent instances of Zookeeper, Kafka, Schema Registry either natively or using Docker Compose:

docker-compose --profile infrastructure up

Install dependencies:

pip3 install -e .

Export settings:

export SIMPLE_SETTINGS=settings

Run using Faust:

faust -A pg_streaming_machine_learning.app worker -l INFO