Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define Scope for Apache Kafka Support #387

Open
kamir opened this issue Dec 27, 2023 · 3 comments
Open

Define Scope for Apache Kafka Support #387

kamir opened this issue Dec 27, 2023 · 3 comments

Comments

@kamir
Copy link
Contributor

kamir commented Dec 27, 2023

This issue is related to #386 .

In this issue we want to identify the scope of Apache Wayang - Apache Kafka support.

Apache Kafka does not offer a processing engine. KStreams and KSQLDB have been developed for "Kafka internal streaming processing". But it has been decided to discontinue KSQLDB. Instead, Apache Flink has been selected as the new event streaming processing system on top of Apache Kafka. But besides Apache Flink, there is the KStreams library. KSQLDB has been created using the KStreams event stream processing framework.

Why not to use ksqlDB? (from Google Search)
ksqlDB is inefficient with long-running or high-cardinality aggregation. Routing, filtering, and running basic transformations over streaming data are the strengths of ksqlDB, and while it can perform some aggregations, it will suffer under more complex scenarios requiring large amounts of state.27.07.2023

With this in mind I suggest to define 2 scopes for Apache Kafka support in Apache Wayang:

Scope 1: Support Apache Kafka Source and Sink components in Apache Wayang plans.
Scope 2: Support KStreams for "native Kafka Streaming processing" coordinated by Apache Wayang.

@kamir
Copy link
Contributor Author

kamir commented Dec 27, 2023

I suggest to start with implementing a Kafka-Source and a Kafka-Sink components, so that existing Apache Wayang applications can get input directly from Kafka topics and store results directly in such topics.
This will not yet give us the full "Apache Kafka support as a processing platform for Apache Wayang" - but it is a starting point for interacting with the data which lives in Apache Kafka topics.

@zkaoudi
Copy link
Contributor

zkaoudi commented Dec 27, 2023

+1

A kafka source+sink would be awesome.

@kamir
Copy link
Contributor Author

kamir commented Feb 5, 2024

I am back on this task working out a simple KafkaSource component, reading plain text messages from a Kafka cluster, comparable with the JavaFileSource, which can read file, line by line from HTTP URLs now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants