Skip to content

Latest commit

 

History

History
26 lines (23 loc) · 1019 Bytes

File metadata and controls

26 lines (23 loc) · 1019 Bytes

Apache Kafka

  • Stream processing for unbounded datasets.
  • Similar to PubSub
  • Kafka Connect
    • A tool for scalably and reliably streaming data between Apache Kafka and other systems.
    • There are connectors to PubSub/Dataflow/BigQuery

Compared to PubSub

  • Can have precisely once delivery with Spark direct Connector in addition to at least once.
    • Only at least once with PubSub
  • Guaranteed ordering within a partition.
    • No ordering guaranteed with PubSub
  • No max persistence period
    • 7 days or until acknowledged by all subscribers for PubSub
  • Partitioning under user control
    • Partitioning control abstracted away with PubSub
  • Cluster Mirroring for disaster recovery
    • Automated disaster recover for PubSub
  • 1MB max size for data blobs
    • 10 MB max size for PubSub
  • Can change partitioning after setup (does not repartition existing data)
    • Not under user control with PubSub
  • Pseudo push model supported using Spark.

alt text