Skip to content

Latest commit

 

History

History
20 lines (11 loc) · 906 Bytes

12-Appendix-Apache-Arrow.md

File metadata and controls

20 lines (11 loc) · 906 Bytes

Apache Arrow

  • Apache Arrow is a cross-language development platform for in-memory data.
  • Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

Arrow itself is not a storage or execution engine. It is designed to serve as a shared foundation for the following types of systems:

  • SQL execution engines (e.g., Drill and Impala)
  • Data analysis systems (e.g., Pandas and Spark)
  • Streaming and queueing systems (e.g., Kafka and Storm)
  • Storage systems (e.g., Parquet, Kudu, Cassandra and HBase)

References