- Apache Arrow is a cross-language development platform for in-memory data.
- Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.
Arrow itself is not a storage or execution engine. It is designed to serve as a shared foundation for the following types of systems:
- SQL execution engines (e.g., Drill and Impala)
- Data analysis systems (e.g., Pandas and Spark)
- Streaming and queueing systems (e.g., Kafka and Storm)
- Storage systems (e.g., Parquet, Kudu, Cassandra and HBase)