the portable Python dataframe library
-
Updated
May 27, 2024 - Python
the portable Python dataframe library
O projeto Pokémon TCG Data Pipeline visa criar uma solução de pipeline de dados para coletar, transformar e analisar informações sobre as cartas de Pokémon TCG (Trading Card Game).
Seamlessly switch Pandas DataFrame backend to PyArrow.
Data Engineering Zoomcamp 2024
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
db2ixf is a python package with a CLI that simplifies the parsing and processing of IBM Integration eXchange Format (IXF) files.
Reading both XLSX and XLSB files, fast and memory-safe, with Python, into PyArrow
Python scripts to process, and analyze log files using PySpark.
Python scripts to download, process, and analyze NYC TLC trip data
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Add a description, image, and links to the pyarrow topic page so that developers can more easily learn about it.
To associate your repository with the pyarrow topic, visit your repo's landing page and select "manage topics."