Introducing PolarsExcelDataset: High-Performance Multi-Sheet Excel Integration for Kedro #1000
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces PolarsExcelDataset, a powerful new dataset for Kedro, enabling seamless multi-sheet Excel file processing using Polars. With this addition, data engineers and scientists can now leverage Polars' blazing-fast performance while working with Excel files in Kedro pipelines.
Why This Matters?
Performance Boost: Polars processes large Excel datasets faster than Pandas, optimizing workflows for global talent working with massive datasets.
Scalability: Supports multi-sheet parsing, making data ingestion more efficient for complex machine learning pipelines.
Modern Data Stack: Expands Kedro’s compatibility with Polars, one of the fastest-growing DataFrame libraries.
Future-Ready: Prepares Kedro for next-gen data processing with scalable, high-performance tools.
This contribution empowers global data teams, making Kedro a go-to framework for efficient, scalable data engineering.
Looking forward to feedback and discussions!
jsonschema/kedro-catalog-X.XX.json
if necessaryRELEASE.md
file