Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: replace sort order #1500

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

JasperHG90
Copy link

This PR adds functionality to replace a table's sort order. Closes #1245

Some basic tests are implemented but need to be expanded.

Currently, a new sort order ID is assigned. This adds a sort order to the list of available sort orders rather than replacing it.

@Fokko do you mind taking a look at this and telling me what you think? Thanks in advance!

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
Generally lgtm, added a few nit comments

@@ -404,6 +405,20 @@ def update_schema(self, allow_incompatible_changes: bool = False, case_sensitive
name_mapping=self.table_metadata.name_mapping(),
)

def replace_sort_order(self, case_sensitive: bool = True) -> ReplaceSortOrder:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wdyt about update_sort_order instead?

from the spec,

A data or delete file is associated with a sort order by the sort order's id within [a manifest](https://iceberg.apache.org/spec/#manifests). Therefore, the table must declare all the sort orders for lookup. A table could also be configured with a default sort order id, indicating how the new data should be sorted by default. Writers should use this default sort order to sort the data on write, but are not required to if the default order is prohibitively expensive, as it would be for streaming writes.

All sort orders are kept in the manifest in the sort-orders field.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated class & method names

SortField(source_id=1, transform=IdentityTransform(), direction=SortDirection.ASC, null_order=NullOrder.NULLS_FIRST),
SortField(source_id=2, transform=IdentityTransform(), direction=SortDirection.DESC, null_order=NullOrder.NULLS_LAST),
order_id=1,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test for modifying sort order of a table with existing sort order?

Copy link
Author

@JasperHG90 JasperHG90 Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With modifying do you mean updating it so that there's a new sort order or do you mean to modify an existing one without creating a new sort order. I've added a test doing the former.

from pyiceberg.table import Transaction


class SortOrderBuilder:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need the builder here? Or just follow UpdateSchema's pattern

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need it but personally I like it because it's easier to read. I'll leave it up to you to decide if you want this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this.

@JasperHG90
Copy link
Author

Thanks for the comments 🙌 @kevinjqliu . Will look at your comments this weekend.

@JasperHG90 JasperHG90 requested a review from kevinjqliu January 24, 2025 10:08
@JasperHG90 JasperHG90 changed the title [WIP] Feat: replace sort order Feat: replace sort order Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feat] Support update table's sort order
2 participants