-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] UpdateSchema.add_column supports both parent and child in the same transaction #1493
Comments
Example test
|
I'm happy to pick this up if it's not assigned yet. |
sure @jiakai-li assigned to you! let me know if you have any questions |
Hey @kevinjqliu , after some investigation, I realized this feature might require a bigger change than I originally expected (which could be a good thing though, so we can refactor the code a littble bit as well). Specifically, I think the logic in At the mean time, I would also like to put a note that below code acheives the same purpose (and I feel it's more natural as well): ...
table = catalog.create_table("default.test", schema)
with table.update_schema() as update:
update.add_column("parent", StructType(
NestedField(-1, "child", StructType(), required=False)
)) Please let me know if you have any thoughts, thanks! |
Thanks! Thats a good workaround. I think a more generic use case is to be able to modify pending updates.
From an API perspective, i think this is a good use case to support. WDYT? |
Sure @kevinjqliu , I think this is a very interesting API to support. I checked the java side as well to get some idea about how they tackle the issue. Seems it's not supported there either (based on what I see here). Do you think we need to raise an issue there as well?
Can I please also have a bit more clarification on what do you mean by "modify pending updates" in this example? Thanks! |
Interesting that its not available in the java implementation.
We can. I guess it hasn't been an issue over there. Most likely people are using the work around you described above.
yes, so
iceberg-python/pyiceberg/table/update/schema.py Lines 223 to 228 in 50c33aa
There's some logic to lookup the parent, but its based on the current schema. This ignores the columns already added to So the next |
Thank you @kevinjqliu , I think I have a better understanding now :-) I believe I have some idea, and will push an update once it gets closer. I'm currently out for a trip so it will take some more time to work on it, but I think I'm getting there. Thanks again! |
Apache Iceberg version
None
Please describe the bug 🐞
Current we cannot add the parent field with its child nested field in the same transaction.
For example,
We should update the API docs as well
To reproduce:
Error:
Willingness to contribute
The text was updated successfully, but these errors were encountered: