You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We observed the following situation happen a few times now when using lock-free Hive catalog commits introduced in #6570:
We run an ALTER TABLE table SET TBLPROPERTIES ('key' = 'value') or any other operation that results in an Iceberg commit, either Spark or any other engine. For whatever reason the connection to the Hive metastore is broken and the HMS operation fails during the first attempt:
WARN org.apache.hadoop.hive.metastore.RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. alter_table_with_environmentContext
org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
<...>
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_environment_context(ThriftHiveMetastore.java:1693)
<...>
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169)
<...>
at org.apache.iceberg.hive.MetastoreUtil.alterTable(MetastoreUtil.java:78)
at org.apache.iceberg.hive.HiveOperationsBase.lambda$persistTable$0(HiveOperationsBase.java:112)
<...>
at org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:239)
at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:135)
<...>
at org.apache.iceberg.spark.SparkCatalog.alterTable(SparkCatalog.java:345)
<...>
but the operation actually succeeds and updates the metadata location, which means that when the RetryingMetaStoreClient attempts resubmitting the operation, it fails with:
MetaException(message:The table has been modified. The parameter value for key 'metadata_location' is '<new>'. The expected was value was '<previous>')
The Iceberg commit is then considered failed and the new metadata file is cleaned up in the finally block here before retrying the commit. But the problem is that the Hive table has the new metadata location set, so when Iceberg tries refreshing the table it fails, because the new metadata file no longer exists, leaving the table in a corrupted state.
I suppose a fix could be checking the exception and ignoring the case when the already set location is equal to the new metadata location, but parsing the error message sounds very hacky.
Willingness to contribute
I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time
The text was updated successfully, but these errors were encountered:
Apache Iceberg version
1.6.1
Query engine
Spark
Please describe the bug 🐞
We observed the following situation happen a few times now when using lock-free Hive catalog commits introduced in #6570:
We run an
ALTER TABLE table SET TBLPROPERTIES ('key' = 'value')
or any other operation that results in an Iceberg commit, either Spark or any other engine. For whatever reason the connection to the Hive metastore is broken and the HMS operation fails during the first attempt:but the operation actually succeeds and updates the metadata location, which means that when the
RetryingMetaStoreClient
attempts resubmitting the operation, it fails with:The Iceberg commit is then considered failed and the new metadata file is cleaned up in the
finally
block here before retrying the commit. But the problem is that the Hive table has the new metadata location set, so when Iceberg tries refreshing the table it fails, because the new metadata file no longer exists, leaving the table in a corrupted state.I suppose a fix could be checking the exception and ignoring the case when the already set location is equal to the new metadata location, but parsing the error message sounds very hacky.
Willingness to contribute
The text was updated successfully, but these errors were encountered: