Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg table .crc files are not deleted when using Hadoop file system driver #24522

Open
ademille opened this issue Dec 18, 2024 · 0 comments
Open
Labels
iceberg Iceberg connector

Comments

@ademille
Copy link

ademille commented Dec 18, 2024

When an iceberg table is dropped, the metadata and data files associated with that table is deleted. However, the .crc files that are created by the Hadoop file system driver are left in place. The .crc files are also left in place any time an Iceberg table data file is deleted (for example, through a merge or partition delete). This may not be noticeable when using an HDFS filesystem, but it is obvious when using the Hadoop driver to access local files.

The problem is when a temporary table is re-used for importing data and then dropped, the directory can quickly fill up with .crc files.

Steps to reproduce (Using Trino 465)

  1. Create an Iceberg catalog and enable the Hadoop File system:
  2. fs.hadoop.enabled=true
  3. Create a table, insert a row, and drop the table.
  4. Look at the directory contents and notice that it is full of .crc files, but not the original files.
trino:sa> CREATE TABLE iceberg.sa.test (id BIGINT) WITH (format = 'ORC', format_version = 2, location = '/tmp/trino/test');
CREATE TABLE
trino:sa> insert into iceberg.sa.test values (100);
INSERT: 1 row

Query 20241212_211046_04630_ffbh4, FINISHED, 1 node
Splits: 130 total, 130 done (100.00%)
0.21 [0 rows, 0B] [0 rows/s, 0B/s]

trino:sa> drop table iceberg.sa.test;
DROP TABLE
[root@DSDDF8 trino]# find /tmp/trino/test/
/tmp/trino/test/
/tmp/trino/test/metadata
/tmp/trino/test/metadata/.20241212_204959_02122_ffbh4-ffb96f8f-b160-4e43-ba83-9125f8b3c0be.stats.crc
/tmp/trino/test/metadata/.snap-4082420308949166332-1-6d590caa-05d5-4580-aca6-c0b9f6484512.avro.crc
/tmp/trino/test/metadata/.6d590caa-05d5-4580-aca6-c0b9f6484512-m0.avro.crc
/tmp/trino/test/metadata/.00001-6eb9896d-e69f-49e3-aae9-b4bba84853fd.metadata.json.crc
/tmp/trino/test/metadata/.snap-3493362677341884085-1-59873be5-9f63-4028-ad1d-4531e9605ec4.avro.crc
/tmp/trino/test/metadata/.snap-4850668042992078922-1-8d0989d4-c49e-4f34-8ee6-13358de87d47.avro.crc
/tmp/trino/test/metadata/.00000-cccfb8c5-d323-4502-848d-6600f37a17c2.metadata.json.crc
/tmp/trino/test/metadata/.20241212_211046_04630_ffbh4-8298051c-b4ba-49ad-a4e2-fd188f481237.stats.crc
/tmp/trino/test/metadata/.3a0e7301-11f8-4548-a870-5d1a95e0a65f-m0.avro.crc
/tmp/trino/test/metadata/.snap-805553046563604304-1-3a0e7301-11f8-4548-a870-5d1a95e0a65f.avro.crc
/tmp/trino/test/metadata/.00001-68dab464-3500-4426-b226-0d165134e10e.metadata.json.crc
/tmp/trino/test/metadata/.00000-e7cc15c2-ffd4-4be4-bef2-7248f0693442.metadata.json.crc
/tmp/trino/test/metadata/.snap-7636965954837989284-1-4d46c19d-65d5-4608-be8a-baf554e3232f.avro.crc
/tmp/trino/test/data
/tmp/trino/test/data/.20241212_211046_04630_ffbh4-8c075593-72c0-43d8-97e7-187e963e87c2.orc.crc
/tmp/trino/test/data/.20241212_204959_02122_ffbh4-39dd734a-96fe-4d9b-8d80-d6bca57c678d.orc.crc
@ebyhr ebyhr added the iceberg Iceberg connector label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iceberg Iceberg connector
Development

No branches or pull requests

2 participants