Convert Hugo versioned docs to mkdocs format (#9591)

apache · Feb 1, 2024 · ed28898 · ed28898
1 parent 6bbf70a
commit ed28898
Show file tree

Hide file tree

Showing 45 changed files with 432 additions and 638 deletions.
diff --git a/docs/java-api.md → docs/docs/api.md b/docs/java-api.md → docs/docs/api.md
@@ -1,13 +1,5 @@
 ---
 title: "Java API"
-url: api
-aliases:
-    - "java/api"
-menu:
-    main:
-        parent: "API"
-        identifier: java_api
-        weight: 200
 ---
 <!--
  - Licensed to the Apache Software Foundation (ASF) under one or more
@@ -36,11 +28,11 @@ Table metadata and operations are accessed through the `Table` interface. This i
 
 ### Table metadata
 
-The [`Table` interface](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/Table.html) provides access to the table metadata:
+The [`Table` interface](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/Table.html) provides access to the table metadata:
 
-* `schema` returns the current table [schema](../schemas)
+* `schema` returns the current table [schema](schemas.md)
 * `spec` returns the current table partition spec
-* `properties` returns a map of key-value [properties](../configuration)
+* `properties` returns a map of key-value [properties](configuration.md)
 * `currentSnapshot` returns the current table snapshot
 * `snapshots` returns all valid snapshots for the table
 * `snapshot(id)` returns a specific snapshot by ID
@@ -108,7 +100,7 @@ where `Record` is Iceberg record for iceberg-data module `org.apache.iceberg.dat
 
 ### Update operations
 
-`Table` also exposes operations that update the table. These operations use a builder pattern, [`PendingUpdate`](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/PendingUpdate.html), that commits when `PendingUpdate#commit` is called.
+`Table` also exposes operations that update the table. These operations use a builder pattern, [`PendingUpdate`](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/PendingUpdate.html), that commits when `PendingUpdate#commit` is called.
 
 For example, updating the table schema is done by calling `updateSchema`, adding updates to the builder, and finally calling `commit` to commit the pending changes to the table:
 
@@ -150,7 +142,7 @@ t.commitTransaction();
 
 ## Types
 
-Iceberg data types are located in the [`org.apache.iceberg.types` package](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/types/package-summary.html).
+Iceberg data types are located in the [`org.apache.iceberg.types` package](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/types/package-summary.html).
 
 ### Primitives
 
@@ -166,7 +158,7 @@ Types.DecimalType.of(9, 2) // decimal(9, 2)
 
 Structs, maps, and lists are created using factory methods in type classes.
 
-Like struct fields, map keys or values and list elements are tracked as nested fields. Nested fields track [field IDs](../evolution#correctness) and nullability.
+Like struct fields, map keys or values and list elements are tracked as nested fields. Nested fields track [field IDs](evolution.md#correctness) and nullability.
 
 Struct fields are created using `NestedField.optional` or `NestedField.required`. Map value and list element nullability is set in the map and list factory methods.
 
@@ -193,7 +185,7 @@ ListType list = ListType.ofRequired(1, IntegerType.get());
 
 ## Expressions
 
-Iceberg's expressions are used to configure table scans. To create expressions, use the factory methods in [`Expressions`](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/expressions/Expressions.html).
+Iceberg's expressions are used to configure table scans. To create expressions, use the factory methods in [`Expressions`](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/expressions/Expressions.html).
 
 Supported predicate expressions are:
 

diff --git a/docs/docs/assets/images/audit-branch.png b/docs/docs/assets/images/audit-branch.png
diff --git a/docs/docs/assets/images/historical-snapshot-tag.png b/docs/docs/assets/images/historical-snapshot-tag.png
diff --git a/docs/docs/assets/images/iceberg-in-place-metadata-migration.png b/docs/docs/assets/images/iceberg-in-place-metadata-migration.png
diff --git a/docs/docs/assets/images/iceberg-migrateaction-step1.png b/docs/docs/assets/images/iceberg-migrateaction-step1.png
diff --git a/docs/docs/assets/images/iceberg-migrateaction-step2.png b/docs/docs/assets/images/iceberg-migrateaction-step2.png
diff --git a/docs/docs/assets/images/iceberg-migrateaction-step3.png b/docs/docs/assets/images/iceberg-migrateaction-step3.png
diff --git a/docs/docs/assets/images/iceberg-snapshotaction-step1.png b/docs/docs/assets/images/iceberg-snapshotaction-step1.png
diff --git a/docs/docs/assets/images/iceberg-snapshotaction-step2.png b/docs/docs/assets/images/iceberg-snapshotaction-step2.png
diff --git a/docs/docs/assets/images/partition-spec-evolution.png b/docs/docs/assets/images/partition-spec-evolution.png
diff --git a/docs/aws.md → docs/docs/aws.md b/docs/aws.md → docs/docs/aws.md
@@ -1,11 +1,5 @@
 ---
 title: "AWS"
-url: aws
-menu:
-    main:
-        parent: Integrations
-        identifier: aws_integration
-        weight: 0
 ---
 <!--
  - Licensed to the Apache Software Foundation (ASF) under one or more
@@ -53,7 +47,7 @@ For example, to use AWS features with Spark 3.4 (with scala 2.12) and AWS client
 
 ```sh
 # start Spark SQL client shell
-spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:{{% icebergVersion %}},org.apache.iceberg:iceberg-aws-bundle:{{% icebergVersion %}} \
+spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:{{ icebergVersion }},org.apache.iceberg:iceberg-aws-bundle:{{ icebergVersion }} \
     --conf spark.sql.defaultCatalog=my_catalog \
     --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
     --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \
@@ -69,10 +63,12 @@ To use AWS module with Flink, you can download the necessary dependencies and sp
 
 ```sh
 # download Iceberg dependency
-ICEBERG_VERSION={{% icebergVersion %}}
+ICEBERG_VERSION={{ icebergVersion }}
 MAVEN_URL=https://repo1.maven.org/maven2
 ICEBERG_MAVEN_URL=$MAVEN_URL/org/apache/iceberg
+
 wget $ICEBERG_MAVEN_URL/iceberg-flink-runtime/$ICEBERG_VERSION/iceberg-flink-runtime-$ICEBERG_VERSION.jar
+
 wget $ICEBERG_MAVEN_URL/iceberg-aws-bundle/$ICEBERG_VERSION/iceberg-aws-bundle-$ICEBERG_VERSION.jar
 
 # start Flink SQL client shell
@@ -142,7 +138,7 @@ an Iceberg table is stored as a [Glue Table](https://docs.aws.amazon.com/glue/la
 and every Iceberg table version is stored as a [Glue TableVersion](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-tables.html#aws-glue-api-catalog-tables-TableVersion). 
 You can start using Glue catalog by specifying the `catalog-impl` as `org.apache.iceberg.aws.glue.GlueCatalog`,
 just like what is shown in the [enabling AWS integration](#enabling-aws-integration) section above. 
-More details about loading the catalog can be found in individual engine pages, such as [Spark](../spark-configuration/#loading-a-custom-catalog) and [Flink](../flink/#creating-catalogs-and-using-catalogs).
+More details about loading the catalog can be found in individual engine pages, such as [Spark](spark-configuration.md#loading-a-custom-catalog) and [Flink](flink.md#creating-catalogs-and-using-catalogs).
 
 #### Glue Catalog ID
 
@@ -181,17 +177,17 @@ If there is no commit conflict, the operation will be retried.
 Optimistic locking guarantees atomic transaction of Iceberg tables in Glue.
 It also prevents others from accidentally overwriting your changes.
 
-{{< hint info >}}
-Please use AWS SDK version >= 2.17.131 to leverage Glue's Optimistic Locking.
-If the AWS SDK version is below 2.17.131, only in-memory lock is used. To ensure atomic transaction, you need to set up a [DynamoDb Lock Manager](#dynamodb-lock-manager).
-{{< /hint >}}
+!!! info
+    Please use AWS SDK version >= 2.17.131 to leverage Glue's Optimistic Locking.
+    If the AWS SDK version is below 2.17.131, only in-memory lock is used. To ensure atomic transaction, you need to set up a [DynamoDb Lock Manager](#dynamodb-lock-manager).
+
 
 #### Warehouse Location
 
 Similar to all other catalog implementations, `warehouse` is a required catalog property to determine the root path of the data warehouse in storage.
 By default, Glue only allows a warehouse location in S3 because of the use of `S3FileIO`.
 To store data in a different local or cloud store, Glue catalog can switch to use `HadoopFileIO` or any custom FileIO by setting the `io-impl` catalog property.
-Details about this feature can be found in the [custom FileIO](../custom-catalog/#custom-file-io-implementation) section.
+Details about this feature can be found in the [custom FileIO](custom-catalog.md#custom-file-io-implementation) section.
 
 #### Table Location
 
@@ -267,7 +263,7 @@ This design has the following benefits:
 
 Iceberg also supports the JDBC catalog which uses a table in a relational database to manage Iceberg tables.
 You can configure to use the JDBC catalog with relational database services like [AWS RDS](https://aws.amazon.com/rds).
-Read [the JDBC integration page](../jdbc/#jdbc-catalog) for guides and examples about using the JDBC catalog.
+Read [the JDBC integration page](jdbc.md#jdbc-catalog) for guides and examples about using the JDBC catalog.
 Read [this AWS documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAMDBAuth.Connecting.Java.html) for more details about configuring the JDBC catalog with IAM authentication. 
 
 ### Which catalog to choose?
@@ -293,7 +289,7 @@ This feature requires the following lock related catalog properties:
 2. Set `lock.table` as the DynamoDB table name you would like to use. If the lock table with the given name does not exist in DynamoDB, a new table is created with billing mode set as [pay-per-request](https://aws.amazon.com/blogs/aws/amazon-dynamodb-on-demand-no-capacity-planning-and-pay-per-request-pricing).
 
 Other lock related catalog properties can also be used to adjust locking behaviors such as heartbeat interval.
-For more details, please refer to [Lock catalog properties](../configuration/#lock-catalog-properties).
+For more details, please refer to [Lock catalog properties](configuration.md#lock-catalog-properties).
 
 
 ## S3 FileIO
@@ -347,7 +343,7 @@ Iceberg by default uses the Hive storage layout but can be switched to use the `
 With `ObjectStoreLocationProvider`, a deterministic hash is generated for each stored file, with the hash appended 
 directly after the `write.data.path`. This ensures files written to s3 are equally distributed across multiple [prefixes](https://aws.amazon.com/premiumsupport/knowledge-center/s3-object-key-naming-pattern/) in the S3 bucket. Resulting in minimized throttling and maximized throughput for S3-related IO operations. When using `ObjectStoreLocationProvider` having a shared and short `write.data.path` across your Iceberg tables will improve performance.
 
-For more information on how S3 scales API QPS, check out the 2018 re:Invent session on [Best Practices for Amazon S3 and Amazon S3 Glacier]( https://youtu.be/rHeTn9pHNKo?t=3219). At [53:39](https://youtu.be/rHeTn9pHNKo?t=3219) it covers how S3 scales/partitions & at [54:50](https://youtu.be/rHeTn9pHNKo?t=3290) it discusses the 30-60 minute wait time before new partitions are created.
+For more information on how S3 scales API QPS, check out the 2018 re:Invent session on [Best Practices for Amazon S3 and Amazon S3 Glacier](https://youtu.be/rHeTn9pHNKo?t=3219). At [53:39](https://youtu.be/rHeTn9pHNKo?t=3219) it covers how S3 scales/partitions & at [54:50](https://youtu.be/rHeTn9pHNKo?t=3290) it discusses the 30-60 minute wait time before new partitions are created.
 
 To use the `ObjectStorageLocationProvider` add `'write.object-storage.enabled'=true` in the table's properties. 
 Below is an example Spark SQL command to create a table using the `ObjectStorageLocationProvider`:
@@ -378,7 +374,7 @@ However, for the older versions up to 0.12.0, the logic is as follows:
 - before 0.12.0, `write.object-storage.path` must be set.
 - at 0.12.0, `write.object-storage.path` then `write.folder-storage.path` then `<tableLocation>/data`.
 
-For more details, please refer to the [LocationProvider Configuration](../custom-catalog/#custom-location-provider-implementation) section.  
+For more details, please refer to the [LocationProvider Configuration](custom-catalog.md#custom-location-provider-implementation) section.  
 
 ### S3 Strong Consistency
 
@@ -539,7 +535,7 @@ The Glue, S3 and DynamoDB clients are then initialized with the assume-role cred
 Here is an example to start Spark shell with this client factory:
 
 ```shell
-spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:{{% icebergVersion %}},org.apache.iceberg:iceberg-aws-bundle:{{% icebergVersion %}} \
+spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:{{ icebergVersion }},org.apache.iceberg:iceberg-aws-bundle:{{ icebergVersion }} \
     --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
     --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \    
     --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
@@ -618,13 +614,14 @@ For versions before 6.5.0, you can use a [bootstrap action](https://docs.aws.ama
 ```sh
 #!/bin/bash
 
-ICEBERG_VERSION={{% icebergVersion %}}
+ICEBERG_VERSION={{ icebergVersion }}
 MAVEN_URL=https://repo1.maven.org/maven2
 ICEBERG_MAVEN_URL=$MAVEN_URL/org/apache/iceberg
 # NOTE: this is just an example shared class path between Spark and Flink,
 #  please choose a proper class path for production.
 LIB_PATH=/usr/share/aws/aws-java-sdk/
 
+
 ICEBERG_PACKAGES=(
   "iceberg-spark-runtime-3.3_2.12"
   "iceberg-flink-runtime"
@@ -655,7 +652,7 @@ More details could be found [here](https://docs.aws.amazon.com/glue/latest/dg/aw
 ### AWS EKS
 
 [AWS Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/) can be used to start any Spark, Flink, Hive, Presto or Trino clusters to work with Iceberg.
-Search the [Iceberg blogs](../../../blogs) page for tutorials around running Iceberg with Docker and Kubernetes.
+Search the [Iceberg blogs](../../blogs.md) page for tutorials around running Iceberg with Docker and Kubernetes.
 
 ### Amazon Kinesis
 

diff --git a/docs/branching-and-tagging.md → docs/docs/branching.md b/docs/branching-and-tagging.md → docs/docs/branching.md
@@ -1,13 +1,5 @@
 ---
 title: "Branching and Tagging"
-url: branching
-aliases:
-    - "tables/branching"
-menu:
-    main:
-        parent: Tables
-        identifier: tables_branching
-        weight: 0
 ---
 
 <!--
@@ -33,14 +25,14 @@ menu:
 
 Iceberg table metadata maintains a snapshot log, which represents the changes applied to a table.
 Snapshots are fundamental in Iceberg as they are the basis for reader isolation and time travel queries.
-For controlling metadata size and storage costs, Iceberg provides snapshot lifecycle management procedures such as [`expire_snapshots`](../spark-procedures/#expire-snapshots) for removing unused snapshots and no longer necessary data files based on table snapshot retention properties.
+For controlling metadata size and storage costs, Iceberg provides snapshot lifecycle management procedures such as [`expire_snapshots`](spark-procedures.md#expire-snapshots) for removing unused snapshots and no longer necessary data files based on table snapshot retention properties.
 
 **For more sophisticated snapshot lifecycle management, Iceberg supports branches and tags which are named references to snapshots with their own independent lifecycles. This lifecycle is controlled by branch and tag level retention policies.** 
 Branches are independent lineages of snapshots and point to the head of the lineage. 
 Branches and tags have a maximum reference age property which control when the reference to the snapshot itself should be expired.
 Branches have retention properties which define the minimum number of snapshots to retain on a branch as well as the maximum age of individual snapshots to retain on the branch. 
 These properties are used when the expireSnapshots procedure is run. 
-For details on the algorithm for expireSnapshots, refer to the [spec](../../../spec#snapshot-retention-policy).
+For details on the algorithm for expireSnapshots, refer to the [spec](../../spec.md#snapshot-retention-policy).
 
 ## Use Cases
 
@@ -52,7 +44,7 @@ See below for some examples of how branching and tagging can facilitate these us
 
 Tags can be used for retaining important historical snapshots for auditing purposes.
 
-![Historical Tags](../img/historical-snapshot-tag.png)
+![Historical Tags](assets/images/historical-snapshot-tag.png)
 
 The above diagram demonstrates retaining important historical snapshot with the following retention policy, defined 
 via Spark SQL.
@@ -84,7 +76,7 @@ ALTER TABLE prod.db.table CREATE BRANCH `test-branch` RETAIN 7 DAYS WITH SNAPSHO
 
 ### Audit Branch
 
-![Audit Branch](../img/audit-branch.png)
+![Audit Branch](assets/images/audit-branch.png)
 
 The above diagram shows an example of using an audit branch for validating a write workflow. 
 
@@ -115,9 +107,9 @@ CALL catalog_name.system.fast_forward('prod.db.table', 'main', 'audit-branch');
 
 Creating, querying and writing to branches and tags are supported in the Iceberg Java library, and in Spark and Flink engine integrations.
 
-- [Iceberg Java Library](../java-api-quickstart/#branching-and-tagging)
-- [Spark DDLs](../spark-ddl/#branching-and-tagging-ddl)
-- [Spark Reads](../spark-queries/#time-travel)
-- [Spark Branch Writes](../spark-writes/#writing-to-branches)
-- [Flink Reads](../flink-queries/#reading-branches-and-tags-with-SQL)
-- [Flink Branch Writes](../flink-writes/#branch-writes)
+- [Iceberg Java Library](java-api-quickstart.md#branching-and-tagging)
+- [Spark DDLs](spark-ddl.md#branching-and-tagging-ddl)
+- [Spark Reads](spark-queries.md#time-travel)
+- [Spark Branch Writes](spark-writes.md#writing-to-branches)
+- [Flink Reads](flink-queries.md#reading-branches-and-tags-with-SQL)
+- [Flink Branch Writes](flink-writes.md#branch-writes)
diff --git a/docs/configuration.md → docs/docs/configuration.md b/docs/configuration.md → docs/docs/configuration.md
@@ -1,13 +1,5 @@
 ---
 title: "Configuration"
-url: configuration
-aliases:
-    - "tables/configuration"
-menu:
-    main:
-        parent: Tables
-        identifier: tables_configuration
-        weight: 0
 ---
 <!--
  - Licensed to the Apache Software Foundation (ASF) under one or more
@@ -144,8 +136,8 @@ Iceberg catalogs support using catalog properties to configure catalog behaviors
 `HadoopCatalog` and `HiveCatalog` can access the properties in their constructors.
 Any other custom catalog can access the properties by implementing `Catalog.initialize(catalogName, catalogProperties)`.
 The properties can be manually constructed or passed in from a compute engine like Spark or Flink.
-Spark uses its session properties as catalog properties, see more details in the [Spark configuration](../spark-configuration#catalog-configuration) section.
-Flink passes in catalog properties through `CREATE CATALOG` statement, see more details in the [Flink](../flink/#creating-catalogs-and-using-catalogs) section.
+Spark uses its session properties as catalog properties, see more details in the [Spark configuration](spark-configuration.md#catalog-configuration) section.
+Flink passes in catalog properties through `CREATE CATALOG` statement, see more details in the [Flink](flink.md#adding-catalogs) section.
 
 ### Lock catalog properties
 
@@ -154,7 +146,7 @@ Here are the catalog properties related to locking. They are used by some catalo
 | Property                          | Default            | Description                                            |
 | --------------------------------- | ------------------ | ------------------------------------------------------ |
 | lock-impl                         | null               | a custom implementation of the lock manager, the actual interface depends on the catalog used  |
-| lock.table                        | null               | an auxiliary table for locking, such as in [AWS DynamoDB lock manager](../aws/#dynamodb-for-commit-locking)  |
+| lock.table                        | null               | an auxiliary table for locking, such as in [AWS DynamoDB lock manager](aws.md#dynamodb-lock-manager)  |
 | lock.acquire-interval-ms          | 5000 (5 s)         | the interval to wait between each attempt to acquire a lock  |
 | lock.acquire-timeout-ms           | 180000 (3 min)     | the maximum time to try acquiring a lock               |
 | lock.heartbeat-interval-ms        | 3000 (3 s)         | the interval to wait between each heartbeat after acquiring a lock  |