Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Add documentation for file-based metastore #24511

Merged

Conversation

steveburnett
Copy link
Contributor

@steveburnett steveburnett commented Feb 6, 2025

Description

Add how to configure Presto to use a file-based hms to installation/deployment.rst.

Motivation and Context

@ethanyzhang suggested this would be a good addition to the Presto documentation in an internal discussion that @nmahadevuni contributed the configuration in. I discussed where such information would fit best in the Presto documentation with @tdcmeehan.

Impact

Documentation. Readers wanting to try out Presto quickly can bypass the need for the steps in Configure Hive MetaStore.

Test Plan

Local doc build. Screenshot with existing text above and below included for context.
Screenshot 2025-02-06 at 4 41 07 PM

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add documentation for file-based Hive metastore to :doc:`/connector/file-based-metastore`.

@steveburnett steveburnett requested review from elharo and a team as code owners February 6, 2025 21:52
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Feb 6, 2025
@prestodb-ci prestodb-ci requested review from a team, psnv03 and NivinCS and removed request for a team February 6, 2025 21:52
@steveburnett steveburnett self-assigned this Feb 6, 2025
@github-actions github-actions bot added the docs label Feb 6, 2025
@majetideepak
Copy link
Collaborator

@steveburnett I have an issue here with some more details
#19112
We need to list the restrictions as well. file-based metastore does not support partitioning for example. @nmahadevuni should confirm

@imjalpreet
Copy link
Member

file-based meta store does not support partitioning

@majetideepak I haven't used the file-based Hive metastore much myself, but did you encounter any issues when trying to create partitioned tables?

Ideally, it should be possible and I can see that partition-specific metadata calls are implemented even for the FileHiveMetastore.

https://github.com/prestodb/presto/blob/master/presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/file/FileHiveMetastore.java

@ethanyzhang
Copy link
Contributor

@imjalpreet should this be in the hive.properties or config.properties?

@imjalpreet
Copy link
Member

should this be in the hive.properties or config.properties?

@ethanyzhang These should be in the catalog properties file. I also added a review comment above.

@steveburnett
Copy link
Contributor Author

steveburnett commented Feb 6, 2025

@steveburnett I have an issue here with some more details #19112 We need to list the restrictions as well. file-based metastore does not support partitioning for example. @nmahadevuni should confirm

Thanks for the additional information @majetideepak!

With the new information in #19112, I am going to move this topic from where I initially put it in this PR as a small topic in Deploying Presto.

I thought about moving it into the Hive Connector doc, but as it is relevant to "Hive and Lakehouse Connectors (Iceberg, Delta, and Hudi)" I think I will move it to its own page in /installation and include Deepak's instructions how to use it, which will be a big help to readers.

Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to include this in the connectors section?

Right now, metastore properties are spread across different parts of the Hive Connector documentation, mainly in Hive Configuration Properties, Metastore Configuration Properties, and Glue Configuration Properties.

Perhaps we could extract the metastore-related documentation into a separate subsection, similar to Hive Security. This would make sense since metastore properties are relevant not just to Hive, but also to other connectors like Iceberg, Delta, and Hudi.

@majetideepak
Copy link
Collaborator

Ideally, it should be possible and I can see that partition-specific metadata calls are implemented even for the FileHiveMetastore.

@imjalpreet I remember seeing some issue with partitions and file metastore. It could be due to my setup. I think its good to test this once before documenting.

@hantangwangd
Copy link
Member

I use file bases HMS for development environment, just confirmed again that it supports Hive partitioned tables. Or maybe you encountered some specific problems when using partitioned tables with file bases HMS @majetideepak?

@majetideepak
Copy link
Collaborator

@hantangwangd If you use it, then it's good. My issue was likely related to my setup.

@nmahadevuni
Copy link
Member

@majetideepak I have used partitioned tables too recently with file based HMS. I didn't have any issues.

@steveburnett steveburnett force-pushed the steveburnett-file-based-hms branch from 03c832f to 35be659 Compare February 7, 2025 16:52
@steveburnett steveburnett changed the title [docs] Add file-based metastore config to deployment.rst [docs] Add documentation for file-based metastore Feb 7, 2025
@steveburnett
Copy link
Contributor Author

Hi everyone, thanks for your feedback! I took @imjalpreet's suggestion and moved this to a separate page in /connector, corrected the directory path that @nmahadevuni noted, added text describing the supported connectors and corrected the instructions for the properties file to the connector property file, and added usage examples from @majetideepak's #19112.

I welcome everyone to review again and comment with new corrections and additions.

@majetideepak
Copy link
Collaborator

@steveburnett, @hantangwangd, @nmahadevuni
The file-based metastore approach is also useful for reading existing files.
I tested all the content in #19112
It does not cover partitioning. We could cover partitioning in a separate PR.

@nmahadevuni
Copy link
Member

Since we are mentioning one can provide .prestoSchema and .prestoPermissions for existing data, is there any format or steps on how to create these? @majetideepak

@steveburnett steveburnett force-pushed the steveburnett-file-based-hms branch from 35be659 to a0a7ea5 Compare February 7, 2025 18:13
@steveburnett
Copy link
Contributor Author

Hi everyone! I revised based on the feedback from @nmahadevuni, @majetideepak, and @hantangwangd. PTAL when you can.

Specific open questions I haven't addressed in this update:

  • what I should say or remove from the current draft about partitioning. Should I add a sentence in the Overview "Partitioning with file-based metastores is not supported."?

  • what I should add about "provide .prestoSchema and .prestoPermissions files", or remove that line entirely?

@majetideepak
Copy link
Collaborator

majetideepak commented Feb 7, 2025

@steveburnett Let's remove Reading Existing Data Files with a File-based Metastore from this PR since both the open questions are related to that.
I believe @imjalpreet mentioned that partitioning works when you create the table with file based metastore.

@steveburnett steveburnett force-pushed the steveburnett-file-based-hms branch from a0a7ea5 to a915d6f Compare February 7, 2025 21:38
@steveburnett
Copy link
Contributor Author

@steveburnett Let's remove Reading Existing Data Files with a File-based Metastore from this PR since both the open questions are related to that. I believe @imjalpreet mentioned that partitioning works when you create the table with file based metastore.

Done and done, thanks! PTAL.

NivinCS
NivinCS previously approved these changes Feb 8, 2025
hantangwangd
hantangwangd previously approved these changes Feb 8, 2025
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work @steveburnett.

@nmahadevuni
Copy link
Member

Thank you @steveburnett.

@steveburnett steveburnett dismissed stale reviews from hantangwangd and NivinCS via 1624b09 February 10, 2025 15:05
@steveburnett steveburnett force-pushed the steveburnett-file-based-hms branch from a915d6f to 1624b09 Compare February 10, 2025 15:05
Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveburnett thanks for the PR, LGTM!

@ethanyzhang
Copy link
Contributor

@steveburnett is this ready to go? We can ping @yingsu00 to merge

@steveburnett
Copy link
Contributor Author

@steveburnett is this ready to go? We can ping @yingsu00 to merge

@ethanyzhang yes, this is ready to merge.

@steveburnett steveburnett merged commit 18cef11 into prestodb:master Feb 12, 2025
55 checks passed
@steveburnett steveburnett deleted the steveburnett-file-based-hms branch February 13, 2025 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs from:IBM PR from IBM
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

9 participants