-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document using statistics in the Faker connector #24874
base: master
Are you sure you want to change the base?
Conversation
1a0f029
to
f9ba5d6
Compare
|
||
Faker can also automatically set the `default_limit` table property, | ||
and the `min`, `max`, `null_probability` column properties, based on statistics | ||
collected by scanning existing data, like in the following example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well.. we should at least link to the docs about stats
https://trino.io/docs/current/sql/analyze.html
Also .. let me send a PR to update the related docs for that stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That page focuses on statistics tracked by Trino, and it's not really relevant in this context, because here we use statistics collected on the fly, when scanning the data. It's the Faker connector that requests particular statistics from the engine, and this doesn't depend on what statistics are provided by other connectors.
``` | ||
|
||
Instead of using range, or other predicates, tables can be sampled, | ||
see {ref}`TABLESAMPLE<tablesample>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoa.. I didnt even know about that
change link to markdown syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, not very well known. Using it for getting accurate stats without scanning whole data is a good example, maybe we can add it somewhere:
show stats for (select * from tpch.sf1.orders tablesample system(1));
I actually noticed this produces invalid stats in tpch
- they're scaled down, since they're computed without scanning/generating any data.
f9ba5d6
to
9c9a921
Compare
Description
Follow up to #24585. I ran the Vale linter and it spotted usages of
will
, so I also fixed that in the config and schema/table/column property descriptions.Additional context and related issues
Release notes
(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: