Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document using statistics in the Faker connector #24874

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nineinchnick
Copy link
Member

Description

Follow up to #24585. I ran the Vale linter and it spotted usages of will, so I also fixed that in the config and schema/table/column property descriptions.

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@nineinchnick nineinchnick requested a review from mosabua February 1, 2025 09:19
@cla-bot cla-bot bot added the cla-signed label Feb 1, 2025
@github-actions github-actions bot added docs faker Faker connector labels Feb 1, 2025
docs/src/main/sphinx/connector/faker.md Show resolved Hide resolved
docs/src/main/sphinx/connector/faker.md Outdated Show resolved Hide resolved

Faker can also automatically set the `default_limit` table property,
and the `min`, `max`, `null_probability` column properties, based on statistics
collected by scanning existing data, like in the following example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well.. we should at least link to the docs about stats

https://trino.io/docs/current/sql/analyze.html

Also .. let me send a PR to update the related docs for that stuff

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That page focuses on statistics tracked by Trino, and it's not really relevant in this context, because here we use statistics collected on the fly, when scanning the data. It's the Faker connector that requests particular statistics from the engine, and this doesn't depend on what statistics are provided by other connectors.

```

Instead of using range, or other predicates, tables can be sampled,
see {ref}`TABLESAMPLE<tablesample>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoa.. I didnt even know about that

change link to markdown syntax

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, not very well known. Using it for getting accurate stats without scanning whole data is a good example, maybe we can add it somewhere:

show stats for (select * from tpch.sf1.orders tablesample system(1));

I actually noticed this produces invalid stats in tpch - they're scaled down, since they're computed without scanning/generating any data.

docs/src/main/sphinx/connector/faker.md Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/faker.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

2 participants