Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add stats error faq #2372

Merged
merged 3 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions packages/dolt/content/other/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,6 @@ us know with an issue](https://github.com/dolthub/dolt/issues) or in
[our Discord](https://discord.gg/s8uVgc3) and we'll [fix it in 24 hours](https://www.dolthub.com/blog/2024-05-15-24-hour-bug-fixes/).
Our goal is to be a 100% drop-in replacement for MySQL.

## Why is my Dolt database so big on disk?

Dolt generates a lot of garbage during some writes, especially during initial import. It's not
unusual to get a local storage size of 20x the actual data size after an import. Running `dolt gc`
will remove the garbage and reclaim local storage. See the [docs on `dolt
Expand All @@ -113,3 +111,30 @@ personally identifiable information is collected. You can disable this behavior
```bash
dolt config --global --add metrics.disabled true
```

## How to silence `stats failure` logs?

Statistic warnings appear infrequently between dolt version upgrades
and do not impact the correctness of database operations.
If the error is reproducible please [report the
issue](https://github.com/dolthub/dolt/issues/new).

Statistics caches can be removed from the filesystem to silence warnings
with `dolt_stats_purge()`:

```sql
call dolt_stats_purge();
```

Version incompatibilities should not hinder the purge command, but if
manual intervention is desirable a specific database's
stats cache can be removed from the filestystem:

```bash
rm -rf .dolt/stats
```

Statistics can be recollected at any time to improve join and indexing
execution performance. See
[the stats docs](../reference/sql/sql-support/miscellaneous#stats-controller-functions)
for more details.
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Additional notes:

Dolt currently supports table statistics for index and join costing.

Statistics are collected by running `ANALYZE TABLE <table, ...>`.
Statistics are auto-collected by default for servers, but cab be manually collected by running `ANALYZE TABLE <table, ...>`.

Here is an example of how to initialize and observe statistics:

Expand Down Expand Up @@ -163,6 +163,14 @@ Dolt exposes a set of helper functions for managing statistics collection and us

- `dolt_stats_status()`: Returns the latest update to statistics for the current database.

- `dolt_stats_prune()`: Garbage collects the statistics cache storage, retaining only
the most recent statistic updates.

- `dolt_stats_purge()`: Deletes the old statistics cache from the
filesystem. This can be used to silence warnings from backwards
incompatible upgrades. Statistics will need
to be recollected, which can be time consuming.

### Performance

Lowering check intervals and update thresholds increases the refresh read and write load. Refreshing statistics uses shortcuts to avoid reading from disk when possible, but in most cases at least needs to read the target fanout level of the tree from disk to compare previous and current chunk sets. Exceeding the refresh threshold reads all data from disk associated with the new chunk ranges, which will be the most expensive impact of auto-refresh. Dolt uses ordinal offsets to avoid reading unnecessary data, but the tree growing or shrinking by a level forces a full tablescan.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1624,6 +1624,19 @@ Returns the latest update to statistics for the current database.
Deletes the stats ref on disk and wipes the database stats held in memory for the current database.
Stops update thread if active.

## `dolt_stats_prune()`

Garbage collect the statistics cache storage, retaining only
the most recent statistic updates. Background threads need to be
restarted after this operation.

## `dolt_stats_purge()`

Delete the old statistics cache from the
filesystem. This can be used to silence warnings from backwards
incompatible upgrades. Statistics will need
to be recollected, which can be time consuming.

# Access Control
Dolt stored procedures are access controlled using the GRANT permissions system. MySQL database permissions trickle down to tables and procedures, someone who has Execute permission on a database would have Execute permission on all procedures related to that database. Dolt deviates moderately from this behavior for sensitive operations. See [Administrative Procedures](#administrative-procedures) below.

Expand Down