-
Notifications
You must be signed in to change notification settings - Fork 1
Storage Usage: Add benchmark results from v5.10.2 #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe documentation for storage performance has been updated to reflect improvements in CrateDB version 5.10.2, including updated storage size comparisons, detailed shard and replication calculations, a new section on manual table optimization, expanded tables showing storage effects, and various clarifications, formatting corrections, and structural enhancements. Changes
Poem
✨ Finishing Touches🧪 Generate Unit Tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (13)
docs/performance/storage.md (13)
32-34
: Rephrase INFO block wording
The INFO block is a bit wordy and contains repetitive phrasing. Consider tightening “In version 5.10 storage usage was improved…” to avoid redundancy.
36-36
: Convert bare URL to markdown link
The plain URL violates MD034. Wrap the CrateDB blog link in[Reducing storage costs in CrateDB v5.10](https://cratedb.com/blog/reducing-storage-costs-in-cratedb-v5.10)
to satisfy the linter and improve readability.
65-67
: Merge redundant sentences and remove extra blank line
Lines 65–67 repeat “when a table is created…” and introduce an unnecessary blank line. Consider combining the two sentences and deleting the blank line to improve flow.
70-76
: Use concise transition and remove blank line
Replace “On top of that,” with “Additionally,” and remove the blank line at 71 to tighten the replication explanation.
122-126
: Replace hard tabs with spaces
Lines 122–126 use hard tabs (MD010). Convert them to spaces for consistent formatting.
142-142
: Format SQL command in bullet list
WrapOPTIMIZE table
in backticks in the bullet at line 142 for consistency with other code examples.
146-146
: Optional: Simplify phrasing
Consider shortening “Disk size improvements can vary depending on the data types, schema and even disk manufacturer.” to something like “Disk size improvements vary by data type, schema, and storage hardware.”
154-154
: Optional: Add IF NOT EXISTS
TheCREATE TABLE taxi_noindex
snippet could includeIF NOT EXISTS
to prevent errors if the table is already present.
163-164
: Clarify index deletion note
“It can’t be deleted” may confuse readers—recommend rephrasing to “the existing index cannot be removed without recreating the table.”
239-246
: Fix version notation and extra period
In the disable-columnar-store section, remove the extra period in “v.5.10” and ensure you’re using “v5.10.2” consistently.
307-308
: Unify SQL column aliases
In the “All results” query you alias the total size astotal_size_mib
on line 307 but usetotal_size_mb
in other sections. Choose one alias and apply consistently.Also applies to: 316-316
383-383
: Remove trailing comma
Line 383 ends with a comma (“...data evolves,”). Change it to a period to properly finish the sentence.
393-393
: Convert bare Wikipedia URL to markdown link
The bare link should be formatted as[Database Normalization](https://en.wikipedia.org/wiki/Database_normalization)
to satisfy MD034 and improve readability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/performance/storage.md
(14 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/performance/storage.md
36-36: Bare URL used
null
(MD034, no-bare-urls)
122-122: Hard tabs
Column: 5
(MD010, no-hard-tabs)
123-123: Hard tabs
Column: 4
(MD010, no-hard-tabs)
124-124: Hard tabs
Column: 4
(MD010, no-hard-tabs)
125-125: Hard tabs
Column: 4
(MD010, no-hard-tabs)
126-126: Hard tabs
Column: 4
(MD010, no-hard-tabs)
393-393: Bare URL used
null
(MD034, no-bare-urls)
🪛 LanguageTool
docs/performance/storage.md
[style] ~65-~65: ‘taken into account’ might be wordy. Consider a shorter alternative.
Context: ...ed, sharding and replication have to be taken into account. When a table is created with default v...
(EN_WORDINESS_PREMIUM_TAKEN_INTO_ACCOUNT)
[style] ~73-~73: ‘On top of that’ might be wordy. Consider a shorter alternative.
Context: ...odes = 3
max(4, 3 * 2) = 6 shards On top of that, the default replication is the
0-1` r...
(EN_WORDINESS_PREMIUM_ON_TOP_OF_THAT)
[uncategorized] ~356-~356: The verb “are” doesn’t seem to fit in this context, “is” is probably more formally correct.
Context: ...eplica shards) = 768 MiB > The result are slightly off
768 ~= 775` because in th...
(AI_HYDRA_LEO_CPT_ARE_IS)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build docs
🔇 Additional comments (10)
docs/performance/storage.md (10)
9-9
: Dataset updated: link and row count
The introduction correctly links to the January 2024 Yellow taxi trip dataset and updates the row count to 2 964 624.
30-30
: Section transition text looks good
The “We will dive deeper to really understand what is going on.” segue is clear and helpful.
58-59
: Clarify measurement breakdown intro
This introduction to the shard‐by‐shard breakdown is concise and clear. No change needed.
99-99
: Code block closure
The closing backticks properly terminate the SQL snippet; no action required.
191-191
: Indexes INFO is concise
The INFO note about indexes not being re-addable after table creation is accurate and clear.
195-202
: Disable indexing storage results validated
The storage comparison table for disable-indexing in v5.9.3 vs v5.10.2 is correct and clearly formatted.
248-248
: Column store recommendation is fine
The note advising against disabling column storage in v5.10 is clear and appropriate.
261-261
: OPTIMIZE command snippet
The exampleOPTIMIZE table <table_name> WITH (max_num_segments=1)
is correct and clear.
282-289
: Compression results table reviewed
The tables comparing default vs.best_compression
in v5.9.3 and v5.10.2 are accurate and well formatted.
294-296
: Percentages formatting acceptable
The tilde-marked approximate percentages (~27.86%
,~1%
) effectively convey that these are estimates.
a609bc4
to
dab3146
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (4)
docs/performance/storage.md (4)
9-9
: Prevent unintended italic formatting for numeric separators
The plain text2_964_624
will trigger markdown italic around964
. Wrap this number in backticks or format it without underscores to avoid unintended styling.
65-66
: Consider tightening phrasing
“have to be taken into account” could be shortened to “consider sharding and replication.”
356-357
: Fix grammatical agreement
The verb “are” should be “is” in this context.- > The result are slightly off `768 ~= 775` because in this example, decimals are being ignored. + > The result is slightly off `768 ~= 775` because in this example, decimals are being ignored.
393-393
: Convert bare URL to linked text
Avoid raw URLs for readability. Link the text “Database normalization” instead.- Read more about it at https://en.wikipedia.org/wiki/Database_normalization + Read more about it at [Database normalization](https://en.wikipedia.org/wiki/Database_normalization)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/performance/storage.md
(13 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/performance/storage.md
36-36: Bare URL used
null
(MD034, no-bare-urls)
122-122: Hard tabs
Column: 5
(MD010, no-hard-tabs)
123-123: Hard tabs
Column: 4
(MD010, no-hard-tabs)
124-124: Hard tabs
Column: 4
(MD010, no-hard-tabs)
125-125: Hard tabs
Column: 4
(MD010, no-hard-tabs)
126-126: Hard tabs
Column: 4
(MD010, no-hard-tabs)
393-393: Bare URL used
null
(MD034, no-bare-urls)
🪛 LanguageTool
docs/performance/storage.md
[style] ~65-~65: ‘taken into account’ might be wordy. Consider a shorter alternative.
Context: ...ed, sharding and replication have to be taken into account. When a table is created with default v...
(EN_WORDINESS_PREMIUM_TAKEN_INTO_ACCOUNT)
[style] ~73-~73: ‘On top of that’ might be wordy. Consider a shorter alternative.
Context: ...odes = 3
max(4, 3 * 2) = 6 shards On top of that, the default replication is the
0-1` r...
(EN_WORDINESS_PREMIUM_ON_TOP_OF_THAT)
[uncategorized] ~356-~356: The verb “are” doesn’t seem to fit in this context, “is” is probably more formally correct.
Context: ...eplica shards) = 768 MiB > The result are slightly off
768 ~= 775` because in th...
(AI_HYDRA_LEO_CPT_ARE_IS)
[uncategorized] ~357-~357: The preposition “of” seems more likely in this position.
Context: ...ored. > The goal is to give you an idea on how tweaking some CrateDB aspect can af...
(AI_EN_LECTOR_REPLACEMENT_PREPOSITION)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build docs
🔇 Additional comments (19)
docs/performance/storage.md (19)
22-28
: v5.10.2 storage result added
The new bullet for CrateDB 5.10.2 is correctly formatted and consistent with the approximate storage sizes.
30-30
: Heading for deeper dive is clear
The introductory line for the detailed breakdown fits well.
70-72
: Shard calculation example is clear
The formula and example values are correct and easy to follow.
95-99
: Shard size SQL snippet looks good
The query correctly computes average and total MiB per shard.
142-142
: Optimize table bullet is accurate
The note on callingOPTIMIZE table
for v5.10+ is valid and concise.
146-147
: Retention of disk improvement note
The informational callout about variability in disk improvements is appropriate.
154-154
: Code example is correct
TheCREATE TABLE
snippet for disabling indexing is well-formatted and accurate.
195-195
: Baseline v5.9.3 results unchanged
The CrateDB v5.9.3 reference remains correct.
201-201
: v5.10.2 effect on indexing is correct
The CrateDB v5.10.2 storage numbers for disabled indexing are accurate.
233-233
: Column store v5.9.3 results are fine
The table header and values for v5.9.3 are accurate.
239-239
: Column store v5.10.2 results are fine
The v5.10.2 storage numbers for disabling columnar store look correct.
242-243
: Table formatting is correct
The separator and data rows align with markdown best practices.
245-245
: Baseline reduction note is accurate
The reduction percentage for v5.9 (no index + no column store) is correct.
307-307
: Final SQL alias for total_size_mib
The new aliastotal_size_mib
matches the table references.
316-316
: Ordering by avg_size_per_shard_in_mib is correct
TheORDER BY
clause aligns with the computed metric.
323-330
: v5.9.3 table rows validated
The data entries under the results for v5.9.3 are accurate.
332-336
: Ensure consistent column header in v5.10.2 table
Verify that the header matches the finalized naming convention (avg_size_in_bytes_per_record
).Please confirm that the header on the v5.10.2 results table matches the naming you’ve aligned for the v5.9.3 table.
336-343
: v5.10.2 table rows validated
The data entries for CrateDB v5.10.2 benchmarking are correct.
382-382
: Future-proofing note is clear
The remark about future changes is appropriate and well-placed.
| 39 | 471 | | ||
|
||
Data was reduced `~13.11%` in v5.9 similar to `no_index`. | ||
Data was increased `~10%` in v.5.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update version in column store increase note
The note references “v.5.10”; it should be v5.10.2 and remove the extraneous period.
- Data was increased `~10%` in v.5.10
+ Data was increased `~10%` in v5.10.2
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
Data was increased `~10%` in v.5.10 | |
- Data was increased `~10%` in v.5.10 | |
+ Data was increased `~10%` in v5.10.2 |
🤖 Prompt for AI Agents
In docs/performance/storage.md at line 246, update the version reference from
"v.5.10" to "v5.10.2" by removing the extra period and adding the patch version
to accurately reflect the correct version number.
> | ||
>Read more at https://cratedb.com/blog/reducing-storage-costs-in-cratedb-v5.10 | ||
> | ||
> This guide has been updated to reflect the v5.10 improvements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Update improvement note to v5.10.2
The note says “reflect the v5.10 improvements” but it should be v5.10.2.
Apply this diff:
- > This guide has been updated to reflect the v5.10 improvements.
+ > This guide has been updated to reflect the v5.10.2 improvements.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
> This guide has been updated to reflect the v5.10 improvements. | |
> This guide has been updated to reflect the v5.10.2 improvements. |
🤖 Prompt for AI Agents
In docs/performance/storage.md at line 38, update the version number in the
improvement note from "v5.10" to "v5.10.2" to accurately reflect the correct
version. Simply replace "v5.10" with "v5.10.2" in the existing text.
Data was reduced `~13.11%` in v5.9 | ||
Data was reduced `~33.87%` in v5.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align percentage context to v5.10.2
The second reduction metric still labels the version as “v5.10.” Update to v5.10.2.
- Data was reduced `~33.87%` in v5.10
+ Data was reduced `~33.87%` in v5.10.2
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
Data was reduced `~13.11%` in v5.9 | |
Data was reduced `~33.87%` in v5.10 | |
Data was reduced `~13.11%` in v5.9 | |
Data was reduced `~33.87%` in v5.10.2 |
🤖 Prompt for AI Agents
In docs/performance/storage.md around lines 207 to 208, update the version label
for the second data reduction metric from "v5.10" to "v5.10.2" to accurately
reflect the correct version context.
CrateDB v5.9.3 | ||
|
||
| table_name | records | total_size_mib | avg_size_per_shard_in_mib | avg_bytes_per_record | | ||
|----------------------------------------------|---------|----------------|---------------------------|----------------------| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unify average bytes column header
The first results table uses avg_bytes_per_record
while the second uses avg_size_in_bytes_per_record
. Align both names.
-| table_name | records | total_size_mib | avg_size_per_shard_in_mib | avg_bytes_per_record |
+| table_name | records | total_size_mib | avg_size_per_shard_in_mib | avg_size_in_bytes_per_record |
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
CrateDB v5.9.3 | |
| table_name | records | total_size_mib | avg_size_per_shard_in_mib | avg_bytes_per_record | | |
|----------------------------------------------|---------|----------------|---------------------------|----------------------| | |
CrateDB v5.9.3 | |
| table_name | records | total_size_mib | avg_size_per_shard_in_mib | avg_size_in_bytes_per_record | | |
|----------------------------------------------|---------|----------------|---------------------------|------------------------------| |
🤖 Prompt for AI Agents
In docs/performance/storage.md around lines 319 to 322, the column header for
average bytes per record is inconsistent between tables, using
`avg_bytes_per_record` in the first and `avg_size_in_bytes_per_record` in the
second. Update the second table's column header to `avg_bytes_per_record` to
unify the naming across both tables.
> INFO: In version 5.10 storage usage was improved, some users report up to 70% of storage reduction, | ||
> more reasonably a 30–50% can be expected for most data, the reduction % of reduction will vary | ||
> depending on use cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Align version reference to v5.10.2
The guidance still mentions “version 5.10” but should reference v5.10.2 for consistency with the rest of the document.
Apply this diff:
- > INFO: In version 5.10 storage usage was improved, some users report up to 70% of storage reduction,
+ > INFO: In version 5.10.2 storage usage was improved, some users report up to 70% of storage reduction,
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
> INFO: In version 5.10 storage usage was improved, some users report up to 70% of storage reduction, | |
> more reasonably a 30–50% can be expected for most data, the reduction % of reduction will vary | |
> depending on use cases. | |
> INFO: In version 5.10.2 storage usage was improved, some users report up to 70% of storage reduction, | |
> more reasonably a 30–50% can be expected for most data, the reduction % of reduction will vary | |
> depending on use cases. |
🤖 Prompt for AI Agents
In docs/performance/storage.md around lines 32 to 34, update the version
reference from "version 5.10" to "version v5.10.2" to maintain consistency with
the rest of the document. Replace all occurrences of "5.10" in this context with
"v5.10.2".
5. [Optimize table](#upgrade-to-v510-and-optimize-table) | ||
6. [Changing the compression algorithm](#changing-the-compression-algorithm) | ||
7. [All results](#all-results-and-what-to-do) | ||
8. [Data normalization](#extra-data-normalization) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct TOC anchor for v5.10.2 and ensure link consistency
The TOC entry still references the slug upgrade-to-v510-and-optimize-table
. Update it to match v5.10.2 and the updated heading slug.
-5. [Optimize table](#upgrade-to-v510-and-optimize-table)
+5. [Optimize table](#upgrade-to-v5-10-2-and-optimize-table)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
5. [Optimize table](#upgrade-to-v510-and-optimize-table) | |
6. [Changing the compression algorithm](#changing-the-compression-algorithm) | |
7. [All results](#all-results-and-what-to-do) | |
8. [Data normalization](#extra-data-normalization) | |
5. [Optimize table](#upgrade-to-v5-10-2-and-optimize-table) | |
6. [Changing the compression algorithm](#changing-the-compression-algorithm) | |
7. [All results](#all-results-and-what-to-do) | |
8. [Data normalization](#extra-data-normalization) |
🤖 Prompt for AI Agents
In docs/performance/storage.md around lines 46 to 49, the table of contents
entry for the upgrade section uses the outdated anchor slug
"upgrade-to-v510-and-optimize-table". Update this anchor to reflect the correct
version "v5.10.2" and ensure it matches the corresponding heading slug exactly
for link consistency.
@@ -47,30 +55,29 @@ CrateDB is a distributed database; nodes, shards, partitions and replicas are ti | |||
When a table is created, data is sharded and distributed among nodes. This | |||
means that the memory footprint depends on our replication and sharding strategy. | |||
|
|||
Let's break down how the `775MiB` in CrateDB and the `510MiB` in PostgreSQL were | |||
Let's break down how the `431MiB` in CrateDB and the `510MiB` in PostgreSQL were |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify CrateDB version in breakdown
Reference the optimized storage as CrateDB v5.10.2 when breaking down the 431MiB
.
- Let's break down how the `431MiB` in CrateDB and the `510MiB` in PostgreSQL were obtained.
+ Let's break down how the `431MiB` in CrateDB v5.10.2 and the `510MiB` in PostgreSQL were obtained.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
Let's break down how the `431MiB` in CrateDB and the `510MiB` in PostgreSQL were | |
Let's break down how the `431MiB` in CrateDB v5.10.2 and the `510MiB` in PostgreSQL were obtained. |
🤖 Prompt for AI Agents
In docs/performance/storage.md at line 58, specify the CrateDB version as
v5.10.2 when mentioning the optimized storage size of 431MiB. Update the text to
clearly indicate that the 431MiB figure corresponds to CrateDB version 5.10.2
for clarity and accuracy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a stack for the update, this outcome looks very promising.
I've added a few suggestions about MyST Markdown syntax and friends, and would like to give @hlcianfagna the pleasure of final approval, as he knows the topic matter and the English language better than me.
> INFO: In version 5.10 storage usage was improved, some users report up to 70% of storage reduction, | ||
> more reasonably a 30–50% can be expected for most data, the reduction % of reduction will vary | ||
> depending on use cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: In MyST Markdown, you may want to phrase the admonition like this:
> INFO: In version 5.10 storage usage was improved, some users report up to 70% of storage reduction, | |
> more reasonably a 30–50% can be expected for most data, the reduction % of reduction will vary | |
> depending on use cases. | |
:::{note} | |
In version 5.10 storage usage was improved, some users report up to 70% of storage reduction, | |
more reasonably a 30–50% can be expected for most data, the reduction % of reduction will vary | |
depending on use cases. | |
::: |
In CrateDB, when a table is created, sharding and replication have to be taken into account. | ||
When a table is created with default values, it gets partitioned using this formula: `max(4, num_data_nodes * 2)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe elegantly refer to the sharding and scaling guideline docs at this spot?
You can check the actual size of a shard locally; querying `select table_name, path from sys.shards` | ||
shows the file path of the shards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can check the actual size of a shard locally; querying `select table_name, path from sys.shards` | |
shows the file path of the shards. | |
You can check the actual size of a shard by inspecting the filesystem. | |
Use `select table_name, path from sys.shards` to show the file path of the individual shards. |
The index can only be disabled when the table is created, if the table already exists and it cannot | ||
be deleted it will have to be re-created. | ||
The index can only be disabled when the table is created, if the table already exists, and it can’t | ||
be deleted, it will have to be re-created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to delete an index afterwards? Doesn't the table always need to be re-created?
@@ -171,15 +188,24 @@ DROP TABLE "taxi_deleteme" | |||
|
|||
> WARNING: Dropping the table deletes the data, make sure that the copy was done correctly. | |||
|
|||
> INFO: Indexes cannot be re-added after table creation. | |||
> INFO: Indexes can’t be re-added after table creation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> INFO: Indexes can’t be re-added after table creation. | |
:::{attention} | |
Indexes can’t be re-added after table creation. | |
::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> INFO: Indexes can’t be re-added after table creation. | |
> INFO: Disabled indexes cannot be re-added without recreating the table. |
Data was reduced `~13.11%` in v5.9 similar to `no_index`. | ||
Data was increased `~10%` in v.5.10 | ||
|
||
> NOTE: In v5.10, we don’t recommend that you turn off column storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> NOTE: In v5.10, we don’t recommend that you turn off column storage. | |
:::{note} | |
In v5.10, we don’t recommend that you turn off column storage. | |
::: |
In v5.10 we introduced a new table storage format that can reduce storage up to 50% in comparison | ||
to previous versions. | ||
|
||
The easiest way to gain massive storage reduction is to update to the latest version available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to remove the additional line break here.
In v5.10 we introduced a new table storage format that can reduce storage up to 50% in comparison | |
to previous versions. | |
The easiest way to gain massive storage reduction is to update to the latest version available. | |
In v5.10 we introduced a new table storage format that can reduce storage up to 50% in comparison | |
to previous versions. The easiest way to gain massive storage reduction is to update to the latest version available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just upgrading alone would not be enough, the new format only applies to new tables and partitions I believe, and only after segments are merged.
you would typically see storage usage being automatically reduced as more data gets added to the | ||
table. Nonetheless, you can trigger these optimizations manually by running: | ||
|
||
`OPTIMIZE table <table_name> WITH (max_num_segments=1)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a little more uppercasing.
`OPTIMIZE table <table_name> WITH (max_num_segments=1)` | |
`OPTIMIZE TABLE <table_name> WITH (max_num_segments=1)` |
|
||
> NOTE: In v5.10, we don’t recommend that you turn off column storage. | ||
|
||
## Upgrade to v5.10 and Optimize table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe?
## Upgrade to v5.10 and Optimize table | |
## Upgrade to v5.10 and `OPTIMIZE TABLE` |
|
||
## Upgrade to v5.10 and Optimize table | ||
|
||
In v5.10 we introduced a new table storage format that can reduce storage up to 50% in comparison |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In v5.10 we introduced a new table storage format that can reduce storage up to 50% in comparison | |
CrateDB v5.10 introduced a new table storage format that can reduce storage up to 50% in comparison |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but please look at the comments in lines 66 and 253
At first sight, it might look that CrateDB storage takes more than PostgreSQL, | ||
but we need to dive deeper to really understand what is going on. | ||
> INFO: In version 5.10 storage usage was improved, some users report up to 70% of storage reduction, | ||
> more reasonably a 30–50% can be expected for most data, the reduction % of reduction will vary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> more reasonably a 30–50% can be expected for most data, the reduction % of reduction will vary | |
> more reasonably a 30–50% can be expected for most data, the reduction percentage will vary |
but we need to dive deeper to really understand what is going on. | ||
> INFO: In version 5.10 storage usage was improved, some users report up to 70% of storage reduction, | ||
> more reasonably a 30–50% can be expected for most data, the reduction % of reduction will vary | ||
> depending on use cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> depending on use cases. | |
> depending on the use case. |
For CrateDB when a table is created, sharding and replication has to be taken into account. | ||
When a table is created with default values, it gets partitioned in `max(4, num_data_nodes * 2)` shards. | ||
In CrateDB, when a table is created, sharding and replication have to be taken into account. | ||
When a table is created with default values, it gets partitioned using this formula: `max(4, num_data_nodes * 2)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a table is created with default values, it gets partitioned using this formula: `max(4, num_data_nodes * 2)`. | |
When a table is created with default values, it gets a number of shards derived using this formula: `max(4, num_data_nodes * 2)`. |
|
||
The average size of a shard is ~35MiB, `35 * 12 = ~420`. | ||
|
||
You can check the actual size of a shard locally; querying `select table_name, path from sys.shards` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not differ significantly from what sys.shards
report right?
@@ -143,8 +160,8 @@ CREATE TABLE taxi | |||
) | |||
``` | |||
|
|||
The index can only be disabled when the table is created, if the table already exists and it cannot | |||
be deleted it will have to be re-created. | |||
The index can only be disabled when the table is created, if the table already exists, and it can’t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index can only be disabled when the table is created, if the table already exists, and it can’t | |
The index can only be disabled when the table is created, if the table already exists |
The index can only be disabled when the table is created, if the table already exists and it cannot | ||
be deleted it will have to be re-created. | ||
The index can only be disabled when the table is created, if the table already exists, and it can’t | ||
be deleted, it will have to be re-created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be deleted, it will have to be re-created. | |
it will have to be re-created. |
@@ -171,15 +188,24 @@ DROP TABLE "taxi_deleteme" | |||
|
|||
> WARNING: Dropping the table deletes the data, make sure that the copy was done correctly. | |||
|
|||
> INFO: Indexes cannot be re-added after table creation. | |||
> INFO: Indexes can’t be re-added after table creation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> INFO: Indexes can’t be re-added after table creation. | |
> INFO: Disabled indexes cannot be re-added without recreating the table. |
In v5.10 we introduced a new table storage format that can reduce storage up to 50% in comparison | ||
to previous versions. | ||
|
||
The easiest way to gain massive storage reduction is to update to the latest version available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just upgrading alone would not be enough, the new format only applies to new tables and partitions I believe, and only after segments are merged.
@@ -300,7 +379,7 @@ CREATE TABLE IF NOT EXISTS "doc"."taxi_nocolumnstore_noindex_bestcompresion" ( | |||
CrateDB's default settings are optimized for performance. | |||
|
|||
If some columns will never be used for aggregations or groupings, there will be no performance penalty. | |||
That might change in the future as your use case and data needs evolve, | |||
That might change in the future as your use case and data evolves, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might change in the future as your use case and data evolves, | |
That might change in the future as your use case and data evolve, |
About
Update the page about "Performance » Storage Usage" regarding storage space improvements with CrateDB 5.10.
Preview
https://cratedb-guide--214.org.readthedocs.build/performance/storage.html
Checklist