Lightning: Make sure we are using default block size of 16KB if user does not specify one. #60097

OliverS929 · 2025-03-16T02:28:53Z

What problem does this PR solve?

Issue Number: close #59947

Problem Summary:
Make we are using a sufficient default block size. Ref #49514

What changed and how does it work?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)

To test the specific issue addressed in this PR, I used a ~1.4TB dataset consisting mostly of duplicate data. Before the fix, memory usage spiked during the ingest phase due to the large index metadata loaded by Pebble, causing OOM kills on a 16c64g VM. With the fix, memory consumption remained stable, staying below 17GB and leading to no disastrous memory spikes.
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

tiprow · 2025-03-16T02:29:14Z

Hi @OliverS929. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

OliverS929 · 2025-03-16T02:29:17Z

/ok-to-test

Benjamin2037 · 2025-03-16T03:00:16Z

Please make sure to add enough test cases to avoid regression later.

codecov · 2025-03-16T03:02:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.9514%. Comparing base (77f118f) to head (b519cb8).
Report is 20 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #60097        +/-   ##
================================================
+ Coverage   73.1493%   73.9514%   +0.8020%     
================================================
  Files          1706       1738        +32     
  Lines        471415     483520     +12105     
================================================
+ Hits         344837     357570     +12733     
+ Misses       105415     104036      -1379     
- Partials      21163      21914       +751

Flag	Coverage Δ
integration	`45.8637% <57.1428%> (?)`
unit	`72.5181% <100.0000%> (-0.0793%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`52.6910% <ø> (∅)`
parser	`∅ <ø> (∅)`
br	`46.9612% <ø> (-1.1339%)`	⬇️

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pkg/lightning/backend/local/engine.go

OliverS929 · 2025-03-17T11:46:57Z

/retest

pkg/lightning/backend/local/engine_test.go

Benjamin2037 · 2025-03-18T01:15:57Z

pkg/lightning/backend/local/engine.go

@@ -59,6 +59,9 @@ var (
 	normalIterStartKey = []byte{1}
 )

+// DefaultBlockSize ensures we are using a block size larger than 16KB, whereas 4KB is the default block size of Pebble.


Can we also explain why 4KB pebble default value is not a good choice? and what problem will it cause?

D3Hunter

can you write the detail of manual test?

lightning/pkg/importer/table_import.go

pkg/lightning/backend/local/engine.go

Benjamin2037 · 2025-03-18T03:55:21Z

pkg/lightning/backend/local/engine.go

@@ -1423,6 +1426,12 @@ func newSSTWriter(path string, blockSize int) (*sstable.Writer, error) {
 	if err != nil {
 		return nil, errors.Trace(err)
 	}
+
+	// Logic to check the block size we are using is 16KB by default.
+	if blockSize <= 0 {


Should we also check even this blocksize set, we still has risk to OOM?

OliverS929 · 2025-03-18T14:16:28Z

/retest

OliverS929 · 2025-03-19T06:49:22Z

/retest

OliverS929 · 2025-03-19T07:10:00Z

/retest

Benjamin2037 · 2025-03-19T07:40:04Z

pkg/lightning/backend/local/engine_test.go

+	require.True(t, blockSizeField.IsValid(), "blockSize field should be valid")
+	require.Equal(t, config.DefaultBlockSize, int(blockSizeField.Int()))
+
+	// Clean up


why above comment withnot upper case，here the comment with upper case start？please make consistency.

D3Hunter · 2025-03-19T08:05:08Z

can you write the detail of manual test?

Sure, I can provide a brief overview of the dataset size and test structure. However, I’m concerned that sharing further details in this PR might not be appropriate, as they could involve confidential information related to specific customer use cases.

you can ignore the customer part, just describe the steps and results from tech point of view

D3Hunter

rest lgtm

D3Hunter · 2025-03-19T08:08:00Z

pkg/lightning/backend/local/engine.go

+	// potentially causing a memory spike and leading to an Out of Memory (OOM) scenario.
+	// If the user specifies a smaller block size, respect their choice.
+	if blockSize <= 0 {
+		blockSize = config.DefaultBlockSize


can you also replace the literal inside NewConfig, BlockSize: 16 * 1024,

OliverS929 · 2025-03-19T09:13:48Z

can you write the detail of manual test?

Sure, I can provide a brief overview of the dataset size and test structure. However, I’m concerned that sharing further details in this PR might not be appropriate, as they could involve confidential information related to specific customer use cases.

you can ignore the customer part, just describe the steps and results from tech point of view

Sure. To test the specific issue addressed in this PR, I used a ~1.4TB dataset consisting mostly of duplicate data. Before the fix, memory usage spiked during the ingest phase due to the large index metadata loaded by Pebble, causing OOM kills on a 16c64g VM. With the fix, memory consumption remained stable, staying below 17GB and leading to no disastrous memory spikes.

ti-chi-bot · 2025-03-19T09:16:06Z

[LGTM Timeline notifier]

Timeline:

2025-03-17 10:19:43.397593011 +0000 UTC m=+264477.081829106: ☑️ agreed by lance6716.
2025-03-19 09:16:05.011375656 +0000 UTC m=+433458.695611751: ☑️ agreed by wjhuang2016.

Benjamin2037

LGTM

ti-chi-bot · 2025-03-19T09:24:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Benjamin2037, lance6716, wjhuang2016

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~lightning/OWNERS~~ [Benjamin2037,lance6716]
~~pkg/lightning/OWNERS~~ [Benjamin2037,lance6716]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

D3Hunter · 2025-03-19T09:27:03Z

can you write the detail of manual test?

Sure, I can provide a brief overview of the dataset size and test structure. However, I’m concerned that sharing further details in this PR might not be appropriate, as they could involve confidential information related to specific customer use cases.

you can ignore the customer part, just describe the steps and results from tech point of view

Sure. To test the specific issue addressed in this PR, I used a ~1.4TB dataset consisting mostly of duplicate data. Before the fix, memory usage spiked during the ingest phase due to the large index metadata loaded by Pebble, causing OOM kills on a 16c64g VM. With the fix, memory consumption remained stable, staying below 17GB and leading to no disastrous memory spikes.

please add it to the PR description, under [ ] manual test section

Benjamin2037 · 2025-03-19T09:29:23Z

Please remember add integration test.

Benjamin2037 · 2025-03-19T09:35:40Z

/retest

OliverS929 · 2025-03-19T10:31:35Z

can you write the detail of manual test?

Sure, I can provide a brief overview of the dataset size and test structure. However, I’m concerned that sharing further details in this PR might not be appropriate, as they could involve confidential information related to specific customer use cases.

you can ignore the customer part, just describe the steps and results from tech point of view

Sure. To test the specific issue addressed in this PR, I used a ~1.4TB dataset consisting mostly of duplicate data. Before the fix, memory usage spiked during the ingest phase due to the large index metadata loaded by Pebble, causing OOM kills on a 16c64g VM. With the fix, memory consumption remained stable, staying below 17GB and leading to no disastrous memory spikes.

please add it to the PR description, under [ ] manual test section

Done.

OliverS929 · 2025-03-19T10:31:45Z

/retest

OliverS929 · 2025-03-19T14:51:09Z

/retest

OliverS929 · 2025-03-19T22:19:44Z

/retest

OliverS929 · 2025-03-20T05:04:15Z

/retest

OliverS929 · 2025-03-20T07:26:21Z

/cherry-pick release-8.5

OliverS929 · 2025-03-20T07:26:27Z

/cherry-pick release-8.1

ti-chi-bot · 2025-03-20T07:27:19Z

@OliverS929: new pull request created to branch release-8.5: #60184.

In response to this:

/cherry-pick release-8.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2025-03-20T07:27:23Z

@OliverS929: new pull request created to branch release-8.1: #60185.

In response to this:

/cherry-pick release-8.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Make sure we are using a block size that is larger than 16KB.

79e0dce

ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 16, 2025

ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Mar 16, 2025

OliverS929 changed the title ~~Lightning: Make sure we are using a block size that is larger than 16KB.~~ Lightning: Make sure we are using a block size that is larger than 16KB by default. Mar 16, 2025

Change comments to align with proper Go documentation comment format.

850cf7c

ti-chi-bot bot removed the do-not-merge/needs-triage-completed label Mar 17, 2025

lance6716 reviewed Mar 17, 2025

View reviewed changes

pkg/lightning/backend/local/engine.go Outdated Show resolved Hide resolved

Add UT to test out default Block Size behavior.

950aaf6

ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 17, 2025

OliverS929 changed the title ~~Lightning: Make sure we are using a block size that is larger than 16KB by default.~~ Lightning: Make sure we are using default block size of 16KB if user does not specify one. Mar 17, 2025

OliverS929 added 2 commits March 17, 2025 17:29

Ensure the default on one level lower.

a97fce3

Make sure that data engine use the configured value as well.

b6930c5

lance6716 approved these changes Mar 17, 2025

View reviewed changes

ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 17, 2025

Use a tmp Pebble for UT.

4cca7bc

Benjamin2037 reviewed Mar 18, 2025

View reviewed changes

pkg/lightning/backend/local/engine_test.go Show resolved Hide resolved

Benjamin2037 reviewed Mar 18, 2025

View reviewed changes

D3Hunter reviewed Mar 18, 2025

View reviewed changes

lightning/pkg/importer/table_import.go Outdated Show resolved Hide resolved

pkg/lightning/backend/local/engine.go Outdated Show resolved Hide resolved

Benjamin2037 reviewed Mar 18, 2025

View reviewed changes

Fix PR comments.

d9816fa

Fix UT build.

28f6be2

Benjamin2037 reviewed Mar 19, 2025

View reviewed changes

D3Hunter reviewed Mar 19, 2025

View reviewed changes

OliverS929 added 2 commits March 19, 2025 16:50

Fix comment.

2bba8db

Remove unnecessary imports.

b519cb8

wjhuang2016 approved these changes Mar 19, 2025

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 19, 2025

Benjamin2037 approved these changes Mar 19, 2025

View reviewed changes

ti-chi-bot bot merged commit 514204e into pingcap:master Mar 20, 2025
25 checks passed

ti-chi-bot mentioned this pull request Mar 20, 2025

Lightning: Make sure we are using default block size of 16KB if user does not specify one. (#60097) #60184

Open

13 tasks

ti-chi-bot mentioned this pull request Mar 20, 2025

Lightning: Make sure we are using default block size of 16KB if user does not specify one. (#60097) #60185

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lightning: Make sure we are using default block size of 16KB if user does not specify one. #60097

Lightning: Make sure we are using default block size of 16KB if user does not specify one. #60097

OliverS929 commented Mar 16, 2025 •

edited

Loading

tiprow bot commented Mar 16, 2025

OliverS929 commented Mar 16, 2025

Benjamin2037 commented Mar 16, 2025 •

edited

Loading

codecov bot commented Mar 16, 2025 •

edited

Loading

OliverS929 commented Mar 17, 2025

Benjamin2037 Mar 18, 2025

D3Hunter left a comment

Benjamin2037 Mar 18, 2025

OliverS929 commented Mar 18, 2025

OliverS929 commented Mar 19, 2025

OliverS929 commented Mar 19, 2025

Benjamin2037 Mar 19, 2025

D3Hunter commented Mar 19, 2025

D3Hunter left a comment

D3Hunter Mar 19, 2025

OliverS929 commented Mar 19, 2025

ti-chi-bot bot commented Mar 19, 2025

Benjamin2037 left a comment

ti-chi-bot bot commented Mar 19, 2025

D3Hunter commented Mar 19, 2025

Benjamin2037 commented Mar 19, 2025

Benjamin2037 commented Mar 19, 2025

OliverS929 commented Mar 19, 2025

OliverS929 commented Mar 19, 2025

OliverS929 commented Mar 19, 2025

OliverS929 commented Mar 19, 2025

OliverS929 commented Mar 20, 2025

OliverS929 commented Mar 20, 2025

OliverS929 commented Mar 20, 2025

ti-chi-bot commented Mar 20, 2025

ti-chi-bot commented Mar 20, 2025

Lightning: Make sure we are using default block size of 16KB if user does not specify one. #60097

Lightning: Make sure we are using default block size of 16KB if user does not specify one. #60097

Conversation

OliverS929 commented Mar 16, 2025 • edited Loading

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

tiprow bot commented Mar 16, 2025

OliverS929 commented Mar 16, 2025

Benjamin2037 commented Mar 16, 2025 • edited Loading

codecov bot commented Mar 16, 2025 • edited Loading

Codecov Report

OliverS929 commented Mar 17, 2025

Benjamin2037 Mar 18, 2025

Choose a reason for hiding this comment

D3Hunter left a comment

Choose a reason for hiding this comment

Benjamin2037 Mar 18, 2025

Choose a reason for hiding this comment

OliverS929 commented Mar 18, 2025

OliverS929 commented Mar 19, 2025

OliverS929 commented Mar 19, 2025

Benjamin2037 Mar 19, 2025

Choose a reason for hiding this comment

D3Hunter commented Mar 19, 2025

D3Hunter left a comment

Choose a reason for hiding this comment

D3Hunter Mar 19, 2025

Choose a reason for hiding this comment

OliverS929 commented Mar 19, 2025

ti-chi-bot bot commented Mar 19, 2025

[LGTM Timeline notifier]

Benjamin2037 left a comment

Choose a reason for hiding this comment

ti-chi-bot bot commented Mar 19, 2025

D3Hunter commented Mar 19, 2025

Benjamin2037 commented Mar 19, 2025

Benjamin2037 commented Mar 19, 2025

OliverS929 commented Mar 19, 2025

OliverS929 commented Mar 19, 2025

OliverS929 commented Mar 19, 2025

OliverS929 commented Mar 19, 2025

OliverS929 commented Mar 20, 2025

OliverS929 commented Mar 20, 2025

OliverS929 commented Mar 20, 2025

ti-chi-bot commented Mar 20, 2025

ti-chi-bot commented Mar 20, 2025

OliverS929 commented Mar 16, 2025 •

edited

Loading

Benjamin2037 commented Mar 16, 2025 •

edited

Loading

codecov bot commented Mar 16, 2025 •

edited

Loading