planner: support cluster-level binding reload command #65509

qw4990 · 2026-01-09T09:38:11Z

What problem does this PR solve?

Issue Number: close #65378

Problem Summary: planner: support cluster-level binding reload command

What changed and how does it work?

Currently, the ADMIN RELOAD BINDINGS command only reloads SQL bindings on the TiDB node where the command is executed. In a cluster environment with multiple TiDB nodes, users need to manually execute the command on each node to reload bindings across the entire cluster. This is inconvenient and error-prone, especially when bindings are modified and need to be synchronized across all nodes.

This PR introduces a new ADMIN RELOAD CLUSTER BINDINGS command that reloads SQL bindings across all TiDB nodes in the cluster with a single command execution.

Key changes:

Parser support: Added AdminReloadClusterBindings AST node type and parsing support for the new ADMIN RELOAD CLUSTER BINDINGS command in pkg/parser/ast/ and pkg/parser/.
Plan builder: Added OpReloadClusterBindings operation type in pkg/planner/core/common_plans.go and handled the new command in pkg/planner/core/planbuilder.go.
Executor implementation: Implemented reloadClusterBindings() method in pkg/executor/bind.go that uses the existing broadcast() function to send the reload command to all TiDB nodes in the cluster, including the current node.
Logging enhancement: Added logging to bindingCacheUpdater.LoadFromStorageToCache() in pkg/bindinfo/binding_cache.go to track the number of bindings loaded, cache size, and duration during full load operations.

How it works:

When a user executes ADMIN RELOAD CLUSTER BINDINGS, the command is broadcast to all TiDB nodes in the cluster using the coprocessor framework's broadcast mechanism (similar to how REFRESH STATS CLUSTER works). Each TiDB node receives the broadcast and executes ADMIN RELOAD BINDINGS locally, which triggers LoadFromStorageToCache(true) to reload all bindings from storage into the local cache.

The broadcast mechanism ensures that:

The command reaches all TiDB nodes in the cluster
Each node reloads its local binding cache from storage
The operation is atomic at the cluster level (all nodes reload simultaneously)

Test:

Since the test of this PR requires an entire cluster environment and some interaction between different TiDB nodes, so it's hard to write a UT for it. Then I just tested it manually, please see the picture below:

I create a table and a binding.
In the first "admin reload bindings", only the node executing this command loads all bindings;
Then in the second "admin reload cluster bindings", all nodes load all bindings;
Then I create a new binding, and run "admin reload cluster bindings" again, all nodes load 2 new bindings.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

tiprow · 2026-01-09T09:38:40Z

Hi @qw4990. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

codecov · 2026-01-09T10:01:28Z

Codecov Report

❌ Patch coverage is 53.12500% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.5705%. Comparing base (28cce6a) to head (59ceb6b).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #65509        +/-   ##
================================================
+ Coverage   77.8398%   79.5705%   +1.7307%     
================================================
  Files          1971       1895        -76     
  Lines        540836     528806     -12030     
================================================
- Hits         420986     420774       -212     
+ Misses       118190     106599     -11591     
+ Partials       1660       1433       -227

Flag	Coverage Δ
integration	`47.5533% <40.6779%> (-0.6462%)`	⬇️
unit	`76.7587% <53.1250%> (+0.3018%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`56.7974% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`65.7217% <ø> (+4.6046%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

qw4990 · 2026-01-09T10:17:15Z

/test unit-test

tiprow · 2026-01-09T10:17:38Z

@qw4990: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

qw4990 · 2026-01-12T02:40:08Z

/test check-dev2

tiprow · 2026-01-12T02:40:30Z

@qw4990: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/test check-dev2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

AilinKid

rest LGTM

AilinKid · 2026-01-12T04:33:12Z

pkg/planner/core/common_plans.go

 	}

-	sum = p.BasePhysicalPlan.MemoryUsage() + p.Inner.MemoryUsage()
+	sum = p.BasePhysicalPlan.MemoryUsage()


base.Plan don't have the memory usage interface, should we care?

The problem is that base.Plan doesn't have a MemoryUsage function. And if we want to add MemoryUsage to base.Plan, then we have to implement this function for all LogicalPlan, which is very complex... So to make the code simple, I guess the best way is to just ignore it here...

I think we should add a comment here to notify that this may cause the memory usage to be inaccurate. And explain why it is acceptable.

On the other hand, I believe we can still try to reuse the simple here. We just need to add SimpleExec.executeAdmin to accomplish this. Then, we won't need to hard-code the handling of this specific admin command.

Offline discussed, thanks!

AilinKid · 2026-01-12T04:38:12Z

pkg/bindinfo/binding_cache.go

+			bindingLogger().Info("load bindings", zap.Bool("fullLoad", fullLoad),
+				zap.Bool("cacheSizeChange", cacheSizeChange), zap.Bool("hasNewBinding", hasNewBinding),
+				zap.Int64("cacheCapacity", u.GetMemCapacity()), zap.Int64("cacheUsage", u.GetMemUsage()),
+				zap.Int64("cachedBindingNum", int64(u.Size())), zap.Duration("duration", time.Since(begin)), zap.Error(err))


since we don't unit test literally, we should have one more test for
two instance, one has mem-quota= small, the other one has memo-quota big
then trigger the admin reload clusters bindings, if one can not load them all, better show some log or warnings, cause it will cause some in-consistence behavior even after this strong synchronization command.

Currently tidb_mem_quota_binding_cache is a global variable, all instances should have the same memory quota. But your comment just reminds me that we need to log if some values are evicted from the cache. I added some new log for this:

pkg/planner/core/pb_to_plan.go

fixdb · 2026-01-12T08:47:06Z

pkg/bindinfo/binding_cache.go

+	if u.GetMemCapacity() != vardef.MemQuotaBindingCache.Load() {
+		cacheSizeChange = true
+		u.SetMemCapacity(vardef.MemQuotaBindingCache.Load())
+	}


Between the comparison and SetMemCapacity(), another goroutine could modify the global variable. Should we cache the loaded value in a local variable first?

You are right, I've updated it.

fixdb · 2026-01-12T08:49:41Z

pkg/planner/core/common_plans.go

 type PhysicalSimpleWrapper struct {
 	physicalop.BasePhysicalPlan
-	Inner Simple
+	Inner base.Plan


Why is this change?

The purpose of PhysicalSimpleWrapper is to wrap Simple (which is actually a LogicalPlan) to a PhysicalPlan. Now I want to also use it to wrap Admin to a PhysicalPlan, so I just modify its type from Simple to base.Plan.
And I renamed this structure from PhysicalSimpleWarpper to PhysicalPlanWrapper, which means it can wrap any kind of plan to a PhysicalPlan.

Co-authored-by: fixdb <[email protected]>

0xPoe

Thanks!

I am a bit confused. How do you distinguish between the admin SQL from remote and the user? I think you should rely on IsFromRemote to make that distinction. Also, please include a manual test in your PR to ensure it works well with multiple instances.

qw4990 · 2026-01-12T14:23:13Z

pkg/planner/core/common_plans.go

-		return
-	}
-
-	sum = s.SimpleSchemaProducer.MemoryUsage() + size.SizeOfInterface + size.SizeOfBool + size.SizeOfUint64


Actually the prior implementation is not complete, and it's not used any longer, so we can just remove it.

qw4990 · 2026-01-12T14:27:05Z

Thanks!

I am a bit confused. How do you distinguish between the admin SQL from remote and the user? I think you should rely on IsFromRemote to make that distinction. Also, please include a manual test in your PR to ensure it works well with multiple instances.

offline discussed

0xPoe

Thanks!

0xPoe · 2026-01-12T14:36:22Z

pkg/executor/show.go

 		return cmpResult > 0
 	})
 	for _, hint := range bindings {
+		if hint == nil {


Better to add some comments explaining when this will happen.

0xPoe · 2026-01-12T14:38:31Z

pkg/planner/core/pb_to_plan.go

+	if innerPlan == nil {
+		errors.Errorf("unexpected statement %s in broadcast query", *e.BroadcastQuery.Query)
+	}


Maybe this one can be in the default branch. And it seems the error is not used here.

…ding-reload

fixdb

+1

ti-chi-bot · 2026-01-13T04:50:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: AilinKid, fixdb
Once this PR has been reviewed and has the lgtm label, please assign d3hunter, yudongusa for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [AilinKid,fixdb]
~~pkg/bindinfo/OWNERS~~ [AilinKid,fixdb]
pkg/domain/OWNERS
pkg/parser/OWNERS
~~pkg/planner/OWNERS~~ [AilinKid,fixdb]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-01-13T04:51:00Z

[LGTM Timeline notifier]

Timeline:

2026-01-13 03:52:29.769886591 +0000 UTC m=+329593.831751504: ☑️ agreed by AilinKid.
2026-01-13 04:50:59.455647466 +0000 UTC m=+333103.517512375: ☑️ agreed by fixdb.

qw4990 added 2 commits January 9, 2026 17:20

fixup

df31452

fixup

c455a03

ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 9, 2026

qw4990 added 4 commits January 10, 2026 10:08

fixup

471a68f

fixup

e75ee00

fixup

da4f49f

fixup

25e3e5e

AilinKid reviewed Jan 12, 2026

View reviewed changes

fixdb reviewed Jan 12, 2026

View reviewed changes

qw4990 and others added 6 commits January 12, 2026 17:46

Update pkg/planner/core/pb_to_plan.go

cd8aae0

Co-authored-by: fixdb <[email protected]>

fixup

e7d2d55

fixup

750d3b1

fixup

7bafcdd

fixup

24b51d6

fixup

6b7fe92

0xPoe reviewed Jan 12, 2026

View reviewed changes

qw4990 added 4 commits January 12, 2026 20:41

fixup

6e3fc78

fixup

549c08c

fixup

069c65f

fixup

9b4c693

qw4990 commented Jan 12, 2026

View reviewed changes

0xPoe reviewed Jan 12, 2026

View reviewed changes

qw4990 added 4 commits January 13, 2026 10:54

fixup

867623c

Merge remote-tracking branch 'upstream/master' into cluster-level-bin…

4f97053

…ding-reload

fixup

bca366d

fixup

c9844ee

AilinKid approved these changes Jan 13, 2026

View reviewed changes

ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 13, 2026

fixup

59ceb6b

fixdb approved these changes Jan 13, 2026

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 13, 2026

planner: support cluster-level binding reload command #65509

Are you sure you want to change the base?

planner: support cluster-level binding reload command #65509

Conversation

qw4990 commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Uh oh!

tiprow bot commented Jan 9, 2026

Uh oh!

codecov bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

qw4990 commented Jan 9, 2026

Uh oh!

tiprow bot commented Jan 9, 2026

Uh oh!

qw4990 commented Jan 12, 2026

Uh oh!

tiprow bot commented Jan 12, 2026

Uh oh!

AilinKid left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AilinKid Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0xPoe left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qw4990 commented Jan 12, 2026

Uh oh!

0xPoe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fixdb left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Jan 13, 2026

Uh oh!

ti-chi-bot bot commented Jan 13, 2026

[LGTM Timeline notifier]

Uh oh!

qw4990 commented Jan 9, 2026 •

edited

Loading

codecov bot commented Jan 9, 2026 •

edited

Loading

AilinKid Jan 12, 2026 •

edited

Loading

0xPoe left a comment •

edited

Loading