Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[client] Add retry when get one available tablet serverNode fails #426

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zhaozijun109
Copy link

Purpose

Linked issue: close #xxx

Tests

API and Format

Documentation

AdminReadOnlyGateway gateway =
GatewayClientProxy.createGatewayProxy(
() ->
getOneAvailableTabletServerNode(
metadataUpdater.getCluster()),
rpcClient,
AdminReadOnlyGateway.class);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary change. Pls revert this line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will revert this line.

@@ -52,6 +55,9 @@

/** Utils for metadata for client. */
public class MetadataUtils {
private static final Logger LOG = LoggerFactory.getLogger(MetadataUtils.class);

private static final int MAX_RETRY_TIMES = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this retry time be configurable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will make it as system configuration

@polyzos
Copy link
Collaborator

polyzos commented Feb 19, 2025

@gkatzioura also added a retry mechanism here
to fix some tests. I assume retry logic will be needed in other places as well, so it maybe makes sense to introduce a class to handle that or use a library. WDYT? cc @wuchong

UPDATE: There is a retry method, but its only available in the test utils

@zhaozijun109
Copy link
Author

@polyzos Good idea, I will try to make a common util method of retry logic. WDYT? @SteNicholas please left some comments, thank you

List<ServerNode> aliveTabletServers = null;
for (int retryTimes = 0; retryTimes <= MAX_RETRY_TIMES; retryTimes++) {
aliveTabletServers = cluster.getAliveTabletServerList();
if (aliveTabletServers.isEmpty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When aliveTabletServers is empty in cluster, the retry doesn't work since the cluster wil stil be empty for aliveTabletServers, we need send metadata request to update cluster`.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,I will adjust it,thank you.

@SteNicholas
Copy link
Contributor

SteNicholas commented Feb 21, 2025

@zhaozijun109, @polyzos, IMO, the retry mechanism maybe different from different behavior, therefore it may not need to introduce a common util method of retry logic.

@zhaozijun109
Copy link
Author

@SteNicholas Thank for your advice, I will re-adjust based on all the comments above

@polyzos
Copy link
Collaborator

polyzos commented Feb 21, 2025

@SteNicholas Indeed, but there might be re-use of some implementations.. my thoughts were that it might be best to have them in a centralized place, to avoid introducing multiple retry implementations across the codebase, because I already saw 3 different cases popping up in different places. Whatever you think works best

@zhaozijun109
Copy link
Author

@luoyuxia @SteNicholas Could you please help me review again when you have free time? Thank you.

@@ -876,6 +876,13 @@ public class ConfigOptions {
"Enable metrics for client. When metrics is enabled, the client "
+ "will collect metrics and report by the JMX metrics reporter.");

public static final ConfigOption<Integer> CLIENT_GET_TABLET_SERVER_NODE_MAX_RETRY_TIMES =
Copy link
Contributor

@SteNicholas SteNicholas Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this option in configuration.md.

@SteNicholas
Copy link
Contributor

@zhaozijun109, could you create new issue for this pull request? BTW, could you also add some description for this pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants