Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algorithm implementation problem for START_RETRY_LIMIT #213

Open
lizhihongTest opened this issue Apr 14, 2023 · 1 comment
Open

Algorithm implementation problem for START_RETRY_LIMIT #213

lizhihongTest opened this issue Apr 14, 2023 · 1 comment
Assignees
Labels
Any Idea Any problem/ideas/suggestions App-Manager The App Manager basic service bug Something isn't working

Comments

@lizhihongTest
Copy link
Collaborator

In the implementation of app-manager, the constant START_RETRY_LIMIT is set to 3, but four retries are made, and it takes five cycles to return an error

const START_RETRY_LIMIT: u8 = 3;

if *running_counter > START_RETRY_LIMIT {
let target_status_code = AppLocalStatusCode::RunException;
info!("[RUNNING CHECK] app failed count is out of limit! app:{}, change app status from [{}] to [{}]",
app_id, cur_status_code, target_status_code);
status.set_status(target_status_code);
status_clone = Some(status.clone());
} else {
*running_counter = *running_counter + 1;
info!("[RUNNING CHECK] app status is running, but not actually. will restart it, app:{}, retry count:{}", app_id, *running_counter);
try_start = true;
}
}
if try_start {
let _ = self.restart_app(app_id, status.clone()).await;
} else if let Some(new_status) = status_clone {
let _ = self.non_helper.put_local_status(&new_status).await;
}

Optimization suggestions:

  • *running_counter > START_RETRY_LIMIT should be changed to *running_counter >= START_RETRY_LIMIT
  • let _ = self.non_helper.put_local_status(&new_status).await; It should be judged after restart_app, instead of modifying the status in the next cycle. The current implementation will delay the reporting of APP status errors by one minute
@lizhihongTest lizhihongTest added bug Something isn't working App-Manager The App Manager basic service labels Apr 14, 2023
@lurenpluto
Copy link
Member

Some suggestions for improvement maybe:

  • The first is a common error in the detection of the edge of the condition, although it is not too big impact to do more than once, but it will cause the actual implementation of the logic and literally some definition does not match, bringing confusion to people

  • The second may involve the state switch is synchronous or asynchronous problem, if the detection itself is done inside a large lock, to ensure that no re-entry, then here if the use of asynchronous operation, is it possible that the state management inconsistency problem? For example, the state is not yet submitted, and then quickly move to the next successful state caused by the new state is overwritten?

If we want to do async-based optimization, we should at least add a local_status management locally to avoid the above problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Any Idea Any problem/ideas/suggestions App-Manager The App Manager basic service bug Something isn't working
Projects
Status: 💡Any Idea
Development

No branches or pull requests

3 participants