Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More refined App Manager status recovery #170

Closed
weiqiushi opened this issue Apr 3, 2023 · 14 comments
Closed

More refined App Manager status recovery #170

weiqiushi opened this issue Apr 3, 2023 · 14 comments
Assignees
Labels
App-Manager The App Manager basic service task This is a task

Comments

@weiqiushi
Copy link
Member

Currently, AppManager checks the current status of all DecApps when they are launched. If the status is operation in progress, such as Installing, Uninstalling, it will change to the corresponding failure status, i.e. InstallFailed, UninstallFailed

Here it should be possible to do a finer state recovery, not simply restore to Failed state, but to a more "normal" state

@weiqiushi weiqiushi added the task This is a task label Apr 3, 2023
@weiqiushi weiqiushi self-assigned this Apr 3, 2023
@lurenpluto lurenpluto added the App-Manager The App Manager basic service label Apr 3, 2023
@weiqiushi
Copy link
Member Author

weiqiushi commented Apr 10, 2023

Currently, most of the AppManager state issues are focused on this state recovery logic. In this scenario, the following two modifications can improve the problem:

  1. Ensure that AppManager's Install, Uninstall, etc. logic code does not generate errors when called multiple times
    • Install: If the App is already installed, but not yet running. Calling Install again will not generate an error. This allows the download-install process to be performed again.
    • Uninstall: If the app is already uninstalled, calling Uninstall again will not generate an error.
    • Start: Returns success if the app is already running.
    • Stop: Returns success if the app is not running. If the app has been uninstalled, it also returns success.
  2. In the state recovery logic of AppManager, the same action logic is called again and the result of the action is written to the new state

@lurenpluto
Copy link
Member

I think the issue here is that there is no clear target state for each App. From your description, I think the states of an app should be divided into the following three levels:

- Final target state

This is controlled by external users or components. The series of operations performed within the app-manager are aimed at bringing the app to this state. To achieve this state, there may be a series of corresponding operations.

- Current state

The current state of the app, which is generally a specific state within a state sequence.

- Intermediate target state

Some intermediate target states set to achieve the final target state.

Take the restart of an app as an example. The app-manager currently completes it internally through a stop->start operation sequence. The corresponding three states are as follows:

  • Final target state: Running
  • Intermediate target states: Stopped -> Running
  • Current state, which may be further subdivided into sub-states, such as the operation sequence corresponding to Running->Stopping->Stopped->Starting->Running

After the app-manager restarts or encounters an error, the goal of managing the state for each app is to return the app to its final target state. Starting from the current state, a series of intermediate target states are formed, divided into operations, and completed. In case of an error, a retry mechanism can be provided.

weiqiushi added a commit that referenced this issue Apr 10, 2023
…rted, stop return success when decapp stopped or uninstalled
weiqiushi added a commit that referenced this issue Apr 12, 2023
…hen restoring the state of the app at startup. Optimize the status detection log
@lizhihongTest
Copy link
Collaborator

@weiqiushi I found that the DEC APP contains the following states, which do not include Downloading, but the download of the DEC APP is a long-term network operation. If the app-manager is restarted when the ood-daemon is updated during this process, the Downloading state is missing. Will it be impossible to resume the download?

pub enum AppLocalStatusCode {
    Init = 0,
    Installing = 1, 
    InstallFailed = 3,
    NoService = 4,
    Stopping = 5,
    Stop = 6,
    StopFailed = 7,
    Starting = 8,
    Running = 9,
    StartFailed = 10,
    Uninstalling = 11,
    UninstallFailed = 12,
    Uninstalled = 13,
    RunException = 15, 
    ErrStatus = 255,
}

@weiqiushi
Copy link
Member Author

@weiqiushi I found that the DEC APP contains the following states, which do not include Downloading, but the download of the DEC APP is a long-term network operation. If the app-manager is restarted when the ood-daemon is updated during this process, the Downloading state is missing. Will it be impossible to resume the download?

pub enum AppLocalStatusCode {
    Init = 0,
    Installing = 1, 
    InstallFailed = 3,
    NoService = 4,
    Stopping = 5,
    Stop = 6,
    StopFailed = 7,
    Starting = 8,
    Running = 9,
    StartFailed = 10,
    Uninstalling = 11,
    UninstallFailed = 12,
    Uninstalled = 13,
    RunException = 15, 
    ErrStatus = 255,
}

The current download is part of the installation logic. If the app-manager is interrupted during the installation process, the AppManager will retry the entire installation logic the next time it is started. The target version data of the app will be downloaded again at this time. The download process itself does not support recovery. It can only be re-downloaded

@lizhihongTest
Copy link
Collaborator

@weiqiushi During the DEC APP Installing process, the download file was interrupted, and the recovery process, the app-manager run panic current version: 1.1.0.748-nightly (23-04-13)

2023-04-13 18:39:18.521233 +08:00] INFO [ThreadId(35)] [service\app-manager\src\app_manager_ex.rs:190] start install sys app!
[2023-04-13 18:39:18.521239 +08:00] INFO [ThreadId(33)] [service\app-manager\src\app_manager_ex.rs:367] find app 9tGpLNndpfRjUF59SsZidVaPuPd8QNFJusQKH8genY3Q status Installing on startup, try install again
[2023-04-13 18:39:18.521289 +08:00] INFO [ThreadId(31)] [component\cyfs-lib\src\ws\session.rs:64] new ws session: sid=72632183, source=ws://127.0.0.1:1319/
[2023-04-13 18:39:18.521423 +08:00] INFO [ThreadId(35)] [service\app-manager\src\app_manager_ex.rs:1107] try get sys app list 9tGpLNnPYrQBpwV6LAksdUptxBNFzRFtts1Acrh9DBij
[2023-04-13 18:39:18.521528 +08:00] INFO [ThreadId(31)] [component\cyfs-lib\src\router_handler\ws\handler.rs:409] ws handler session begin: sid=72632183
[2023-04-13 18:39:18.521577 +08:00] INFO [ThreadId(35)] [component\cyfs-lib\src\router_handler\ws\handler.rs:46] will add ws router handler: chain=handler, category=post_object, id=app_manager_cmd_handler, sid=72632183, routine=true
[2023-04-13 18:39:18.521580 +08:00] INFO [ThreadId(30)] [component\cyfs-lib\src\requestor\tcp.rs:46] tcp connect to 127.0.0.1:1318 success, during=0ms
[2023-04-13 18:39:18.521972 +08:00] INFO [ThreadId(30)] [component\cyfs-lib\src\requestor\tcp.rs:63] http-tcp request to 127.0.0.1:1318 success! during=0ms
[2023-04-13 18:39:18.522002 +08:00] INFO [ThreadId(34)] [component\cyfs-lib\src\router_handler\ws\handler.rs:96] add ws router handler success: chain=handler, category=post_object, id=app_manager_cmd_handler, dec=Some(9tGpLNncauC9kGhZ7GsztFvVegaKwBXoSDjkxGDHqrn6)
[2023-04-13 18:39:18.522026 +08:00] INFO [ThreadId(30)] [component\cyfs-lib\src\non\requestor.rs:294] get object from non service success: 9tGpLNnPYrQBpwV6LAksdUptxBNFzRFtts1Acrh9DBij, object=9tGpLNnPYrQBpwV6LAksdUptxBNFzRFtts1Acrh9DBij
[2023-04-13 18:39:18.522059 +08:00] INFO [ThreadId(30)] [service\app-manager\src\app_manager_ex.rs:521] ###### will install sys apps!
[2023-04-13 18:39:18.524891 +08:00] INFO [ThreadId(33)] [component\cyfs-debug\src\panic\panic.rs:114] stack_hash=
1594ec1846f2797972d9e36ec080400e
[2023-04-13 18:39:18.524907 +08:00] WARN [ThreadId(33)] [component\cyfs-debug\src\panic\panic.rs:31] thread 'async-std/runtime' panicked at 'called `Option::unwrap()` on a `None` value': service\app-manager\src\app_manager_ex.rs:371
0: 0x0000000000a827b8 0x00007ff750b30000
1: 0x0000000000a81d82 0x00007ff750b30000
2: 0x0000000000a07feb 0x00007ff750b30000
3: 0x0000000000e8bc00 0x00007ff750b30000
4: 0x0000000000e8b91b 0x00007ff750b30000
5: 0x0000000000e894cf 0x00007ff750b30000
6: 0x0000000000e8b610 0x00007ff750b30000
7: 0x0000000001093f35 0x00007ff750b30000
8: 0x000000000109402c 0x00007ff750b30000
9: 0x00000000000bc0ad 0x00007ff750b30000
10: 0x0000000000098d1a 0x00007ff750b30000
11: 0x00000000001b021b 0x00007ff750b30000
12: 0x0000000000e09ca2 0x00007ff750b30000
13: 0x0000000000e12013 0x00007ff750b30000
14: 0x0000000000e05b3f 0x00007ff750b30000
15: 0x0000000000e08d95 0x00007ff750b30000
16: 0x0000000000e05f18 0x00007ff750b30000
17: 0x0000000000e04449 0x00007ff750b30000
18: 0x0000000000e148e1 0x00007ff750b30000
19: 0x0000000000e9941c 0x00007ff750b30000
20: 0x0000000000017bd4 0x00007ff9654e0000
21: 0x000000000006ced1 0x00007ff9655e0000

@weiqiushi
Copy link
Member Author

This is due to a previously overlooked logic:

When AppManager receives an AppCmd from a user, in order to synchronize the App status as soon as possible, it will set the App status in advance before the command is executed

//这里先设置状态,以免后续有命令来读
let next_cmd = &cmd_group[0].0;
let next_status_code =
AppCmdExecutor::get_next_status_with_cmd(next_cmd.cmd()).unwrap();
info!(
"cmd accept [{:?}], will change status from [{}] to [{}], app:{}, cmd groups: {:?}",
cmd_code, status_code, next_status_code, app_id, cmd_group_code
);
status.set_status(next_status_code);
status_clone = status.clone();
}
let _ = self.non_helper.put_local_status(&status_clone).await;

When it receives the install command and sets the Installing status, it forgets to set the version number to be installed together

@lizhihongTest
Copy link
Collaborator

If I fail to install the DEC APP and the status is Installfailed, I will perform the CmdCode::Install operation again at this time, and the APP-Manager will perform the process of uninstallation and reinstallation. There are two problems in this process

  • app-manager runPanic
CYFS service panic report: 
product:cyfs-service
service:app-manager
bin:app-manager.exe
channel:nightly
target:x86_64-pc-windows-msvc
version:1.1.0.0-nightly (23-04-13)
msg:{"msg":"thread 'async-std/runtime' panicked at 'called `Option::unwrap()` on a `None` value': service\\app-manager\\src\\app_cmd_executor.rs:440\n0: 0x0000000000a834e8 0x00007ff75e860000\n1: 0x0000000000a82ab2 0x00007ff75e860000\n2: 0x0000000000a09b4b 0x00007ff75e860000\n3: 0x0000000000e8cea0 0x00007ff75e860000\n4: 0x0000000000e8cbbb 0x00007ff75e860000\n5: 0x0000000000e8a76f 0x00007ff75e860000\n6: 0x0000000000e8c8b0 0x00007ff75e860000\n7: 0x00000000010929e5 0x00007ff75e860000\n8: 0x0000000001092adc 0x00007ff75e860000\n9: 0x000000000010ea81 0x00007ff75e860000\n10: 0x00000000000bfe30 0x00007ff75e860000\n11: 0x0000000000099259 0x00007ff75e860000\n12: 0x00000000001adafb 0x00007ff75e860000\n13: 0x0000000000e0b052 0x00007ff75e860000\n14: 0x0000000000e133c3 0x00007ff75e860000\n15: 0x0000000000e0707f 0x00007ff75e860000\n16: 0x0000000000e0a035 0x00007ff75e860000\n17: 0x0000000000e072c8 0x00007ff75e860000\n18: 0x0000000000e05899 0x00007ff75e860000\n19: 0x0000000000e15801 0x00007ff75e860000\n20: 0x0000000000e9a6bc 0x00007ff75e860000\n21: 0x0000000000017bd4 0x00007ff9654e0000\n22: 0x000000000006ced1 0x00007ff9655e0000","msg_with_symbol":"thread 'async-std/runtime' panicked at 'called `Option::unwrap()` on a `None` value': service\\app-manager\\src\\app_cmd_executor.rs:440\n   0: <unknown>\n   1: <unknown>\n   2: <unknown>\n   3: <unknown>\n   4: <unknown>\n   5: <unknown>\n   6: <unknown>\n   7: <unknown>\n   8: <unknown>\n   9: <unknown>\n  10: <unknown>\n  11: <unknown>\n  12: <unknown>\n  13: <unknown>\n  14: <unknown>\n  15: <unknown>\n  16: <unknown>\n  17: <unknown>\n  18: <unknown>\n  19: <unknown>\n  20: <unknown>\n  21: BaseThreadInitThunk\n  22: RtlUserThreadStart\n","hash":"b112d1992146f99f821183717090d7c5"}
  • app-manager is in the process of reinstalling, and app-manager is interrupted during the Uninstalling operation. After app-manager restarts, it will only uninstall and not install

weiqiushi added a commit that referenced this issue Apr 13, 2023
…d from user, Compatibility with previous incorrect state settings when uninstall App
@lizhihongTest
Copy link
Collaborator

I think the issue here is that there is no clear target state for each App. From your description, I think the states of an app should be divided into the following three levels:

- Final target state

This is controlled by external users or components. The series of operations performed within the app-manager are aimed at bringing the app to this state. To achieve this state, there may be a series of corresponding operations.

- Current state

The current state of the app, which is generally a specific state within a state sequence.

- Intermediate target state

Some intermediate target states set to achieve the final target state.

Take the restart of an app as an example. The app-manager currently completes it internally through a stop->start operation sequence. The corresponding three states are as follows:

  • Final target state: Running
  • Intermediate target states: Stopped -> Running
  • Current state, which may be further subdivided into sub-states, such as the operation sequence corresponding to Running->Stopping->Stopped->Starting->Running

After the app-manager restarts or encounters an error, the goal of managing the state for each app is to return the app to its final target state. Starting from the current state, a series of intermediate target states are formed, divided into operations, and completed. In case of an error, a retry mechanism can be provided.

@weiqiushi It seems that this modification cannot perfectly solve this problem, and it is necessary to adopt this part of the suggestion for reconstruction in order to improve the state machine of app-manager

@weiqiushi
Copy link
Member Author

This is caused by the bug described above. In this case, AppManager will leave an incorrect status: Installing but no version number

In this state:

  1. when uninstall is executed, Uninstall execution will panic because the version number is not found
  2. the uninstall process is panic and subsequent install operations cannot be performed

commit 314d358 The above problem has been fixed.

@weiqiushi
Copy link
Member Author

I think the issue here is that there is no clear target state for each App. From your description, I think the states of an app should be divided into the following three levels:

- Final target state

This is controlled by external users or components. The series of operations performed within the app-manager are aimed at bringing the app to this state. To achieve this state, there may be a series of corresponding operations.

- Current state

The current state of the app, which is generally a specific state within a state sequence.

- Intermediate target state

Some intermediate target states set to achieve the final target state.
Take the restart of an app as an example. The app-manager currently completes it internally through a stop->start operation sequence. The corresponding three states are as follows:

  • Final target state: Running
  • Intermediate target states: Stopped -> Running
  • Current state, which may be further subdivided into sub-states, such as the operation sequence corresponding to Running->Stopping->Stopped->Starting->Running

After the app-manager restarts or encounters an error, the goal of managing the state for each app is to return the app to its final target state. Starting from the current state, a series of intermediate target states are formed, divided into operations, and completed. In case of an error, a retry mechanism can be provided.

@weiqiushi It seems that this modification cannot perfectly solve this problem, and it is necessary to adopt this part of the suggestion for reconstruction in order to improve the state machine of app-manager

This logic requires a refactoring of the appmanager's entire state handling logic to implement, which may take 1-2 major releases to complete. This refactoring will not be scheduled in the near future

@lizhihongTest
Copy link
Collaborator

This problem #207 occurs when DEC APP Installing "npm i" is interrupted

@lizhihongTest
Copy link
Collaborator

Associated issue #212

@lizhihongTest
Copy link
Collaborator

lizhihongTest commented Apr 15, 2023

@weiqiushi You can create a new task for refactoring. This task only adds operation interruption recovery. Some other problems may need to redesign app-manager to solve them.

@lizhihongTest
Copy link
Collaborator

Associated issue #213

streetycat pushed a commit to streetycat/CYFS that referenced this issue May 12, 2023
…ady started, stop return success when decapp stopped or uninstalled
streetycat pushed a commit to streetycat/CYFS that referenced this issue May 12, 2023
…ding command when restoring the state of the app at startup. Optimize the status detection log
streetycat pushed a commit to streetycat/CYFS that referenced this issue May 12, 2023
streetycat pushed a commit to streetycat/CYFS that referenced this issue May 12, 2023
… command from user, Compatibility with previous incorrect state settings when uninstall App
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
App-Manager The App Manager basic service task This is a task
Projects
Status: Done
Development

No branches or pull requests

3 participants