-
Notifications
You must be signed in to change notification settings - Fork 583
Description
Meg came up with this and I want to ensure the metald side has these facilities in place to assist.
During the rollback operation, the following failure points need to be addressed to ensure robust error handling:
-
Step 1 (DB Check): If checking the partition DB for VMs fails due to connectivity issues, the RPC should return a failure response immediately, and no changes should be made to the system state.
-
Step 3 (VM Provisioning): If VM provisioning fails (e.g., insufficient capacity), the RPC should return a failure response immediately, preventing hostname switching and maintaining the current active deployment.
-
Step 4 (Hostname Switching): If the database transaction fails during hostname switching, it should trigger an automatic rollback. Successfully booted VMs should remain running for future requests, the RPC should return a failure response, and no traffic routing changes should occur.
We need to ensure that logging is implemented effectively in these failure points for easier debugging.