forked from cloudfoundry/docs-cf-admin
-
Notifications
You must be signed in to change notification settings - Fork 2
/
diego-cell-upgrade.html.md.erb
37 lines (21 loc) · 2.49 KB
/
diego-cell-upgrade.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
title: Managing Diego Cell Limits During Upgrade
owner: Diego
---
Cloud Foundry supports rolling upgrades in high availability environments. A rolling upgrade means that you can continue to operate an existing Cloud Foundry deployment and its running app instances while upgrading the platform.
<p class="note"><strong>Note</strong>: Rolling upgrade is not available in your deployment if you have not configured your deployment to be highly available. See <a href="../concepts/high-availability.html">High Availability in Cloud Foundry</a>.
</p>
## <a id="process"></a> Diego Cell Upgrade Process
To upgrade Cloud Foundry, BOSH must drain all Diego cell VMs that host app instances. BOSH manages this process by upgrading a batch of cells at a time.
The number of cells that undergo upgrade simultaneously (either in a state of shutting down or coming back online) is controlled by the `max_in_flight` value of the Diego cell job. For example, if `max_in_flight` is set to `10%` and your deployment has 20 Diego cell job instances, then the maximum number of cells that BOSH can upgrade at a single time is `2`.
When BOSH triggers an upgrade, each Diego cell undergoing upgrade enters "evacuation" mode. Evacuation mode means that the cell stops accepting new work and signals the rest of the Diego system to schedule replacements for its app instances. This scheduling is managed by the [Diego auctioneer process](../concepts/diego/diego-auction.html).
The evacuating cells continue to interact with the Diego system as replacements come online. The cell undergoing upgrade waits until either its app instance replacements run successfully before shutting down the original local instances, or for the evacuation process to time out. This "evacuation timeout" defaults to 10 minutes.
If cell evacuation exceeds this timeout, then the cell stops its app instances and shuts down. The Diego system continues to re-emit start requests for the app replacements.
## <a id="limitations-considerations"></a> Preventing Overload
A potential issue arises if too many app instance replacements are slow to start or do not start successfully at all.
If too many app instances are started at once, then the load of these starts on the rest of the system can cause other applications that are already running to crash and be rescheduled. These events can result in a cascading failure.
<%= partial 'diego-upgrade-limits' %>
<% if vars.product_name == 'CF' %>
<% else %>
<%= partial '../opsguide/diego-upgrade-limits-table' %>
<% end %>