Ok so this is just a quick write explaining at a high level the process of enabling the High Availability feature of an Edge Gateways in vCloud Director and some things that you should know if deploying them. vCloud Edges are fundamentally the same as NSX edges however they are controlled by vCloud and are nowhere near (although they are catching up) as rich as full NSX edges. They do however have the High Availability flag exposed allowing for device redundancy that is pretty essential for these devices. When enabled if a fault occurs and the Edge crashes or becomes unavailable; a redundant device will seamlessly take over after 15 seconds.
How do I enable it?
So to enable High Availability on an Edge Gateway this is done from the Properties of the Edge Gateway (Administration > Org VDC > Edge Gateways) and select Enable High Availability
How does it work ?
Edge Gateway peers communicate with each other for heartbeat messages using one of the internal interfaces; this is important as at least one internal interface/network be configured (discussed later). vCloud does not expose the HA configuration parameters and as such in NSX 6.3.0 the default dead time is 15 seconds which means that in the event of a failure it takes 15 seconds for the secondary to kick in.
When they are deployed the process that is happening behind the scenes is;
- NSX will deploy an edge under the System vDC Resource Pool vse-EdgeGatwayName-1 which will initially be named based on the Edge Id in NSX
- Finally the Edge will be renamed in vSphere and there will be two Edges in the System vDC in vCenter and labelled “-0″ and “-1″
So once HA has been enabled it is important to note that this does not mean that it is doing anything. There may be two VMs deployed but that doesn’t mean anything. The Edge Gateway HA Status of the Edge has three Status:
- Not Active – This means that High Availability checkbox has been checked but HA is disabled (discussed later)
- Up – When HA is actually configured correctly
Until the Org VDC Networks are added the Secondary node just sits there un-configured and will just consume CPU cycles.
At this point HA will be operating.
How do I verify that it is running?
Logon to the Edge Gateway Console and from the CLI execute show service highavailability this will show the status of the node (Standby for the non-Active Node and Active for the current master) as well as the status of the cluster and the configuration.
When no vApp Networks are present show service highavailability will show Disabled ; when its disabled if the Edge dies; the surviving Edge does not update its configuration and just sits there doing nothing.
High Availability does nothing unless a vApp Network is connected; why does this matter ?
Ok so this seems fairly logical right; if you just have External Networks attached to the Edge and no vApp Org Networks then you don’t need High Availability…but there is a use case for an Edge Gateway with External Networks and the way its displayed an admin might thing that HA is working even though its not ! The reason why it doesn’t operate is because the heart-beating is done via the Internal NICs and if there isn’t one then obviously it can’t operate.
In 99% of use case you will always have a vApp Network connected to an Edge Gateway however Edge Gateways have a bunch of awesome network features that can be leveraged without connection to an Org VDC network.
One such use case (which is how this post came about) is if a customer consuming IaaS using a vCloud has some requirement for some physical servers to be installed in VLAN backed physical networks to be plumbed into vCloud with a firewall. An Edge is a great use case for this as the customer can manage the firewall rules for this service and two Org Networks can be bound with the Edge acting as a firewall. There are other ways to do this but an Edge is a cheap and easy way to achieve this.
So if you use Edges in this manner and require HA create a dummy vApp Org Network (eg. Just a dummy network labelled HA-Heartbeat) and attach it to the Edge.
- Edge Gateway HA only operates if a VDC Org Network is attached to the Edge
- Deadtime/failover in the event of a failure is by default 15 seconds in NSX 6.2.4/NSX 6.3.0
- If you do need it; its pretty low maintenance set and forget
- Don’t enable it if you don’t need it; consumes CPU and Memory