So I would call this a bug/design issue however VMWare have just noted it in a KB but – BEWARE of use of the VXLAN Tunnel End Point Port (UDP 8472 or 4789 by default) by ANY virtual machine that is hosted on a NSX cluster (regardless of if it is on a VXLAN segment or a standard Port Group) as the traffic will be dropped by the host with a VLAN mismatch. This affects all outbound traffic (i.e. connections from machines inside ESXi with a destination Port that matches the VTEP Port e.g.. UDP 4789)
VMWare today have updated KB2079386 to state “VXLAN port 8472 is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application.” This was the result of a very long running support call involving a VM on a VLAN backed Port Group was having traffic on UDP 8472 being silently dropped without explanation – the KB is not quite accurate; it should read “VTEP Port is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application.” – this is because the hypervisor will drop outbound packets with the destination set to the VTEP Port regardless of if its 8472 or 9871 etc.
Why is this an issue ?
The VXLAN standard (described initially in RFC 7344 has been implemented by a number of vendors for things other than NSX; one such example is physical Sophos Wireless Access points which use the VXLAN standard for “Wireless Zones” and communicates with the Sophos UTM (which can be a virtual machine) on UDP Port 8472. If the UTM is running on a host that has NSX deployed it simply won’t work even if it is running on a Port Group that has nothing to do with NSX.
There are surely other products using this port which begs the question; as a cloud provider or even as an IT group how do you explain to customers that they can’t host that product on ESXi if NSX is deployed when NSX VXLAN even if the traffic is not even on the VLAN with the VXLAN encapsulation operating ?!? The feedback from VMWare support regarding this issue has been that these are reserved ports and should not be used…
What’s going On ?
- Host is ESXi 6.5a running NSX 6.3
- My VTEP Port is set to 4789 (UDP)
- NSX is enabled on cluster “NSX”
- I have a VM “VLANTST-INSIDE” running on host labesx1 (which is part of NSX cluster) running on dvSwitch Port Group “dv-VLAN-101″ which is a VLAN backed (non-VXLAN) Port Group
- I have a VM “VLANTEST” running outside of ESXi on the same VLAN
With a UDP Server running on the test machine inside ESXi on UDP 4789 the external machine can connect without dramas:
When the roles are reversed the behaviour changes; with a UDP Server running on the machine running External to ESXi on UDP 4789 the initial connection can be initially seen but no traffic observed:
When attempting on any other port; no issues:
So if we run pktcap-uw –capture Drop on the ESXi host labesx1.pigeonnuggets.com we can see that the packets are being dropped by the Hypervisor with the Drop Reason ‘VlanTag Mismatch’
It appears that the Network stack is inspecting packets for VTEP UDP Port and filtering them if they do not match the VLAN which is carrying VXLAN regardless of if the payload matches; if the Port Number is the VTEP Port and it’s a VXLAN packet it will be dropped.
What are the options ?
So the only option I have found to resolve this is to change your VTEP Port which is not ideal but there is not really many options at this time. So if a product is conflicting; logon to vCenter and select Networking & Security > Installation > Logical Network Preparation > VXLAN Transport > Change
This is a non-disruptive change and won’t affect your VXLAN payloads. Hopefully this will be fixed at some point….