vSAN Health PRTG Sensor

I just wanted to create a quick post on a PRTG Network Monitoring Sensor that I created to provide some monitoring for vSAN deployed in my Lab. The script is not extensively tested as I am just using it in the Lab however hopefully you will get some value from it. The sensor performs an Invoke of a vSAN Health check against a VSAN cluster and reports any failures  or warnings for components.

The sensor is available from my Github and uses a module from Roman Gelman (@rgelman75) VSAN.psm1 which is available from here.

Sensor Setup

  1. Download PRTG-vSphere-vSANealth.ps1 from here and place it in C:\Program Files (x86)\PRTG Network Monitor\Custom Sensors\EXEXML\ on your Probe
  2. Download the prtg.customlookups.vsan.status.ovl from here (Custom Lookup for PRTG) and place this file in C:\Program Files (x86)\PRTG Network Monitor\lookups\custom\ on your Central Probe
  3. Download VSAN.psm1 from here and place this in C:\Program Files (x86)\PRTG Network Monitor\Custom Sensors\EXEXML\ on your Probe
  4. Logon to PRTG and select Setup > Administrative Tools > Load Lookup and File Lists

5. Next against your vCenter device create a new Sensor of type EXE/Script Advanced

6. Enter the following parameters and then click Continue to finish the setup:

    • Sensor Name: The name of the sensor (e.g. VSAN Cluster Status – Development)
    • EXE/Script: PRTG-vSphere-vSANHealth.ps1
    • Parameters: -vCenterServer “<FQDN of your vCenter>” -vSANCluster “<Name of the VSAN Cluster>”
    • Security Context: Use Windows Credentials of parent device
    • Timeout (Sec): 300
After configuration it should start reporting the status and you can setup notifications, monitoring frequency and channel limits as desired.

Troubleshooting
  • The User executing the scripts (the Windows account specified in PRTG) must have sufficient rights to the vCenter Server
  • PRTG runs all PowerShell scripts under the 32-bit image; this will cause PowerCLI not to run unless the modules are installed under the x86 Modules folder. The easiest way to fix this is to run: Save-Module -Name VMware.PowerCLI -Path SystemRoot%\SysWOW64\WindowsPowerShell\v1.0\Modules\
  • If you encounter a number of virtual machines in your enviornment “vsan-healthcheck-disposable….” in your environment this is due to the Timeout value (which is statically set to 2 seconds) not being high enough in the VSAN.psm1 module; find any references to VsanQueryVcClusterHealthSummary and replace the Integer 2 with something higher (eg. 30 seconds)
The Code

 

Configuring Storage IO Control IOPS Capacity for vCloud Director Org VDC Storage Profiles

Recently I began a small project to expose and control Storage IO Control in vCloud Director 8.20+. In order to leverage the capabilities there are a few things that need to be configured/considered. Before you begin you need to determine the capabilities (IOPS) of each of your datastores which is set as a custom attribute on each datastore and is exposed to vCloud Director as the “Available IOPS” for a data store. There are a few things to note before you begin:

  1. You cannot enable IOPS support on a VMware Virtual SAN datastore
  2. You cannot enable IOPS support if the Storage Profile contains Datastores that are part of a Storage DRS Cluster; all of the datastores in the Storage Profile must not be part of a Cluster; if any datastores are in a SDRS cluster you can’t leverage the SIOC in vCloud Director
  3. Each Datastore can have a value set between 200-4000 (IOPS)
  4. You need to have vSphere Administrator rights on the vCenters hosting the datastores to complete the below
  5. The tagged datastores must be added to a SIOC enabled Storage Policy which is mapped to vCloud as a Provider VDC Storage Profile
  6. The Organisational VDC Storage Profile can then have SIOC capabilities set against it using the REST API (or Powershell using my vCloud Storage Profile Module

Step 1. Set the iopsCapacity Custom Attribute

In order to expose SIOC in vSphere to vCloud Director custom attributes have to be added to the Datastores using the vSphere Manage Object Browser (MOB) as outlined in VMWare KB2148300 however it’s much easier to do this through the vSphere Client or vSphere Web Client.

  1. Logon to the vSphere H5 Client (https://<vCenter>/ui) and select the Tags & Custom Attributes from the menu and select Custom Attributes and click Add
  1. Enter the attribute iopsCapacity and select the Type Datastore and click Add
  1. Next select Storage from the main menu and select each Datastores which you wish to set SIOC capabilities to be exposed in vCloud and from the Actions menu select Tags & Custom Attributes > Edit Custom Attributes 
  1. Set the value for iopsCapacity and click OK
  1. Next; tag the datastores with a relevant tag and create a new Storage Profile with VMWare Storage IO Control provider for the SIOC enabled datastores

Step 2. Configure Storage Profiles in vCloud Director

  1. After this has been set on all the relevant data stores; logon to vCloud Director and select vCenters > Refresh Storage Policies 
  1. Add the Storage Profile to the relevant Organizations (Organizations > Organisational VDCS > Storage Policies)
  1. Review the Provider VDC and confirm that the IopsCapacity value shows a non-zero value when using the Get-ProviderVdcStorageProfile cmdlet (Open PowerShell and connect to vCloud Director and import the module Module-vCloud-SIOC.psm1 available from here)
  1. Set the Storage IO Control settings using the Set-OrgVdcStorageProfile cmdlet

$objOrgVDCStorageProfile = Get-OrgVdcStorageProfile -OrgName “PigeonNuggets” | ? {$_.Name -eq “SIOC”}
$objOrgVDCStorageProfile | Set-OrgVdcStorageProfile -SIOCEnabled $true -DiskIopsMax 1000 -DiskIopsDefault 100 -DiskIopsPerGBMax 100

The OrgVDC Storage Profile is configured for SIOC which is implemented in vSphere. SIOC as implemented in vCloud Director needs further work (manually tagging the datastores with capabilities and API only exposure is a bit rough) however the capabilities are beginning to be exposed; further configuration can be made on individual Virtual Disks via the API (hopefully I will get to this in the near future). Hopefully this is of some value for you. #LongLiveVCD

Copy-VMGuestFile “The request was aborted” Exception – Mystery Solved !

Today I thought I would examine an issue that has come up from time to time over the past  6 or 7 years or so which I have never sat down and examined properly; when running the PowerCLI Copy-VMGuestFile cmdlet for larger files an exception is thrown “The request was aborted: The request was cancelled”.

The fix is simple: The WebOperationTimeoutSeconds parameter for PowerCLI needs to be increased from the Default of 300 seconds to a larger value to allow enough time for the transfer to complete or set to infinite.  A big thanks to @vmkdaily on the VMWare {Code} Slack for the info for the solution.

To set to infinite simply execute Set-PowerCLIConfiguration -WebOperationTimeoutSeconds -1 as an Administrator, restart PowerCLI and then rerun the cmdlet.

Some further information
Below is just a summary of the mechanism of how the GuestFileManager and other Guest Processes works for transferring files between the Guest and the local machine via the API. The GuestOperationsManager is a really important and versatile library as it allows for two-way execution and information passing between guests and Orchestration platforms WITHOUT network connectivity between the management plain and the virtual machine which becomes particularly for operations of scale. I use the Invoke-VMScript and Copy-VMGuestFile quite a bit for everything to automating VM Hardware Upgrades and guest reconfiguration/customisation to shipping and executing patches or build automation for customers in a Service Provider environment where I have no network inter-connectivity to their environments.

So the VIX API operations were moved into the vSphere API in vSphere 5 and the process flow for a call is as follows;

  1. A vSphere Web services client program calls a function in the vSphere API.
  2. The client sends a SOAP command over https (port 443) to vCenter Server.
  3. The vCenter Server passes the command to the host agent process hostd, which sends it to VMX.
  4. VMX relays the command to VMware Tools in the guest.
  5. VMware Tools has the Guest OS execute the guest operation.

In PowerCLI this is exposed via the GuestOperationsManager. For the Copy-VMGuestFile cmdlet a call is made to the $GuestOperationsManager.FileManager.InitateFileTransferXXXX methods with the Virtual Machine object, the Guest Authentication details, the destination where the file will be placed on the guest (and filename and attributes) and the size of the file. The returned value is a unique URI with a Token/Session ID which is used via a HTTP PUT to upload the file. The below is an example implementation if you wish to have a play around:

Further information

PowerCLI : Get/Set cmdlets for DNS Configuration of the vCenter Server Appliance

This week after responding to a post on the VMWare Community forums I started playing around with the vSphere REST API. I have quite a bit of experience with the vCloud REST API which is the basis for all of the vCloud PowerCLI code but not a lot with JSON or the vSphere API. So I have thrown together some methods with an aim for reuse for all vSphere APIs that I thought I would share. The module is named Module-vCSA-Administration.psm1 and contains at present three main methods;

  • Connect-VIServerREST : Establishes a connection to the vSphere REST API on a vCenter Server
  • Get-VCSANetworkConfigDNS : Get the DNS Networking configuration for the vCSA
  • Set-VCSANetworkConfigDNS : Amend the DNS Networking configuration for the vCSA

The support methods in the module (Get-vSphereRESTResponseJSON, Update-vSphereRESTDataJSON, Publish-vSphereRESTDataJSON) provide a very simple wrapper that can be used to make any REST API call and extend the functionality. I hope someone finds these useful; I will probably extend these modules further but was a good introduction into playing with JSON based REST API for vSphere.

Usage
I generally try to document as much as possible so you can use the get-help cmdlet command to review the help for each of the cmdlets (e.g. Get-help Connect-VIServerREST).

In order to use the module you must load the module using the Import-Module <Path to Module> cmdlet

Use the Connect-VIServerREST to connect to the REST API. The REST API uses the vSphere SSO Service to authenticate sessions so use the credentials (in the format you using use) to logon to the REST API. The method will connect to the REST API and stores the session information in a Global Variable $DefaultVIServerREST for use by other methods.



Once the connection is established you can use the other methods to retrieve or set the DNS configuration either setting to DHCP or setting static servers using the Get-VCSANetworkConfigDNS or Set-VCSANetworkConfigDNS cmdlets.

Module is available from: https://raw.githubusercontent.com/AdrianBegg/vSphere/master/Module-vCSA-Administration.psm1

 

Identify VMs affected by VMware Tools and RSS Incompatibility Issue

Last month VMWare published a blog post (https://blogs.vmware.com/apps/2017/03/rush-post-vmware-tools-rss-incompatibility-issues.html) about an issue affecting Windows Server 2012/2012 R2 and newer Guest OS’s running VMware Tools versions 9.10.0 up to 10.1.5, the paravirtual VMXNet3 driver. Basically Windows Receive Side Scaling (RSS) is not functional. RSS enables network adapters to distribute the kernel-mode network processing load across multiple processor cores in multi-core computers. The distribution of this processing makes it possible to support higher network traffic loads than would be possible if only a single core were to be used. This still hasn’t been fixed but when it does you may need to determine which machines are still impacted.

In order to determine which machines are exposed/potentially impacted in your environment by this bug (and also check if RSS is flagged as enabled in guest) the following script can be used. Once this bug is fixed; if you have machines that don’t have RSS enabled that have multicore you should consider enabling it !

For the latest version please refer to https://github.com/AdrianBegg/vSphere/

 

NSX – BEWARE ! VTEP Port can NOT be used by ANY virtual machine on an ESXi host running NSX

So I would call this a bug/design issue however VMWare have just noted it in a KB but – BEWARE of use of the VXLAN Tunnel End Point Port (UDP 8472 or 4789 by default) by ANY virtual machine that is hosted on a NSX cluster (regardless of if it is on a VXLAN segment or a standard Port Group) as the traffic will be dropped by the host with a VLAN mismatch. This affects all outbound traffic (i.e. connections from machines inside ESXi with a destination Port that matches the VTEP Port e.g.. UDP 4789)
VMWare today have updated KB2079386 to state “VXLAN port 8472 is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application.” This was the result of a very long running support call involving a VM on a VLAN backed Port Group was having traffic on UDP 8472 being silently dropped without explanation – the KB is not quite accurate; it should read “VTEP Port is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application.” – this is because the hypervisor will drop outbound packets with the destination set to the VTEP Port regardless of if its 8472 or 9871 etc.

Why is this an issue ?

The VXLAN standard (described initially in RFC 7344 has been implemented by a number of vendors for things other than NSX; one such example is physical Sophos Wireless Access points which use the VXLAN standard for “Wireless Zones” and communicates with the Sophos UTM (which can be a virtual machine) on UDP Port 8472. If the UTM is running on a host that has NSX deployed it simply won’t work even if it is running on a Port Group that has nothing to do with NSX.

There are surely other products using this port which begs the question; as a cloud provider or even as an IT group how do you explain to customers that they can’t host that product on ESXi if NSX is deployed when NSX VXLAN even if the traffic is not even on the VLAN with the VXLAN encapsulation operating ?!? The feedback from VMWare support regarding this issue has been that these are reserved ports and should not be used…

What’s going On ?


As a proof of concept I ran up the following lab:

  • Host is ESXi 6.5a running NSX 6.3
  • My VTEP Port is set to 4789 (UDP)
  • NSX is enabled on cluster “NSX”
  • I have a VM “VLANTST-INSIDE” running on host labesx1 (which is part of NSX cluster) running on dvSwitch Port Group “dv-VLAN-101″ which is a VLAN backed (non-VXLAN) Port Group
  • I have a VM “VLANTEST” running outside of ESXi on the same VLAN

With a UDP Server running on the test machine inside ESXi on UDP 4789 the external machine can connect without dramas:

When the roles are reversed the behaviour changes; with a UDP Server running on the machine running External to ESXi on UDP 4789 the initial connection can be initially seen but no traffic observed:
 
When attempting on any other port; no issues:

So if we run pktcap-uw –capture Drop on the ESXi host labesx1.pigeonnuggets.com we can see that the packets are being dropped by the Hypervisor with the Drop Reason ‘VlanTag Mismatch’

It appears that the Network stack is inspecting packets for VTEP UDP Port and filtering them if they do not match the VLAN which is carrying VXLAN regardless of if the payload matches; if the Port Number is the VTEP Port and it’s a VXLAN packet it will be dropped.

What are the options ?

So the only option I have found to resolve this is to change your VTEP Port which is not ideal but there is not really many options at this time. So if a product is conflicting; logon to vCenter and select Networking & Security > Installation > Logical Network Preparation > VXLAN Transport > Change

This is a non-disruptive change and won’t affect your VXLAN payloads. Hopefully this will be fixed at some point….