Troubleshooting the Lightwave vmware-stsd service not starting

A quick post today; it has been a very busy 2 months for me planning a wedding and planning migration to Germany but I hopefully should have my very alpha level PowerShell cmdlets for Photon Platform 1.2.1 released in a few weeks which I have been cracking away at in my spare time aber jetz; a quick post on troubleshooting Lightwave as there is not much content out there on the platform;

First check the status of the services loaded using #systemctl status / systemctlIn my case the vmware-stsd.service is failing unexpectedly; so if I inspect the service load with systemctl status vmware-stsd.service I found that the start script is exciting with a success but the Main PID is exitingAfter reviewing the script /opt/vmvware/sbin/vmware-stsd.sh the initialization error logs are written to /var/log/vmware/sso/utils/vmware-stst-stsd.err. After running a tail /var/log/vmware/sso/utils/vmware-stst-stsd.err I found that the service was failing due to the tcserver.pid file not being removed after an unclean shutdown.After cleaning up the orphaned tcserver.pid file (# rm /var/log/vmware/sso/tcserver.pid) and issuing a # systemctl restart vmware-stsd.service; Roberts your fathers brother and everything starts up as expected.

Load Balancing VMWare Photon Platform 1.2 with Citrix Netscaler

Updated 22/06/2017 : Fixed HSTS Policy configuration on the Netscaler (was bound as a Request policy not a Response Policy) and updated quorum discussion after testing with three Photon Controller Nodes

Greetings; this post is the forth in a series for June where I have been focusing on the Photon Platform 1.2. This post outlines how to publish the application through a Citrix Netscaler and my findings along the way. The Photon Platform deployment appliance will deploy a HAProxy Load Balancer as part of the deployment. The HAProxy deployment has the following limitations when deployed using the photon-setup installer;

  1. Only one Load Balancer VM is deployed (requires HAProxy skills to scale the LB platform)
  2. The installer does not allow for an “external URI” to be defined for Lightwave  or the Photon Controller which means that the Cookies issued and the Auth Redirects behave incorrectly if using DNS names (as of yet I have not been able to figure out how to change these settings through API/config files)
  3. As of yet The SSL certificates are all signed by the Lightwave infrastructure and are not trusted
  4. The way the Deployment appliance deploys the solution the response that returns from https://PhotonController:9000/v1/system/auth only returns a single IP for Lightwave which is not dynamically updated; if this endpoint goes offline it does not get updated (Issue 134 has been logged regarding this)
  5. When the primary Lightwave server goes offline users can’t authenticate as the Open ID Connect Clients are not registered on additional servers and further the endpoint registration does not occur on the Photon Controller Platform
  6. The Photon Controllers are deployed in a cluster with (by default) a Majority Quorum; the Quorum setting can be adjusted via the API.

So out of the box there are a few issues with the scalability and availability of the Management Platform however you can still replace the Load Balancing and improve the availability (a little bit), several issues (Issue 133 and Issue 134) discovered during the prototyping have been logged on Github. I will continue to look for ways to address these issues however be aware of the following points of failure in the Photon Platform 1.2 platform I have found during my testing:

  1. Authentication (Lightwave) : All configuration is on a single instance
  2. The Management User Interface : Will not work if the Load Balancer is offline (the API still functions on Port 9000 but 443 stops working)

Hopefully as these issues are addressed this document will evolve and the Platform will become more resilient to failures however I would recommend deploying the Netscalers in HA mode to address HA Issue #2 listed above.

Configuration

Step 1. Allocate VIP, create some DNS records and Generate some SSL Certificates

The first step is to allocate a Virtual IP addresses (VIP) that will be used for providing Public access to the backend services (Lightwave and Photon Controller) and perform any NAT/Firewalling to make these accessible to clients. Once this has been done create DNS (A) records for the two services (Lightwave for Authentication and Photon Platform for the Management/API access) eg. lightwave.pigeonnuggets.com & photonplatform.pigeonnuggets.com

Next generate some Web Server certificates against your Enterprise PKI/Public Certificate authority with the Subject Name (and Subject Alternative Names) set to the DNS records created.

Step 2. Prepare Lightwave

Lightwave is used to generate the access tokens (in the form of a cookie) that is used by Photon Controller to authenticate client requests. When a client contacts the Photon Controller without an access token the Photon Controller constructs a 302 Redirect for the client which contains;

  1. The domain to authenticate
  2. The Claims that the token should contain
  3. Where the clients should be redirected back to after authentication
  4. A client id and;
  5. The hostname of the originating request

Lightwave will then generate a cookie with the Domain Name that matches the provided host in the Redirect URI and pass this to the requester for use with Photon. In order for Lightwave to service the request to validates if the “Redirect_URI” provided is allowed for the Client with the Client Id provided and as such we need to add the new domain names to the list of allowed URI’s. To do this;

  1. Navigate to the LightWave Domain Controller administration page (https://lightwavefqdn/lightwaveui/) and enter the LightWave domain and when prompted enter the LightWave administrator account
  1. Select Service Providers from the side menu and select Open ID Connect Client and locate the Client ID for the Photon Controller Management UI; you can locate this by examining the 302 Redirect URL you get when navigating to the Photon Controller via the IP or by looking for the entry with “https://<IP of load balancer>:4343/logout_callback” and clicking Edit
  1. Amend the properties of the Logout URI: https://<fqdn of DNS record for Photon Controller LB>:4343/logout_callback and enter the following URI’s and click Add followed by Save
  1. Clean-up/remove any references to the default load balancer if you intend to decommission it/not allow connections via IP also by clicking the X next to the relevant objects

Step 3. Configure Netscaler Backend Objects and Request Rewrite Policies

The diagram (click here for a PDF version) outlines the configuration on the Load Balancer to deliver the Lightwave and Photon Controller Platform management plane to users. It is important to ensure that Photon Platform continues to work that the VIP for the Load Balancer is the IP that was assigned to the HAProxy machine. The Photon Controller uses the External URI (the IP) when there is more than one Photon Platform deployed to make backend web requests and if you don’t reuse the VIP it breaks.  I don’t as yet have a better solution but during testing I wasn’t able to amend these external_URI values successfully…watch this space.

As mentioned in Step 2 I was not able to determine how to set/change the “External URI” post deployment for the Lightwave and the Photon Controller platforms and as a result some of the responses to/from the client and the Photon controller must be rewritten with the correct external URI’s (instead of internal IP addresses).

Health Monitors and Load Balancing Objects

The following configuration are the basic Load Balancing objects that need to be defined and the health sets; for the health of the Photon Controllers an API call is made to https://server:9000/v1/available which returns if the service is available

Rewrite and Responder Policy

The following rewrite and responder policies are used to;

  • Insert the Required Headers
  • Replace the Internal IP addresses on the headers with Load Balancer VIP DNS names;
  • Replace the API Authentication endpoint URI;
  • A redirect to Port 4343 if a browser client hits the root of the API service

Also find my complete ns.config for the solution above with the passwords removed for the Certificates and the nsroot password however this should be pretty easy to implement.

The Result

Hopefully you should now have the solution Load Balanced through the Netscaler via DNS names and not directly via IP. I hope you find this information helpful and saves you some pain. Hopefully some of the issues with the Lightwave authentication configuration so that the solution can be deployed in a highly available, scalable manner. Enjoy.

Customizing the Lightwave SSO Sign-On Page

The following is a brief post to outline how to customise the SSO Sign-On page for Project Lightwave. The Project Lightwave Single Sign-On Service is used as the authentication engine for the Photon Platform. Users will be displayed this page whenever they logon to the service and parts of the page can be customized using the Lightwave Administration UI. The following will walk through the configurable options and the anatomy of the page.

Name : HTML to Display in the Header of the SSO Page
Display Banner Checkbox : If selected this requires users to check a box “I agree to XXXXXXX” before they are able to login or they are displayed with a error “In order to use our services, you must agree to XXXXXXXX” if this is not enabled (and Hide Banner and Title is disabled) the link is shown to the Banner however users are not required to agree.
Hide Banner and Title: If this is enabled the Banner is not shown to the users
Content: HTML to display when users click the “I agree to XXXXXXX”

To change the settings and customize these options select Policies & Configuration from the side menu and select Login Banner and click Edit you can then Edit each of the settings discussed above.

Configuring Active Directory Integration with Lightwave on the VMware Photon Platform 1.2

So for the month of June I have decided to lab the Photon Platform 1.2 by VMWare and will be posting a bunch of content related to the product. The Photon Platform leverages Project Lightwave as its directory service.

Project Lightwave is an open source project comprised of enterprise-grade, identity and access management services targeting critical security, governance, and compliance challenges for Cloud-Native Apps within the enterprise. For vSphere Admins Lightwave performs many of the same functions as the Platform Services Controller;

  • Lightwave Directory Service – standards based, multi-tenant, multi-master, highly scalable LDAP v3 directory service
  • Lightwave Certificate Authority – directory integrated certificate authority helps to simplify certificate-based operations and key management across the infrastructure.
  • Lightwave Certificate Store – endpoint certificate store to store certificate credentials.
  • Lightwave Authentication Services – cloud authentication services with support for Kerberos, OAuth 2.0/OpenID Connect, SAML and WSTrust
  • Lightwave Domain Name Services – directory integrated domain name service to ensure Kerberos Authentication to the Directory Service and Authentication Service (STS)
The following outlines how to configure the Lightwave Domain Controllers to use Active Directory Domain Services as an identity provider which will enable users to leverage their Active Directory credentials to access and administer the Platform.

Before you begin you will need the following:

  1. A service account (just a Domain User account with ability to read the Active Directory Domain)
  2. The domain LDAPS certificate for the domain controller; to obtain this open the Local Computer\Personal\Certificates store on the domain controller and export the Certificate (without the private key) for the Certificate using the Certificate Template Domain Controller in Base64 Format

Step 1. Navigate to the LightWave Domain Controller administration page (https://lightwavefqdn/lightwaveui/) and enter the LightWave domain and when prompted enter the LightWave administrator account

Step 2. Select Identify Sources from the side menu and click Add

Step 3. Select Active Directory as LDAP and click Next

Please Note: At the time of writing the option Active Directory (Integrated Windows Authentication) does not appear to function/there is no UI options to add the machine to the domain; I will investigate further at a later time but I imagine that the Lightwave machines need to be added to the domain via the CLI first.

Step 4. Enter the details for the LDAPS service and the Base DN for the Users and Groups; I have just used the root of the domain however you can scope these to Containers further down your tree as per your requirements

Step 5. Select Choose File and select the certificate for the Domain Controllers LDAPS service and click Next

Step 6. Enter the Service Account credentials (Username in the UPN format) and click Test Connection if the connection is successful click Next

Step 7. Finally review and click Save to complete the configuration


Step 8.
Next select Users & Groups from the side-menu and under Groups select Administrators and click Membership

Step 9. Select the domain from the drop-down menu and locate the User or Group to grant the permissions (in the below example the group R-Photon-Admins), check the checkbox next to the object and select Add Member followed by Save

Finally; sign out of the Platform and Sign back in with the Active Directory account entering the Domain Account Username and Password and clicking Login

NOTE: Do not use the “Use Windows session authentication” it doesn’t work during testing (throws “Internal Processing error”). And voila your Active Directory environment can be leveraged for identity.