Deploying VMWare Photon Platform 1.2

I am late to the party in the Container platform space but my exploration is way overdue so I am aiming at spending some overdue R&D time assessing VMWare Photon Platform from a “vSphere/vCloud” administrators point of view. My goal is to explore how these could be deployed at scale, challenges and advantages of the platform and to develop content on design considerations, deployment and Day 1 and Day 2 considerations. These will be long posts my apologies :S

VMware Photon Controller is an open-source multi-tenant host controller that you can use to manage hardware, containers, VMs, and host clusters. It provides a distributed, high-scale control plane optimized for cloudnative applications, which include containers and developer frameworks, such as Pivotal Cloud Foundry (PCF).

Quick Summary of Architecture

VMware Photon Platform is an open source “container-optimized cloud platform” for Cloud-Native Applications.

  • No vCenter Server; Photon Controller is a decentralized Control Plane
  • ESXi Hosts are used as the Virtualization Platform its designed to scale !
  • No scaling limitations for hosts; can have 100′s or 1000s of ESXi hosts
  • vCloud Director type resource management
  • Has the concept of fault domains, group of underlying hypervisors etc.
  • One-click provision of Kubernetes  as a service; fully supported by VMWare
  • NSX-T and vSAN integrations

Advantages for vCloud/vSphere Admin

  • Familiar management and troubleshooting for underlying Hypervisor; if you’re a vSphere admin than troubleshooting the hypervisor (e.g. performance issues) is going to be a familiar process
  • Trusted platform : ESXi is a solid hypervisor
  • Cost : You can deploy a pretty capable solution for low/zero cost
  • Interoperability and API : Interoperability with existing toolsets
  • Open Source : If you don’t like something, change it, if something breaks, fix it or at least have a look at why
  • Multi-tenant and scalability is massive

Disadvantages

  • Deployment effort: Its moderate; the configuration is not to forgiving at typos’ etc. playing around with the configuration/troubleshooting installation issues took a moderate effort – budget some time to Lab
  • Documentation and community content on components (e.g. Project Lightwave) is light (no pun intended) and is often hidden throughout the source tree; be prepared to troubleshoot problems to resolution yourself
  • vSphere Integrated Container deployment is a lot easier:) and has many advantages (but also disadvantages); they both have there place; Photon is for scaling.

Lessons Learnt/Important Notes

I am going to put all this at the front because this is all the little things that caught me; the following are some of the lessons learnt through my Lab deployment which hopefully will save you some time;

Before you begin 

  1. Review the Issues registers for Photon and Lightwave (https://github.com/vmware/photon-controller/issues) before you begin; there are a number of bugs that might catch you
  2. Read the Release Notes

Installation/Design

  1. If you plan to deploy NSX-T you must deploy this before you deploy the Photon Platform
  2. Lightwave Appliance (the equivalent of the Platform Services Controller in vSphere) is basically a Domain Controller and has its own DNS service which MUST be used by all of the Photon Platform components for everything to work – this is important as when you are installing/configuring the Photon Platform appliances the “dns” configuration needs to point to the IP of the Lightwave appliances and the DNS of the Lightwave appliances to your actual DNS infrastructure.
  3. The Appliances deployed from the Photon Installer are deployed Thick Provisioned so you require a minimum of 40GB free on your LUN per appliance; in a small deployment there is 3 appliances (Lightwave, Photon Controller and Load Balancer) however if deploying with HA make sure you check disk requirements and ensure you have enough for the vswap files etc. when the machines are Powered On etc.
  4. Lightwave password has a password policy; the Lightwave Administrator and root password must be at have at least one uppercase, one lowercase, one digit, and one punctuation character.
  5. If you intend to use ESXi 6.0 “Free”;
    1. Leave the node in evaluation mode while your building – the ESXi 6.0 Free license disables the vSphere 6.0 API which I used at least to configure the host; once you install the license you loose some PowerCLI capability
    2. ESXi 6.5 “free” has only a restriction on guests which are restricted to 8 vCPU’s; for container host platforms this may be sufficient but assess for your needs
  6. SSH must be enabled on the ESXi hosts to deploy the Photon Agents
  7. You can deploy many Lightwave and Photon Controller Appliances but only one HAProxy Load Balancer through the deployment appliance
  8. You can deploy your management plane components on the same hosts as your “Cloud” hosts (Hosts that will host customer containers) however you may wish to separate your management components on their own hosts/storage to prevent “noisy” containers impacting your management plane and to simplify your patching/maintenance strategies

Troubleshooting/Documentation

  1. The Photon Controller 1.2 documentation wiki is not a complete resource; as an open-source offering it has a decent start at documentation on the wiki however be prepared to troubleshoot issues; the log locations are documented on the wiki
  2. There is documentation hidden in text files throughout the source on Github; I recommend you go hunting through the Github repository for text files s as there is tools and other tidbits “hidden” away

Post Deployment

  1. When connecting to the Web Interface you MUST connect via the IP address and not the DNS name of the Load Balancer (https://<ip-address-of-load-balancer>:4343) if you attempt to connect using the FQDN after the deployment you will just end up in a redirect loop. I have not investigated fully but it appears to be due to the token/header forwarded to the Platform Controller.
  2. Installer Appliance
    1. Change the password (root:changeme) to something secure; it holds configuration data with passwords in plain text so secure it
    2. Turn it off; you don’t need it after the build is complete (If you labbing don’t delete it because you’re probably going to redeploy the environment a few times)

Step 1. Designing/Building Hardware/ESXi Platform

Without a vCenter and access to the vSphere API (if using Free ESXi license) there are a few challenges for managing the infrastructure platform; no Update Manager, no Auto Deploy or Host Profiles, no agent upgrade management and limited host/platform monitoring out of the box. In my environment I have the following  host deployment methodology for a “light touch” deployment of new hosts/patching;

  1. A PXE Network Installation Service which performs a zero touch base install of ESXi 6.5d (Refer to this document for configuring this up)
  2. A script that configures a host for Photon Platform and deploys the agent;
    1. Configures NTP, SSHD, vSwitches, iSCSI and installs the license
    2. Optionally deploys the Photon Agents and adds to the Cloud
  3. A script to place Photon Platform Cloud ESXi hosts into “Maintenance” and to patch and deploy the Agents

You could do this a number of ways but for my environment all I need to do is create a DHCP reservation for the new host, add a DNS entry and then PXE Boot into ESXi installer and a Kickstart script deploys the host with a basic deployment. I then customise the host with the following basic script:

Next you will need to design allocate some Network address space for the deployment. The following addressing is required;

  • Networking for ESXi Hosts (Management/iSCSI/etc.)
  • Static IP address for each of the Photon Management Controllers; you need at least one but for HA and load you should have several in different availability zones
  • Static IP for for each of the Lightwave Appliance; these are like Windows Domain Controllers so for HA you should have at least one in each site
  • Static IP for the Load Balancer Appliance
  • Static IP for the vSAN Management Appliance if deploying
  • An IP address for the Deployment Appliance

Step 2. Download the Deployment Appliance and Deploy

An appliance has been developed which you can deploy onto a host and basically roll the platform out via a YAML configuration file. The process is relatively straight forward however the YAML file needs some attention.

  1. Download the latest OVA from here and
  2. Connect to the ESXi host prepared and deploy the OVA (installer-ova-nv-1.2-dd9d360.ova), select storage, Agree to the EULA and enter the some networking details for the appliance and click Finish
  3. Once the deployment completes Power On the appliance and SSH to the device using the default credentials (root:changeme) and change the password 
Step 3. Create a YAML Deployment Configuration File 

The installer script uses a YAML configuration file which describes the environment to deploy the components. In my deployment I am not using VSAN (basically only using “free” components in this lab). For details on deploying VSAN please refer to this document.

The YAML file is made up of 6 distinct parts; three are mandatory (Compute, Lightwave, Photon) for the deployment and the others are optional (Load Balancer, vSAN, NSX):

  • Compute : ESXi hosts are defined
    • Pretty self-explanatory however the dns address is the IP of the Lightwave appliance and the allowed-datastores is the datastores that are presented through to Photon Platform
  • Lightwave: An VMWare open-source directory service : Lightwave furnishes a directory service, a certificate authority, a certificate store, and an authentication service. (Basically PSC for Photon). The options are mostly straightforward however; there are a few that need some explanation;
    • domain : The domain is the “SSO Authentication Domain”; it can be anything however for simplicity align it with a DNS namespace in your Organisational DNS hierarchy and I don’t recommend making it the same as your Active Directory Forest/Domain if you plan on using this as an authentication provider
    • password : Must meet a default password policy (1 upper case, 1 lower case, 1 digit, 1 special character)
    • hostref: This is the host (as defined in the compute) to deploy the appliance
    • datastore: The datastore to install the appliance onto on the host
    • network : This is the Port Group on the host the NIC will be connected
    • dns : In this section (and this section only) this is the actual DNS servers for your organisation
  • Photon Controller : The Management Appliance
    • img-store datastore: The datastore on an ESXi hypervisor for images for VMs and Kubernetes clusters, etc.
    • cloud hostref : ESXi hypervisor to manage/host the containers
    • hostref: This is the host (as defined in the compute) to deploy the appliance
    • datastore: The datastore to install the appliance onto on the host
    • network : This is the Port Group on the host the NIC will be connected
    • dns : Address is the IP of the Lightwave appliance(s)
  • Load Balancer :  A HAProxy Appliance; this is optional however provides a basic load balancing service for the Photon Controller
    • hostref: This is the host (as defined in the compute) to deploy the appliance
    • datastore: The datastore to install the appliance onto on the host
    • network : This is the Port Group on the host the NIC will be connected
    • dns : Address is the IP of the Lightwave appliance(s)

 The following is an example YAML for a basic single node deployment:

The following is a slightly more available solution with 2 Lightwave servers and Photon Controllers to give you an idea of scaling the Management Plane:

 

Once you have created a configuration file check it is valid with http://www.yamllint.com/

Step 4. Deploy the Platform

Finally; once you have a validated YAML configuration file logon to the device using SSH and create the YAML configuration file from Step 3 using vi/upload the file created to /root and execute photon-setup platform install -config config.yaml where config.yaml is your configuration file and watch the magic happen.

Once the installation is complete if it is successful you should be able to connect to the Web Interface by navigating to the IP Address of the Load Balancer (https://<ip-address-of-load-balancer>:4343). You MUST connect via the IP address and not the DNS name; if you attempt to connect using the FQDN after the deployment you will just end up in a redirect loop.

Next Steps

  1. Configuring Active Directory Integration with Lightwave
  2. Customizing the Lightwave SSO Sign-On Page

I will hopefully have a lot more content to come on this topic in the next few weeks; learning more and more everyday about the product so hopefully I can develop some content; currently looking at deploying Kubernetes-as-a-service, replacing the certificates with Enterprise PKI certificates, alternative load balancing strategies and automation of maintenance via the API/PowerShell, backup and restore and upgrades…so a lot of things to come :) Watch this space …

Further Reading and Troubleshooting Resources

  1. Photon Platform Wiki – Troubleshooting
  2. Running Containers at Scale with Photon Platform
  3. Photon Platform Getting Started Guide
  4. Cormac Hogan’s Photon Controller Blog Posts – Excellent detailed resources
  5. VMWare Comminity DevOps Forums