After a recent outage a customer was unable to bring their VCSA back online. A quick investigation showed that while the VCSA was powered on after the host failure, the virtual NIC was disconnected.
The customer simply logged into the host and edited the settings for the NIC to get things back online: Not quite. This VCSA was on a Distributed Port Group and was running on a cluster for which it was the management plane. Since the VCSA was offline, the DVS configuration couldn’t be modified. The VCSA couldn’t be added back to the existing Distributed Port Group. No Virtual Standard Switches existed on the system and all vmnics are assigned to the Distributed Virtual Switch.
How do you bring the VCSA back online? The easy/fast answer is an Ephemeral Distributed Port Group but this customer didn’t have any configured. More on this later.
We are left with a somewhat painful option: using the host CLI, we must force a vmnic out of the Distributed Virtual Switch. Using this vmnic, we’ll create a new Virtual Standard Switch and attach the VCSA to a Standard Port Group.
Create a new VSS:
esxcli network vswitch standard add --vswitch-name=<New-VSS-Name>
Get your existing DVS uplinks: (Be sure to record the port ID and DVS name)
esxcli network vswitch dvs vmware list
Remove a vmnic uplink from the DVS:
esxcfg-vswitch -Q <vmnicN> -V <Port-ID> <DVS-Name>
Add the vmnic to the newly created VSS:
esxcli network vswitch standard uplink add --uplink-name=<vmnicN> --vswitch-name=<New-VSS-Name>
Create a Port Group on the newly created VSS: (set the VLAN ID, if required)
esxcfg-vswitch -A <New-PortGroup> -v <VLAN-ID> <New-VSS-Name>
Now, in the ESXi GUI, simply edit the VM NIC and attach it to the newly created VSS Port Group.
Of course, once the recovery is complete, the VCSA and the stolen vmnic uplink must be moved back to the DVS. Painful, isn’t it?
Now that we’ve recovered the VCSA, lets implement the more graceful solution.
Enter the Ephemeral Distributed Port Group.
An Ephemeral Distributed Port Group is a special Port Group that does not adhere to the same binding restrictions. What does that mean? It’s a DVS Port Group that can be modified when vCenter is offline.
You add an Ephemeral Port Group the same way you’d add any other Distributed Port Group. Right click your DVS, select “Distributed Port Group” and click “New Distributed Port Group…”
Give your new Port Group a meaningful name. I usually match the corresponding Distributed Port Group and add EPH to denote it is the Ephemeral version of the Port Group. For example, if my Distributed Port Group is vSphere_VLAN_1362, I may use vSphere_VLAN_1362_EPH for the Ephemeral Port Group. Once you have a name, click Next.
On the configure settings page, you would configure the Port Group as usual, except for one change. On the Port binding dropdown, select “Ephemeral – no binding.” Enter your remaining settings as usual, for this example I set the VLAN ID to 1362, and click Next.
Confirm the details and click Finish.
Now, you may be tempted to move your VMs into this newly created Ephemeral Port Group and just let them live there. This is not recommended. The Ephemeral Port Group is designed for recovery only and I’ve seen some oddities when re-configuring or attaching advanced features to VMs that are in Ephemeral Port Groups. If you find yourself in a place where you need to use an Ephemeral Port Group, remember that switching between Port Groups with the same VLAN ID, is effectively non-disruptive.
When using this method of recovery, I have moved any VMs out of the Ephemeral Port Group within 24 hours of a completed recovery. That is not a hard and fast rule but something I use to remind myself to keep things in their proper Port Groups.
There are other places where I recommend the creation of Ephemeral Port Groups but it very much depends on customer requirements. For instance, a Windows shop will probably want to have an Ephemeral Port Group lying in wait for their Domain Controllers. An Ephemeral Port Group for DNS and DHCP services are other common recommendations.