I’ve been pinged a few times over the years with people saying that the new protected network feature does not work for them. This setting is set per vNIC of the virtual machine.
The issue lies in how & what people test, bar any number of other reasons why a live migration might not start or complete. What people tend to do is disable a NIC to which the vSwitch is connected. But a Protected Network is about media sense loss detection of network disconnects and this requires the NIC to be actually there and enabled. Remember, we’re talking about the NIC on the host connected to the virtual switch. A physical link failure here, meaning that the virtual switch the protected virtual network adapter no longer has network connectivity, will lead to all the VMs with the protected network enabled do be live migrated to another node in the cluster that still has a connected virtual switch for the same network. The latter is to avoid senseless virtual machine migrations to other nodes that might also have lost connectivity due to a failed physical switch.
So the point is that testing by disabling the NIC in the OS will not do. You need to unplug the cables to the virtual switch or disable the port on the switch or even shutdown the switch (a bit drastic).
Do note that it can take a little time for the live migration to kick in, it varies a bit, but it beats having to wait for the issue to be resolved. You’ll see event id 1255 logged when the VMs lose network connectivity:
In this day and age with NIC teaming to redundant switches & the fact that you might be using converged networking these tests aren’t as simple as you might think. Also don’t pull out all if the cables used for clustering if you want the cluster to be able to help you out here with a live migration. Because when the other cluster nodes can’t talk to the node your testing in any way it will be kicked out of the cluster, the VMs will go down, be moved to another node and started. This might seem obvious but if you a are using a teamed 10Gbps solution in a converged setup this might cause exactly that.
Another thing to note is that if you have a virtual switch with a dedicated backup network exposed to hosts & VMs that can tolerate down time you might want to disable protected networks on that vNIC as you don’t want to live migrate the VMs of when that network has an issue. It all depends on your needs & tastes.
Last but not least please behave, and don’t do anything silly in production when testing this. Be careful in your testing.