Check vhid, password and virtual IP address Kemp LoadMaster

Introduction

Recently I was implementing a high available Kemp LoadMaster X15 system. I prepared everything, documented the switch and LM-X15 configuration, and created a VISIO to visualize it all. That, together with the migration and rollback scenario was all presented to the team lead and the engineer who was going to work on this with us. I told the team lead that all would go smoothly if my preparations were good and I did not make any mistakes in the configuration. Well, guess what, I made a mistake somewhere and had to solve a Kemp LoadMaster ad digest – md2=[31084da3…] md=[20dcd914…] – Check vhid, password and virtual IP address log entry.

Check vhid, password and virtual IP address

As, while all was working well, we saw the following entry inundate the system message file log:

<date> <LoadMasterHostName> ucarp[2193]: Bad digest – md2=[xxxxx…] md=[xxxxx…] – Check vhid, password and virtual IP address

Check vhid, password and virtual IP address
every second …

Wait a minute, as far as I know all was OK. The VHID was unique for the HA pair and we did not have duplicate IP addresses set anywhere on other network appliances. So what was this about?

Figuring out the cause

Well, we have a bond0 on eth0 and eth2 for the appliance management. We also have eth1 which is a special interface used for L1 health checks between the Loadmasters. We don’t use a direct link (different racks) so we configure them with an IP on a separate dedicated subnet. Then we have the bonds with the VLAN for the actual workloads via Virtual Services.

We have heartbeat health checks on bond0, eth1 and on at least one VLAN per bonds for the workloads.

Confirm that Promiscuous mode and PortFast are enabled. Check!
HA is configured for multicast traffic in our setup so we confirm that the switch allows multicast traffic. Check!

Make sure that switch configurations that block multicast traffic, such as ‘IGMP snooping’, are disabled on the switch/switch ports as needed. Check!

Now let’s look at possible causes and check our confguration:

So what else? The documentation states as possible other causes the following:

  1. There is another device on the network with the same HA Virtual ID. The LoadMasters in a HA pair should have the same HA Virtual ID. It is possible that a third device could be interfering with these units. As of LoadMaster firmware version 7.2.36, the LoadMaster selects a HA Virtual ID based on the shared IP address of the first configured interface (the last 8 bits). You can change the value to whatever number you want (in the range 1 – 255), or you can keep it at the value already selected. Check!
  2. An interface used for HA checks is receiving a packet from a different interface/appliance. If the LoadMaster has two interfaces connecting to the same switch, with Use for HA checks enabled, this can also cause these error messages. Disable the Use for HA checks option on one of the interfaces to confirm the issue. If confirmed, either leave the option disabled or move the interface to a separate switch.

I am sure there is no interference from another appliance. Check! As we had checked every other possible cause the line in red caught my attention. Could it be?

Time for some packet captures

So we took a TCP dump on bond0 and looked at it in Wireshark. You can make a TCP dump via debug options under System Log Files.

Check vhid, password and virtual IP address
Debug Options, once there find TCP dump.

Select your interface, click start, after 10 seconds or so click stop and download the dump

TCP dump

Do note that Wireshark identifies this as VRRP, but the LoadMaster uses CARP (open source) do set it to decode as CARP, that way you’ll see more interesting information in Info

No, not proprietary VRRP but CARP

Also filter on ip.dst == 244.0.0.18 (multicast address). What we get here is that on eth0 we see multicasts from eth1. That is the case described in the documentation. Aha!

Check vhid, password and virtual IP address
Aha, we see CARP multicasts from eth1 on eth0, that is what we call a clue!

So now what, do we need to move eth1 to another switch to solve this? Or disable the HA check? No, luckily not. Read on.

The fix for Check vhid, password and virtual IP address

No, I did not use one or more separate switches just to plug in the heartbeat HA interfaces on the LoadMasters. What I did is create a separate VLAN for the eth0 HA heartbeat uplink interfaces on the switches. This way I ensure that they are in a separate unicast group from the management interface uplinks on the switches

By selecting a different VLAN for the MGNT and Heartbeat interface uplink they are in different TV VLAN groups by default.

By default the Multicast TV VLAN Membership is per VLAN. The reason the actual workload interfaces did not cause an issue when we enabled HA checks is that these were trunk ports with a number of allowed VLANs, different from the management VLAN, which prevents this error being logged in the first place.

That this works was confirmed in the packet trace from the LM-X15 after making the change.

No more packets received from a different interface. Mission accomplished.

So that was it. The error was gone and we could move along with the project.

Conclusion

Well, I should have know as normally I do put those networks not just in a separate subnet but also make sure they are on different VLANs. This goes to show that no matter how experienced you are and how well you prepare you will still make mistakes. That’s normal and that’s OK, it means you are actually doing something. Key is how you deal with a mistake and that why I wrote this. To share how I found out the root cause of the issue and how I fixed it. Mistakes are a learning opportunity, use them as such. I know many organizations frown upon a mistake but really, these should grow up and don’t act this silly.

Azure App Service now supports NAT Gateway

Azure App Service now supports NAT Gateway

It almost snuck by me but on November 15th, 2020 Microsoft announced that a web app in Azure App Service now supports NAT Gateway. That might not seem like a big deal but it can come in quite handy! Also, we have been waiting for this for quite a while.

Azure App Service now supports NAT Gateway
NAT Gateway no also supported by web apps in Azure App Service

Why is this useful?

For one the NAT Gateway provides a dedicated, fixed IP address for outgoing traffic. That can be quite handy for whitelisting use cases. You could use Azure Firewall if you want to control egress traffic over a dedicated fixed IP address by FQDN but then you miss out on the second benefit, scalability. On top of that Azure Firewall is expensive overkill just to get a dedicated IP for outbound traffic.

An Azure NAT Gateway also helps with scaling the web application. Because it delivers 64000 outbound SNAT usable ports. The Azure App Service itself has a limited number of connections you can have to the same address and port.

How to use a NAT Gateway with Azure App Service

  1. Integrate your app with an Azure virtual network. You need to use Regional VNet Integration in order to leverage an Azure NAT Gateway. Regional VNet Integration is available for web apps in a Standard, Premium V2 or Premium V3 App Service plan. It will work with both Function apps and web or API apps. Note some Standard App Service plans cannot use Regional VNet Integration if they run on older App Service deployments on older hardware stamps. See Clarify if PremiumV2 is required for VNET integration.
  2. Route all the outbound traffic into your Azure virtual network
  3. Provision a NAT Gateway in the same virtual network and configure it with the subnet used for VNet Integration.

From now on outbound (egress) traffic initiated by your web app in Azure App Service will go out over the IP address of the NAT Gateway.

Have fun with it.

Replacing a failed disk in a stand-alone Storage Spaces with Mirror Accelerated Parity

Introduction

I use Storage spaces in various environments for several use cases, even with clients (see Move Storage Spaces from Windows 8.1 to Windows 10). In this blog post, we’ll walk through replacing a failed disk in a stand-alone Storage Spaces with Mirror Accelerated Parity. have a number of DELL R740XD stand-alone servers with a ton of storage that I use as backup targets. See A compact, high capacity, high throughput, and low latency backup target for Veeam Backup & Replication v10 for a nice article on a high-performance design with such servers. They deliver the repositories for the extents in Veeam Backup & Replication Scale-out Backup Repositories. They have MVME’s for the performance tier in Storage Spaces with Mirror Accelerated Parity.

Even with the best hardware and vendor, a disk can fail and yes, it happened to one of our NVME drives. Reseating the disk did not help and we were one disk shot in device manager. So yes the disk was dead (or worse the bus where it was seated, but that is less likely).

Replacing a failed disk in a stand-alone Storage Spaces with Mirror Accelerated Parity

So let’s take a look by running

Get-PhysicalDisk 

I immediately see that we have an NVME that has lost communication, it is gone and also no longer displays a disk number. It seems to be broken.

Replacing a failed disk in a stand-alone Storage Spaces with Mirror Accelerated Parity

That means we need to get rid of it in the storage spaces pool so we can replace it.

Getting rid of the failed disk properly

I put the disk that lost communication into a variable

$ProblemDisk = Get-PhysicalDisk | where-object OperationalStatus -like *lost*

We than retire the problematic disk

$ProblemDisk | Set-PhysicalDisk -Usage retired

We then run Get-PhysicalDisk again and yes, we see the disk was retired.

Replacing a failed disk in a stand-alone Storage Spaces with Mirror Accelerated Parity

Now grab that retired disk and save it to a parameter by running

$RetiredDisk = Get-PhysicalDisk | where-object  Usage -like *Retired*

Now remove the retired disk from the storage pool by running

Get-StoragePool -FriendlyName BackupStoragePool | Remove-PhysicalDisk -PhysicalDisk $RetiredDisk

Let this complete and check again with Get-PhysicalDisk, you will see the problematic disk has gone. Note that there are only 7 NVME disks left.

Replacing a failed disk in a stand-alone Storage Spaces with Mirror Accelerated Parity

It does not show an unrecognized disk that is still visible to the OS somehow. So we cannot try to reset it to try to get it back into action. We need to replace it and so we request a replacement disk with DELL support and swap them out.

Putting the new disk into service

Now we have our new disk we want it to be added to the storage pool. You should now see the new disk in Disk Manager as online and not initialized. Or when you select Add Physical Disk in the storage pool in Server Manager.

But we were doing so well in PowerShell so let’s continue there. We will add the new disk to the storage pool. Run

$DiskToAddToPool = Get-PhysicalDisk | where-object  Canpool -eq True

Get-StoragePool -FriendlyName BackupStoragePool | Add-PhysicalDisk -PhysicalDisk $DiskToAddToPool

When you run Get-PhysicalDisk again you will see that there are no disks left that can be pooled, meaning they are all in the storage pool. And we have 8 NMVE disks agaibn Good!

Now run

Optimize-StoragePool -FriendlyName BackupStoragePool

And let it run. You can check up on its progress via this little script.

while(1 -eq 1) {
Get-storagejob
write-host 'Wait'
start-sleep -seconds 10
} 
Keeping an eye on the storage pool optimization process

That’s it. All is well again and rebalanced. It also ensures the storage capacity contributed by the replaced disk will be available in the performance tier when I want to create an extra virtual disk. Storage Spaces at its best giving me the opportunity to leverage NVMe with other disks while maximizing the benefits of ReFS.

For more info on stand-alone storage space and PowerShell, you can find more info in Deploy Storage Spaces on a stand-alone server

Conclusion

As you have seen replacing a failed disk in a stand-alone Storage Spaces with Mirror Accelerated Parity is not too hard to do. You just need to wrap your head around how storage spaces work and investigate the commands a little. For that I recommend practicing on a Virtual Machine.

LoadMaster LMOS 7.2.52 firmware feature enhancements

LoadMaster LMOS 7.2.52 firmware feature enhancements

Let’s take a look at some of the recently released LoadMaster LMOS 7.2.52 firmware feature enhancements. You can read the release notes on their web site. Next to security updates there are two entries in the new features and change notices that caught my attention. The first is the new feature that delivers the Ability to use SNI in SubVS, as well as SNI-Hostname Pass-Through. Secondly, it is the enhancement that we can now configure Per-VS Health Check Settings. Both are very welcome as I have to deal with such scenarios or needs frequently. So let’s take a look.

Ability to use SNI in SubVS, as well as SNI-Hostname Pass-Through

The Server Name Indication (SNI) feature has been enhanced to support the following:

  • The ability to pass through the original hostname as the SNI hostname to the Real Server.
  • The ability to specify a different (manual) SNI hostname per SubVS. This is the same as the previous functionality to specify this on the parent Virtual Service (Reencryption SNI Hostname) but on the SubVS level with content switching.


These new features help make scenarios, where you may want to consolidate as many services as possible to the least amount of IP addresses, easier and less confusing to implement.

The Pass-through SNI hostname check box is available in the SSL Properties section of the Virtual Service modify screen. When this is enabled and when re-encrypting, the received SNI hostname is passed through as the SNI to be used to connect to the Real Server. If the Virtual Server has a Reencryption SNI Hostname set, this overrides the received SNI.

LoadMaster LMOS 7.2.52 firmware feature enhancements
The pass-through SNI hostname can be overridden at the VS level

It is also possible to set the re-encryption SNI hostname in a SubVS (in the Basic Properties section). If it is set in a SubVS, this overrides the parent Virtual Service value and/or the received SNI value.

LoadMaster LMOS 7.2.52 firmware feature enhancements
You can also define the Reencryption SNI Hostname at the subVS level

Per-VS Health Check Settings

Until now, health check settings were global-only, located on the Rule & Checking > Check Parameters UI page: Check Interval, Connect Timeout, and Retry Count

LoadMaster LMOS 7.2.52 firmware feature enhancements
We still have global Service Check Parameters, but the can be overridden now at the VS and subVS level

Now, we can also configure these settings on the Virtual Service Real Servers tab. This means we can tune the health check behavior for specific VSs and SubVSs. By default, real server health checks use the global settings. The VS or SubVS settings change as the global settings change. So the default behaves as in previous releases.

LoadMaster LMOS 7.2.52 firmware feature enhancements
You can mix: customize the check interval, use the defautl value for the timeout and leverage the global setting for the retry value.

Once you change a check parameter on a VS or SubVS level, however, those custom VS or SubVS settings will remain unchanged regardless of changes made to the global setting. The UI indicates whether the currently in-use value is the global value or is set to a custom value.

Conclusion

I am really happy with the enhancements that are introduced with FW 7.2.52. The two I highlighted above are the ones I really come up against in the future. Not having a re-encryption SNI hostname on the SubVS level was something we could workaround (see How to Re-Encrypt Multiple SNIs on the Same IP and Port with Kemp LoadMaster – PART 1 and How to Re-Encrypt Multiple SNIs on the same IP and port with a Kemp LoadMaster – PART 2). But having this feature on the subVS makes life a lot easier.

Being able to set the real server health check interval, timeout, and retry count helps us in those scenarios where we have services with different needs. These global setting where always a balancing act between all the services. So this capability is very welcome as well.

It is great to see Kemp Technologies their offerings evolve and improve. They have established themselves quite well over the years. It makes me happy that way back I chose them for their price/value and excellent support. I still remember the first HA pair (LM-2200) I ever deployed (2011) for a real time reference GPS position system. That was the beginning of a very succesful journey building solutions with LoadMasters using both physical and virtual appliances.