Check vhid, password and virtual IP address Kemp LoadMaster

Introduction

Recently I was implementing a high available Kemp LoadMaster X15 system. I prepared everything, documented the switch and LM-X15 configuration, and created a VISIO to visualize it all. That, together with the migration and rollback scenario was all presented to the team lead and the engineer who was going to work on this with us. I told the team lead that all would go smoothly if my preparations were good and I did not make any mistakes in the configuration. Well, guess what, I made a mistake somewhere and had to solve a Kemp LoadMaster ad digest – md2=[31084da3…] md=[20dcd914…] – Check vhid, password and virtual IP address log entry.

Check vhid, password and virtual IP address

As, while all was working well, we saw the following entry inundate the system message file log:

<date> <LoadMasterHostName> ucarp[2193]: Bad digest – md2=[xxxxx…] md=[xxxxx…] – Check vhid, password and virtual IP address

Check vhid, password and virtual IP address
every second …

Wait a minute, as far as I know all was OK. The VHID was unique for the HA pair and we did not have duplicate IP addresses set anywhere on other network appliances. So what was this about?

Figuring out the cause

Well, we have a bond0 on eth0 and eth2 for the appliance management. We also have eth1 which is a special interface used for L1 health checks between the Loadmasters. We don’t use a direct link (different racks) so we configure them with an IP on a separate dedicated subnet. Then we have the bonds with the VLAN for the actual workloads via Virtual Services.

We have heartbeat health checks on bond0, eth1 and on at least one VLAN per bonds for the workloads.

Confirm that Promiscuous mode and PortFast are enabled. Check!
HA is configured for multicast traffic in our setup so we confirm that the switch allows multicast traffic. Check!

Make sure that switch configurations that block multicast traffic, such as ‘IGMP snooping’, are disabled on the switch/switch ports as needed. Check!

Now let’s look at possible causes and check our confguration:

So what else? The documentation states as possible other causes the following:

  1. There is another device on the network with the same HA Virtual ID. The LoadMasters in a HA pair should have the same HA Virtual ID. It is possible that a third device could be interfering with these units. As of LoadMaster firmware version 7.2.36, the LoadMaster selects a HA Virtual ID based on the shared IP address of the first configured interface (the last 8 bits). You can change the value to whatever number you want (in the range 1 – 255), or you can keep it at the value already selected. Check!
  2. An interface used for HA checks is receiving a packet from a different interface/appliance. If the LoadMaster has two interfaces connecting to the same switch, with Use for HA checks enabled, this can also cause these error messages. Disable the Use for HA checks option on one of the interfaces to confirm the issue. If confirmed, either leave the option disabled or move the interface to a separate switch.

I am sure there is no interference from another appliance. Check! As we had checked every other possible cause the line in red caught my attention. Could it be?

Time for some packet captures

So we took a TCP dump on bond0 and looked at it in Wireshark. You can make a TCP dump via debug options under System Log Files.

Check vhid, password and virtual IP address
Debug Options, once there find TCP dump.

Select your interface, click start, after 10 seconds or so click stop and download the dump

TCP dump

Do note that Wireshark identifies this as VRRP, but the LoadMaster uses CARP (open source) do set it to decode as CARP, that way you’ll see more interesting information in Info

No, not proprietary VRRP but CARP

Also filter on ip.dst == 244.0.0.18 (multicast address). What we get here is that on eth0 we see multicasts from eth1. That is the case described in the documentation. Aha!

Check vhid, password and virtual IP address
Aha, we see CARP multicasts from eth1 on eth0, that is what we call a clue!

So now what, do we need to move eth1 to another switch to solve this? Or disable the HA check? No, luckily not. Read on.

The fix for Check vhid, password and virtual IP address

No, I did not use one or more separate switches just to plug in the heartbeat HA interfaces on the LoadMasters. What I did is create a separate VLAN for the eth0 HA heartbeat uplink interfaces on the switches. This way I ensure that they are in a separate unicast group from the management interface uplinks on the switches

By selecting a different VLAN for the MGNT and Heartbeat interface uplink they are in different TV VLAN groups by default.

By default the Multicast TV VLAN Membership is per VLAN. The reason the actual workload interfaces did not cause an issue when we enabled HA checks is that these were trunk ports with a number of allowed VLANs, different from the management VLAN, which prevents this error being logged in the first place.

That this works was confirmed in the packet trace from the LM-X15 after making the change.

No more packets received from a different interface. Mission accomplished.

So that was it. The error was gone and we could move along with the project.

Conclusion

Well, I should have know as normally I do put those networks not just in a separate subnet but also make sure they are on different VLANs. This goes to show that no matter how experienced you are and how well you prepare you will still make mistakes. That’s normal and that’s OK, it means you are actually doing something. Key is how you deal with a mistake and that why I wrote this. To share how I found out the root cause of the issue and how I fixed it. Mistakes are a learning opportunity, use them as such. I know many organizations frown upon a mistake but really, these should grow up and don’t act this silly.

Custom Route Tables in Azure Virtual WAN are live!

Introduction

Last week, around August 26-27th 2020 Custom Route Tables in Azure Virtual WAN lit up in my Azure Tenants. Awesome news. Normally this should have happened the week of the 3d of August 2020. However, some delay happened. Now it is here is has come in silence. Which I find odd. This is a major capability that offers so much of what we need to make Azure Virtual WAN shine. But it is here, ready to shine at Microsoft Ignite

Custom Route Tables in Azure Virtual WAN
Watch my video on custom route tables in Azure virtual WAN

Custom Route Tables in Azure Virtual WAN

What do we have now? You can read up on Azure Virtual WAN route tables over here. I have made a video about all this which you can find on my blog and on my Vimeo channel. Please take a look for some walkthroughs and links to some other blog posts by me on Azure Virtual WAN.

LabELs

First of all, let’s discuss the labels. Labels logically group route tables. These are very helpful when propagating routes from connections to multiple route tables. The Default Route Table has a built-in label called ‘Default’. When you propagate connection routes to the ‘Default’ label, it automatically applies to all the Default Route Tables across every hub in the Virtual WAN.

Custom Route Tables in Azure Virtual WAN
Creating a label

Associations

Now, we can discuss associations. Each connection is associated with one route table. This means that the connection can send to the destination indicated as routes in the route table it is associated with. The routing configuration of the connection will show the associated route table. This is very important for connected VNETs. Multiple connections can be associated with the same route table. Note that all VPN, ExpressRoute, and User VPN connections are associated with the same (default) route table.

Custom Route Tables in Azure Virtual WAN
Association

By default, all connections are associated with the Default route table in a virtual hub. Each virtual hub has its own Default route table. You can add one or more static routes to the default Route table. Static routes take precedence over dynamically learned routes for the same prefixes.

Propagations

Last but not least, connections dynamically propagate routes to one or more route table. VPN, ExpressRoute, and User VPN connections propagate routes to the same set of route tables. With connections like a Site-2-Site VPN, Express Route, or Point-2-Site VPN, routes are propagated from the virtual hub to the on-premises router using BGP.

Custom Route Tables in Azure Virtual WAN
Propagations for Branches
Custom Route Tables in Azure Virtual WAN
Propagation for a connected VNET

A “None” route table is also available for each virtual hub. Propagating to the None route table implies that no routes are propagated from the connection.

Some need to ask

Finally, some customers need to reach out to support in order to get Azure Virtual WAN Custom route tables to light up.

Contact Microsoft support if and when needed to enable custom route tables for you.

As a result, I suggest you do so to start kicking the tires and then dive in deeper. This is a cornerstone technology for Azure networking going forward.

Automation

I have not found any documentation or guidance in regards to automation with PowerShell, Azure CLI, or ARM templates yet. I expect this to be forthcoming as this is much needed. As a result, I hope we’ll see this by Microsft Ignite 2020.

Conclusion

Azure Virtual WAN with the secured Virtual Hub and custom route tables offers the capabilities we have been waiting for. With these capabilities in place. Azure Virtual WAN is the future of Azure virtual networking. Therefore, I fully expect to hear a lot more about it during Microsoft Ignite in September. I personally will focus on this part of networking in the coming months. It is a stock part of any Azure initiative and project in the near future.

Azure Virtual WAN is for everyone

Do you need to be a Fortune 500 Global business?

When it comes to Azure Virtual WAN, you might have the impression it is only useful for huge, international entities.  Entities like the big Fortune 500 companies, with a significant, distributed global presence.

I can understand why. That is where the attention is going, and it makes for excellent examples to showcase. Also, the emphasis with SD-WAN has too often been about such cases. SD-WAN also enables economically feasible, reliable, and redundant connectivity for smaller locations and companies than ever before. My take is that Azure Virtual WAN is for everyone!

Azure Virtual WAN is for everyone

I would also like to emphasize that Azure Virtual WAN is so much more than just SD-WAN. That does not distract from SD-WAN’s value. SD-WAN is a crucial aspect of it in terms of connectivity to and from your Azure environment. I would even say that the ability to leverage Microsoft’s global network via Azure Virtual WAN is the most significant force multiplier that SD-WAN has gotten in the past year.

Network appliance vendors are signing on to integrate with Azure Virtual WAN for a good reason. It makes sense to leverage one of the biggest, best, and fastest global networks in the world to provide connectivity for your customers. 

One extreme use case would be to use Azure Virtual WAN only as an SD-WAN carrier just to connect your sites without using anything in Azure. An example of this would be a business that is still on-prem but wants to move to Azure. That is a good start. It modernizes connectivity between the locations while becoming ready to move workloads to Azure, where the landing zone is integrated into Azure Virtual WAN when it is time to do so.

A Medium Enterprise example

But let’s step back a minute. The benefits of Azure Virtual WAN go beyond SD-WAN deployments for multinational companies spanning the globe. Make no mistake about this. SD-WAN is also very interesting for Small and Medium Enterprises (SME), and the benefits of Azure Virtual WAN go beyond on-premises to Azure connectivity. It extends to connecting any location to any location.

Azure Virtual WAN is for everyone
SD-WAN leveraging Azure Virtual WAN and the Microsoft Global Network

On-premises connectivity is more than a data center, a corporate HQ, and branch offices with ExpressRoute and/or Site-to-Site VPN (S2S). It is also a user via a Point-to-Site VPN (P2S). All of these can be anywhere in the world but also distributed across your city, country, or continent. Think about what that means for “remote work by default” shops. Every individual, whether working with you as an employee,  partner, customer, consultant or contractor, can be connected to your Azure virtual WAN and your on-premises locations thanks to the any-to-any connectivity.

Some people might have an NGFW at home, depending on their role and needs. Many others will be fine with a point-to-site VPN, which serves both work-from-home profiles as well as road warriors.

People, if this Coronavirus global pandemic has not awakened you to this importance and possibility of remote work, I do not know what to tell you. Drink a lot more coffee?

For example, a national retailer, a school, a medical provider with lots of small local presences can all benefit from Azure Virtual WAN. When they merge with others, within or across the borders, Azure Virtual WAN with SD-WAN puts them in a great position to extend and integrate their network.

There is more to Azure Virtual WAN than SD-WAN

We have not touched on the other benefits Azure Virtual WAN brings. These benefits are there, even if you have no on-premises locations to connect. That would be another extreme, Azure Virtual WAN without any SD-WAN deployment. While the on-premises deployment of apps goes down over time, it will not go ways 100% for everyone. Also, even in a 100% cloud-native environment, having other connectivity options than over the internet and public services can help with security, speed, and cost reduction.

The Any-to-Any capabilities, the ease of use, leading to operational cost saving, are game-changing. Combined with the integration with Azure Firewall manager to create a Secure Virtual HUB and custom routing, it makes for a very flexible way of securing and managing network access and security.

Hybrid scenarios

Don’t think that SMEs will only have 2 to 5 subscriptions, or even less if they are just consumers of cloud services outsourced to a service provider, with one or a couple of vNETs.

If you do not have many subscriptions, you can still have a lot of vNETs. You create vNETs per application, business unit, etc. On top of that, in many cases, you will have development, testing, acceptance, and production environments for these applications.

You might very well do what we do, and what we see more of again, lots of subscriptions. You can create subscriptions for every application environment, business unit, etc. The benefits are clear and easy to measure distinction in ownership, responsibilities, costs, and security. That means a company can have dozens to hundreds of subscriptions that way. These can all have multiple vNETs. When an SME wants to protect itself against downtime, two regions come into play. That means that the hub-to-hub transitive nature excels.

Azure Virtual WAN is for everyone
Azure Virtual WAN – Hybrid scenario

Now, managing VNET peering, transit vNETs, Network Gateways, Firewalls, and route tables all become a bit of a chore fast when the environment grows. Rolling all that work into a convenient, centralized virtual global service makes sense to reduce complexity, reduce operational costs, and simplify your network architecture and design.

Going cloud first and cloud native

In a later stage, your organization can reduce its on-premises footprint and go for an all cloud-based approach. Be realistic, there might very well be needs for some on-premises solutions but Azure Stack has you covered there. You can leverage Azure Stack HCI, Edge, or even hub or those needs but still integrate deployment, management, operations, and monitoring into Azure.

Azure Virtual WAN is for everyone
Azure Virtual WAM – Cloud first scenario

Global Transit Architecture with Azure Virtual WAN

I still need to drive the capabilities and benefits of the Global Transit Architecture with Azure Virtual WAN home for you. For one, it is any-to-any by default. You can control and limit this where needed, but it works automagically for you out of the box. Second, this is true for ExpressRoute, S2S VPN, P2S VPN, VNET peers, and virtual hubs in all directions.

  • Branch-to-VNet
  • Branch-to-branch
    • ExpressRoute Global Reach and Virtual WAN
  • Remote User-to-VNet
  • Remote User-to-branch
  • VNet-to-VNet
  • Branch-to-hub-hub-to-Branch
  • Branch-to-hub-hub-to-VNet
  • VNet-to-hub-hub-to-VNet

This means that a user with a P2S VPN connected to a virtual hub has access to a datacenter that connects to that same hub or another one within the same Virtual WAN. You can go crisscross all over the place. I love it. Remember that we can secure this, control this.

Azure Virtual WAN is for everyone
Any-to-Any – crisscross along locations and connection types – Image adapted from MSFT

Think about that for a moment. When I am on the road connected via a P2S VPN to an azure virtual hub, I can reach my datacenter (ExpressRoute), my office, store, factory, and potentially even my home office (S2S VPN). Next to that, I can reach all my vNETs. It is the same deal when I am working from home or in the office, store, or factory. That is impressive. The default is any-to-any, automagically done for you. But you can restrict and secure this to your needs with custom routing and a secure virtual hub (Azure Firewall Manager).

Conclusion

The benefits of Azure Virtual WAN are plenty, for many scenarios in large, medium and small enterprises. So, I invite you to take a better look. I did. As a result, I have been investing time in diving into its possibilities and potential. I will be presenting on this topic to share my insights into what, to me, is the future of Azure networking. Do not think this is only for the biggest corporations or organizations.

Spline

Back in 2010, I introduced the first 10Gbps networking into my solutions. Cost effective and focused on single rack needs. I built my first Leaf-Spine based network somewhere in 2011-2012. Nothing major, but it did lead to the most cost-effective and efficient redundant 10Gbps network in every rack. The solution enabled cross rack and cross row connectivity (3 rows of 3 racks). As we were prepping for Windows Server 2012 we made sure we had DCB in that design covered. We loved it.

We isolated all the needs of the ops team from corporate networking to enable them “to own the stack”. Ops remained the owner of the entire stack. Network, storage, virtualization, data protection, core infrastructure etc. That meant we could do RoCE right and got the networking done at a great value for money ratio. Owning the stack has always been the way to avoid expensive silos. The only people who didn’t like it were those that made money or derived political power by controlling resources. We got shit done fast, efficient and effective at prices well below what people paid for a lot less “service”.

The leaf-spine design has remained a favorite of mine. Perfection is not of this world leaf-spine has challenges just like anything, but that doesn’t distract from the usability to build great solutions. One challenge that always remains is real fair load balancing, congestion, blocking … Depending on your size with a decent deployment you might never know of these challenges let alone how they are solved. With the extension of the network to the clouds, it remained a solid choice in a hybrid world. It also formed the basis for more cloud-like network designs on-premises. Some variations on leaf-spine exist and design choices depend on the context, needs, and possibilities.

Somewhere in those years the term “spline” made its appearance in the leaf-spine world and I was puzzled for a moment. What is a Spline? Is nothing more than the smallest possible form of a leaf-spine in a single tier, which is quite popular as it can integrate into existing environments by itself and enable scenarios some big corporations network team won’t or can’t ever enable. Basically, what I did in the early days to get 10Gbps into existing environments without too much pushback. So, it’s both a technical solution and a diplomatic tool as well as a nice marketing term.

What is Spline?

As said, a Spline is nothing more than the smallest possible form of a leaf-spine. That comes down to only 2 switches in a single tier. In this single tier, these 2 switches combine the roles of the leaf and spine, hence the name “Spline”. This is a nice marketing term for two small switches with ample of ports & bandwidth for a small sized deployment were leaf-spine would be overkill and cost prohibitive.

The switches are 1 or maximum 2 units high-density multi-rate devices. This could be anything between 1/10/25/40/50/100 Gbps depending on the model and vendors, available modules and cables used. It’s a viable choice for smaller deployments when one can have some margin for growth and wiggle room.


Mellanox SN2010 & SN2100 are prime example of great switches for a Spline


The modular DELL S6100 ON is another example of a switch to build splines with.

A single tier provides for the lowest latency possible by definition, no tiers need to be crossed – it doesn’t get any better. Predictable (it is always the same) distance and bandwidth is there as, again, there are no tiers to cross.

You can use layer 2 (MLAG, VLT, vPC) or layer 3 (ECMP) interlinks. You don’t lose any flexibility or options here. As such, it will work with traditional virtualization, containers, HCI and with Routing on Host, network virtualization.

What you lose is scale out. You need the leaf-spine to scale out bandwidth and port count in a flexible way. You can scale up by using bigger switches.

Top use cases for spine

Actually, many smaller solutions probably use a “spine” with layer 2 networking for S2D deployments. It’s easy to get 1 to 4 of S2D clusters in a rack depending on the size of the clusters and the number of ports & bandwidth in the switches. With 2 redundant smaller switches that don’t take up more than 1 or 2 units to provide them with ample 25Gbps ports or 100Gbps port you can split up to your needs.


Another prime candidate for a spline is StarWind their storage solutions. They have great offerings for varied needs and don’t force every need into the one type fits all solution of HCI. But in the end, you can use them in any environment where you need lots of bandwidth, high throughput, and low latency.

When and where I don’t like Spine

I don’t like large Spines. They are the same old story with potentially huge chassis switches that bring back all the drawbacks but they have been flattened into a single tier. They are prohibitively expensive, so normally there’s only 2 with a huge amount of ports leading to cabling expenses and logistical issues depending on the data center you’re in. Upgrading one of those 2 huge chassis switches tend to bring down a large part of your network (potentially half) and carries a greater risk. So, we’re back to why leaf-spine became so popular and remains popular.

When I look at it from a reverse perspective it’s like someone took a 1 rack or one deployment stamp design and created a giant version of it. All this in an attempt to scale it up instead of out. In reality, it was probably giant switches looking for a new sales pitch. It might work for some, but I would not design a solution based on this. It might have a familiar look and feel to some people but I never liked them very much, design-wise, concept-wise, money wise … but that’s me. Where you can use them if you have the appetite for that is in client networking. Still not a big fan but hey that’s where I tolerate stacks when needed (limited uplinks) as it doesn’t impact 24/7 operations as much and the clients accept the downtime & risk.

Conclusion

A spline is a great design for a rack-sized deployment. In a pinch you can cross racks or even rows but the cabling of that all can become costly. Depending on what’s allowed and possible in your data-center it might not even be an option. Pro Tip: choose your location wisely and never ever tolerate the one size fits all approach of a hosting provider, corporate network team or co-loco. That basically always means they are optimizing for their needs and budgets, not yours.

When using bigger deployments or where growth is very likely, I go for small leaf-spine deployments instead of scaled-up splines.

A Spline can be converted into part of a leaf-spine, so it allows for change and evolution in your network, you normally won’t lose your investment.

What I do not like about spline is when it is used to leverage those huge chassis switches again. It brings all the drawbacks in cost, lack of scale-out, limited redundancy and higher risk back into the picture. Simply flattening a bad idea in a single tier doesn’t make it great.