Load balancing UDP for a RD Gateway farm with a KEMP Loadmaster

When implementing load balancing for RD Gateway we must take care not to forget load balancing the UDP traffic. Now your RDP Connection will still work over HTTPS alone if you forget this, but you’ll miss out on the benefits.

  • Better experience of bad, unreliable network connections with high packet loss
  • Better experience with high end graphics and in general a better graphical experience over WAN links.

As many people have load balanced their gateways since Windows Server 2008 (R2) when UDP was not into play yet and as things work without people might forget. The most important thing you need to know is that when leveraging UDP for RDP 8/8.1 the UDP session traffic has to leverage Direct Server Return (DSR) for the real servers configuration when we configure load balancing for a RD gateway farm with a KEMP Loadmaster. I’m focusing on the UDP part here, not the HTTPS part. That’s been done enough and the Kemp info on that is sufficient. The UDP part could do with some extra info.

The reason for this is that when UDP is leveraged for high end graphics we want to avoid sending all that graphical network traffic the load balancer. There is no real added value being performed there in this UDP use case but the load might get quite high. This is where DSR is leveraged wen configuring the Loadmaster. That means we also need to configure our real servers to uses Direct Return as the forwarding method. When you forget this you’ll lose UDP with RDP 8.1 but you might not notice immediately. If you’re not looking for it as the HTTP connection alone will let you connect and work, albeit with a reduced experience.

To read more on why it’s done this way (even if it seems complex and has drawbacks) see http://kemptechnologies.com/ca/white-papers/direct-server-return-it-you/ you’ll notice that for graphics it is great idea. By selecting Direct Server Return as the  forwarding method (see later) changes the destination MAC address of the incoming packet on the fly (very fast) to the MAC address of one of the real servers. When the packet reaches the real server it must think it owns the VS IP address, which it doesn’t. So we use the loopback adapter to let the real server reply as if it does but we don’t respond to ARPs as that would cause issues with the load balancer who has the real IP of the virtual service. That’s where the 254 metric we configure in the demo below comes into play.  Note that  the real server responds over it normal NIC. Which is great and it helps with firewall rules not ruining the party. That’s why with DSR which leverages the the loopback adapter on the RD Gateway servers also requires you to configure the weak host / strong host behavior for the network configuration on those servers, it’s not answering itself! I’ll not go into details on this here but basically since Windows Vista and Windows Server 2008 the security model has change from weak host to strong host. This means that a system (that is not acting as a router) cannot send or receive any packets on a given interface unless the destination/source IP in the packet is assigned to the interface. In the “weak host” model, this restriction does not apply. Read more about this here. Let’s walk through this UDP/DSR/weak host setup & configuration.

On your Loadmaster you’ll create a virtual service for UDP traffic.

  • Select Virtual Services > Add New.
  • Enter the IP address of your RD Gateway Farm
  • Set 3391 as the Port.
  • Select udp for the Protocol.
  • Click Add this Virtual Service.

Open up the Standard Options to configure those

image

  • We don’t need layer 7 as the UDP connections are tied to the HTPP connection and they will spawn and die with that one.
  • We select Source IP Address as the Persistence Mode as the RD Gateway needs persistence to guarantee the connection stay together on the same RD Gateway server. Set the time out value no to high so it isn’t remembered to long.
  • We select least connections as that’s the best option in most cases, let the farm node with the least load take on new connections. This is handy after down time for example.

Now head over to the Real Servers section

image

  • Make sure the Real Server Check parameters is set to ICMP ping, which is what the LoadMaster uses to check if the RD Gateway servers are alive.
  • Click Add New to add an  RD Gateway server, you’ll do this for each farm member.

image

  • Enter the Real Server Address for each RD Gateway.
  • Enter 3391 as the Port.
  • Select Direct return as the Forwarding method.
  • Click Add This Real Server.

When you’re done it looks like this:

image

So now we need to check if the real servers are seen as on line and healthy …

image

If one RD Gateway server is down or has an issue you see this … no worries the LoadMaster sends all clients to the other farm member server.

image

Configure the  RD Gateway farm servers to work with DSR

We’re not done yet, we need to configure our RD Gateway servers in the farm to work with DSR.

Go to Device Manager, right-click on the computer name and select Add legacy hardware

image

Click next on the welcome part of the wizard …

image

Select “Install the hardware that I manually select from a list (Advanced)” and click Next …image

Scroll down to network adapters, select it and click Next …image

Under Manufacturer choose Microsoft and as Network Adapter scroll down to Microsoft KM-TEST Loopback Adapter, select it and click Next.

image

Click Next to install it …image

image

Click next to close the Wizard.image

 

Now go to  and change the name so you can easily identify the loopback adapter …imageimage

In the properties of the loopback adapter we disable everything we don’t need. In this case, we only need IPV4 and nothing else. We also need to configure the TCP/IP settings for the loopback adapter. So open up the TCP/IP v4 properties of that NIC …image

Enter the IP address of the Virtual Service for UDP on the load master and, very important enter a subnet mask of 255.255.255.255 for the loopback address. It’s a subnet of 1 host, the VIP IP address. Do not enter a gateway!

image

Now go to the advanced setting and deselect Automatic metric and fill out 254. This step prevents the server to respond to ARP requests for the MAC of the VIP with the MAC of the loop back adapter.

image

Also uncheck “Register this connection”s address in DNS” to avoid any name resolution problems for the real servers.

image

Finally disable NETBIOS over TCP/IP.

image

What we are doing with all the above is preventing any issues with normal network traffic to this real servers being affected by the loopback adapter who’s one and only function is to enable DSR and nothing else. It’s a bit “paranoid” but it pays to be and prevent problems.

Dealing with Strong Host / Weak Host setting in W2K8 and higher

We now still need to deal with the strong host security model and allow the LAN interface to receive traffic from the KEMP and allow the KEMP to receive and send traffic form/to the LAN interface. This is done by executing the following commands:

netsh interface ipv4 set interface LAN weakhostreceive=enabled
netsh interface ipv4 set interface KEMP-DSR-LOOPBACK weakhostreceive=enabled
netsh interface ipv4 set interface KEMP-DSR-LOOPBACK weakhostsend=enabled

That’s it. You should now have HTTP/UDP connections in your RD Gateway monitoring when using a load balancer and set it up correctly.  Remember if this isn’t configured correctly you’ll still connect but you lose the benefits the UDP connections offer.

Now another thing you need to be aware of in your RD Gateway configuration is that for UDP  to work with DSR is that the UDP Transport Settings need to be configured for “all unassigned” IP addresses. Other wise DRS won’t work and you’ll lose UDP. This make sense, you’ll receive traffic on the VIP on your real servers. It’s just like DSR with a web server where in IIS you’ll bind both the LAN and the loopback adapter to port 80 or 443 for the site.

image

We can see that one client is connected via RDSGW01 to two servers (Viking and Spartan) leveraging HTTP and UDP. The load balancing is done via the KEMP Loadmasters in  geo-redundant fashion.

image

Yes, my geo load balanced RD Gateway Server farms are providing UDP support for the servers and clients we  RDP in to.

image

Combined with those servers and clients being spread amongst the sites provides for enough business continuity to keep the shop running when a site fails, so it’s more than just connectivity!

A highly redundant Application Delivery Controller Setup with KempTechnologies

Introduction

The goal was to make sure the KempTechnologies LoadMaster Application Delivery Controller was capable to handle the traffic to all load balanced virtual machines in a high volume data and compute environment. Needless to say the solution had to be highly available.

A highly redundant Application Delivery Controller Setup with KempTechnologies

The environment offers rack and row as failure units in power, networking and compute. Hyper-V clusters nodes are spread across racks in different rows. Networking is high to continuously available allowing for planned and unplanned maintenance as well as failure of switches. All racks have redundant PDUs that are remotely managed over Ethernet. There is a separate out of band network with remote access.

The 2 Kemp LoadMasters are mounted a different row and different rack to spread the risk and maintain high availability. Eth0 & Eth2 are in active passive bond for a redundant management interface, eth1 is used to provide a secondary backup link for HA. These use the switch independent redundant switches of the rack that also uplink (VLT) to the Force10 switches (spread across racks and rows themselves). The two 10GBps ports are in an active-passive bond to trunked ports of the two redundant switch independent 10 Gbps switches in the rack. So we also have protection against port or cable failures.

image

Some tips: Use TRUNK for the port mode, not general with DELL switches.

This design allows gives us a lot of capabilities.We have redundant networking for all networks. We have an active-passive LoadMasters which means:

  • Failover when the active on fails
  • Non service interrupting firmware upgrades
  • The rack is the failure domain. As each rack is in a different row we also mitigate “localized” issues (power, maintenance affecting the rack, …)

Combine this with the fact that these are bare metal LoadMasters (DELL R320 with iDRAC –  see Remote Access to the KEMP R320 LoadMaster (DELL) via DRAC Adds Value) we have out of band management even when we have network issues. The racks are provisioned with PDU that are managed over Ethernet so we can even cut the power remotely if needed to resolve issues.

Conclusion

The results are very good and we get “zero ping loss” failover between the LoadMaster Nodes during testing.

We have a solid, redundant Application Deliver Controller deployment that does not break the switch independent TOR setup that exists in all racks/rows. It’s active passive on the controller level and active-passive at the network (bonding) level. If that is an issue the TOR switches should be configured as MLAGs. That would enable LACP for the bonded interfaces. At the LoadMaster level these could be configured as a cluster to get an active-active setup, if some of the restrictions this imposes are not a concern to your environment.

Important Note:

Some high end switches such as the Force10 Series with VLT support attaching single homes devices (devices not attached to both members on an VLT). While VLT and MLAG are very similar MLAGs come with their own needs & restrictions. Not all switches that support MLAG can handle single homed devices. The obvious solution is no to attach single homed devices but that is not always a possibility with certain devices. That means other solutions are need which could lead to a significant rise in needed switches defeating the economics of affordable redundant TOR networking (cost of switches, power, rack space, operations, …) or by leveraging MSTP and configuring a dedicates MSTP network for a VLAN which also might not always be possible / feasible so solve the issue. Those single homed devices might very well need to be the same VLANs as the dual homed ones. Stacking would also solve the above issue as the MLAG restrictions do not apply. I do not like stacking however as it breaks the switch independent redundant network design; even during planned maintenance as a firmware upgrade brings down the entire stack.

One thing that is missing is the ability to fail over when the network fails. There is no concept of a “protected” network. This could help try mitigate issues where when a virtual service is down due to network issues the LoadMaster could try and fail over to see if we have more success on the other node. For certain scenarios this could prevent long periods of down time.

NIC Firmware/Driver Updates Reset Your RSS/VMQ optimizations

When optimizing your RSS/VMQ settings for maximum performance you’ll normally (I hope) do this in PowerShell. Save that script with some comments on why you configure it that way and make it part of your Hyper-V host deployment scripts

Why? Automation is king but you’ll need it again for sure. Why? Well there is this “tendency” that NIC firmware/driver updates reset your RSS/VMQ optimizations back to their defaults.That’s a bit of a bummer if you have to redo all the work instead of having a script ready to go. I have seen many a deployment where the configuration was missing after firmware/driver upgrades so please, check!

image

Figure: Where has my optimized configuration gone after a driver/firmware upgrade?

The good news is this isn’t a show stopper issue as things will keep working, but without your optimizations and with VMQ, depending on your NIC team setup for the vSwitch issues might occur. When doing NIC teaming for your virtual switch it’s important to get it right.  With switch dependent teaming (LACP/Static) the NICs in the team need to use overlapping processor sets (Min Queues). When doing switch independent teaming the NICs in the team need to use non-overlapping processor sets. So you need to configure each NIC in your team to use the different processors (Sum of Queues).

On top of that you might want to / should separate RSS/VMQ cores from each other. SMB Direct for CSV/LM will also help achieve this as there we leverage CPU offloading to the NIC.

Musings On Switch Embedded Teaming, SMB Direct and QoS in Windows Server 2016 Hyper-V

When you have been reading up on what’s new in Windows Server 2016 Hyper-V networking you probably read about Switch Embedded Teaming (SET). Basically this takes the concept of teaming and has this done by the vSwitch. Which means you don’t have to team at the host level. The big benefit that this opens up is the RDMA can be leveraged on vNICs. With host based teaming the RDMA capabilities of your NICs are no longer exposed, i.e. you can’t leverage RDMA. Now this has become possible and that’s pretty big.

clip_image001

With the rise of 10, 25, 40, 50 and 100 Gbps NICs and switches the lure to go fully converged becomes even louder. Given the fact that we now don’t lose RDMA capabilities to the vNICs exposed to the host that call sounds only louder to many.  But wait, there’s even more to lure us to a fully converged solution, the fact that we now do no longer lose RSS on those vNICs! All good news.

I have written an entire whitepaper on convergence and it benefits, drawback, risks & rewards. I will not repeat all that here. One point I need to make that lossless traffic and QoS are paramount to the success of fully converged networking. After all we don’t want lossy storage traffic and we need to assure adequate bandwidth for all our types of traffic. For now, in Technical Preview 3 we have support for Software Defined Networking (SDN) QoS.

What does that mean in regards to what we already use today? There is no support for native QoS  and vSwitch QoS in Windows Server 2016 TPv3. There is however the  mention of DCB (PFC/ETS ), which is hardware QoS in the TechNet docs on Remote Direct Memory Access (RDMA) and Switch Embedded Teaming (SET). Cool!

But wait a minute. When we look at all kinds of traffic in a converged Hyper-V environment we see CSV (storage traffic), live migration (all variations), backups over SMB3 all potentially leveraging SMB Direct. Due to the features and capabilities in SMB3 I like that. Don’t get me wrong about that. But it also worries me a bit when it comes to handling QoS on the hardware side of things.

In DCB Priority Flow Control (PFC) is the lossless part, Enhanced Transmission Selection (ETS) is the minimum bandwidth QoS part. But how do we leverage ETS when all types of traffic use SMB Direct. On the host it all gets tagged with the same priority. ETS works by tagging different priorities to different workloads and assuring minimal bandwidths out of a total of 100% without reserving it for a workload if it doesn’t need it. Here’s a blog post on ETS with a demo video DCB ETS Demo with SMB Direct over RoCE (RDMA .

Does this mean a SDN QoS only approach to deal with the various type of SMB Direct traffic or do they have some aces up their sleeves?

This isn’t a new “concern” I have but with SET and the sustained push for convergence it does has the potential to become an issue. We already have the SMB bandwidth limitation feature for live migration. That what is used to prevent LM starving CSV traffic when needed. See Preventing Live Migration Over SMB Starving CSV Traffic in Windows Server 2012 R2 with Set-SmbBandwidthLimit.

Now in real life I have rarely, if ever, seen a hard need for this. But it’s there to make sure you have something when needed. It hasn’t caused me issues yet, but I’m a performance & scale first, in “a non-economies of scale” world compared to hosters. As such convergence is a tool I use with moderation. My testing when traffic competes without ETS is that they all get part of the cake but not super predictable/ consistent. SMB bandwidth limitation is a bit of a “bolted on” solution => you can see the perf counters push down the bandwidth in an epic struggle to contain it, but as said it’s a struggle, not a nice flat line.

Also Set-SmbBandwidthLimit is not a percentage, but hard max bandwidth limit, so when you lose a SET member the math is off and you could be in trouble fast. Perhaps it’s these categories that could or will be used but it doesn’t seem like the most elegant solution/approach. That with ever more traffic leveraging SMB Direct make me ever more curious. Some switches offer up to 4 lossless queues now so perhaps that’s the way to go leveraging more priorities … Interesting stuff! My preferred and easiest QoS tool, get even bigger pipes, is an approach convergence and evolution of network needs keeps pushing over. Anyway, I’ll be very interested to see how this is dealt with. For now I’ll conclude my musings On Switch Embedded Teaming, SMB Direct and QoS in Windows Server 2016 Hyper-V