Reflections on Getting Windows Network Load Balancing To Work (Part 1)

This is part 1 in series on Windows Network Load Balancing. Part 2 can be found here: https://blog.workinghardinit.work/2010/07/23/reflections-on-getting-windows-network-load-balancing-to-work-part-2/
Introduction

This will not be an extensive NLB installation & configuration manual. You’ll find plenty of material on that searching the internet. I would like to reflect on some issues and options when using Windows Network Load Balancing.

I will not be discussing NLB solutions using just one NIC with multicast. I think they lack so badly in resilience, configuration and troubleshooting capabilities that I never consider using them, not even in the lab. Even in a lab you need to work like in real live, bar some exceptions. Apart from no available slots in a server to add NICs you have no excuse not to and even then, just make sure you do. NIC ports are very cheap nowadays and especially in a virtual environment there is nothing stopping you from adding some extra virtual ports. Do yourself a favor and always use two or more NIC ports. Even in the year 2000 I grinned when I read that one of the drawbacks was the cost of the extra NIC. Really, you have a real business need and are prepared to pay for multiple servers to set up a Windows Network Load Balancing cluster but you can’t spring for an extra NIC? Remember in those days servers really meant hardware and in the Windows 2000 era you needed Windows 2000 Advanced Server or Windows 2000 Datacenter Server.

What I also will not discuss any further beyond the following is hardware load balancing. Yes good hardware load balancers have extra functions and features that can be very valuable and even necessary for certain deployments. They can be rather expensive for some budgets but they are very capable devices. It is up to you as an engineer to look at the needs, the budget, the risks and benefits of technologies for a business case and come up with good, affordable and working solutions. In some cases that solution will be Windows Load Balancing, in other cases it will be hardware load balancing. Needs, circumstances and environments differ, so do the solutions.

Another thing I’ll wipe of the map from the start is the use of a cross over cable to connect the private NIC. Do not use one. It is not supported and will cause issues or fail.

Then there is the confusion around the use of default Gateways, the fact if the private and the NLB NIC must or must not be on the same subnet, routing and forwarding differences between of Windows 2003 & Windows 2008 (R2). These are the issues I’ll address later in Part 2. But first we need to talk about unicast & multicast a bit. This is unavoidable when using Windows Network Load Balancing. To complete the information here I will provide some examples using two NICs on the same and on different subnets with different default gateway and routing solutions, and also an example using multiple independent clusters (3 NICs)

Things to consider when using unicast & multicast

A topic I will not address too much is which is better: unicast or multicast. Well that depends on the needs, the environment and if the products or solutions uses support it. For example when using VMware guests you’ll have to use multicast if you want it to work without breaking things like VMotion. Another example, ISA server 2006 didn’t support multicast until the release of a hotfix that was later included in SP1 and higher). It also depends on the network gear that’s available, etc.

My take on it all is the following. Use what works best given the circumstances. I you have no access to the switch configuration or your networking gear has issues with multicast NLB you can whine all day long that it’s better than unicast but you’ll won’t get anywhere. When practical I use unicast with multiple NICs and when the circumstances or the products used allow for it, I use multicast with multiple NICs. Which is best is a discussion that sometimes smells of “mine is bigger than yours” and I hope you never had that phase and if you did, you’ve left that far behind together with your other growing pains. Thank you.

Why are Unicast & Multicast so Important

Unicast or multicast mode defines how the cluster virtual IP its MAC address is handled. The network traffic sends packets for the cluster virtual IP based on the cluster MAC address advertised by the cluster. The cluster virtual IP MAC address is used because all traffic for the NLB cluster need be delivered to all nodes.

I will not go into detail on how unicast and multicast works. That has been done very well on CISCO’s web site http://www.cisco.com/en/US/products/hw/switches/ps708/products_configuration_example09186a0080a07203.shtml), TechNet (http://technet.microsoft.com/en-us/library/cc782694(WS.10).aspx) and by Thomas Shindler (http://www.isaserver.org/articles/basicnlbpart2.html)

Unicast issues to consider
  • You need two NICs ports. This is because of the “bogus MAC address” (see the CISCO link above for an explanation). Oh please … give me a break already! Again don’t even consider using a single NIC NLB solution in production.
  • Port Flooding can’t be stopped on the switch level. A valid argument in many cases.
  • It does work in most environments and with just about all network gear.

The good news is that you can prevent flooding by using a hub or a switch configured as a hub to in front of the upstream switch. If you have enough nodes in the NLB this might be a good way to go as you will be attaching 8, 16 or more nodes anyway. If you have only two or three nodes that might be a bit overkill that takes up room in the rack and uses power. Another ways is to uses VLAN to separate the traffic. This works well unless you have a need for the NLB subnet to be the same as the rest or can’t get it configured (politics, rules, existing environment …)

Multicast issues to consider
  • You can use a one NIC solution. Multicast allows setting up an NLB cluster with only one NIC which, by some, is considered a benefit. I think I was very clear already about this. I never implement single NIC Windows Network Load Balancing solutions.
  • Port Flooding. But here we have some good news for switch admins. Multicast also allows you to stop port flooding by using static arp entries on the switches upstream of your server. This is very valuable. When you only have a couple of nodes in the NLB or can’t create or use VLANs to separate the NLB traffic this is a very good reason to use multicast. See also http://www.cisco.com/en/US/products/hw/switches/ps708/products_configuration_example09186a0080a07203.shtml. This one of the reasons multicast is considered better by some people, but as mentioned you can prevent flooding by using a “hub” in front of the upstream switch or by separating the traffic using another VLAN which for lager NLB clusters is not that much overhead. You might still need to do that if for some reason the static arp solution on the switch ports of the NLB NICs can’t be done. You can also use IGMP snooping to examine the contents of multicast packets and associate a port with a multicast address. If this is not possible the static arp entries come mentioned above do the job.
  • As mentioned on TechNet (http://technet.microsoft.com/en-us/library/cc782694(WS.10).aspx)upstream routers might not support mapping a unicast IP address (the cluster IP address) with a multicast MAC address. In these situations, you must upgrade or replace the router. If that’s not possible than you can’t use multicast.
  • So you’ll need to talk to your network people (or to yourself if you do the networking as well) to get it figured out and see what they prefer, allow, tolerate and support.
Virtualization comes into the picture

In a virtualize environment the discussion on the “best” way of preventing port flooding also changes a bit. You don’t need so many physical ports but they do often become more scares and valuable as the number of NIC ports on the virtualization hosts are limited. Also a lot of virtualization technologies need their specific little tweaks to get stuff working right depending on the version etc.

Closing thoughts on unicast/multicast

So in the end when choosing between unicast and multicast NLB take a long had look at the environment, the possibilities and needs, the politics, available skillsets than pick the one that is best suited for that particular situation. It’s not that on an issue until you meet some CISCO or Juniper networking guru’s who’ll jab on for hours on how the NLB/multicast implementation sucks.

In part 2 we’ll talk a bit about subnets, default gateways, routing, forwarding and the strong host model in Windows 2008 (R2).

Calling x64 CLI Tools in x86 Scripting Tools and Processes

Every now and then I get the same question from people who only recently decided to make the switch to x64 bit Windows operating systems. I’ve been running on x64 since Vista RTM and I’m very happy with it. When those people start scripting with their tools, which are 32 bit, calling some CLI tool in %windir%System32 they can run into an annoying issue that express itself in the correct yet somewhat misleading “WshShell.Exec: The system cannot find the file specified.”. But you know it’s there in %windir%System32, you checked and double checked!

When your scripting tool is 32 bit and you run your script it usually launches an 32 bit version of the CLI tool you’re calling. This behavior is a result of file redirection. This is a transparent process that’s part of the Windows-on-Windows 64-bit (WOW64) subsystem that is used to run 32 bit apps. When a 32 bit applications calls a CLI tool in the %windir%system32 directory it silently redirects this to the %windir%SysWOW64 where 32 bit apps can happily run without a worry on an x64 bit operating system. Yes, indeed %windir%system32 is for x64 code only and %windir%SysWOW64 is for 32 bit code.

What’s in a name 🙂 Some people argue they should have use system32 for 32 bit and system64 for x64 bit but I’m sure they had their reasons for what they did (i.e. it would have been hell for some reason I guess). Other suggestions have also been made by people who are far better qualified than I am. For example by Mark Russinovich, a hard core systems developer, in http://blogs.technet.com/b/markrussinovich/archive/2005/05/07/running-everyday-on-64-bit-windows.aspx.

Now all this can happen transparently for the user if the tools used have both an x64 and a x86 version. Cmd.exe and ping.exe are fine examples. If you run some VBScript in my favorite scripting tool for example (Sapiens PrimalScript) which is 32 bit it will launch a 32 bit cmd.exe, that launches the cscript.exe 32 bit version and which will launch ping.exe (using WScript.Shell) in %windir%SysWOW64 by silently redirecting your %windir%system32 path. No worries, you don’t know any better and the result is the same. So it’s usually not a problem if there is both a x64 and a x86 version to the CLI tool as you have seen in the ping.exe example. When a 32 bit process calls a tool in %windir%system32 it’s redirected to %windir%SysWOW64 and uses the 32 bit version. No harm done.

The proverbial shit hits the fan when you call a CLI tool that only has a x64 bit version. As the scripting tool is x86 it’s call is redirected to the WOW64 and the script fails miserably as the CLI tool can’t be found. This can be pretty annoying when writing and testing scripts. The CLI backup tool of Windows Backup is a prime example. It does not have a 32 bit version. Consider this little script for example:

Option Explicit

Dim oShell
Dim oExecShell
Dim sBackupCommandString
Dim sText

Set oShell = CreateObject("WScript.Shell")
'sBackupCommandString = "%windir%sysnativewbadmin get disks"
sBackupCommandString = "%windir%system32wbadmin get disks"

Set  oExecShell = oShell.Exec(sBackupCommandString)

Do While oExecShell.Status = 0
    Do While Not oExecShell.StdOut.AtEndOfStream
        sText = oExecShell.StdOut.ReadLine()
        Wscript.Echo sText 
    Loop    
Loop

Set oShell = Nothing
Set oExecShell = Nothing

There is a lot of File Redirection going on here to %windir%SysWOW64 when running this code in the 32 bit scripting tool. That tool launches the 32 bit cmd.exe and thus the 32 bit cscript.exe which then launches a 32 bit shell and tries to run "%windir%system32wbadmin get disks" which is also redirected to %windir%SysWOW64 where wbadmin cannot be found throwing the error: “WshShell.Exec: The system cannot find the file specified.”. If you don’t have a 32 bit code editor just launch the script manually from an 32 bit command prompt to see the error.

The solution as demonstrated here is to use as in “%windir%Sysnativewbadmin.exe get disks”. Uncomment that line and put the line with sBackupCommandString = "%windir%system32wbadmin get disks" in comment. Do the same test again and voila. It runs. So there you have it, you can easily test your script now. Just make sure that when the time comes to put it out in the wild you replace it with the real path if the calling process is x64 bit, which for example wscript.exe and cscript.exe are when you launch the form a x64 bit shell (explorer.exe or cmd.exe), which is the default on a x64 operating system. The x86 version runs when you launch them from a x86 shell. But remember the default on x64 bit operating systems is x64 bit and sysnative only functions when called from a 32 bit process (it’s a virtual directory that doesn’t really exists).

Sysnative was introduced in Vista/Windows2008 x64 bit. Not only 32 bit script editor users a affected by this, all 32 bit processes launching tools in "%windir%system32 are. See more on MSDN via this link http://msdn.microsoft.com/en-us/library/aa384187(VS.85).aspx.  For the folks running XP or Windows 2003 x64 bit it is perhaps time you consider upgrading to Windows 2008 R2 or v7 x64 bit? If you can’t, no need to worry, you’re in luck. Microsoft did create a hot fix for you (http://support.microsoft.com/?scid=kb;en-us;942589) that introduces sysnative on those platforms. So welcome to the x64 bit universe, beware of file redirection in WOW64 and happy scripting 🙂

Netdom computername: Alternate Names are little gems

I’ve had the distinct pleasure of tapping into the knowledge of Jose Barreto and learn that the Netdom Computername that provides alternate names for windows 2008 (R2) works with SMB 2.0. We deliberately stayed away from DNS aliases in 2008 for some file servers replacements in combination with disabling strict name checking because using that combination will revert back to SMB 1.0. That means you can’t take advantage of the improved throughput you get with SMB 2.0. Tonight I was happy to find out that netdom computername /add:<NewAltDNSName> will create a dns entry and SPN for that name and using it will not make windows revert to SMB 1.0. This is neat! Go have a look at http://technet.microsoft.com/de-de/library/cc835082(WS.10).aspx to find out more.