Using Host Names in IIS in Combination with a KEMP LoadMaster

At a client the change over of a web site from old servers to new ones lead to the investigation of an issue with the hardware load balancer. Since that web site is related to an existing surveyors solutions suite that already had a KEMP LoadMaster 2200 in use the figured we’d also use it for the web site and no longer use WNLB.

Now the original web site had multiple DNS entries and host header names defined in IIS (see Configure a Host Header for a Web Site (IIS 7)) . Host header names in IIS allow you to host multiple web sites on an IIS server using the same IP address and port. A small added security benefit is that surfing on IP address fails which means we marginally disrupt some script kiddies & get an extra security checkbox marked during an audit Winking smile.

In our example we needed:

  • ntrip.surveyor.lab
  • www.surveyor.lab

Note: The real names have been changed as well as the reasons why as this has some business & historical justifications that don’t matter here.

ntrip.surveyor.lab needs to be handled by the load balanced web servers in the solution. The www.surveyor.lab needs to be redirected to another web server to keep the business happy. However for political reasons we have to keep the DNS record for www.surveyor.lab pointing to the load balanced servers, i.e. the load master VIP.

Now without host names IIS al worked fine until we wanted to use HTTP redirect. As the web site is the same IP address for both names we either redirected them both or none. To fix this we needed two sites in IIS. The real one hosting ntrip.surveyor.lab and a “fake” one hosting the www.surveyor.lab that we want to redirect. Well as both are hosted on the same IP address and port on the IIS server we need to use host names. But then the sites became unavailable.

When checking the LoadMaster configuration, the virtual service for the web servers seemed well.

image

Is this a limitation of hardware load balancing or this specific Loadmaster? Some searching on the internet made it look like I was about the only on on the planet dealing with this issue so no help there.

Kemp Support Rocks

I already knew this but this experience reaffirms it. KEMP Technologies really does care about their customers and are very fast & responsive. I threw a quick question on twitter to @KempTech on Twitter and they responded very fast with some pointers. After that I replied with some more details, they offered to take it on via other means as twitter has it limits. OK, no problems. The next morning I got an e-mail from one of their engineers (Ekkehard) with more information and a request for more input from our side. I quickly made a VISIO diagram of the current and the desired situation. Based on this he let me know this should work.

image

He asked for a copy of the configuration and already pointed to the solution:

And what exactly happens – does the RS turn “red” in the “View/Modify Services” view? That might be caused by the health check settings…
(Remember that a 302 is considered NOT ok, so you had to enter the proper check URL and or / HTTP1.1 hostname)

But at that moment I did not realize this yet. I saw no error or the real server turning red indicating it was down. So we went through the configuration and decided to test without forcing layer 7 to see what happened. This didn’t make a difference and it wasn’t really a solution if it had as we needed layer 7 and layer 7 transparency.

Ekkehard also noticed my firmware was getting rather old (don’t fix what isn’t broken Smile) and suggest an upgrade (5.1-24 to 5.1.-74). So I did, reboot and tested some more settings. To make sure I didn’t miss anything I threw a network sniffer (WireShark) against the issue. And guess what?  As soon as I added a host name to the IIS web site bindings I didn’t even get any request from my client on that server anymore. So it was definitely being stopped at the Loadmaster. Without it request from a client came through perfectly.  That was not IIS doing as with a host name nothing came into the server. So why would the LoadMaster stop traffic to a real server? Because it’s down, that’s why, just like Ekkehard has indicated in one of his mails but we didn’t see it then.

Better check again and sure enough, the health service told me the real servers are down. Hey … that’s new. Did the previous firmware not show this, or just slower? I can’t say for sure. It’s either me being to impatient, a hiccup, the firmware or premature dementia Confused smile

Root Cause

So what happens? The default health check uses HTTP 1.0. You can customize it with a path like  /owa or such but in essence it uses the IP address of the real server and guess what. With a Host header name in IIS that isn’t allowed other wise it can’t figure out what website you want to go to if you’re using this feature to run multiple sites on the same IP address and port. So we need to check the health based on host name. Can the LoadMaster do that for us? Yes it can!

The fix

You need to enable HTTP 1.1 and fill out the host name you want to use for health checking.  In our case that’s ntrip.surveyor.lab. That’s all there’s to it. Easy as can be if you know. And Ekkehard knew he indicated to this in his quoted mail above.

HTTP1 1host

 

Lessons Learned

So how did I not know this? Isn’t this documented? Sure enough on page  56 of the LoadMaster manual it says the following:

7  HTTP  The LoadMaster opens a TCP connection to the Real Server on the Service port (port80). The LoadMaster sends a HTTP/1.0 HEAD request the server, requesting the page ―/‖.  If the server sends a HTTP response with a status code of 2 (200-299, 301, 302, 401) the LoadMaster closes the connection and marks the server as active.  If the server fails to respond within the configured response time for the configured number of times or if it responds with a different status code, it is assumed dead.  HTTP 1.0 and 1.1 support available, using HTTP 1.1 allows you to check host header enabled web servers.

Typical, you read the exact line of information you need AND understand it after having figured it out. Now linking that information (yes we always read all manuals completely Embarrassed smile) to the situation at hand isn’t always that fast a process but I got there in the end with some help from KEMP Technologies.

One hint is perhaps to mention this is in the handy tips that pop up when you hover over a setting in the LoadMaster console. I rely on this a lot and a mention of “HTTP 1.1 allows you to check host header enabled web servers” might have helped me out. But it’s not there. A very poor excuse I know … Embarrassed smile

image

Host Header Names & HTTP redirection

After having fix this issue I proceeded to configure HTTP redirect in IIS 7.5. For this is used two sites. One was just a fake site tied to the www.surveyors.lab hostname in IIS on port 80.

image

For this site I created a HTTP redirect to www.bussines.lab/surveyors/services. This works just fine as long as you don’t forget the http:// in the redirect URL.

image

So it has to be http://www.bussines.lab/surveyors/services or you’ll get a funky loop effect looking like this:

http://www.surveyors.lab/www.bussines.lab/surveyors/services/www.bussines.lab/surveyors/services/www.bussines.lab/surveyors/services

Firefox will tell you you have a loop that will never end but Internet Explorer doesn’t, it just fails. You do get that URL as a pointer to the cause of the issue. That is if you can relate it to that.

The other was the real site  and was configured with following bindings and without redirection.

image

Don’t forget to do this on all real servers in the farm! The next thing I need to find out is how to health check two host names in the LoadMaster as I have two websites with the same IP address, port but different host names.

Hyper-V, KEMP LoadMaster & DFS Replication Provide FTP Solutions For Surveyors Network

Remember the blog entry about A Hardware Load Balancing Exercise With A Kemp Loadmaster 2200 KEMP Loadmaster to provide redundancy for a surveyor’s GPS network? Well, we got commissioned to come up with a redundant FTP solution for their needs last month and this blog is about what we came up with. The aim was to make due with what is already available.

FTP 7.5 in Windows 2008 R2

We use the FTP Server available in Windows 2008 R2 which provides us with all functionality we need: User Isolation and FTP over SSL.

The data from all the GPS stations is sent to the FTP server for safekeeping and is to be used to overcome certain issues customers might have with missing data from surveying solutions. This data is not being made available to customers by default, it’s only for special cases & purposes. So we need to collect the data in its own folder named after its account so we can configure user isolation. This also prevents GPS Stations from writing in locations where it shouldn’t.

As every GPS Station slogs in with the “Station” account it ends up in the “Station” folder as root FTP folder and can’t read or write out of that folder. The survey solution service desk can FTP into that folder and access any data they want.

The data that’s being provided by the software solution (LanSurvey01 and lanSurvey02) is to be sent to its own folder “Data” that is also set up with user Isolation to prevent the application from reading or writing anywhere else on the file system.

The data from should be publicly available to the customers and for this, we created a separate FTP site called “Public” that is configured for anonymous access to the same Data folder but with read permissions only. This way the customers can get all the data they need but only have read access to the required data and nothing more.

For more information on setting up FTP 7.5 and using FTP over SSL you might take a look here http://learn.iis.net/page.aspx/304/using-ftp-over-ssl/ and read my blog on FTP over SSL Pollution of the Gene Pool a Real Life “FTP over SSL” Story

High Availability

In the section above we’ve taken care of the FTP needs. Now we still need redundancy. We could use Windows NLB but since this network already uses a KEMP Loadmaster due to the fact that the surveyor’s software has some limitations in its configuration capabilities that don’t allow Windows Network Load Balancing being used.

We want both the GPS stations and the surveyor’s application servers to be able to send FTP data when one of the receiving FTF servers is down for some reason (updates, upgrades, maintenance, or failure). What we did is set up a VIP for use with FTP on the Kemp Loadmaster. This VIP is what is used by the GPS Stations and the application to write and by the customers to read the FTP data.

DFS-R to complete the solution

But up until now, we’ve been ignoring an issue. When using NLB to push data to hosts we need to ensure that all the data will be available on all the nodes all of the time. You could opt to only have the users access the FTP service via an NLB VIP address and push the data to both nodes without using NLB. The latter might be done at the source but then you have twice the amount of data to push out. It also means extra work to configure and maintain the solution. We could copy the data to one FTP node and copy it from there. That works but leaves you very vulnerable to a service outage when the node that gets the original copy is down. No new data will be available. Another issue is the fact that you need a rock-solid way to copy the data and have it done it a timely manner, even after downtime of one or more of the nodes.

As you read above we provide an NLB VIP as a target for the surveyor’s application and the GPS Stations to send their data to. This means the data will be sent to the FTP NLB array even if one of the nodes is down for some reason. To get the data that arrives from 2 application servers and from 40 GPS Stations synchronized and up to date on both the NLB nodes we use the Data File System – Replication (DFS-R) built into Windows 2008 R2. We have no need for a DFS-Namespace here, so we only use the replication feature. This is easy and fast to set up (add the DFS service from the File Server Role) and it doesn’t require any service downtime (no reboot required). The fact that both the FTP nodes are members of a Windows 2008 R2 domain does help with making this easy. To make sure we have replication in all direction we opt to set it up as a full and the replication schedule is 24/7, no days off J Since we chose to replicate the FTP root folder we have both the Data and the Stations folders covered as well as the folder structure needed to have FTP user Isolation function.

This solution was built fast and easily using Windows 2008 R2 out of the box functionality: FTP(S) with User Isolation and DFS-R. The servers are running as hyper-V guests in a Hyper-V cluster providing high availability through Live Migration.

Reflections on Getting Windows Network Load Balancing To Work (Part 2)

This is part 2 in series on Windows Network Load Balancing. Part 1 can be found here: https://blog.workinghardinit.work/2010/07/01/reflections-on-getting-windows-network-load-balancing-to-work-part-1/

On Default Gateways, Routing & Forwarding.

Here’s a bullet list of what people tend to trip over when configuring NLB network settings.

  • No support for multiple Default Gateways that are on multiple subnets
  • The default gateway does not have to be empty on the NLB NIC
  • The Private and the NLB NIC can be on separate or the same subnets
  • You can have multiple Default Gateways if they are on the same subnet
  • Don’t forget about static routes where and when needed.
  • Beware of the strong host model in Windows 2008 (R2) for both IPv4 & IPv6 (WK3 it was only for IPv6)
  • Mind the order of the connections in Adapters and Bindings.

Now let’s address the subjects in this list.

No support for multiple Default Gateways that are on multiple subnets

When using IP addresses from different subnets you cannot have a default gateway on every NIC because that will cause routing issues. This is not different for the NIC’s used in Windows NLB. So you can have only one NIC with a Default Gateway and if the other NICs need to route somewhere you need to add static persistent routes. Those routes must be persistent or they will not survive a reboot of the server. In the figure below you see a classic two NIC NLB cluster with the Default Gateway Empty on the NLB NIC. This could be a valid setup for an intranet. You can add routes for the subnet in the company that need to be able to talk to the NLB Cluster and you’re golden. The Private NIC gets a default gateway and acts like any other NIC in your network.

In this example we have the Default Gateway on the Private NICs they can route internally and to the internet. If you need traffic to & from the internet form the NLB NIC you could enable forwarding on the NLB NIC or enable weak host behavior which can be done more atomic than what you achieve by enabling forwarding. If you only need to route internally we could use the same approach of enabling forwarding instead of adding static persistent routes for the NLB NIC. But then you don’t isolate & protect traffic that neatly and it will route to everywhere the default gateway can get.

So we prefer to play with static persistent routes in this case. We’ll briefly look at some examples now. If you only need to route internally (i.e. to reach the database or a client PC) from the NLB NIC we add the needed static persistent routes on the NLB NICs using the route command.

In order for the NLB NICs to reach the database with strong host model and no forwarding enabled:

Route add -p 10.30.0.0 mask 255.255.0.0 10.10.0.1

To reach the client PC’s:

Route add -p 10.20.0.0 mask 255.255.0.0 10.10.0.1

(Using route print you can look at the routes and using route delete you can get rid of them.)

Or by using netsh, (it’s advised to use netsh from Windows 2008 on)

netsh interface ipv4 add route 10.30.0.0/16 “NLB NIC” 10.10.0.1

netsh interface ipv4 add route 10.20.0.0/16 “NLB NIC” 10.10.0.1

(you can look at the routing table by using netsh interface ipv4 show route, with netsh interface ipv4 delete route you get ridd of then, see http://technet.microsoft.com/en-us/library/cc731521(WS.10).aspx for more information.

You could also connect to the database over the PRIVATE NIC and then you don’t need that route. If you can configure it like that it’s a good solution. But all situations differ.

You can also play with the weakhost / stronghost model behaviour:

netsh interface ipv4 set interface Private NIC weakhostsend=enabled

netsh interface ipv4 set interface Private NIC weakhostreceive=enabled

netsh interface ipv4 set interface NLB NIC weakhostsend=enabled

netsh interface ipv4 set interface NLB NIC weakhostreceive=enabled

Now don’t just blindly enable on every NIC you can find on the server. Test what you really need and use only that. I leave that as an exercise to the readers. It really depends on the situation and needs for your particular situationJ. Keep in mind that when you enable weakhostsend and weakhostreceive on every NIC this reverts your Windows 2008 servers back to Windows 2003 behavior and this might not be needed or wanted. So just enable what you need for optimal security.

Naturally enabling forwarding will do the trick as well, as this creates a weak host model. Depending on how many NICs you use and how traffic must flow you might have to do it on more than one NIC, normally the one(s) without a default gateway.

netsh interface ipv4 set interface “NLB NIC” forwarding=enabled

 

If you want to see the configuration of the NIC you can run:

           netsh interface ipv4 show interface l=verbose

That will produce something like below:

Interface Local Area Connection Parameters

IfLuid                             : ethernet_5
IfIndex                            : 3
State                              : connected
Metric                             : 10
Link MTU                           : 1500 bytes
Reachable Time                     : 21500 ms
Base Reachable Time                : 30000 ms
Retransmission Interval            : 1000 ms
DAD Transmits                      : 3
Site Prefix Length                 : 64
Site Id                            : 1
Forwarding                         : disabled
Advertising                        : disabled
Neighbor Discovery                 : enabled
Neighbor Unreachability Detection  : enabled
Router Discovery                   : dhcp
Managed Address Configuration      : enabled
Other Stateful Configuration       : enabled
Weak Host Sends                    : disabled
Weak Host Receives                 : disabled

Use Automatic Metric               : enabled
Ignore Default Routes              : disabled
Advertised Router Lifetime         : 1800 seconds
Advertise Default Route            : disabled
Current Hop Limit                  : 0
Force ARPND Wake up patterns       : disabled
Directed MAC Wake up patterns      : disabled


The default gateway does not have to be empty on the NLB NIC

It is not a hard requirement to leave the Default Gateway on the NLB NIC empty and put it on the private NIC. You can set it on the NLB NIC and leave the private NIC’s gateway empty instead. An example of this you can see in the demo. This is the best choice in my opinion when you need the NLB NIC to route to destinations you don’t know how to reach, i.e. the internet, so for public websites. The prime function of the default gateway is exactly to help with that. When you don’t know where to send it, send it to the Default Gateway. If you need to reach other internal subnets from the Private NIC, just use static routes. Don’t use the NLB NIC as that is internet facing in this case. You can see an example of this in the figure below. Also in this case you’ll find that you do not have to enable forwarding on the NIC using netsh, as the NIC that has to answer to the unknown IP Address has the Default Gateway. This setup works great for example in a managed domain environment for internet access where the NLB NICs are internet facing and the private NIC is for management, Active Directory, Backups, etc.

In this example we have the Default Gateway on the NLB NICs so it can route internet traffic. Any routes needed in the Private NIC subnet are added as persistent static routes. An example of this is to reach the database server.

As traffic from the Private range is never supposed to go via the NLB Public range and vice versa we do not need to care about forwarding or strong host /weak host models. We can keep traffic nicely separated and that is a good thing. If you build this on Windows 2008(R2) just like you did on Windows 2003 it would work out of the box and you might not even know about a change in default behavior from weak host model to strong host model.

To get the PRIVATE NIC to reach the database server you’d add static routes and be done with it.

Add needed static persistent routes using the route command:

Route add -p 10.20.0.0 mask 255.255.0.0 172.16.2.1

Or by using netsh, (it’s advised to use netsh from Windows 2008 on)

netsh interface ipv4 add route 10.20.0.0/16 “PRIVATE NIC” 172.16.2.1

No requirement to have different subnets for Private and NLB NICs  / Multiple Gateways When the subnets are the same

There is no requirement to have different subnets for every NIC. Sometimes I read that this is a requirement on forums when someone is having issues but it’s not. You can also experiment with multiple Default Gateways if they are on the same subnet (WARNINGS APPPLY*)

So here you can play with giving every NIC a default gateway (same subnet, so no issues), with static persistent routes, with enabling forwarding and weak host / strong host configuration. I tend to use only one gateway and use static persistent routes. If I need to relay I’ll go for weak host minimal configuration or revert to forwarding.

WARNINGS APPLY*: When you start having multiple NIC’s for multiple NLB Clusters on the same NLB nodes, things can get a bit complicated and unpredictable. So I prefer only to use a default gateway on both NICs when you have two NIC , one for private (management) traffic and one for the NLB cluster traffic. Once you have multiple NIC’s for multiple NLB clusters (1 private NIC + 2 or more NLB cluster NICs) you can no longer play this game safely, even if they are all on the same subnet, without running into trouble I have experienced. You can get an event id 18 “NLB cluster [X.X.X.X]: NLB detected duplicate cluster subnets. This may be due to network partitioning, which prevents NLB heartbeats of one or more hosts from reaching the other cluster hosts. Although NLB operations have resumed properly, please investigate the cause of the network partitioning” . Also in this situation you can’t have a default gateway on the management NIC and one on one of the NLB NIC’s without a default gateway on the second NLB NIC. Forget that. You can get issues with a node remaining in “converging” forever and what’s worse the NLB cluster will send traffic to all nodes so 1/x connections will fail. Rebooting one node might help but once you reboot ‘m both you run the risk of this happening and you really don’t want that. Once you dealing with multiple cluster IP addresses on multiple separate NIC’s you’d better stick to one default gateway on one of the NIC’s and nowhere else.  This kind of makes me wonder if it’s pure luck that it works with 2 cluster NICs or not, with multiple and with reboots of the nodes I know we run into trouble and that’s no good.

It’s also smart not to mix static routes with forwarding to achieve the same thing. And please have the exact same configuration on very particular NIC on every node. Not one node with NLB NIC 1 routing via static routes and the other node using forwarding on NLB NIC 1. That’s asking for inconsistent behavior.

We’ll briefly look at some examples now.

If you only need to route internally (i.e to reach the database or a client PC) we add the needed static persistent routes on the NLB NICs using the route command.

In order for the NLB NICs to reach the database with strong host model and no forwarding enabled:

Route add -p 10.30.0.0 mask 255.255.0.0 10.10.0.1

To reach the client PC’s:

Route add -p 10.20.0.0 mask 255.255.0.0 10.10.0.1

(Using route print you can look at the routes and using route delete you can get rid of them.)

Or by using netsh, (it’s advised to use netsh from Windows 2008 on)

netsh interface ipv4 add route 10.30.0.0/16 “NLB NIC” 10.10.0.1

netsh interface ipv4 add route 10.20.0.0/16 “NLB NIC” 10.10.0.1

(you can look at the routing table by using netsh interface ipv4 show route, with netsh interface ipv4 delete route you get ridd of then, see http://technet.microsoft.com/en-us/library/cc731521(WS.10).aspx for more information.

You can also just enter the default gateway on the NLB NICs as well. All NICs are on the same subnet this will cause no issues. Just remember that traffic will also go to where ever that gateway routes, even to the internet.

We already know we can play with the weakhost / stronghost model:

netsh interface ipv4 set interface Private NIC weakhostsend=enabled

netsh interface ipv4 set interface Private NIC weakhostreceive=enabled

netsh interface ipv4 set interface NLB NIC weakhostsend=enabled

netsh interface ipv4 set interface NLB NIC weakhostreceive=enabled

Again don’t just blindly enable on every NIC you can find on the server. Test what you really need and use only that. I leave that as an exercise to the readers. As I’ve said before, it really depends on the situation and needs for your particular situation. Keep in mind that when you enable weakhostsend and weakhostreceive on every NIC this will just revert your Windows 2008 server into Windows 2003 behavior and this might not be needed or wanted. So just enable what you need for optimal security.

There is a very good explanation of strong and weak host behavior by “The Cable Guy” at http://technet.microsoft.com/en-us/magazine/2007.09.cableguy.aspx I strongly advise you to go take a look.

And naturally enabling forwarding will do the trick in this scenario as well, as this creates a weak host model. Depending on how many NICs you use and how traffic must flow you might have to do it on more than one NIC, normally the one(s) without a default gateway.

netsh interface ipv4 set interface “NLB NIC” forwarding=enabled

When & Why Use Three NICs or more?

NLB supports using multiple network adapters to configure separate clusters. This allows for configuring multiple independent clusters on each host. We used to have only virtual clusters meaning that you could configure multiple clusters on a single network adapter. Anyone who ever had to trouble shoot some networking or configuration issues on a production NLB will appreciate the ability to limit interruptions and problems to one cluster instead of 2 or more. As an example of this I had to trouble shoot a CAS/HUB Exchange Implementation two node NLB implementation. The NLB Cluster of the CAS role had this very issue, but since it was running on its own cluster with a separate NIC the HUB role NLB cluster has no issues what so ever. Another good reason to use more NIC is to separate traffic, for example FTP versus HTTP on the same NLB Cluster.

One of the worst things that can happen is that an issue messes up the proper functioning of the NLB itself. That way even if the virtual IP remains available no host or only some of the hosts get network traffic. That means the cluster is unavailable or is only partially responding. This is a bad situation to be in and can be hard to trouble shoot. Since it’s a high availability technology you can bet someone is looking over your shoulder that has a vested interest in getting that resolved as soon as possible.

Mind the order of the connections in Adapters and Bindings

Make sure the PRIVATE NIC that is to be used for private network traffic (DNS, AD, RDP, …) is listed first. That prevent any issues (speed, functionality) of those services and you experience will be much better. This is illustrated in the figures below. LAN-HUB is the PRIVATE NIC here. The others are for NLB (yup it’s an Exchange 2010 setup).

Conclusion & recapitulation

I’ll finish with some closing musings on single & multiple default gateway and getting/sending network traffic where it needs to go.

When you enter a gateway on the second, third and so on NIC next to the one on the first NIC you’ll get a warning:

—————————

Microsoft TCP/IP

—————————

Warning – Multiple default gateways are intended to provide redundancy to a single network (such as an intranet or the Internet). They will not function properly when the gateways are on two separate, disjoint networks (such as one on your intranet and one on the Internet). Do you want to save this configuration?

—————————

Yes No

—————————

This will not work reliable when you have multiple subnets. This is why you use static persistent routing entries. Depending on your needs you can also use forwarding or the weak host model and even combine those with static persistent routes if needed of desired. Now the above also means that if you have multiple NICs with IP addresses on the same subnet you can indeed enter a Default Gateway on all of them.

If you don’t have or cannot have a Default Gateway filled in you are left with two options. If you know what needs to go where you can add static routes, which is basically telling the NIC the IP of a gateway to send traffic to for a certain destination. This is assuming you can reach that IP and that the traffic is not from a source/ to a destination that has no route defined and firewall allow for it, etc.

If you have no route or you can’t specify one (i.e. you can’t predict where traffic will have to go) you have one other option left and that is to route the traffic via the NIC that does have a Default Gateway. This used to work out of the box on Windows 2003 and earlier, but it doesn’t work out of the box since Windows 2008 (R2). That is because by default NICs in Windows 2008(R2) operate in a strong host model. So it will not receive or send traffic destined for some other IP than itself or send traffic originating somewhere else than itself. For that you’ll need to set the NIC properties to weak host send and receive or you need to enable forwarding. Actually forwarding is disabled by default on Windows 2003 as well. The big difference is that Windows 2003 operates in a weak host manner (send/receive) as opposed to Windows 2008 (R2) strong host mode. By enabling forwarding we put the Windows 2008 server in weak host mode and as such it works (see RFC1122). On the internet you’ll find both solutions, but the link between the two is often never made. Using weak host receiving and weak host sending allows for more atomic, custom configurations than forwarding.

Contact me via the web site or leave a comment if you have any questions or suggestions.

Post Script / Side Note because someone asked J

Basically you can have multiple gateways on a server but only one default gateway. You can add more than one default gateway on the same NIC but then they will only be used when the default gateway filled out in is not available, it will then try the next one and so forth. You can add multiple gateways to a single NIC or one or more to multiple NICs but that can, get messy very quickly. Whether it is wise to provide gateway redundancy in such a manner is another discussion. See also KB article http://support.microsoft.com/kb/157025. Be mindful of the extra configurations you’ll need (Dead Gateway Detection). This is a rather uncommon scenario on a windows server. You can use it for redundancy or when you want the traffic to go to a certain default gateway instead of another when it is available (so separate traffic for example for cost or to reduce the traffic load).

And then there’s adding a default gateway that’s on another subnet than the IP address of the NIC. In that case you get this warning:


—————————

Microsoft TCP/IP

—————————

Warning – The default gateway is not on the same network segment (subnet) that is defined by the IP address and subnet mask. Do you want to save this configuration?

—————————

Yes No

—————————

All pretty cool stuff you can do to mess with peoples head and understanding of what’s going on (it can work if the router on the local subnet has a route the subnet where that default gateway lives and PROXY ARP is working … but we’re not going to turn this into a networking course or pretty soon we’ll be installing RRAS and turn the server into a router.

Reflections on Getting Windows Network Load Balancing To Work (Part 1)

This is part 1 in series on Windows Network Load Balancing. Part 2 can be found here: https://blog.workinghardinit.work/2010/07/23/reflections-on-getting-windows-network-load-balancing-to-work-part-2/
Introduction

This will not be an extensive NLB installation & configuration manual. You’ll find plenty of material on that searching the internet. I would like to reflect on some issues and options when using Windows Network Load Balancing.

I will not be discussing NLB solutions using just one NIC with multicast. I think they lack so badly in resilience, configuration and troubleshooting capabilities that I never consider using them, not even in the lab. Even in a lab you need to work like in real live, bar some exceptions. Apart from no available slots in a server to add NICs you have no excuse not to and even then, just make sure you do. NIC ports are very cheap nowadays and especially in a virtual environment there is nothing stopping you from adding some extra virtual ports. Do yourself a favor and always use two or more NIC ports. Even in the year 2000 I grinned when I read that one of the drawbacks was the cost of the extra NIC. Really, you have a real business need and are prepared to pay for multiple servers to set up a Windows Network Load Balancing cluster but you can’t spring for an extra NIC? Remember in those days servers really meant hardware and in the Windows 2000 era you needed Windows 2000 Advanced Server or Windows 2000 Datacenter Server.

What I also will not discuss any further beyond the following is hardware load balancing. Yes good hardware load balancers have extra functions and features that can be very valuable and even necessary for certain deployments. They can be rather expensive for some budgets but they are very capable devices. It is up to you as an engineer to look at the needs, the budget, the risks and benefits of technologies for a business case and come up with good, affordable and working solutions. In some cases that solution will be Windows Load Balancing, in other cases it will be hardware load balancing. Needs, circumstances and environments differ, so do the solutions.

Another thing I’ll wipe of the map from the start is the use of a cross over cable to connect the private NIC. Do not use one. It is not supported and will cause issues or fail.

Then there is the confusion around the use of default Gateways, the fact if the private and the NLB NIC must or must not be on the same subnet, routing and forwarding differences between of Windows 2003 & Windows 2008 (R2). These are the issues I’ll address later in Part 2. But first we need to talk about unicast & multicast a bit. This is unavoidable when using Windows Network Load Balancing. To complete the information here I will provide some examples using two NICs on the same and on different subnets with different default gateway and routing solutions, and also an example using multiple independent clusters (3 NICs)

Things to consider when using unicast & multicast

A topic I will not address too much is which is better: unicast or multicast. Well that depends on the needs, the environment and if the products or solutions uses support it. For example when using VMware guests you’ll have to use multicast if you want it to work without breaking things like VMotion. Another example, ISA server 2006 didn’t support multicast until the release of a hotfix that was later included in SP1 and higher). It also depends on the network gear that’s available, etc.

My take on it all is the following. Use what works best given the circumstances. I you have no access to the switch configuration or your networking gear has issues with multicast NLB you can whine all day long that it’s better than unicast but you’ll won’t get anywhere. When practical I use unicast with multiple NICs and when the circumstances or the products used allow for it, I use multicast with multiple NICs. Which is best is a discussion that sometimes smells of “mine is bigger than yours” and I hope you never had that phase and if you did, you’ve left that far behind together with your other growing pains. Thank you.

Why are Unicast & Multicast so Important

Unicast or multicast mode defines how the cluster virtual IP its MAC address is handled. The network traffic sends packets for the cluster virtual IP based on the cluster MAC address advertised by the cluster. The cluster virtual IP MAC address is used because all traffic for the NLB cluster need be delivered to all nodes.

I will not go into detail on how unicast and multicast works. That has been done very well on CISCO’s web site http://www.cisco.com/en/US/products/hw/switches/ps708/products_configuration_example09186a0080a07203.shtml), TechNet (http://technet.microsoft.com/en-us/library/cc782694(WS.10).aspx) and by Thomas Shindler (http://www.isaserver.org/articles/basicnlbpart2.html)

Unicast issues to consider
  • You need two NICs ports. This is because of the “bogus MAC address” (see the CISCO link above for an explanation). Oh please … give me a break already! Again don’t even consider using a single NIC NLB solution in production.
  • Port Flooding can’t be stopped on the switch level. A valid argument in many cases.
  • It does work in most environments and with just about all network gear.

The good news is that you can prevent flooding by using a hub or a switch configured as a hub to in front of the upstream switch. If you have enough nodes in the NLB this might be a good way to go as you will be attaching 8, 16 or more nodes anyway. If you have only two or three nodes that might be a bit overkill that takes up room in the rack and uses power. Another ways is to uses VLAN to separate the traffic. This works well unless you have a need for the NLB subnet to be the same as the rest or can’t get it configured (politics, rules, existing environment …)

Multicast issues to consider
  • You can use a one NIC solution. Multicast allows setting up an NLB cluster with only one NIC which, by some, is considered a benefit. I think I was very clear already about this. I never implement single NIC Windows Network Load Balancing solutions.
  • Port Flooding. But here we have some good news for switch admins. Multicast also allows you to stop port flooding by using static arp entries on the switches upstream of your server. This is very valuable. When you only have a couple of nodes in the NLB or can’t create or use VLANs to separate the NLB traffic this is a very good reason to use multicast. See also http://www.cisco.com/en/US/products/hw/switches/ps708/products_configuration_example09186a0080a07203.shtml. This one of the reasons multicast is considered better by some people, but as mentioned you can prevent flooding by using a “hub” in front of the upstream switch or by separating the traffic using another VLAN which for lager NLB clusters is not that much overhead. You might still need to do that if for some reason the static arp solution on the switch ports of the NLB NICs can’t be done. You can also use IGMP snooping to examine the contents of multicast packets and associate a port with a multicast address. If this is not possible the static arp entries come mentioned above do the job.
  • As mentioned on TechNet (http://technet.microsoft.com/en-us/library/cc782694(WS.10).aspx)upstream routers might not support mapping a unicast IP address (the cluster IP address) with a multicast MAC address. In these situations, you must upgrade or replace the router. If that’s not possible than you can’t use multicast.
  • So you’ll need to talk to your network people (or to yourself if you do the networking as well) to get it figured out and see what they prefer, allow, tolerate and support.
Virtualization comes into the picture

In a virtualize environment the discussion on the “best” way of preventing port flooding also changes a bit. You don’t need so many physical ports but they do often become more scares and valuable as the number of NIC ports on the virtualization hosts are limited. Also a lot of virtualization technologies need their specific little tweaks to get stuff working right depending on the version etc.

Closing thoughts on unicast/multicast

So in the end when choosing between unicast and multicast NLB take a long had look at the environment, the possibilities and needs, the politics, available skillsets than pick the one that is best suited for that particular situation. It’s not that on an issue until you meet some CISCO or Juniper networking guru’s who’ll jab on for hours on how the NLB/multicast implementation sucks.

In part 2 we’ll talk a bit about subnets, default gateways, routing, forwarding and the strong host model in Windows 2008 (R2).