This is part 2 in series on Windows Network Load Balancing. Part 1 can be found here: https://blog.workinghardinit.work/2010/07/01/reflections-on-getting-windows-network-load-balancing-to-work-part-1/
On Default Gateways, Routing & Forwarding.
Here’s a bullet list of what people tend to trip over when configuring NLB network settings.
- No support for multiple Default Gateways that are on multiple subnets
- The default gateway does not have to be empty on the NLB NIC
- The Private and the NLB NIC can be on separate or the same subnets
- You can have multiple Default Gateways if they are on the same subnet
- Don’t forget about static routes where and when needed.
- Beware of the strong host model in Windows 2008 (R2) for both IPv4 & IPv6 (WK3 it was only for IPv6)
- Mind the order of the connections in Adapters and Bindings.
Now let’s address the subjects in this list.
No support for multiple Default Gateways that are on multiple subnets
When using IP addresses from different subnets you cannot have a default gateway on every NIC because that will cause routing issues. This is not different for the NIC’s used in Windows NLB. So you can have only one NIC with a Default Gateway and if the other NICs need to route somewhere you need to add static persistent routes. Those routes must be persistent or they will not survive a reboot of the server. In the figure below you see a classic two NIC NLB cluster with the Default Gateway Empty on the NLB NIC. This could be a valid setup for an intranet. You can add routes for the subnet in the company that need to be able to talk to the NLB Cluster and you’re golden. The Private NIC gets a default gateway and acts like any other NIC in your network.
In this example we have the Default Gateway on the Private NICs they can route internally and to the internet. If you need traffic to & from the internet form the NLB NIC you could enable forwarding on the NLB NIC or enable weak host behavior which can be done more atomic than what you achieve by enabling forwarding. If you only need to route internally we could use the same approach of enabling forwarding instead of adding static persistent routes for the NLB NIC. But then you don’t isolate & protect traffic that neatly and it will route to everywhere the default gateway can get.
So we prefer to play with static persistent routes in this case. We’ll briefly look at some examples now. If you only need to route internally (i.e. to reach the database or a client PC) from the NLB NIC we add the needed static persistent routes on the NLB NICs using the route command.
In order for the NLB NICs to reach the database with strong host model and no forwarding enabled:
Route add -p 10.30.0.0 mask 255.255.0.0 10.10.0.1
To reach the client PC’s:
Route add -p 10.20.0.0 mask 255.255.0.0 10.10.0.1
(Using route print you can look at the routes and using route delete you can get rid of them.)
Or by using netsh, (it’s advised to use netsh from Windows 2008 on)
netsh interface ipv4 add route 10.30.0.0/16 “NLB NIC” 10.10.0.1
netsh interface ipv4 add route 10.20.0.0/16 “NLB NIC” 10.10.0.1
(you can look at the routing table by using netsh interface ipv4 show route, with netsh interface ipv4 delete route you get ridd of then, see http://technet.microsoft.com/en-us/library/cc731521(WS.10).aspx for more information.
You could also connect to the database over the PRIVATE NIC and then you don’t need that route. If you can configure it like that it’s a good solution. But all situations differ.
You can also play with the weakhost / stronghost model behaviour:
netsh interface ipv4 set interface Private NIC weakhostsend=enabled
netsh interface ipv4 set interface Private NIC weakhostreceive=enabled
netsh interface ipv4 set interface NLB NIC weakhostsend=enabled
netsh interface ipv4 set interface NLB NIC weakhostreceive=enabled
Now don’t just blindly enable on every NIC you can find on the server. Test what you really need and use only that. I leave that as an exercise to the readers. It really depends on the situation and needs for your particular situationJ. Keep in mind that when you enable weakhostsend and weakhostreceive on every NIC this reverts your Windows 2008 servers back to Windows 2003 behavior and this might not be needed or wanted. So just enable what you need for optimal security.
Naturally enabling forwarding will do the trick as well, as this creates a weak host model. Depending on how many NICs you use and how traffic must flow you might have to do it on more than one NIC, normally the one(s) without a default gateway.
netsh interface ipv4 set interface “NLB NIC” forwarding=enabled
If you want to see the configuration of the NIC you can run:
netsh interface ipv4 show interface l=verbose
That will produce something like below:
Interface Local Area Connection Parameters
IfLuid : ethernet_5
IfIndex : 3
State : connected
Metric : 10
Link MTU : 1500 bytes
Reachable Time : 21500 ms
Base Reachable Time : 30000 ms
Retransmission Interval : 1000 ms
DAD Transmits : 3
Site Prefix Length : 64
Site Id : 1
Forwarding : disabled
Advertising : disabled
Neighbor Discovery : enabled
Neighbor Unreachability Detection : enabled
Router Discovery : dhcp
Managed Address Configuration : enabled
Other Stateful Configuration : enabled
Weak Host Sends : disabled
Weak Host Receives : disabled
Use Automatic Metric : enabled
Ignore Default Routes : disabled
Advertised Router Lifetime : 1800 seconds
Advertise Default Route : disabled
Current Hop Limit : 0
Force ARPND Wake up patterns : disabled
Directed MAC Wake up patterns : disabled
The default gateway does not have to be empty on the NLB NIC
It is not a hard requirement to leave the Default Gateway on the NLB NIC empty and put it on the private NIC. You can set it on the NLB NIC and leave the private NIC’s gateway empty instead. An example of this you can see in the demo. This is the best choice in my opinion when you need the NLB NIC to route to destinations you don’t know how to reach, i.e. the internet, so for public websites. The prime function of the default gateway is exactly to help with that. When you don’t know where to send it, send it to the Default Gateway. If you need to reach other internal subnets from the Private NIC, just use static routes. Don’t use the NLB NIC as that is internet facing in this case. You can see an example of this in the figure below. Also in this case you’ll find that you do not have to enable forwarding on the NIC using netsh, as the NIC that has to answer to the unknown IP Address has the Default Gateway. This setup works great for example in a managed domain environment for internet access where the NLB NICs are internet facing and the private NIC is for management, Active Directory, Backups, etc.
In this example we have the Default Gateway on the NLB NICs so it can route internet traffic. Any routes needed in the Private NIC subnet are added as persistent static routes. An example of this is to reach the database server.
As traffic from the Private range is never supposed to go via the NLB Public range and vice versa we do not need to care about forwarding or strong host /weak host models. We can keep traffic nicely separated and that is a good thing. If you build this on Windows 2008(R2) just like you did on Windows 2003 it would work out of the box and you might not even know about a change in default behavior from weak host model to strong host model.
To get the PRIVATE NIC to reach the database server you’d add static routes and be done with it.
Add needed static persistent routes using the route command:
Route add -p 10.20.0.0 mask 255.255.0.0 172.16.2.1
Or by using netsh, (it’s advised to use netsh from Windows 2008 on)
netsh interface ipv4 add route 10.20.0.0/16 “PRIVATE NIC” 172.16.2.1
No requirement to have different subnets for Private and NLB NICs / Multiple Gateways When the subnets are the same
There is no requirement to have different subnets for every NIC. Sometimes I read that this is a requirement on forums when someone is having issues but it’s not. You can also experiment with multiple Default Gateways if they are on the same subnet (WARNINGS APPPLY*)
So here you can play with giving every NIC a default gateway (same subnet, so no issues), with static persistent routes, with enabling forwarding and weak host / strong host configuration. I tend to use only one gateway and use static persistent routes. If I need to relay I’ll go for weak host minimal configuration or revert to forwarding.
WARNINGS APPLY*: When you start having multiple NIC’s for multiple NLB Clusters on the same NLB nodes, things can get a bit complicated and unpredictable. So I prefer only to use a default gateway on both NICs when you have two NIC , one for private (management) traffic and one for the NLB cluster traffic. Once you have multiple NIC’s for multiple NLB clusters (1 private NIC + 2 or more NLB cluster NICs) you can no longer play this game safely, even if they are all on the same subnet, without running into trouble I have experienced. You can get an event id 18 “NLB cluster [X.X.X.X]: NLB detected duplicate cluster subnets. This may be due to network partitioning, which prevents NLB heartbeats of one or more hosts from reaching the other cluster hosts. Although NLB operations have resumed properly, please investigate the cause of the network partitioning” . Also in this situation you can’t have a default gateway on the management NIC and one on one of the NLB NIC’s without a default gateway on the second NLB NIC. Forget that. You can get issues with a node remaining in “converging” forever and what’s worse the NLB cluster will send traffic to all nodes so 1/x connections will fail. Rebooting one node might help but once you reboot ‘m both you run the risk of this happening and you really don’t want that. Once you dealing with multiple cluster IP addresses on multiple separate NIC’s you’d better stick to one default gateway on one of the NIC’s and nowhere else. This kind of makes me wonder if it’s pure luck that it works with 2 cluster NICs or not, with multiple and with reboots of the nodes I know we run into trouble and that’s no good.
It’s also smart not to mix static routes with forwarding to achieve the same thing. And please have the exact same configuration on very particular NIC on every node. Not one node with NLB NIC 1 routing via static routes and the other node using forwarding on NLB NIC 1. That’s asking for inconsistent behavior.
We’ll briefly look at some examples now.
If you only need to route internally (i.e to reach the database or a client PC) we add the needed static persistent routes on the NLB NICs using the route command.
In order for the NLB NICs to reach the database with strong host model and no forwarding enabled:
Route add -p 10.30.0.0 mask 255.255.0.0 10.10.0.1
To reach the client PC’s:
Route add -p 10.20.0.0 mask 255.255.0.0 10.10.0.1
(Using route print you can look at the routes and using route delete you can get rid of them.)
Or by using netsh, (it’s advised to use netsh from Windows 2008 on)
netsh interface ipv4 add route 10.30.0.0/16 “NLB NIC” 10.10.0.1
netsh interface ipv4 add route 10.20.0.0/16 “NLB NIC” 10.10.0.1
(you can look at the routing table by using netsh interface ipv4 show route, with netsh interface ipv4 delete route you get ridd of then, see http://technet.microsoft.com/en-us/library/cc731521(WS.10).aspx for more information.
You can also just enter the default gateway on the NLB NICs as well. All NICs are on the same subnet this will cause no issues. Just remember that traffic will also go to where ever that gateway routes, even to the internet.
We already know we can play with the weakhost / stronghost model:
netsh interface ipv4 set interface Private NIC weakhostsend=enabled
netsh interface ipv4 set interface Private NIC weakhostreceive=enabled
netsh interface ipv4 set interface NLB NIC weakhostsend=enabled
netsh interface ipv4 set interface NLB NIC weakhostreceive=enabled
Again don’t just blindly enable on every NIC you can find on the server. Test what you really need and use only that. I leave that as an exercise to the readers. As I’ve said before, it really depends on the situation and needs for your particular situation. Keep in mind that when you enable weakhostsend and weakhostreceive on every NIC this will just revert your Windows 2008 server into Windows 2003 behavior and this might not be needed or wanted. So just enable what you need for optimal security.
There is a very good explanation of strong and weak host behavior by “The Cable Guy” at http://technet.microsoft.com/en-us/magazine/2007.09.cableguy.aspx I strongly advise you to go take a look.
And naturally enabling forwarding will do the trick in this scenario as well, as this creates a weak host model. Depending on how many NICs you use and how traffic must flow you might have to do it on more than one NIC, normally the one(s) without a default gateway.
netsh interface ipv4 set interface “NLB NIC” forwarding=enabled
When & Why Use Three NICs or more?
NLB supports using multiple network adapters to configure separate clusters. This allows for configuring multiple independent clusters on each host. We used to have only virtual clusters meaning that you could configure multiple clusters on a single network adapter. Anyone who ever had to trouble shoot some networking or configuration issues on a production NLB will appreciate the ability to limit interruptions and problems to one cluster instead of 2 or more. As an example of this I had to trouble shoot a CAS/HUB Exchange Implementation two node NLB implementation. The NLB Cluster of the CAS role had this very issue, but since it was running on its own cluster with a separate NIC the HUB role NLB cluster has no issues what so ever. Another good reason to use more NIC is to separate traffic, for example FTP versus HTTP on the same NLB Cluster.
One of the worst things that can happen is that an issue messes up the proper functioning of the NLB itself. That way even if the virtual IP remains available no host or only some of the hosts get network traffic. That means the cluster is unavailable or is only partially responding. This is a bad situation to be in and can be hard to trouble shoot. Since it’s a high availability technology you can bet someone is looking over your shoulder that has a vested interest in getting that resolved as soon as possible.
Mind the order of the connections in Adapters and Bindings
Make sure the PRIVATE NIC that is to be used for private network traffic (DNS, AD, RDP, …) is listed first. That prevent any issues (speed, functionality) of those services and you experience will be much better. This is illustrated in the figures below. LAN-HUB is the PRIVATE NIC here. The others are for NLB (yup it’s an Exchange 2010 setup).
Conclusion & recapitulation
I’ll finish with some closing musings on single & multiple default gateway and getting/sending network traffic where it needs to go.
When you enter a gateway on the second, third and so on NIC next to the one on the first NIC you’ll get a warning:
—————————
Microsoft TCP/IP
—————————
Warning – Multiple default gateways are intended to provide redundancy to a single network (such as an intranet or the Internet). They will not function properly when the gateways are on two separate, disjoint networks (such as one on your intranet and one on the Internet). Do you want to save this configuration?
—————————
Yes No
—————————
This will not work reliable when you have multiple subnets. This is why you use static persistent routing entries. Depending on your needs you can also use forwarding or the weak host model and even combine those with static persistent routes if needed of desired. Now the above also means that if you have multiple NICs with IP addresses on the same subnet you can indeed enter a Default Gateway on all of them.
If you don’t have or cannot have a Default Gateway filled in you are left with two options. If you know what needs to go where you can add static routes, which is basically telling the NIC the IP of a gateway to send traffic to for a certain destination. This is assuming you can reach that IP and that the traffic is not from a source/ to a destination that has no route defined and firewall allow for it, etc.
If you have no route or you can’t specify one (i.e. you can’t predict where traffic will have to go) you have one other option left and that is to route the traffic via the NIC that does have a Default Gateway. This used to work out of the box on Windows 2003 and earlier, but it doesn’t work out of the box since Windows 2008 (R2). That is because by default NICs in Windows 2008(R2) operate in a strong host model. So it will not receive or send traffic destined for some other IP than itself or send traffic originating somewhere else than itself. For that you’ll need to set the NIC properties to weak host send and receive or you need to enable forwarding. Actually forwarding is disabled by default on Windows 2003 as well. The big difference is that Windows 2003 operates in a weak host manner (send/receive) as opposed to Windows 2008 (R2) strong host mode. By enabling forwarding we put the Windows 2008 server in weak host mode and as such it works (see RFC1122). On the internet you’ll find both solutions, but the link between the two is often never made. Using weak host receiving and weak host sending allows for more atomic, custom configurations than forwarding.
Contact me via the web site or leave a comment if you have any questions or suggestions.
Post Script / Side Note because someone asked J Basically you can have multiple gateways on a server but only one default gateway. You can add more than one default gateway on the same NIC but then they will only be used when the default gateway filled out in is not available, it will then try the next one and so forth. You can add multiple gateways to a single NIC or one or more to multiple NICs but that can, get messy very quickly. Whether it is wise to provide gateway redundancy in such a manner is another discussion. See also KB article http://support.microsoft.com/kb/157025. Be mindful of the extra configurations you’ll need (Dead Gateway Detection). This is a rather uncommon scenario on a windows server. You can use it for redundancy or when you want the traffic to go to a certain default gateway instead of another when it is available (so separate traffic for example for cost or to reduce the traffic load). And then there’s adding a default gateway that’s on another subnet than the IP address of the NIC. In that case you get this warning:
————————— Microsoft TCP/IP ————————— Warning – The default gateway is not on the same network segment (subnet) that is defined by the IP address and subnet mask. Do you want to save this configuration? ————————— Yes No ————————— All pretty cool stuff you can do to mess with peoples head and understanding of what’s going on (it can work if the router on the local subnet has a route the subnet where that default gateway lives and PROXY ARP is working … but we’re not going to turn this into a networking course or pretty soon we’ll be installing RRAS and turn the server into a router. |
Hi,
I am setting up a website with NLB for high availability. I have two identical webservers: with Public IP 121.96.x.63, and 121.96.x.66 with VIP of 121.96.x.61. It works well when I use a client computer belonging to the same IP range (say computer going through my ISA server with IP 121.96.x.59). But when i go outside this range (for example I surf this site using an account with public IP 10.8.67.3, other location),i only see “cannot display the webpage”.My webservers have two NICs each (local and public) with NLB at UNICAST mode.
What version of windows are you running? Which of the 2 NIC’s holds your gateway and does can that route to any address, ie the internet?
Oh i forgot to mention, its Windows Server 2003 SE. Currently the Local NIC Gateway is blank while both Public NIC’s Gateway are set at 121.96.x.1 (provided by our Internet provider)
Sounds correct. I would test the routing with a non NLB IP address to rule routing issues out.
I am trying to set this up using vmware esx 4.
I have 2 servers(each on a seperate host).
each server has 2 nics, nic1(lan nic) and nic2(nlb nic).
both nics are configured with static IP addresses on the same subnet
On NIC2, i have disabled dns registration and netbios. nic2 also does not have a default gateway as it does not let me input one.
I setup the NLB as a multicast as specified by vmware.
The only way i can ping the VIP is the following:
1. From a server on the same subnet, and this server has to be on the same esx host.
The vip is not pingable from any subnet(including its own) on a different esx host.
*as a side note/question, i cannot ping the nlb nic from anywhere..granted this nic is not configured with a default gateway
Help!
Couple of things to check:
Can you route a sever on that subent without configuring NLB form another subnet, in other words does routing work at all?
For the NLB NIC: remember weak host sending and receiving in Windows 2008 (R2), to enable forwarding …
What operating system are you on?
NLB NIC should allow an gateway, it might give a warning with some advice but you should be able to put one in.
Hey, really nice doc. Thanks you for writing it. 🙂
You’re most welcome. Glad you got something out of it.
Hey.
I’m confused. I have 2 2008 R2 servers in NLB. There’s a known problem on this OS that clients from other subnets cannot reach the NLB virtual address.
I Understand i should use ARP -s command to set a perminent record of my default gatewaty.. i cannot use ARP on my network.. Is there a parallel command in netsh?
thank u.
make sure you have the correct routing on the network. Check also that you have uses forwarding or weak hoste settings in Windows 2008 R2. Don’t forget to add a static route if you need/want to. I don’t know the details of your setup and whether you need to use arp-s or why you can’t use it but you could use netsh like below. Please DO NOT FORGET to run the cmd prompt with elevated permissions.
Delete entry : netsh interface ip delete neighbors “NICNAME” “GatewayIP”
Add : netsh interface ip add neighbors “Network card name here” “Gateway.IP.goes.here” “MAC-address-of-gateway-with-dash-here”
I tried to use that command… but then there’s no communication at all to my server. i can’t ping it from anywhere..!
Hello. You’ll have to check your switches, routing, server configuration, NLB setup firewall etc. It’s impossible to pinpoint your issue without all the details. There are many factors involved here. If you need help I suggest you document your configuration and post that to the Microsoft forums, perhaps http://social.technet.microsoft.com/Forums/en-US/winserverClustering/threads is a good choice. That way all people frequenting tha forum can help out and others can learn from it as well.
Best Regards
Thanks a lot , your ARP command solved a problem I was battling with ALL day!!!!!!!!!
Many many kudos on this WNLB effort. This is by FAR the best online writing I’ve found to date that puts it all together and so adequately addresses the surrounding confusion on the subject. I’ve been assigned the task of discovering (yes, exists but not documented), reviewing, validating and providing recommendations for the necessary remediation to address our intermittent connectivity issues affecting our business critical WNLB applications running in a virtualized infrastructure. All I have been able to find up to now are some of the references you’ve provided here. Again, very well done and much appreciated. BTW do the Win 2008 R2 configurations apply to Win 2008 SP2 also?
Thanks in advance.
Mike M.
Hello Mike, thank you for your kind words., they are much appreciated. This blogpost might even never be really finished as I still learn everytime I have to come up with an WLMB solution. To answer your quesiton, yes, what’s written here is valid for Windows 2008 and Windows 2008 R2. The new functionality in W2K8R2 for NLB is discussed here http://blogs.msdn.com/b/clustering/archive/2010/09/14/10061515.aspx but this doesn’t change the network configuration.
Thanks a lot for this – very informative. I’ve just been tasked with bringing up our first 2K8 loadbalanced cluster – everything works fine in 2K3, but it’s certainly ‘different’ in 2K8 :). The weak host/strong host thing had me baffled for a while – despite reading a couple of very dry Technet and MSDN blogs about it. The routing information here tied it all together nicely – cheers!
I’m glad it was of use to you. Thx for the feedback 🙂
I’m scheduled to attack remediation tomorrow night. There may be other questions before then but one thing looms right now. As it refers to MS E2K7 SP1 OWA running on W2k08 SP2 – one node of the NLB Multicast Mode (single NIC, which will be dual NIC tomorrow 🙂 ) cluster has HTTP-Keep Alive enabled and the other doesn’t. Do you know what’s the best practice is here? Keep HTTP-Keep Alive enabled on both or disable for both nodes? Can’t seem to find much on this online.
Thanks again,
Mike M.
Leave it on, that is the default. Don’t change it unless specifically instructed to do so by the application vendor or your developers/engineers. It can cause issues when just disabled and I have never seen MS tell us to disable it, I have seen them tell people to enable it again for OWA.
Hi, let me preface this by saying thank you so much again for your efforts here as well as the feedback as it was so instrumental. I also hope this post may help someone else as you have helped me. Okay so last Sat 1/24 I implemented NLB reconfiguration/corrective actions based on my review/discovery and testing including the following. Switched from a single to dual NIC WLBS NLB multicast mode cluster, reset all the VMware hosts vSwitch and port-groups “notify switches settings” to Yes, configured all W2k8 roles, features, services, network settings and updates of the two E2K7 SP1 CAS/Hub transport NLB nodes to match identically and configured the VMware hosts’ up-linked load-balanced dual Cisco 6500 switch configuration with the recommend static ARP and “mac-address-table static mac-address vlan-id interface-type disable-snooping” command. Everything went great except for 2 unexpected results.
1. The NLB cluster would not converge until I put both VM CAS/Hub transport NLB nodes on the same VMware host. Subsequently I put them on separate hosts and the cluster remained converged. I’m still trying to figure out why that happened.
2. Adding the “mac-address-table static mac-address vlan-id interface-type disable-snooping” command to both switch A and B caused switch B not to route the multicast packet traffic to the VIP until the command was removed from switch B. So it works fine with the configuration only on switch A. But what happens if we lose switch A? I have a service request ticket open with VMware PSS.
Thanks again and hope someone can benefit.
Hello. Thanks for you detailed feedback. It’s been a while since I played with VMWare and WNLB but doesn’t the notify switch setting has to be put on “No”?
Hi, as per VMware KBs and PSS the “No” is for Unicast mode. you can find here: VMware KB Article: 1006525 – bullet point # 5, KB Article: 1006558 – under Additional Information – “For use with VMotion, ensure Notify Swtiches is set to Yes.”, KB Article: 1556 – Note # 8 (No for Unicast mode)
Mike M
Any change you can do a test with a layer 2 CISCO switch? That way you could drop the disable-snooping and you might have more success.
Can’t change switches but did re-add mac-address-table config and dropped “disable-snooping” on both switchA and switchB still no go (no routing to VIP on switchB). So removed config from switchB again. Here’s Show mac-add….. on coreA “edited for post”.
COREA#sh mac-address-table multicast
vlan mac address type learn qos ports
—–+—————+——–+—–+—+——————————–
90 0333.3333.3333 static Yes –
40 NLB0.VIP0.MAC0 static No – Gi2/2,Gi2/3,Gi2/4,Gi2/5
Gi2/6,Gi2/7,Gi2/8
40 04444.4444.4444 static Yes –
COREA#sh mac-address-table multicast igmp-snooping
vlan mac address type learn qos ports
—–+—————+——–+—–+—+——————————–
90 0333.3333.3333 static Yes –
40 0444.4444.4444 static Yes –
No snooping for NLB VIP so maybe it’s okay. But just wanna make sure we’re okay if we lose switchA.
If CISCO says you need it for WNL you probably do. It would be nice to hear from VMware on this one to hear what they have to say on this combination.
Happy to report issue # 2 from my post on January 27, 2011 at 05:50 has been resolved by our Cisco guy by adding “Port-channel1 and Port-channel100” to the ““mac-address-table static” command on each Cisco Core A and Core B switches. BTW my post on January 28, 2011 at 05:23 should have indicated Core A and Core B not just A. Also, VMware PSS has not gotten back to us with a resolution. Guess I’ll let them know when they do.
Mike M.
Wow, very nicely written article! This was very informative and concise.
One question I had, if you put the SQL Server Database on the same VLAN as NLB Node1 and Node2, how would the static routes be configured?
I am trying to make traffic come in and out of NLB NICs but to have Private NICs access the SQL Database and feed results to NLB Nics.
Technet Question Link: http://social.technet.microsoft.com/Forums/en-US/winserverClustering/thread/d34b0d24-41b2-4142-b83d-500182ab9add/
So both NLB NIC and Private NIC are on the same subnet. This means on’t have any need for static routes. Giving both NICs the same gateway leaves you with the issue which one will be used and can also lead so some issues with partitioning. One way to deal with that is to give only the one that needs to route the gatway and the other not, which makes it very clear how traffic gets routed, if it gets routed if. If you leave both gateways filled in you can play with the metrics to determine which one is used for routing. This also prevents some partioning or other converging issues during a reboot of the nodes.
So If the NLB NICs are used for a web site, service or application on the intra and if doesn’t need to route, leave that gateway blanc and fill it in for the Private NIC or vice versa. Now to force access to the database over the private NIC … well they are on the same subnet so this is a hard one. Gateway is server wide, choice of NIC …….. mmmmmm from the top of my head try playing with the NIC order in advanced setting and set the private NIC first if your app chooses the first nic that might work. If not give the private NIC a lower metric to make it the most cost effective, so it is chosen over the other.
Thanks for your reply, appreciate it. I think to avoid any issues, I will configure the network exactly as pictured above. I was under the wrong impression for my setup. Thanks again. Great blog 🙂
Lots of good info here. However, I’m still not clear why you would “need” 2 NIC’s in Unicast if you only have a single internal VLAN? Lets say you have 4 web servers in an NLB farm on the 10.10.10.x network. Your backend SQL server is also on 10.10.10.x. You have a simple firewall that NAT’s from the NLB VIP = 10.10.10.100 to/from the Internet. External users hit the site using the NAT’d public IP, which maps internally to 10.10.10.100 and the web servers then talk to the SQL server at 10.10.10.200. What benefit do you gain from multiple NIC’s in this scenario?
Hello, Thanks for your comment.
There are two main reasons to have 2 NICs. One of them is related to unicast.
1) Separating the front-end traffic from the back-end traffic (i.e Web server reeaquest by client from communication with Database)
2) Inter-host communication in unicast mode
In unicast mode, each host in the cluster has the same IP Address and the same MAC Address. This makes them look identical from a networking perspective. So, unicast mode has with one NIC prohibits communication among the hosts of the cluster.
Thanks for the useful info. I was having issues with hosts in other subnets connecting to my CAS array and clustered SMTP instance because I didn’t have a default gateway set on the NLB NICs. Thanks for pointing me in the right direction.
Great information about NLB. I have set up 2 Exchange 2010 CAS/HUB servers in a NLB array with Dual-Nic configuration. “Private NIC” has the default gateway and the “NLB nic” has no gateway. I did the “netsh interface ipv4 set interface “NLB nic” forwarding=enabled” and everything works, the VIP IP-address is responding. But then I noticed that the NLB port rules doesnt work as expected. I have set up port rules for the outlook rpc ports only, but the NLB VIP IP-address is answering on port 21, 80, 443 etc? Why? Is it because of the forwarding=enabled?
//Themac
Something must be listening on those ports on that IP address.
Very Useful post, i always prefer to do this kind of configuration only but here i have very different requirement i,e. WE have two different regional sites one site is in Dubai and another site is in Russia. We have created Windows NLB in Dubai site server with Subnet of 192.168.0.11 and 192.168.0.12 and Virtual IP is 192.168.0.100. Now i want to add Russia site server in Dubai site windows NLB. WE haven’t decided Subnet for Russia site. As per my understating we can not create Windows NLB on two different subnet so what are the possibilities i can workaround.
What is the connectivity between those two sites? You could create streched VLAN that + make sure that the connection is rock solid you be coverging evrytime there is a network hiccup.
Depening on the type of NLB (# NIC used, unicast/multicast) and requirements/business needs you could make this work but somehow thinking about the costs, effort & possible reliability issues you might very well be better of using a “Geographical Load Balancer Appliance” or service to deal with this. And than you still have to account for how stateless of sticky your affinity can or must be.
Whatever route you go I’d do a POC cause this sounds like a very tall order for Windows NLB to handle.
Currently we are using IPSec VPN but we have requested our ISP to upgrade line to MPLS.. There will be no more connectivity issue.
Client is ready to invest money on any #of NIC. Personally i preferred Multicast protocol but due to security reason ISP dont allow Multicast to be broadcast on network.
Is that possible to place static route of V.IP and Static IP of Russia site from ISP and broadcast MAC Address.
Given the the details, without going into it myself, and my experience with NLB, tall order indeed. I’d go with a reasonably priced Load Balancer Appliance like ‘Kemp Technologies’. In fact if you do bit of research you’ll fine Microsoft it self is giving up on NLB -very little development on it. But if you must, I couldn’t think of a better site to come for assistance-Microsoft included. This site certainly bailed me out of a difficult situation a couple of years ago. Best. MM
Hi, As already mentioned this is by far the best and most accurate post I have come accross. However, I am still confused…I have been asked to set up NLB on two windows 2008 r2 VMs running on different ESX hosts in seperate datacenters, on different subnets and vlans. I have managed to get the two nodes converged, but that is all. The vIP is on the same subnet as host1.
If I disable the NLB Nic in host1 the vIP is unreachable (meaning it is not working on host2). Unable to ping vIP from outside of vIP’s subnet.
Each VM has two nics one public and one NLB,I have enabled forwarding, weakhostsend and weakhostreceive on both VMs. I have added a static route to NLB Nic to the default gateway address of public Nic. I have changed the binding order of the Nics. As recommended by VMware I am trying multicast but it also doesn’t work in unicast. There are lots of other posts regarding static ARP tables etc but I am not sure what should be entered in to these. Any pointers would be very much appreciated.
You have to stretch the subnet across the data centers to make this work. The NLB VIP needs the member hosts to be on the same subnet.
Thanks for quick response. I thought you could have member hosts on different subnets and just routed to each other
Thanks for above – one more quick question – would nodes converge if there was still a routing issue between sites? I have both nodes converged and can ping the vIP from each node but not outside of subnet – is it now just static routes and neighbors on nodes themselves?
Any of the possibilities in the answer given here http://social.technet.microsoft.com/Forums/en/winserverClustering/thread/7fc1e2ff-413e-4239-9594-d3205d4bb829 apply?
Another quick one, now all nodes on stretched subnet between datacenters – I still need to add ARP routes for the vIP as it is still routed – is that right?
All other rules still apply I’d say, if you needed them in one location for the subnet you’ll need ‘m here as well.
First of all, thank so much! The articles are excellents! I’m trying to figure out the best solution to make a cluster between 2 cities, made by 4 computers, 2 in each city. We have a private net between provided by a third company. The idea is just for security, we need 4 machines identical, running web servers, with identical databases syncronized, and at least one syncronized folder. This structure will run a system like youtube, but private, with 2 interfaces: web and desktop. Is not a huge system, the worst case will be 1000 users at the same time. The videos with low resolution will be stored in this cluster. I didn’t found what to do. It will run windows server, but we don’t have money to pay to a Enterprise license. So the idea is run NLB, but between the two cities probably is going to have just one net. And between the 2 machine in each city we could use all the NICs that we need, because is going to be our lan, not from the third company. Don’t you the best solution for this case?
Between cities is a challenge. Have you tested your application and use case with NLB already, to make sure it’s uitable for that appraoch. That would be my first step. The link between the sites and the amount of traffic would be a pont of concern. “Youtube” like sounds like it could be heavy traffic going between places.
Not sure if anyone still looks at this but I have a weird issue:
I have two 2008 R2 servers with 2 NICs each that are NLB together. 1 NIC on each box is for NLB and the other is also on the same subnet and same gateway. Both servers are communicating over NLB (converged status) and it is working properly and properly serving up sharepoint. However, the issue is that you can not access the internet from the server (can not navigate to google.com)!! The NLB NIC says it has internet, but the other one does not (but they both should and other servers on same subnet do not have this issue, so i believe its the NLB that is redirecting). I have tried the forwarding cmd and just about every combination of enabling/disable NICs, but no success. Any help would be appreciated!
I also want to add, this has been working for about a year. A reboot caused all of this.
Pingback: Windows NLB On Windows Server 2012 R2 Hyper-V: A Personal Preferred Configuration Using IGMP With Multicast | Working Hard In IT
Configured 2 node Win 2012 NLB cluster on vmware 5.1 in multicast mode. Each node has 2 nics,with GW on both NICs. Everything works fine. Only problem is Event 18 on both the nodes: “NLB cluster [x.x.x.x]: NLB detected duplicate cluster subnets. This may be due to network partitioning, which prevents NLB heartbeats of one or more hosts from reaching the other cluster hosts. Although NLB operations have resumed properly, please investigate the cause of the network partitioning”. One of the nodes remains in converging state for a long time and when this happens VIP would not respond to rdp. Please suggest.
NLB NIC:-
C:Windows>netsh interface ipv4 show interface 26
Interface NLB Parameters
———————————————-
IfLuid : ethernet_17
IfIndex : 26
State : connected
Metric : 10
Link MTU : 1500 bytes
Reachable Time : 24000 ms
Base Reachable Time : 30000 ms
Retransmission Interval : 1000 ms
DAD Transmits : 3
Site Prefix Length : 64
Site Id : 1
Forwarding : disabled
Advertising : disabled
Neighbor Discovery : enabled
Neighbor Unreachability Detection : enabled
Router Discovery : dhcp
Managed Address Configuration : enabled
Other Stateful Configuration : enabled
Weak Host Sends : disabled
Weak Host Receives : disabled
Use Automatic Metric : enabled
Ignore Default Routes : disabled
Advertised Router Lifetime : 1800 seconds
Advertise Default Route : disabled
Current Hop Limit : 0
Force ARPND Wake up patterns : disabled
Directed MAC Wake up patterns : disabled
ECN capability : application
Management NIC:
C:Windows>netsh interface ipv4 show interface 13
Interface Prod Parameters
———————————————-
IfLuid : ethernet_11
IfIndex : 13
State : connected
Metric : 10
Link MTU : 1500 bytes
Reachable Time : 30000 ms
Base Reachable Time : 30000 ms
Retransmission Interval : 1000 ms
DAD Transmits : 3
Site Prefix Length : 64
Site Id : 1
Forwarding : disabled
Advertising : disabled
Neighbor Discovery : enabled
Neighbor Unreachability Detection : enabled
Router Discovery : dhcp
Managed Address Configuration : enabled
Other Stateful Configuration : enabled
Weak Host Sends : disabled
Weak Host Receives : disabled
Use Automatic Metric : enabled
Ignore Default Routes : disabled
Advertised Router Lifetime : 1800 seconds
Advertise Default Route : disabled
Current Hop Limit : 0
Force ARPND Wake up patterns : disabled
Directed MAC Wake up patterns : disabled
ECN capability : application
I got your reply: Try DGW on only 1 NIC (exposed to the clients) use static persistent routing for answers that need to go via 2nd NIC (no DGW)
Both the NIC’s are using the same subnet. NLB not exposed to internet.
Which NIc should i choose for DGW assignment?
NIC 1 – Management: This is used for server to function.
NIC 2 – NLB : Only used for NLB.
How to do this “use static persistent routing for answers that need to go via 2nd NIC ” ?
Help!!! I have 2 Physical servers with CAS role, now we have 2 VM servers newly installed and when we add the NLB VMnics along with Physical NLB nics, the VM nics are converged however Physical nics goes to converging from converged. Any settings that need to made while adding vmnics to NLB.
Did you enable MAC spoofing? See http://blogs.msdn.com/b/clustering/archive/2010/07/01/10033544.aspx
Virtual machines are on VMware not hyper-v and we are trying to configure multicast.
There’s lots of parts involved, could be a number of things (network switches firmware support / config ..) Start here http://www.bing.com/search?q=VMware+Multicast+WNLB&src=IE-SearchBox&FORM=IENTSR
Hi There,
Just to confirm, Im running two Cas servers in a VMware environment. I have only 1 subnet (we’ll call it the .1 network) I did the Mac spoofing suggestion (as per vmware) of turning off the Notify switches on a dedicated Port group I called “NLB-Data”. This port group on the same subnet (and Vlan of the .1 network) as the production/mgmt nics (Main sever nics with default gateway defined). Ive added an addional nic to the Vm on this specially configured port group (NLB-Data) and ip’ed them (with a .1 address) withOUT a default gateway and labeled them as “NLB”. I setup the NLB selecting the nic labeld as NLB, and create the VIP (vip on the .1 network) Added the second node following same method, and all is green in NLB manger. I ran the netsh command and enabled forwarding on the “NLB” interface (no gateway nic) on both of the CAS servers. All seems good. I can reboot and refresh and all status is green. I assume this is the correct setup? (the OS is server 2012)
Thanks, very awesome write up!
Actually, I dont have this enabled yet on my CAS servers, but I DID do this for an ADFS POC deployment. I want to ensure this setup is correct before I apply to our exchange environment.
It depends on how the traffic flow but as you have only one subnet any way, you could just put the gateway on the NLB nics you use for the VIP. All routing will be done there. The second NIC is only for management, normally not needing routing or if it does, a static route would do as well. Muliple permutations are possible, depends on needs/possibilities. Also see https://blog.workinghardinit.work/2014/03/28/windows-nlb-on-windows-server-2012-r2-hyper-v-a-personal-preferred-configuration-using-igmp-with-multicast/ and https://blog.workinghardinit.work/2014/03/24/windows-nlb-nodes-misconfigured-after-simultaneous-live-migration-on-windows-server-2012-r2/ and