Virtualization with Hyper-V & The NUMA Tax Is Not Just About Dynamic Memory

First of all to be able to join in this little discussion you need to know what NUMA is and does. You can read up on that on the Intel (or AMD) web site like http://software.intel.com/en-us/blogs/2009/03/11/learning-experience-of-numa-and-intels-next-generation-xeon-processor-i/ and http://software.intel.com/en-us/articles/optimizing-software-applications-for-numa/. Do have a look at the following SQL Skills Blog http://www.sqlskills.com/blogs/jonathan/post/Understanding-Non-Uniform-Memory-AccessArchitectures-(NUMA).aspx which has some great pictures to help visualize the concepts.

What Is It And Why Do We Care?

We all know that a CPU contains multiple cores today. 2,4,6,8,12,16 etc. cores. So in terms of a physical CPU we tend to talk about a processor that fits in a socket and about cores for logical CPUs. When hyper-threading is enabled you double the logical processors seen and used. It is said that Hyper-V can handle hyper threading so you can leave it on. The logic being that it will never hurt performance and can help to improve it. I suggest you test it as there was a performance bug with it once.  A processor today contains it own memory controller and access to memory from that processor is very fast. The NUMA node concept is older than the multi core processor technology but today you can state that a NUMA node translates to one processor/socket and all cores contained in that processor belong to the same NUMA node. Sometimes a processor contains two NUMA node like the AMD 12 core processors. In the future, with the ever-increasing number of cores, we’ll perhaps see even more NUMA nodes per processor. You can state that all Intel processors since Nehalem with Quick Path Interconnect and AMD processors with Hyper-Transport are NUMA processors. But To be sure, check with your vendors before buying. Assumptions right?

Beyond NUMA nodes there is also a thing called processor groups that help Windows to use more than 64 logical processors (its former limit) by grouping logical processors into groups of which Windows handle 4 meaning in total Windows today can support 4*64=256 logical processors. Due to the fact that memory access within a NUMA node is a lot faster than between NUMA nodes you can see where a potential performance hit is waiting to happen. I tried to create a picture of this concept below. Now you know why I don’t make my living as a graphical artist.

To make it very clear NUMA is great and helps us in a lot of ways. But under certain conditions and with certain applications it can cause us to take a (serious) performance hit. And if there is anything certain to ruin a system administrator’s day than it is a brand new server with a bunch of CPUs and loads of RAM that isn’t running any better (or worse?) than the one you’re replacing. Current hypervisors like Hyper-V are NUMA aware and the better servers like SQL Server are as well. That means that under the hood they are doing their best to optimize the CPU & memory usage for performance. They do a very good job actually and you might, depending on your environment, never ever know of any issue or even the existence of NUMA.

But even with a NUMA knowledgeable hypervisor and NUMA aware applications you run the risk of having to go to remote memory. The introduction of Dynamic Memory in Windows 2008 R2 SP1 evens increases this likelihood as there is a lot of memory reassigning going on. Dynamic Memory actually educated a lot of Hyper-V people on what NUMA is and what to look out for. Until Dynamic Memory came on the scene, and the evangelizing that came with it by Microsoft, it was “only” the people virtualizing  SQL Server or Exchange & other big hungry application that were very aware of NUMA with its benefits and potential drawbacks. If you’re lucky the application is NUMA aware, but not all of them are, even the big names.

A Peek Into The Future

As it bears on this discussion, what is interesting that leaked screenshots from Hyper-V 3.0 or vNext  … have NUMA configuration options for both memory and CPU at the virtual machine level! See Numa Settings in Hyper-V 3.0 for a picture. So the times that you had to script WMI calls (see http://blogs.msdn.com/b/tvoellm/archive/2008/09/28/looking-for-that-last-once-of-performance_3f00_-then-try-affinitizing-your-vm-to-a-numa-node-.aspx) to assign a VM to a NUMA node might be over soon (speculation alert) and it seems like a natural progression from the ability to disable NUMA with W2K8R2SP1 Hyper-V in case you need it to avoid NUMA issues at the Hyper-V host level. Hyper-V today is already pretty NUMA aware and as such it will try to get all memory for a virtual machine from a single NUMA node and only when that can’t be done will it span across NUMA nodes. So as stated, Hyper-V with Windows Server 2008 R2 SP1 can prevent this from happening as we can disable NUMA for a Hyper-V host now. The downside is that you can’t get more memory even if it’s available on the host.

A working approach to reduce possible NUMA overhead is to limit the number of CPUs to 2 as this gives the largest amount of memory to the CPUs, in this case, 50%. 4 CPUs only control 25%, etc. So with more CPU (and NUMA nodes), the risk of NUMA spanning is getting bigger very fast. For memory-intensive applications scaling out is the way to go. Actually, you could state that we do scale up the NUMA nodes per socket (lots of cores with the most amount of directly accessible memory possible) and as such do not scale up the server. If you can keep your virtual machines tied to a single CPU on a dual-socket server to try and prevent any indirect memory access and thus a performance hit. But that won’t always work. If you ever wondered when an 8/12/16 core CPU comes in handy, well voila … here a perfect case: packing as many cores on a CPU becomes very handy when you want to limit sockets to prevent NUMA issues but still need plenty of CPU cycles. This should work as long as you can address large amounts of RAM per socket at fast speeds and the CPU internally isn’t cut up into too many multiple NUMA nodes, which would be scaling out NUMA node in the same CPU and we don’t want that or we’re back to a performance penalty.

Stacking The Deck

One way of stacking the deck in your favor is to keep the heavy apps on their own Hyper-V cluster. Then you can tweak it all you want to optimize for SQL Server, Exchange, … etc. When you throw these virtual machines in your regular clusters or for crying out loud on a VDI cluster you’re going to wreak havoc on the performance. Just like mixing server virtualization & VDI is a bad idea (don’t do it), throwing vCPU hungry, memory-hogging servers on those clusters is just killing of performance and capacity of a perfectly good cluster. I have gotten into arguments over this as something one giant cluster for whatever need is better. Well no, you’ll end up micromanaging placement of VM with very different needs on that cluster effectively “cutting” it up in smaller “cluster parts”. Now is separate clusters for different needs always the better approach? No, it depends, If you only have some small SQL Server needs you get away with one nice cluster. It depends, I know, the eternal consultant’s answer, but I have to say it. I don’t want to get angry emails from managers because someone set up a 6 node cluster for a couple of SQL Server Express databases. There are also concepts called testing, proof of concept, etc. It’s called evidence-based planning. Try it, it has some benefits that become very apparent when you’re going to virtualize beefy SQL Server, SharePoint, and Exchange servers.

How do you even know it is happening apart from empirical testing. Aha, excellent question! Take a look at the “Hyper-V VM Vid Numa Node” counter set and read this blog entry by on this subject http://blogs.msdn.com/b/tvoellm/archive/2008/09/29/hyper-v-performance-counters-part-five-of-many-hyper-vm-vm-vid-numa-node.aspx. And keep an eye on the event log for http://technet.microsoft.com/hi-in/library/dd582929(en-us,WS.10).aspx (for some reason there is no comparable entry for W2K8R2 on TechNet)

Conclusions

To conclude, all of the above is why I’m interested in some of the latest generation of servers. The architecture of the hardware allows for the processor to address twice the “normal” amount of memory when you only put dual CPUs on a quad-socket motherboard. The Dell PowerEdge R810 and the M910 have this and it’s called a FlexMemory Bridge and that allows more memory to be available without a performance hit. They also allow for more memory per socket at higher speeds. If you put a lot of memory directly addressable to one CPU you see a speed drop. A DELL R710 with 48 GB of RAM runs at 1033 MHZ  but put 96 GB in there and you fall back to 800 Mhz. So yes, bring on those new quad-socket motherboards with just 2 sockets used, a bunch of fast direct accessible memory in a neat 2 unit server package with lost of space for NIC cards & FC HBAs if needed. Virtualization heaven 🙂 That’s what I want so I can give my VMs running SQL Server 2008 R2 & “Denali” (when can I call it SQL Server 2012?) a bigger amount of directly accessible memory form their NUMA node. This can be especially helpful if you need to run NUMA unaware applications like SAP or such. Testing is the way to go for knowing how well a NUMA aware hypervisor and a NUMA aware application figure out the best approach to optimize the NUMA experience together.  I’m sure we’ll learn more about this as more and more information becomes available and as technology evolves.  For now, we optimize for performance with NUMA where we can, when we can with what we have 🙂 For Exchange 2010 (we even have virtualization support for DAG mailbox servers now as well) scaling out is easier as we have all the neatly separate roles and control just about everything down to the mail client. With SQL Server applications this is often less clear. There is a varied selection of commercial and homegrown applications out there and a lot of them can’t even scale out, only up. So your mileage of what you can achieve may vary. But for resource & memory heavy applications under your control, for now, scaling out is the way to go.

Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4)

This is a 2nd post in a series of 4. Here’s a list of all parts:

  1. Introducing 10Gbps Networking In Your Hyper-V Failover Cluster Environment (Part 1/4)
  2. Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4)
  3. Introducing 10Gbps & Thoughts On Network High Availability For Hyper-V (Part 3/4)
  4. Introducing 10Gbps & Integrating It Into Your Network Infrastructure (Part 4/4)

Introduction

In this post we continue along the train of thought we set in a previous blog post “Introducing 10Gbps Networking In Your Hyper-V Failover Cluster Environment (Part 1/4)”. Let’s say you want to set up a Hyper-V cluster for SQL Server virtualization. Your business & IT manager told you the need to provide them with the best performance you can get. They follow up on that statement with a real budget so you can buy high end servers (blades or rack) and spec them out optimally for SQL Server. You take into consideration NUMA issues, vCPU:pCPU ratios, SQL memory demands, the current 4 vCPU limit in hyper-V, etc. By the way, this will be > 16vCPU with Windows Server 8, which leads me to believe the 64GB memory ceiling for virtual machines will also be broken. But for now this means that with regard to CPU & memory you’ve done all you can. That leaves only networking and IO to deal with. Now the IO is food for another & very extensive discussion, but basically you have to design that around the needs of the application(s) or you’ll be toast. The network part is what we’ll tackle here.

Without going into details, what does a Hyper-V cluster need in terms of networking?

Who/What Function Traffic Connection Type
Host Management Hyper-V host connectivity. Relatively low bandwidth. But don’t forget about deploying VMs or backups. Public
VM Network Provides network connectivity to the VMs Very dependent on the VMs using it. Dedicated Hyper-V
Cluster Heartbeat Internal cluster communication to determine the status of other cluster nodes Not much traffic but low latency or cluster might think it’s in trouble due to dropped packets. OK to combine with CSV. Private Cluster Network
Cluster Shared Volume (CSV) For updating CSM metadata & scenarios where redirected I/O is required Mostly idle. When in redirected I/O it demands high bandwidth & low latency required. Private Cluster Network
Live Migration Used to transfer the running VM’s from one cluster node to another Mostly idle. When Live Migrating it demands high bandwidth & low latency required. Private Cluster Network

Host Management: It is fine to leave this on 1Gbps, unless you have a need to deploy massive amounts of VMs or you backups are consuming all bandwidth. If so consider dedicated NICs for those roles and/or 10Gbps. Also note that you might be able to leverage your SAN for virtual machine deployment / backups.

VM Network: Use multiple “single” NICs or NIC teams to spread both the load and the risk. Remember that you can lose the host management or CSV network of a node, without affecting your virtual machine connectivity but not the virtual machine network(s). So don’t put all your eggs in one basket. So do consider multiple NICs and NIC teaming. Do remember that there are other bottle necks than bandwidth to a virtual machine running apps so don’t go completely overboard as there is no single magic bullet here for virtual machine performance. 2 or 3 will do perfectly fine. What about backups in the guest? Yes, that’s an extra burden but there are better solutions than that and if you hit and bandwidth issue with guest based backups it’s time to investigate them seriously. As you will see in these series I’m not a mincer with NIC ports but there’s no need to have one for every 2 Virtual machines. If you have really high bandwidth needs consider 10Gbps, not a truck load of NIC ports.

Heartbeat: Due to the mostly moderate needs it is often combined with the CSV traffic.

Cluster Shared Volume (CSV): Well you have the need for metadata of the clustered shared volumes. But that’s not all. You also have redirected access when you’re doing backups, defragmenting your CSV storage or when the storage paths are unavailable. So go for 10Gbps when you can, especially since this is your backup path for Live Migration traffic!

Side Note: Don’t say that Redirected Access over the CSV network will never happen when you have redundant storage paths. We’ve seen it happen in an environment with dual FC HBA cards, dual SAN controllers and the works. Redirected Access saved our service availability during that event! What happened exactly and how it all ties together is a long story and complicated but in essence an arbitrated loop management module when haywire and caused a loop, the root cause of this was a defective disk. When that event was over one of the controllers went nuts and decided this wasn’t his cup of tea and called it a day. Guess what? Some servers could not failover to the other controller as something went wrong in the internal workings of the SAN itself, dual HBA didn’t help here. How did our services stay available? Thanks to Redirected Access. It was at 1Gbps speeds so that hurt a little but we kept ‘m running. Our vendor worked through this with us but things where pretty bad and it was pucker time. However this is one example where we kept our services running for 24 hours (whilst working at the issue with the vendor) via redirected access. The bad thing was we needed to take the spare controller of line & restart both to get the replacement controller to be recognized, yes a complete shutdown of the cluster nodes to restart both SAN controllers. I still remember the mail I send and the call I made to management that is was shutting down the business for 30 minutes. But it was not because of Hyper-V, quite the opposite; it helped us out a lot!

Also note that when you run software VSS based backups and disk defragmentation on your CSV storage you’ll be running in Redirected Access mode. Also see https://blog.workinghardinit.work/2011/06/02/some-feedback-on-how-to-defrag-a-hyper-v-r2-cluster-shared-volume/ Some Feedback On How to defrag a Hyper-V R2 Cluster Shared Volume

Live Migration: The bigger and better the pipe the faster Live Migration gets done. With high density or resource (memory) intensive servers this becomes a lot more important. Think of SQL Server, Exchange consuming 16, 24, 32 or more GB of memory. So do consider 10Gbps.

iSCSI: As we are using Fiber Channel in our SAN we did not include iSCSI in the networking needs table above. Now I do want to draw your attention to the need for iSCSI in the virtual machines themselves. This is needed for clustering within the virtual machines. Today this is almost a requirement as clustering in the guest becomes more and more important. You’ll need at least two NIC ports in production for this, if possible in on two separate cards for ultimate redundancy. Now as a best practice we won’t share the iSCSI NICS between the hosts and the guests. I do this in the lab but won’t have it in production. So that could mean at least two more NIC ports. With 10Gbps you’ll have ample performance but depending on your IO needs you might want 4 if you’re using 1Gbps so those NIC numbers are rising fast.

What Function Traffic Connection Type
iSCSI Guest Virtual machine shared storage. High bandwidth need, low latency is required to get good I/O Dedicated to Hyper-V
iSCSI host Host shared storage High bandwidth need, low latency is required to get good I/O Excluded from cluster, dedicated to the host.

What to move to 10Gbps?

Cool, you think, let’s throw some 10Gbps NICs & switches into our network. After that, depending on the rest of your network equipment & components, your virtual machines might be able to talk to other virtual and physical servers on the network at speeds up to 10Gbps or at least 1Gbps. I kind of hope that none of you are running 100 Mbps in your server racks today. And last but not least, with your 10Gbps network you’ll be able to do get the best performance for your CSV and Live Migration traffic. Life is good!

Until your network engineer hears about your plans. All of a sudden it’s no so cool anymore. You certainly woke the network people up! They’re nervous now they have seen all the double (redundancy) lines you’ve drawn on your copy of the schema representing the rack / server room network. They start mumbling things about redundancy, loops, RSTP, MSTP, LAG, stacking and a boatload other acronyms that sound like you’ve heard ‘m before but can’t quite place. They also talk about doom and gloom scenarios that might very well bring down the network. So unless you are the network admin you should dust of your communication skills and get them on board. So for your sake I hope they’re not the kind of engineers that states that most network problems that can’t be solved by removing servers and applications that ruin the nirvana of their network design. If so they’ll be vary weary of that “virtual switch” you’re talking about as well.

The Easy Way Out – A Dedicated CSV & Live Migration Network

Let’s say that you need a lot more time to get to a fully integrated solution for the 10Gbps network architecture figured out and set up. But your manager states you need to improve the Live Migration and other cluster network speeds today. What are your options? Based on the above information your boss is right, the networks that will benefit the most from a move to 10Gbps are CSV and Live Migration (and Heart Beat that piggy backs along with CSV). Now you have to remember that those cluster networks (subnets/VLANs) are for the Heart Beat, CSV and Live Migration cluster traffic only. So basically the only requirement you have is that these run on separate subnets/VLANs (to present them as distinct networks to your failover cluster) and that every node of the cluster can communicate over those subnets/VLANs. This means that you can leave the switches for those networks completely isolated from the rest of the network as shown in the picture below. I used some very common and often used DELL PowerConnect switches (5424, 6248, 8024F) in some scenario drawings for this blog series. They could make that 8024F an unbeatable price/quality deal if they would make them stackable. The sweet thing about stackable switches is that you can do Active-Active NIC teaming across switches rather than active-passive. I never went that way as I’m waiting to see what virtual switch innovations Hyper-V 3.0 will bring us. You see I’m a little cheap after all

But naturally, feel free to think about these scenarios with your preferred ProCurves, CISCO, Juniper, NetGear … switches in mind.

Suddenly things are cool again. The network people get time to figure out an integrated & complete long term solution and you can provide your nodes with 10Gbps for cluster only traffic. By a couple of 10Gbps switches & NICs and you’re on your way. Is this a good idea? I can’t make that call for you. I just provide some ideas. You decide.

The Case For Physically Isolating Them

Now you might wonder if this isn’t very wasteful in resources. Well not necessarily. If your cluster is big enough, let’s say 12-16 nodes or if you have a couple of clusters (4 clusters with 5 nodes for example) this might be not overly expensive. Unless you’re on a converged network, you do (I hope) the same for your storage networks, isolate them that is. You have to when you’re using fiber and you’d better do it when using iSCSI. It provides for the best performance and less complex switch configurations. Remember I mentioned that high availability requires some complexity. Try to keep that complexity as low as possible and when you introduce complexity make sure you can manage it. This serves two purposes. One is making sure that the complexity doesn’t ruin your high availability and two is that you’ll be happy you did it when it comes to troubleshooting and fixing issues. Now you might say that this ruins the concept of converged networks. Academically this is true but when you are filling up ports on switches for a single purpose there is no room for anything else anyway. Don’t lose sight of the aim of a converged network. That is to have the ability to use the same hardware/technology when possible for multiple needs. This gives you options and capabilities where and when needed. It’s not about always using all technology and protocols on each and every switch. Don’t forget also that you’ll need to address QOS/Performance on a converged network per type of traffic. There is also the fact that in brownfield scenario’s you’re dealing with replacing a part of the infrastructure and this example is a good way to get 10Gbps where needed and not making any change on the existing network infrastructure. This reduces risk and impact. As a matter of fact if you plan this right you can do this without service interruption. That means going node by node (maintenance mode, evacuate all VMs), moving the CSV network first for example, and only then the Live Migration network. You’re leveraging the ability of the cluster networks to take on each other’s role here to achieve this.

Another good reason to physically isolate the networks is security. There was an exploit for manipulating VMs during live migrations in 2008 (http://www.eecs.umich.edu/techreports/cse/2007/CSE-TR-539-07.pdf). You can protect against this via very careful switch configuration and VLAN design. But isolating the switches is very easy, clean, and effective as well. Overkill? I don’t know, but perhaps not if you do work for intelligence agencies.

Ethernet Out-of-Band (OOB) Port For Management

Don’t forget you still need to be able to manage those switches but today, in this class of equipment you get an Ethernet Out-of-Band (OOB) port for that. This one you can safely uplink to your regular management network. So if you really don’t need communication with the rest of the network you have no functional reason not to isolate them.

Money, Cost? No Value!

Still, you think, isn’t this very expensive? Well, look at the purpose. Manageable complexity, high availability, and your management stated to eliminate, where possible, any limitation on performance and approved the budget for it all. Put this into perspective. The SQL Server data center editions running on these clusters, combined with the cost of development & maintenance of the databases and applications relying on this infrastructure put that extra money spent on a couple of switches really into perspective. On top of that, you’re not wasting those switches. When the network people get their plans finished they’ll be integrated into the final solution if still needed and possible. Don’t forget that you might use all ports for just cluster traffic depending on the number of hosts you have! So even without integrating them into the rest of the network, you’re still getting very solid results. On top of that, sometimes you get to build solutions where budget is not the first, last, and only concern. Sweet! I do know some people who’ll call me a money-wasting nut case J. But get real, when you’re building highly available, highly performing failover clusters and you’re in a discussion about the cost of a couple of NIC ports and you are going to adjust your design over that, perhaps you have a sponsorship issue. Put this into perspective. Building a Hyper-V cluster is not a competition where the one who uses the least NIC ports/cards and switch ports/ switches win. That’s why it hurts when I see designs like this claiming victory:

What I want to see is more like this:

But that will never fit into a blade design! Really? Have you seen the blades like the DELL M910? It’s a beast, comparable to the R810. It’s was the first blade I really felt like buying. Cisco also entered that market with guns drawn and is pushing HP to keep performing. So Again put the NIC/Switch and NIC port/Switch Port count into perspective against what you’re trying to achieve. To quote Anton Ego “… you know what I’m craving? A little perspective, that’s it. I’d like some fresh, clear, well-seasoned perspective.”

Introducing 10Gbps Networking In Your Hyper-V Failover Cluster Environment

This is a 1st post in a series of 4. Here’s a list of all parts:

  1. Introducing 10Gbps Networking In Your Hyper-V Failover Cluster Environment (Part 1/4)
  2. Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4)
  3. Introducing 10Gbps & Thoughts On Network High Availability For Hyper-V (Part 3/4)
  4. Introducing 10Gbps & Integrating It Into  Your Network Infrastructure (Part 4/4)

A lot of early and current Hyper-V clusters are built on 1Gbps network architectures. That’s fine and works very well for a large number of environments. Perhaps at this moment in time you’re running solutions using blades with 10Gbps mezzanine cards/switches and all this with or without cutting up the bandwidth available for all the different networks needs a Hyper-V cluster has or should have for optimal performance and availability. This depends on the vendor and the type of blades you’re using. It also matters when you bought the hardware (W2K8 or W2K8R2 era) and if you built the solution yourself or bought a fast track or reference architecture kit, perhaps even including all Microsoft software and complete with installation services.

I’ve been looking into some approaches to introducing a 10Gbps network for use with Hyper-V clusters mainly for Clustered Shared Volume (CSV) and Live Migration (LM) Traffic. In brown field environments that are already running Hyper-V clusters there are several scenarios to achieve this, but I’m not offering the “definite guide” on how to do this. This is not a best practices story. There is no one size fits all. Depending on your capabilities, needs & budget you’ll approach things differently, reflecting what’s best for your environment. There are some “don’t do this in production whatever you environment is” warnings that you should take note of, but apart from that you’re free to choose what suits you best.

The 10Gbps implementations I’m dealing with are driven by one very strong operational requirement: reduce the live migration time for virtual machines with a lot of memory running a under a decent to heavy load. So here it is all about bandwidth and speed. The train of taught we’re trying to follow is that we do not want introduce 10Gbps just to share its bandwidth between 4 or more VLANs as you might see in some high density blade solutions. There that has often to do with limited amount of NIC/switch ports in some environments where they also want to have high availability. In high density scenarios the need to reduce cabling is also more urgent. All this is also often driven a desire to cut costs or keep those down as much as possible. But as technology evolves fast my guess is that within a few years we won’t be discussing the cost of 10Gbps switches anymore and even today there very good deals to be made. The reduction of cabling safes on labor & helps achieve high density in the racks. I do need to stress however that way too often discussions around density, cooling and power consumption in existing data centers or server rooms is not as simple as it appears. I would state that the achieve real and optimal results from an investment in blades you have to have the server room, cooling, power and ups designed around them. I won’t even go into the discussion over when blade servers become a cost effective solutions for SMB needs.

So back to 10Gbps networking. You should realize that Live Migration and Redirected Access with CSV absolutely benefit from getting a 10Gbps pipes just for their needs. For VMs consuming 16 Gb to 32 GB of memory this is significant. Think about it. Bringing 16 seconds back to 4 seconds might not be too big of a deal for a node with 10 to 15 VMs. But when you have a dozen SQL Servers that take 180 to 300 seconds to live migrate and reduce that to 20 to 30 seconds that helps. Perhaps not so during automated maintenance but when it needs to be done fast (i.e. on a node indicating serious hardware issues) those times add up. To achieve such results we gave the Live Migration & CSV network both a dedicated 10Gbps network. They consume about 50% of the available bandwidth so even a failover of the CSV traffic to our Live Migration network or vice versa should be easily handled. On top of the “Big Pipes” you can test jumbo frames, VMQ, …

Now the biggest part of that Live Migration time is in the “Brown-Out” phase (event id 22508 in the Hyper-V-Worker log) during which the memory transfer happens. Those are the times we reduce significantly by moving to 10Gbps. The “Black-Out” phase during which the virtual machine is brought on line on the other node creates a snapshot with the last remaining delta of “dirty memory pages”, followed by quiescing the virtual machine for the last memory copy to be performed and finally by the unquiescing of the virtual machine which is then running on the other node. This is normally measured in hundreds of milliseconds (event id 22509 in the Hyper-V-Worker log) . We do have a couple of very network intensive applications that sometimes have a GUI issue after a live migration (the services are fine but the consoles monitoring those services act up). We plan on moving those VMs to 10Gbps to find out if this reduces the “Black-Out” phase a bit and prevents that GUI of acting up. When can give you more feedback on this, I’ll let you known how that worked out.

An Example of these events in the Hyper-V-Worker event log is listed below:

Event ID 22508:

‘XXXXXXXX-YYYY-ZZZZ-QQQQ-DC12222DE1’ migrated with a Brown-Out time of 64.975 seconds.

Event ID 22509:

‘XXXXXXXX-YYYY-ZZZZ-QQQQ-DC12222DE1’ migrated with a Black-Out time of 0.811 seconds, 842 dirty page and 4841 KB of saved state.

Event ID 22507:

Migration completed successfully for ‘XXXXXXXX-YYYY-ZZZZ-QQQQ-DC12222DE1’ in 66.581 seconds.

In these 10Gbps efforts I’m also about high availability but not when that would mean sacrificing performance due to the fact I need to keep costs down and perhaps use approaches that are only really economical in large environments. The scenarios I’m dealing with are not about large hosting environments or cloud providers. We’re talking about providing the best network performance to some Hyper-V clusters that will be running SQL Server for example, or other high resource applications. These are relatively small environments compared to hosting and cloud providers. The economics and the needs are very different. As a small example of this: saving a ten thousand switch ports means that you’ll need you’ll save 500 times the price of a switch. To them that matters a lot more, not just in volume but also in relation to the other costs. They’re probably running services with an architecture that survives loosing servers and don’t require clustering. It all runs on cheap hardware with high energy efficiency as they don’t care about losing nodes when the service has been designed with that in mind. Economics of scale is what they are all about. They’d go broke building all that on highly redundant hardware and fail at achieving their needs. But most of us don’t work in such an environment.

I would also like to remind you that high availability introduces complexity. And complexity that you can’t manage will sink your high availability faster than a torpedo mid ship downs a cruiser. So know what you do, why and when to do it. One final piece of advice: TEST!

So to conclude this part take note of the fact I’m not discussing the design of a “fast track” setup that I’ll resell for all kinds of environments and I need a very cost effective rinse & repeat solution that has a Small, Medium & Large variety with all bases covered. I’m not saying those aren’t good or valuable, far from it, a lot of people will benefit from those but I’m serving other needs. If you wonder why they want to virtualize the applications at all, it has to do with disaster recovery & business continuity and replicating the environment to a remote site.

I intend to follow up on this in future blog posts when I have more information and some time to write it all up.

Hotfixes For Hyper-V & Failover Clustering Can Be Confusing KB2496089 & KB2521348

As I’m building or extending a number of Hyper-V Clusters in the next 4 months I’m gathering/updating my list with the Windows 2008 R2 SP1 hotfixes relating to Hyper-V and Failover Clustering. Microsoft has once published KB2545685: Recommended hotfixes and updates for Windows Server 2008 R2 SP1 Failover Clusters but that list is not kept up to date, the two hotfixes mentioned are in the list below. I also intend to update my list for Windows Server 2008 SP2 and Windows 2008 R2 RTM. As I will run into these and it’s nice to have a quick reference list.

I’ll include my current list below. Some of these fixes are purely related to Hyper-V, some to a combination of hyper-V and clusters, some only to clustering and some to Windows in general. But they are all ones that will bite you when running Hyper-V (in a failover cluster or stand-alone). Now for the fun part with some hotfixes, I’ll address in this blog post. Confusion! Take a look at the purple text and the green text hotfixes and the discussion below. Are there any others like this I don’t know about?

* KB2496089 is included in SP1 according to “Updates in Win7 and WS08R2 SP1.xls” that can be downloaded here (http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=269) but the Dutch language KB article states it applies to W2K8R2SP1 http://support.microsoft.com/kb/2496089/nl

Artikel ID: 2498472 – Laatste beoordeling: dinsdag 10 februari 2011 – Wijziging: 1.0

Vereisten

Deze hotfix moet worden uitgevoerd een van de volgende besturings systemen:

  • Windows Server 2008 R2
  • Servicepack 1 (SP1) voor Windows Server 2008 R2
Voor alle ondersteunde x64 versies van Windows Server 2008 R2

6.1.7600.20881
4,507,648
15-Jan-2011
04: 10
x64

Vmms.exe
6.1.7601.21642
4,626,944
15-Jan-2011
04: 05
x64

When you try to install the hotfix it will. So is it really in there? Compare file versions! Well the version after installing the hotfix on a W2K8R2 SP1 Hyper-V server the version of vmms.exe was 6.1.7601.21642 and on a Hyper-V server with SP1 its was 6.1.7061.17514. Buy the way these are English versions of the OS, no language packs installed.

With hotfix installed on SP1

Without hotfix installed on SP1

To make matters even more confusing while the Dutch KB article states it applies to both W2K8R2 RTM and W2K8R2SP1 but the English version of the article has been modified and only mentions W2K8R2 RTM anymore.

http://support.microsoft.com/kb/2496089/en-us

Article ID: 2496089 – Last Review: February 23, 2011 – Revision: 2.0

For all supported x64-based versions of Windows Server 2008 R2

Vmms.exe
6.1.7600.20881
4,507,648
15-Jan-2011
04:10
x64

So what gives? Has SP1 for W2K8R2 been updated with the fix included and did the SP1 version I installed (official one right after it went RTM) in the lab not yet include it? Do the service packs differ with language, i.e. only the English one got updated?. Sigh :-/ Now for the good news: ** It’s all very academic because of this KB 2521348 A virtual machine online backup fails in Windows Server 2008 R2 when the SAN policy is set to “Offline All” which brings the vmms.exe version to 6.1.7601.21686 and this hot fix supersedes KB2496089. See http://blogs.technet.com/b/yongrhee/archive/2011/05/22/list-of-hyper-v-windows-server-2008-r2-sp1-hotfixes.aspx where this is explicitly mentioned.

Ramazan Can mentions hotfix 2496089 and whether it is included in SP1 in the comments on his blog post http://ramazancan.wordpress.com/2011/06/14/post-sp1-hotfixes-for-windows-2008-r2-sp1-with-failover-clustering-and-hyper-v/ but I’m not very convinced it is indeed included. The machines I tested on are running W2K8R2 English RTM updated to SP1, not installations for the media including SP1 so perhaps there could also be a difference. It also should not matter that if you install SP1 before adding the Hyper-V role, so that can’t be the cause.

Anyway, keep your systems up to date and running smoothly, but treat your Hyper-V clusters with all due care and attention.

  1. KB2277904: You cannot access an MPIO-controlled storage device in Windows Server 2008 R2 (SP1) after you send the “IOCTL_MPIO_PASS_THROUGH_PATH_DIRECT” control code that has an invalid MPIO path ID
  2. KB2519736: Stop error message in Windows Server 2008 R2 SP1 or in Windows 7 SP1: “STOP: 0x0000007F”
  3. KB2496089: The Hyper-V Virtual Machine Management service stops responding intermittently when the service is stopped in Windows Server 2008 R2
  4. KB2485986: An update is available for Hyper-V Best Practices Analyzer for Windows Server 2008 R2 (SP1)
  5. KB2494162: The Cluster service stops unexpectedly on a Windows Server 2008 R2 (SP1) failover cluster node when you perform multiple backup operations in parallel on a cluster shared volume
  6. KB2496089: The Hyper-V Virtual Machine Management service stops responding intermittently when the service is stopped in Windows Server 2008 R2 (SP1)*
  7. KB2521348: A virtual machine online backup fails in Windows Server 2008 R2 (SP1) when the SAN policy is set to “Offline All”**
  8. KB2531907: Validate SCSI Device Vital Product Data (VPD) test fails after you install Windows Server 2008 R2 SP1
  9. KB2462576: The NFS share cannot be brought online in Windows Server 2008 R2 when you try to create the NFS share as a cluster resource on a third-party storage disk
  10. KB2501763: Read-only pass-through disk after you add the disk to a highly available VM in a Windows Server 2008 R2 SP1 failover cluster
  11. KB2520235: “0x0000009E” Stop error when you add an extra storage disk to a failover cluster in Windows Server 2008 R2 (SP1)
  12. KB2460971: MPIO failover fails on a computer that is running Windows Server 2008 R2 (SP1)
  13. KB2511962: “0x000000D1” Stop error occurs in the Mpio.sys driver in Windows Server 2008 R2 (SP1)
  14. KB2494036: A hotfix is available to let you configure a cluster node that does not have quorum votes in Windows Server 2008 and in Windows Server 2008 R2 (SP1)
  15. KB2519946: Timeout Detection and Recovery (TDR) randomly occurs in a virtual machine that uses the RemoteFX feature in Windows Server 2008 R2 (SP1)
  16. KB2512715: Validate Operating System Installation Option test may identify Windows Server 2008 R2 Server Core installation type incorrectly in Windows Server 2008 R2 (SP1)
  17. KB2523676: GPU is not accessed leads to some VMs that use the RemoteFX feature to not start in Windows Server 2008 R2 SP1
  18. KB2533362: Hyper-V settings hang after installing RemoteFX on Windows 2008 R2 SP1
  19. KB2529956: Windows Server 2008 R2 (SP1) installation may hang if more than 64 logical processors are active
  20. KB2545227: Event ID 10 is logged in the Application log after you install Service Pack 1 for Windows 7 or Windows Server 2008 R2
  21. KB2517329: Performance decreases in Windows Server 2008 R2 (SP1) when the Hyper-V role is installed on a computer that uses Intel Westmere or Sandy Bridge processors
  22. KB2532917: Hyper-V Virtual Machines Exhibit Slow Startup and Shutdown
  23. KB2494016: Stop error 0x0000007a occurs on a virtual machine that is running on a Windows Server 2008 R2-based failover cluster with a cluster shared volume, and the state of the CSV is switched to redirected access
  24. KB2263829: The network connection of a running Hyper-V virtual machine may be lost under heavy outgoing network traffic on a computer that is running Windows Server 2008 R2 SP1
  25. KB2406705: Some I/O requests to a storage device fail on a fault-tolerant system that is running Windows Server 2008 or Windows Server 2008 R2 (SP1) when you perform a surprise removal of one path to the storage device
  26. KB2522766: The MPIO driver fails over all paths incorrectly when a transient single failure occurs in Windows Server 2008 or in Windows Server 2008 R2