Virtualization with Hyper-V & The NUMA Tax Is Not Just About Dynamic Memory

First of all to be able to join in this little discussion you need to know what NUMA is and does. You can read up on that on the Intel (or AMD) web site like http://software.intel.com/en-us/blogs/2009/03/11/learning-experience-of-numa-and-intels-next-generation-xeon-processor-i/ and http://software.intel.com/en-us/articles/optimizing-software-applications-for-numa/. Do have a look at the following SQL Skills Blog http://www.sqlskills.com/blogs/jonathan/post/Understanding-Non-Uniform-Memory-AccessArchitectures-(NUMA).aspx which has some great pictures to help visualize the concepts.

What Is It And Why Do We Care?

We all know that a CPU contains multiple cores today. 2,4,6,8,12,16 etc. cores. So in terms of a physical CPU we tend to talk about a processor that fits in a socket and about cores for logical CPUs. When hyper threading is enabled you double the logical processors seen and used. It is said that Hyper-V can handle hyper threading so you can leave it on. The logic being that it will never hurt performance and can help to improve it. I suggest you test it Smile as there was a performance bug with it once.  A processor today contains it own memory controller and access to memory from that processor is very fast. The NUMA node concept is older than the multi core processor technology but today you can state that a NUMA node translates to one processor/socket and all cores contained in that processor belong to the same NUMA node. Sometimes a processors contains two NUMA node like the AMD 12 core processors. In the future, with the ever increasing number of cores, we’ll perhaps see even more NUMA nodes per processor. You can state that all Intel processors since Nehalem with Quick Path Interconnect and AMD processors with Hyper-Transport are NUMA processors. But To be sure, check with your vendors before buying. Assumptions right?

Beyond NUMA nodes there is also a thing called processor groups which help Windows to use more than 64 logical processors (its former limit) by grouping logical processors into groups of which Windows handle 4 meaning in total Windows today can support 4*64=256 logical processors. Due to the fact that memory access within a NUMA node is a lot faster than between NUMA nodes you can see where a potential performance hit is waiting to happen. I tried to create a picture of this concept below. Now you know why I don’t make my living as a graphical artist Eye rolling smile

imageimage

 

To make it very clear NUMA is great and helps us in a lot of ways. But under certain conditions and with certain applications it can cause us to take a (serious) performance hit. And if there is anything certain to ruin a system administrators day than it is a brand new server with a bunch of CPUs and loads of RAM that isn’t running any better (or worse?) than the one you’re replacing. Current hyper visors like Hyper-V are NUMA aware and the better servers like SQL Server are as well. That means that under the hood they are doing their best to optimize the CPU & memory usage for performance. They do an very good job actually and you might, depending on your environment never, ever know of any issue or even the existence of NUMA.

But even with a NUMA knowledgeable hyper visor and NUMA aware applications you run the risk of having to go to remote memory. The introduction of Dynamic Memory in Windows 2008 R2 SP1 evens increases this likelihood as there is a lot of memory reassigning going on. Dynamic Memory actually educated a lot of Hyper-V people on what NUMA is and what to look out for. Until Dynamic Memory came on the scene, and the evangelizing that came with it by Microsoft, it was "only" the people virtualizing  SQL Server or Exchange & other big hungry application that were very aware of NUMA with its benefits and potential draw backs. If  you’re lucky the application is NUMA aware, but not all of them are, even the big names.

A Peak Into The Future

As it bears on this discussion, what is interesting that leaked screenshots from Hyper-V 3.0 or vNext  … have NUMA configuration options for both memory and CPU at the virtual machine level! See Numa Settings in Hyper-V 3.0 for a picture. So the times that you had to script WMI calls (see http://blogs.msdn.com/b/tvoellm/archive/2008/09/28/looking-for-that-last-once-of-performance_3f00_-then-try-affinitizing-your-vm-to-a-numa-node-.aspx) to assign a VM to a NUMA node might be over soon (speculation alert) and it seems like a natural progression from the ability to disable NUMA with W2K8R2SP1 Hyper-V in case you need it to avoid NUMA issues at the Hyper-V host level. Hyper-V today is already pretty NUMA aware and as such it will try to get all memory for a virtual machine from a single NUMA node and only when that can’t be done will it span across NUMA nodes. So as stated, Hyper-V with Windows Server 2008 R2 SP1 can prevent this form happening as we can disable NUMA for a Hyper-V host now. The downside is that you can’t get more memory even if it’s available on the host.

NumaSpanning

A working approach to reduce possible NUMA overhead is to limit the number of CPUs to 2 as this gives the largest amount of memory to the CPUs, in this case 50%. 4 CPUs only control 25%, etc.So with more CPU (and NUMA nodes) the risk of NUMA spanning is getting bigger very fast. For memory intensive applications scaling out is the way to go. Actually you could state that we do scale up the NUMA nodes per socket (lots of cores with the most amount of direct accessible memory possible) and as such do not scale up the server. If you can keep your virtual machines tied to a single CPU on a dual socket server to try and prevent any indirect memory access and thus a performance hit. But that won’t always work. If you ever wondered when an 8/12/16 core CPU comes in handy, well voila … here a perfect case: packing as many cores on a CPU becomes very handy when you want to limit sockets to prevent NUMA issues but still need plenty of CPU cycles. This should work as long as you can address large amounts of RAM per socket at fast speeds and the CPU internally isn’t cut up into to many multiple NUMA nodes, which would be scaling out NUMA node in the same CPU and we don’t want that or we’re back to a performance penalty.

Stacking The Deck

One way of stacking the deck in your favor is to keep the heavy apps on their own Hyper-V cluster. Then you can tweak it all you want to optimize for SQL Server, Exchange, … etc. When you throw these virtual machines in your regular clusters or for crying out loud on a VDI cluster your going to wreak havoc on the performance. Just like mixing server virtualization & VDI is a bad idea (don’t do it), throwing vCPU hungry, memory hogging servers on those cluster is just killing of performance and capacity of a perfectly good cluster. I have gotten into arguments over this as some thing one giant cluster for whatever need is better. Well no, you’ll end up micro managing placement of VM with very different needs on that cluster effectively “cutting” it up in smaller “cluster parts”. Now is separate clusters for different needs always the better approach? No, it depends, If you only have some small SQL Server needs you  get away with one nice cluster. It depends, I know, the eternal consultants answer, but I have to say it. I don’t want to get angry mails from managers because someone set up a 6 node clusters for a couple of SQL Server Express databases Winking smile There are also concepts called testing, proof of concept, etc. It’s called evidence based planning. Try it, it has some benefits that become very apparent  when you’re going to virtualize beefy SQL Server, SharePoint and Exchange servers.

How do you even know it is happening apart from empirical testing. Aha, excellent question! Take a look at the "Hyper-V VM Vid Numa Node" counter set and read this blog entry by on this subject http://blogs.msdn.com/b/tvoellm/archive/2008/09/29/hyper-v-performance-counters-part-five-of-many-hyper-vm-vm-vid-numa-node.aspx. And keep an eye on the event log for http://technet.microsoft.com/hi-in/library/dd582929(en-us,WS.10).aspx (for some reason there is no comparable entry for W2K8R2 on TechNet)

Conclusions

To conclude, all of the above people is why I’m interested in the some of the latest generation of servers. The architecture of the hardware allows for a the processor to address twice the "normal" amount of memory when you only put dual CPUs on a quad socket motherboard. The Dell PowerEdge R810 and the M910 have this and it’s called a FlexMemory Bridge and that allows more memory to be available without a performance hit. They also allow for more memory per socket at higher speeds. If you put a lot of memory directly addressable to one CPU you see a speed drop. A DELL R710 with 48 GB of RAM runs at 1033 MHZ  but put 96 GB in there and you fall back to 800 Mhz. So yes, bring on those new quad socket motherboards with just 2 sockets used, a bunch of fast direct accessible memory in a neat 2 unit server package with lost of space for NIC cards & FC HBAs if needed. Virtualization heaven :-) That’s what I want so I can give my VMs running SQL Server 2008 R2 & "Denali" (when can I call it SQL Server 2012?) a bigger amount of direct accessible memory form their NUMA node. This can be especially helpful if you need to run NUMA unaware applications like SAP or such. Testing is the way to go for knowing how well a NUMA aware hyper visor and a NUMA aware application figure out the best approach to optimize the NUMA experience together.  I’m sure we’ll learn more about this as more and more information becomes available and as technology evolves.  For now we optimize for performance with NUMA where we can, when we can with what we have :-) For Exchange 2010 (we even have virtualization support for DAG mailbox servers now as well) scaling out is easier as we have all the neatly separate roles and control just about everything down to the mail client. With SQL Server applications this is often less clear. There is a varied selection of commercial and home grown applications out there and a lot of them can’t even scale out, only up. So your mileage of what you can achieve may vary. But for resource & memory heavy applications under your control, for now, scaling out is the way to go.

A Brighter Future For Public Folders?

The Exchange Team posted a blog entry asking for feedback on how we use public folders. Nice to see they are taking an interest again. The past 4 years the mantra was “move away from them”, “do it now while you still have the time”, etc. SharePoint was always put forwards as number one replacement option. For some scenarios this is indeed a good choice but let’s face it, for some public folder uses there is no decent replacement and that hurts us as they haven’t seen any decent improvements in the last 2 Exchange releases. I know public folders have always been a bit problematic and finicky for us administrators. They tend to need a bit of voodoo and patience to trouble shoot and get running smoothly (see  blog post by me for an example of this). But instead of using that of an excuse to get rid of them they could also choose to invest in making them as reliable and robust as mail databases. Giving them the same high availability features might also be a welcome improvement, especially now with DAGs in Exchange 2010.

Especially in the Exchange 2007 era Microsoft was promoting getting rid of them actively. But they are still around because so many people use them and they have not decent alternative for all scenarios. In that respect they do listen to their customers. But we want improvements. Some of the functionality we need is there but we really need more robust, reliable and high available public folders. As as shared mail instrument for both sending and receiving mail in a team public folders beat shared mailboxes and SharePoint any time.  It also shines for maintaining a shared repository of contacts. I’m not a proponent of using public folder for a document repository but I understand that its relative simple usage and data protection via replicas still sounds attractive to some versus the complexity of SharePoint. Sure SharePoint has more to offer but perhaps they don’t need those capabilities and to make matters even less attractive; it’s quite an effort to migrate from public folders to SharePoint.

So that left us public folders users feeling a bit abandoned with a message of get out but no easy path to go anywhere else that serves all our needs. So until today all my customers are still and want to  keep using public folders. They are a worried however that one day they will be left out in the cold. But perhaps there is a better future on the horizon for public folders.  They are asking us to “Help us learn more about how you use public folders today!” in that blog post. The emphasis is on “usage scenarios, folder management habits or thought process around public folder data organization”. So if you need and use public folders in any way and you’d like for them to get more attention and evolve into more robust and functional instruments give Microsoft your feedback. Exchange 2010 has brought us great features & very affordable high availability together with support for virtualization. Now we either need a better alternative to public folders than the ones we got now or (my preference) we need better public folders. Since consumption of public folders occurs mostly in Outlook I would suggest the latter. And while we’re asking, bring back access to folder shares in OWA Winking smile.

Exchange 2010 SP1 Rollup 3 Pulled – BlackBerrys sending duplicate messages

Just a quick notification. Due to the duplicate message issue with RIM Blackberry devices and Exchange 2010 Sp1 Rollup 3 Microsoft is temporarily pulling RU3. If you don’t use BES and have no other issues, don’t sweat it. If you wanted RU for UDP support with Outlook 2003 or to fix the DAG Copies GUI bug you’ll have to wait especially if you have Blackberry devices. More the the Exchange Team Blog here.

Exchange 2010 SP1 Rollup 3 Released: Fixes Bug since SP1 in EMC & Brings Back UDP Support

UPDATE March 9th 2011: I have installed Exchange 2010 SP1 Rollup 3 at one site and this did indeed fix this issue finally.

The Microsoft Exchange Team Blog just announced the release here Released: Update Rollup 3 for Exchange 2010 SP1 and Exchange 2007 SP3. This is good news for all the folks out there that got bitten by the Exchange 2010 SP1 bug that causes the Exchange Management Shell (EMC) not to show all database copies after upgrading to exchange 2010 SP1. I’ve blogged about this in EMC Does Not Show All Database Copies After Upgrade To Exchange 2010 SP1 and chimed in to the discussion at Database copies are not all showing up in EMC after SP1 upgrade on the Exchange forums. So apart from cheers for the UDP notifications returning in support of Outlook 2003 let’s hear it for a the EMC case sensitivity bug getting fixed Smile

After while Microsoft also blogged about this Database copies fail to display after upgrading to Exchange 2010 Service Pack 1

We got notified around October 13th that they would included the fix in Exchange 2010 SP1 Roll Up 3 but that they where working on an interim update. They dropped the ball there because communication died about the latter and we were left to conclude we would have to wait for Rollup 3. Well that took it’s time. It’s now march 2011. One of the reasons I think it took so long for Rollup 3 to arrive is the decision for to re-add UDP support for Exchange 2010 for use with Outlook 2003 as blogged about in Microsoft Listens To Customers & Adds UDP Notification Support Back to Exchange 2010

In the ends we will have silly and long unaddressed bug fixed and a welcome aid in migrating customers to Exchange 2010 that are running Outlook 2003. I do wonder however if the bug had been with  PowerShell in the EMS and not in the EMC if Microsoft would have fixed this sooner.  Sure it wasn’t an issue as you could manage everything perfectly using PowerShell and it was only a GUI bug but for some users/customers this is not as obvious  and it made ‘m feel a bit like 2nd class citizens so we had to do some extra “damage” control on that front as well.