KB Article 2522766 & KB Article 2135160 Published Today

At this moment in time I don’t have any more Hyper-V clusters to support that are below Windows Server 2008 R2 SP1. That’s good as I only have one list of patches to keep up to date for my own use. As for you guys still taking care of Windows 2008 R2 RTM Hyper-V cluster you might want to take a look at KN article 2135160 FIX: "0x0000009E" Stop error when you host Hyper-V virtual machines in a Windows Server 2008 R2-based failover cluster that was released today. The issue however is (yet again) an underlying C-State issue that already has been fixed in relation to another issue published as KB article 983460 Startup takes a long time on a Windows 7 or Windows Server 2008 R2-based computer that has an Intel Nehalem-EX CPU installed.

And for both Windows Server 2008 R2 RTM and SP1 you might take a look at an MPIO issue that was also published today (you are running Hyper-V on a cluster and your are using MPIO for redundant storage access I bet) KB article 2522766 The MPIO driver fails over all paths incorrectly when a transient single failure occurs in Windows Server 2008 or in Windows Server 2008 R2

It’s time I add a page to this blog for all the fixes related to Hyper-V and Failover Clustering with Windows Server 2008 R2 SP1 for my own reference Smile

Hyper-V Is Right Up There In Gartner’s Magic Quadrant for x86 Server Virtualization Infrastructure

So how do you like THEM apples?

Well take a look at this people, Gartner published the following on June 30th Magic Quadrant for x86 Server Virtualization Infrastructure

Figure 1: Magic Quadrant for x86 Server Virtualization Infrastructure (Source: Gartner 2011)

That’s not a bad spot if you ask me. And before the “they paid there way up there” remarks flow in, Gartner works how Gartner works and it works like that for everyone (read” the other vendors” on there) so that remark could fly right back into your face if you’re not careful. To get there in 3 years time is not a bad track record. And if you believe some of the people out there this can’t be true. Now knowing that they only had Virtual Server to offer before Hyper-V was available and I was not using that anywhere. No, not even for non-critical production or testing as the lack of X64 bit VM support made it a “no go” product for me. So the success if Hyper-V is quite an achievement. But back in 2008, I did go with Hyper-V as a high available virtualization solution, after having tested and evaluated it during the Beta & Release Candidate time frame. Some people thought I was making a mistake.

But the features in Hyper-V were  “good enough” for most needs I needed to deal with and yes I knew VMware had a richer offering and was proven technology, something people never forget to mention that to me for some reason. I guess they wanted to make sure I hadn’t been living under a rock the last couple of years. They never mentioned the cost and some trends however or looked at the customer’s real needs. Hyper-V was a lot better than what most environments I had to serve had running at the time. In 2008 those people I needed to help were using VMware Server or Virtual Server. Both were/are free but for anything more than lightweight applications on the “not that important” list they are not suitable. If you’re going to do virtualization structurally you need high availability to avoid the risks associated with putting all your eggs in one basket. However, as you might have guessed these people did not use ESX. Why? In all honesty, the cost associated.

In the 2005-2007 time frame servers were not yet at the cost/performance ratio spent they reached in 2008 and far cry from where they are now. Those organizations didn’t do server virtualization because from the cost perspective in both licensing fees for functionality and hardware procurement. It just didn’t fit in yet.  The hardware cost barrier had come down and now with Hyper-V 1.0 we got a hypervisor that we knew could deliver something that was good enough to get the job done at a cost they could cover. We also knew that Live Migration and Dynamic Memory were in the pipelines and the product would only become better. Having tested Hyper-V I knew I had a technology to work with at a very reasonable price (or even for free) and that included high availability.  Combine this with the notion at the time that hypervisors are becoming commodities and that people are announcing the era of the cloud. Where do you think the money needs to go? Management & Applications. Where did Microsoft help with that? The System Center suite. System Center Virtual Machine Manager and Operations Manager. Are those perfect at their current incarnations? Nope. But have you looked at SCVMM 2012 Beta? Do you follow the buzz around Hyper-V 3.0 or vNext? Take a peek and you know where this is going. Think private & hybrid cloud. The beef with the MS stack lies in the hypervisor & management combination. Management tools and integration capability to help with application delivery and hence with the delivery of services to the business. Even if you have no desire or need for the public cloud, do take a look. Having a private cloud capability enhances your internal service delivery. Think of it as “Dynamic IT on Steroids”. Having a private cloud is a prerequisite for having a Hybrid cloud, which aids in the use of the public cloud when that time comes for your organization. And if never, no problem, you have gotten the best internal environment possible, no money or time lost. See my blog for more Private Clouds, Hybrid Clouds & Public Clouds musings on this.

Is Hyper-V and System Center the perfect solution for everyone in every case? No sir.  No single product or product stack can be everything to everyone. The entire VMware versus Hyper-V mud-slinging contests are at best amusing when you have the time and are in the mood for it. Most of the time I’m not playing that game. The consultant’s answer is correct: “It depends”. And very few people know all virtualization products very well and have equal experience with them. But when you’re looking to use virtualization to carry your business into the future you should have a look at the Microsoft stack and see if can help you. True objectivity is very hard. We all have our preferences and monetary incentives and there are always those who’ll take it to extreme levels. There are still people out there claiming you need to reboot a Windows server daily and have BSODs all over the place. If that is really the case they should not be blaming technology. If the technology was that bad they would not need to try and convince people not to use it, they would run away from it by themselves and I would be asking you if you want fries with your burger. Things go “boink” sometimes with any technology, really, you’d think it was created by humans, go figure. At BriForum 2011 in London this year it was confirmed that more and more we’re seeing multi hypervisors in use with large to medium organizations. That means there is a need for different solutions in different areas and that Hyper-V was doing particularly well in greenfield scenarios.

Am I happy with the choices I made? Yes. We’re getting ready  to do some more Hyper-V projects and those plans even include SCVMM 2012 & SCOM 2012 together with and upgrade path to Hyper-V vNext. I mailed the Gartner link to my manager, pointing out my obstinate choice back then turned out rather well Winking smile.

Hyper-V 3.0 Leaked Screen Shots From Windows 8 Create A Buzz

Well, last Monday, June 20th 2011 was quite a twitter active day about some leaked Windows 8 screen shots that lifted a tip of the veil  about Hyper-V 3.0 / Hyper-V vNext or Hyper-V 3. You can take a peak here (Windows Now by Robert McLaws) and here (WinRumors) to see for yourself.

Now Scot Lowe also blogged on this but with some more detail. The list below is the one form Scott Lowe’s blog http://blogs.virtualizationadmin.com/lowe/2011/06/20/hyper-v-30-%e2%80%93-what%e2%80%99s-coming/ but I added some musings and comments to certain items.

  • Virtual Fibre Channel Adapter  ==> nice, I guess the competition of iSCSI was felt. How will this turn out/means with regards to SAN/DRVIVER/HBA support is interesting and there is a mention of virtual fiber channel SAN in the screenshots …
  • Storage Resource Pools  & Network Resource Pools   ==> this could become sweet … I’m dreaming about my wish list feedback to Microsoft but without details I won’t speculate any further.
  • New .VHDX virtual hard drive format (Up to 16TB + power failure resiliency) ==> This is just plain sweet, we’re no longer bound by 2TB LUNs on our physical storage (SAN), now we can take that to the next level.
  • Support for more than 4 cores! (My machine has 12 cores) ==> I say “Bring it on!”
  • NUMA – Memory per Node, Cores per Node, Nodes per Processor Socket ==> Well, well … what will this translate into? Help deal with Dynamic Memory? Aid in virtualization of SQL Servers (i.e. better support for scaling up, for now scaling out works better here).
  • Hardware Acceleration (Virtual Machine Queue & IPsec Offload)
  • Bandwidth Management ==> Ah, that would be nice 🙂
  • DHCP Guard  ==> This is supposed to drop DHCP traffic from VM “masquerading” as DHCP servers. Could be very useful, but need details. Will a DHCP server need to be authorize?. What with non Windows VMs, do you add “good” DHCP servers to an allow list?
  • Router Guard  ==> same as above but for rogue routers.  Drops router advertisement and redirection messages from unauthorized virtual machines pretending to be others. So this sound like an allow list.
  • Monitor Port Provides for monitoring of network traffic in and out of a virtual machine. Forwards the information to a monitoring virtual machine.  ==> Do I hear some cheering network engineers?
  • Virtual Switch Extensions.So far, there appear to be two filters added: NDIS Capture LightWeight Filter and WFP vSwitch Layers LightWeight Filter.

All of this is pretty cool stuff and has many of us wanting to get our hands on the first beta 🙂 I’ve been running Windows Server tweaked as desktop since Windows 2003 so I have Hyper-V already in that role but hey bring it on. I ‘m very eager to get started with this. I have visions on System Center Virtual machine Manager 2012, Hyper-V 3.0 with very capable recent SAN technology … Open-mouthed smile

Consider CPU Power Optimization Versus Performance When Virtualizing

Over the past couple of years I’ve read, heard, seen and responded to reports of users dealing with performance issues when trying to save the planet with the power saving options on CPUs. As this if often enabled by default they often don’t even realize this is in play. Now for most laptop users this is fine and even for a lot of desktop users it delivers upon the promise of less energy consumption. Sure, there are always some power users and techies that need every last drop of pure power but on the whole life is good this way. So you reduce your power needs, help save the planet and hopefully some money along the way as well. Now, even when your filthy rich and money is no objection to you what so ever, you could still be in a place where there are no more extra watts available due to capacity being maxed out or the fact they have been reserved for special events like the London Olympics, so keeping power consumption in check becomes a concern for you as well.

Now this might make good economic sense for a lot of environments (mobile computing) but in other places it might not work out that well. So when you have al this cool & advanced power management running in some environments you need to take care and not turn your virtualization hosts into under achievers. Perhaps that putting it too strong but hey I need to wake you up to get your attention. The more realistic issue is that people are running more and more heavy workloads in virtual machines and that the hosts used for that contain more and more cores per socket using very advanced CPU functionalities and huge amounts of RAM. Look at these KB article KB2532917: Hyper-V Virtual Machines Exhibit Slow Startup and Shutdown and KB 2000977: Hyper-V: Performance decrease in VMs on Intel Xeon 5500 (Nehalem) systems. All this doesn’t always compute (pun intended) very well.

Most hyper-V consultants will also be familiar with the blue screen bugs related to C-state like You receive a “Stop 0x0000007E” error on the first restart after you enable Hyper-V on a Windows Server 2008 R2-based computer and Stop error message on a Windows Server 2008 R2-based computer that has the Hyper-V role installed and that uses one or more Intel CPUs that are code-named Nehalem: “0x00000101 – CLOCK_WATCHDOG_TIMEOUT” on top of the KB articles mentioned above. I got bitten by the latter one a few times (yes I was a very early adopter of Hyper-V). Don’t start bashing Microsoft too hard on this, VMware and other vendors are dealing with their own C-State (core parking) devils (just Google for it) and read the articles to realize sometimes this is a hardware/firmware issue. A colleague of mine told me that some experts are advising to just turn C-state off in a virtualization environment. I’ll leave that to the situation at hand but it is an area that you need to be aware of an watch out for. As always, and especially if you’re reading this in 2014, realize that all information has a time-limited shelf life based on the technology at the time of writing. Technology evolves and who knows what CPUs & hypervisors will capable of in the future?  Also, these bugs have been listed on most Hyper-V blogs as they emerged, so I hope you’re not totally surprised.

It’s not just the C-States we need to watch out for, the P-states have given us some performance issues as well. I’ve come across some “strange” results in virtualized environments that resulted from “merely confused” system administrators to customers suffering from underperforming servers, both physical and virtual actually. All those fancy settings like SpeedStep (Intel) or Cool’n’Quiet (AMD), might cause some issues, perhaps not in your environment but it pays to check it out and be aware of these as servers arrive with those settings enabled in the BIOS and Windows 2008 R2 is using them by default. Oh, If you need some reading on what C-States and P-States are, take a look at C-states and P-states are very different

Some confusion can happen when virtual machines report less speed than the physical CPUs can deliver, worsened by the fact that sometimes it varies between VMs on the same host. As long as this doesn’t cause performance issues this can be lived with by most people but the inquisitive minds. Wen performance takes a dive, servers start to respond slower and apps wind down to a glacial pace; you see productivity suffer which causes people to get upset. To add to the confusion SCVMM allows you to assign a CPU type to your VMs as a hint to SCVMM to help out with intelligent placement of the virtual machines (see What is CPU Type in SCVMM 2008 R2 VM Processor Hardware Profile?), which confuses some people even more. And guess on whose desk that all ends up?

When talking performance on servers we see issues that pitch power (and money, and penguins) savings against raw performance. We’ve seen some SQL servers and other CPU hungry GIS applications servers underperform big time (15% to 20%) under certain conditions. How is this possible? Well, when CPUs are trimmed down in voltage and frequency to reduce power consumption when the performance is not needed. The principle is that they will spring back into action when it is needed. In reality, this “springing” back into action isn’t that responsive. It seems that the gradual trimming down or beefing up the CPUs voltage and frequency isn’t that transparent to the processes needing it. Probably because constant, real-time, atomic adjustments aren’t worth the effort or are technically challenging. For high-performance demands this is not good enough and could lead to more money spend on extra servers and time spend on different approaches (code, design, and architecture) to deal with a somewhat artificial performance issue. The only time you’re not going to have these issues is when your servers are either running apps with mediocre to low-performance needs or when they are so hungry for performance those CPUs will never be trimmed down, they just don’t get the opportunity to do this. There is a lot to think about here and now add server virtualization into the mix. No my dear application owner Task Manager’s CPU information is not the real raw info you can depend on for the complete truth and nothing but the truth.  Many years ago CPUz was my favorite tool to help tweak my home PC. Back then I never thought it would become part of my virtualization toolkit but it’s easy and faster than figuring it out with all the various performance counters.

Now don’t think this is an “RDBMS only” problem and that, since you’re a VDI guy or a GIS or data crunching guy, you’re out of the woods. VDI and other resource-hungry applications (like GIS and data crunching) that show heterogenic patterns in CPU needs can suffer as well and you’d do well to check on your vCPUs and pCPUs and how they are running under different loads. I actually started looking at SQL Server because of seeing the issue first with freaked out GIS application running at 100%v CPUs and the pCPU being all relaxed about it. It made me go … “hang on I need to check something” that’s when I ran into a TechNet forum post on Hyper-V Core Parking performance issues leading to some interesting articles by Glenn Berry and Brent Ozar who are dealing with this on physical servers as well. The latter article even mentions an HP ILO card bug that prevents the CPU from throttling back up completely. Ouch!

Depending on your findings and needs you might just want turn SpeedStep or Cool’n’Quiet off either in the BIOS or in windows. Food for taught, what if one day some vendors decide you don’t need to be able to turn that off, it disappears from your view and ultimately from your control … The “good enough is good enough” world can lead to a very mediocre world. Am I being paranoid? Nope, not according to Ron Oglesby (you want VDI reality checks? Check him out) in his blog post SpeedStep and VDI? Is it a good thing? Not for me. where CISCO UCS 230 blades are causing him problems.

So what do I do? Well to be honest, when the need for stellar and pure raw performance is there, the power savings go out the window whenever I see that it’s causing issues. If it doesn’t, fine, then they can stay. So yes, this means no money saved, no reduction of cooling costs and penguins (not Linux, but those fluffy birds on the South Pole that can’t fly) losing square footage of ice surface. Why? Because the business wants and needs the performance and they are nagging me to deliver it. When you have a need for that performance you’ll make that trade-off and it will be the correct decision. Their fancy new servers performing worse or not better than what they replaced and that virtualization project getting bashed for failing to deliver? Ouch! This is unacceptable, but, to tell you the truth, I kind of like penguins. They are cute. So I’m going to try and help them with Dynamic Optimization and Power Optimization in System Center Virtual Machine Manager 2012. Perhaps this has a better change for performance-critical setups to provide power savings than the advanced CPU capabilities. With this approach, you have nodes running on full power, while distributing the load and shutting down entire nodes when there is over capacity. I’ll be happy to report how this works out in real life. But do mind that this is very environment-dependent and you might not have any issues what so ever, so don’t try to fix what is not broken.

The thing is in most places you can’t hang around for many weeks fine-tuning very little configuration option in the CPUs in collaboration with developers & operations. The production needs, costs and time constraints (by the time they notice any issues “playtime” has come and gone) just won’t allow for it. I’m happy to have those options where I have the opportunity to use them but in most environments, I’ll stick with easier and faster fixes due to those constraints. Microsoft also informs us to keep an eye on power-saving settings in this KB article Degraded overall performance on Windows Server 2008 R2 and offers some links to more guidance on this subject. There is no “one size fits all” solution. By the way some people claim that the best performance results come from leaving SpeedStep on in the BIOS and disabling it in Windows. Others swear by disabling it in the BIOS. I just tend to use what I can where I can and go by the results. It’s all a bit empirical and this is a cool topic to explore, but as always time is limited and you’re not always in the position where you can try it all out at will.

In the end, it comes down to making choices. This is not as hard as you think as long as you make the right choices for the right reasons. Even with the physical desktops that are Wakeup On LAN (WOL) enabled to allow users to remotely boot them when they want to work from home or while traveling, I’ve been known to tell the bean counters that they had to pick one of two: have all options available to their users or save the penguins. You see WOL with a machine that has been shut down works just fine. But when they go into hibernation/standby you have to enable the NICs to be allowed to wake up the computer from hibernation or standby for WOL to work or the users won’t be able to remotely connect to them. See more on this at http://technet.microsoft.com/en-us/library/ee617165(WS.10).aspx But this means they’ll wake up a lot more than necessary by non-targeted network traffic. So what? Think of the benefits! An employee wanting to work a bit at 20:00 PM to get work done on her hibernating PC at work so she can take a couple of hours to take her kid to the doctor next morning can do so = priceless as that mother knows how great of a boss and company she works for.