Microsoft Management Summit 2013 Registration opens on December 3rd, 2012

Just as a heads up to all people planning to attend the Microsoft Management Summit 2013 (MMS 2013) this blog is to let you know that registrations open on December 3rd 2012.

image

So, I’d keep an eye out for the MMS 2013 site and register as soon as you get the opportunity. This event has the tendency to sell out fast.

Microsoft Management Summit 2012

From the 16th of April until the 20th of April 2012 Microsoft is running animportant conference for anyone who’s involved with systems management in the Microsoft sphere. It is, of cause, the Microsoft Management Summit 2012 (MMS 2012) in Las Vegas (Nevada, USA).

image

This is a conference that is held in very high regard and I’ve heard through the grape vines it’s one of the favorite conferences for Microsoft Employees to attend themselves due to its high quality and focus on the System Center suite. I’ve never had the opportunity to attend before and I would like to go.

It’s very likely that the System Center 2012 Suite of products will be officially launched at MMS 2012 and there will be an abundance of learning opportunities in regards to these. As said above, I’d really love to go and I encourage anyone who can make it to attend. Yes, I know it’s in the United States, so for us non US residents that can mean long and expensive travel and we’ll need a budget to stay in Las Vegas for a week. But it’s the only conference of its kind. There is no MMS Europe and such. Although I have to say that with initiatives like “Best of MMS” TechNet events Microsoft & the community make an effort to deliver content and information to a much larger audience, which is great.

If you’re in the target group for this conference and you’re interested take a look here. They even have cost/benefit sheet to help convince your management Winking smile

Now a lot of you are already be playing with the System Center 2012 Betas and Release Candidates but if you’re not  you might want to get a head start by downloading the System Center 2012 Evaluation Products and perhaps even by joining the Community Evaluation Program for System Center 2012 (Private Cloud) and Configuration Manager 2012.

Consider CPU Power Optimization Versus Performance When Virtualizing

Over the past couple of years I’ve read, heard, seen and responded to reports of users dealing with performance issues when trying to save the planet with the power saving options on CPUs. As this if often enabled by default they often don’t even realize this is in play. Now for most laptop users this is fine and even for a lot of desktop users it delivers upon the promise of less energy consumption. Sure, there are always some power users and techies that need every last drop of pure power but on the whole life is good this way. So you reduce your power needs, help save the planet and hopefully some money along the way as well. Now, even when your filthy rich and money is no objection to you what so ever, you could still be in a place where there are no more extra watts available due to capacity being maxed out or the fact they have been reserved for special events like the London Olympics, so keeping power consumption in check becomes a concern for you as well.

Now this might make good economic sense for a lot of environments (mobile computing) but in other places it might not work out that well. So when you have al this cool & advanced power management running in some environments you need to take care and not turn your virtualization hosts into under achievers. Perhaps that putting it too strong but hey I need to wake you up to get your attention. The more realistic issue is that people are running more and more heavy workloads in virtual machines and that the hosts used for that contain more and more cores per socket using very advanced CPU functionalities and huge amounts of RAM. Look at these KB article KB2532917: Hyper-V Virtual Machines Exhibit Slow Startup and Shutdown and KB 2000977: Hyper-V: Performance decrease in VMs on Intel Xeon 5500 (Nehalem) systems. All this doesn’t always compute (pun intended) very well.

Most hyper-V consultants will also be familiar with the blue screen bugs related to C-state like You receive a “Stop 0x0000007E” error on the first restart after you enable Hyper-V on a Windows Server 2008 R2-based computer and Stop error message on a Windows Server 2008 R2-based computer that has the Hyper-V role installed and that uses one or more Intel CPUs that are code-named Nehalem: “0x00000101 – CLOCK_WATCHDOG_TIMEOUT” on top of the KB articles mentioned above. I got bitten by the latter one a few times (yes I was a very early adopter of Hyper-V). Don’t start bashing Microsoft too hard on this, VMware and other vendors are dealing with their own C-State (core parking) devils (just Google for it) and read the articles to realize sometimes this is a hardware/firmware issue. A colleague of mine told me that some experts are advising to just turn C-state off in a virtualization environment. I’ll leave that to the situation at hand but it is an area that you need to be aware of an watch out for. As always, and especially if you’re reading this in 2014, realize that all information has a time-limited shelf life based on the technology at the time of writing. Technology evolves and who knows what CPUs & hypervisors will capable of in the future?  Also, these bugs have been listed on most Hyper-V blogs as they emerged, so I hope you’re not totally surprised.

It’s not just the C-States we need to watch out for, the P-states have given us some performance issues as well. I’ve come across some “strange” results in virtualized environments that resulted from “merely confused” system administrators to customers suffering from underperforming servers, both physical and virtual actually. All those fancy settings like SpeedStep (Intel) or Cool’n’Quiet (AMD), might cause some issues, perhaps not in your environment but it pays to check it out and be aware of these as servers arrive with those settings enabled in the BIOS and Windows 2008 R2 is using them by default. Oh, If you need some reading on what C-States and P-States are, take a look at C-states and P-states are very different

Some confusion can happen when virtual machines report less speed than the physical CPUs can deliver, worsened by the fact that sometimes it varies between VMs on the same host. As long as this doesn’t cause performance issues this can be lived with by most people but the inquisitive minds. Wen performance takes a dive, servers start to respond slower and apps wind down to a glacial pace; you see productivity suffer which causes people to get upset. To add to the confusion SCVMM allows you to assign a CPU type to your VMs as a hint to SCVMM to help out with intelligent placement of the virtual machines (see What is CPU Type in SCVMM 2008 R2 VM Processor Hardware Profile?), which confuses some people even more. And guess on whose desk that all ends up?

When talking performance on servers we see issues that pitch power (and money, and penguins) savings against raw performance. We’ve seen some SQL servers and other CPU hungry GIS applications servers underperform big time (15% to 20%) under certain conditions. How is this possible? Well, when CPUs are trimmed down in voltage and frequency to reduce power consumption when the performance is not needed. The principle is that they will spring back into action when it is needed. In reality, this “springing” back into action isn’t that responsive. It seems that the gradual trimming down or beefing up the CPUs voltage and frequency isn’t that transparent to the processes needing it. Probably because constant, real-time, atomic adjustments aren’t worth the effort or are technically challenging. For high-performance demands this is not good enough and could lead to more money spend on extra servers and time spend on different approaches (code, design, and architecture) to deal with a somewhat artificial performance issue. The only time you’re not going to have these issues is when your servers are either running apps with mediocre to low-performance needs or when they are so hungry for performance those CPUs will never be trimmed down, they just don’t get the opportunity to do this. There is a lot to think about here and now add server virtualization into the mix. No my dear application owner Task Manager’s CPU information is not the real raw info you can depend on for the complete truth and nothing but the truth.  Many years ago CPUz was my favorite tool to help tweak my home PC. Back then I never thought it would become part of my virtualization toolkit but it’s easy and faster than figuring it out with all the various performance counters.

Now don’t think this is an “RDBMS only” problem and that, since you’re a VDI guy or a GIS or data crunching guy, you’re out of the woods. VDI and other resource-hungry applications (like GIS and data crunching) that show heterogenic patterns in CPU needs can suffer as well and you’d do well to check on your vCPUs and pCPUs and how they are running under different loads. I actually started looking at SQL Server because of seeing the issue first with freaked out GIS application running at 100%v CPUs and the pCPU being all relaxed about it. It made me go … “hang on I need to check something” that’s when I ran into a TechNet forum post on Hyper-V Core Parking performance issues leading to some interesting articles by Glenn Berry and Brent Ozar who are dealing with this on physical servers as well. The latter article even mentions an HP ILO card bug that prevents the CPU from throttling back up completely. Ouch!

Depending on your findings and needs you might just want turn SpeedStep or Cool’n’Quiet off either in the BIOS or in windows. Food for taught, what if one day some vendors decide you don’t need to be able to turn that off, it disappears from your view and ultimately from your control … The “good enough is good enough” world can lead to a very mediocre world. Am I being paranoid? Nope, not according to Ron Oglesby (you want VDI reality checks? Check him out) in his blog post SpeedStep and VDI? Is it a good thing? Not for me. where CISCO UCS 230 blades are causing him problems.

So what do I do? Well to be honest, when the need for stellar and pure raw performance is there, the power savings go out the window whenever I see that it’s causing issues. If it doesn’t, fine, then they can stay. So yes, this means no money saved, no reduction of cooling costs and penguins (not Linux, but those fluffy birds on the South Pole that can’t fly) losing square footage of ice surface. Why? Because the business wants and needs the performance and they are nagging me to deliver it. When you have a need for that performance you’ll make that trade-off and it will be the correct decision. Their fancy new servers performing worse or not better than what they replaced and that virtualization project getting bashed for failing to deliver? Ouch! This is unacceptable, but, to tell you the truth, I kind of like penguins. They are cute. So I’m going to try and help them with Dynamic Optimization and Power Optimization in System Center Virtual Machine Manager 2012. Perhaps this has a better change for performance-critical setups to provide power savings than the advanced CPU capabilities. With this approach, you have nodes running on full power, while distributing the load and shutting down entire nodes when there is over capacity. I’ll be happy to report how this works out in real life. But do mind that this is very environment-dependent and you might not have any issues what so ever, so don’t try to fix what is not broken.

The thing is in most places you can’t hang around for many weeks fine-tuning very little configuration option in the CPUs in collaboration with developers & operations. The production needs, costs and time constraints (by the time they notice any issues “playtime” has come and gone) just won’t allow for it. I’m happy to have those options where I have the opportunity to use them but in most environments, I’ll stick with easier and faster fixes due to those constraints. Microsoft also informs us to keep an eye on power-saving settings in this KB article Degraded overall performance on Windows Server 2008 R2 and offers some links to more guidance on this subject. There is no “one size fits all” solution. By the way some people claim that the best performance results come from leaving SpeedStep on in the BIOS and disabling it in Windows. Others swear by disabling it in the BIOS. I just tend to use what I can where I can and go by the results. It’s all a bit empirical and this is a cool topic to explore, but as always time is limited and you’re not always in the position where you can try it all out at will.

In the end, it comes down to making choices. This is not as hard as you think as long as you make the right choices for the right reasons. Even with the physical desktops that are Wakeup On LAN (WOL) enabled to allow users to remotely boot them when they want to work from home or while traveling, I’ve been known to tell the bean counters that they had to pick one of two: have all options available to their users or save the penguins. You see WOL with a machine that has been shut down works just fine. But when they go into hibernation/standby you have to enable the NICs to be allowed to wake up the computer from hibernation or standby for WOL to work or the users won’t be able to remotely connect to them. See more on this at http://technet.microsoft.com/en-us/library/ee617165(WS.10).aspx But this means they’ll wake up a lot more than necessary by non-targeted network traffic. So what? Think of the benefits! An employee wanting to work a bit at 20:00 PM to get work done on her hibernating PC at work so she can take a couple of hours to take her kid to the doctor next morning can do so = priceless as that mother knows how great of a boss and company she works for.

System Center Virtual Machine Manager 2012 Using WSUS To Update Hyper-V Cluster Hosts & Other Fabric Servers

One very neat feature in System Center Virtual Machine Manager 2012 (SCVMM2012), which is currently in Béta, is the integration with WSUS to automate the patching of Hyper-V cluster hosts (+ the Library servers, SCVMM servers and the update servers, i.e the fabric). The fact that SCVMM 2012 will give you the complete toolset to take care of this is yet a great addition to the functionality available in Virtual Machine Manager 2012. More and more I’m looking forward to using it in production as it has so many improvements and new features. Combine that with what’s being delivered in System Center Operations Manager (SCOM2012) and the other member of the System Center family and I’m quite happy with what is coming.

But let’s get back to the main subject of this blog. Using WSUS and SCVMM2012 to auto-update the Hyper-V cluster hosts without interruption to the virtual machines that are running on it. Up until now, we needed to script such a process out with PowerShell even tough having SCVMM2008R2 makes it easier since we have Maintenance Mode in that product which will evacuate all VMs from that particular host, one by one. The workflow of this script looks like this:

  • Place the Host Node in Maintenance Mode in SCOM 2007 R2 (So we don’t get pesky alerts)
  • Place the Host Node in Maintenance Mode in SCVMM2008R2 (this evacuates the VMs from the host via Live Migration to the other nodes in the cluster)
  • Patch the Host and restart it
  • Stop Maintenance Mode on the host node in SCVMM2008R2 (So it can be used to run VMs again)
  • Stop Maintenance Mode on the host node in SCOM 2007 (We want it to be monitored again)
  • Rinse & Repeat until all Host nodes are done. Depending on the size of the cluster you can do this with multiple nodes at the same time. Just remember that there can be only one Live Migration action taking place per node. That means you need at least 4 nodes to do something like Live migrate from Node A to Node B and Live Migrate from Node C to node D. So you need to work out what’s optimal for your cluster depending on load and number of nodes you have to work with.
  • Have the virtual machines redistributed so that the last host also gets its share or virtual machines

Now with SCVMM2012 we can do this out of the box using WSUS and all of this is achieved without ever interrupting any services provided by the guests as all virtual machines are kept running and are live migrated away from the host that will be patched. If you’re a shop that isn’t running System Center Configuration Manager you can still do this thanks to the use of WSUS and that’s great news.  There is an entire sub-section on the subject of Managing Fabric Updates in VMM 2012 already available on TechNet. But it goes beyond the Hyper-V host. It’s also the SCVMM server, the library server, and the Update Server that get patched. But don’t go wild now, that’s the entire scope of this. That means you still need regular WSUS or SCCM for patching the virtual machine guests and other physical servers. The aim of this solution is to patch your virtualization solution’s infrastructure as a separate entity, not your entire environment.

So how do we get this up and running? Well, it isn’t hard. Depending on your needs and environment you can choose to run WSUS and SCVMM on the same server or not. If you choose the latter please make sure you install the SWSUS Administration Console on the SCVMM server. This is achieved by downloading  WSUS 3.0 SP2 and installing it. Otherwise, just use the WSUS role from the roles available on Windows 2008 R2. This handles the prerequisites for you as well. It is also advisable to install the WSUS role on a separate server when your SCVMM 2012 Infrastructure is a highly available clustered one. For more information see http://technet.microsoft.com/en-us/library/gg675099.aspx . Time-saving tip: create a separate domain account for the WSUS server integration, it can not be the SCVMM 2012 service domain account.

Make sure you pay attention to the details in the documentation, don’t forget to install the WSUS 3.0 SP2 Administration Console on the SCVMM 2012 server or servers and to restart the SCVMM service when asked to. That will safe you some trouble. Also, realize that this WSUS Server will only be used for updating the SCVMM 2012 fabric and nothing else. So we do not configure anything except the operating system (W2K8R2) , and the languages needed. All other options & products that are not related to virtualization are unchecked as we don’t need them. Combine this with dynamic optimization to distribute the VM’s for you and you’re golden. A good thing to note here is that you’re completely in control. You as the virtualization infrastructure / SCVMM 2012 Fabric administrator control what happens regarding updates, service packs, …

You do need to get used to the GUI a bit when playing around with SCVMM2012 for the first time to make sure you’re in the right spot, but once you get the hang of it you’ll do fine. I’ll leave you with some screenshots of my lab cluster being scanned to check the compliance status and then being remediated. It works pretty neatly.

Here are the hosts being scanned.

You can right-click and select remediate per baseline or select the host and select remediate form context menu or the ribbon bar.

The crusader host is being remediated. I could see it being restarted in the lab.