KB2770917 Updating Host & Guest Integration Services Components – Most Current Version Depends on Guest OS

As after installing http://support.microsoft.com/kb/2770917 on Windows Server 2012 Hyper-V hosts the integration services components are upgraded from 6.2.9200.16384 to 6.2.9200.16433. Windows Server 2012 guest get that same upgrade and as such also the newer integration services components. The guest with older OS version needed a different approach. So I turned to all the great PowerShell support now available for Hyper-V to automate this. Pretty pleased with the results of our adventures in PowerShell scripting I let the script go on Hyper-V cluster dedicated to test & development. As such there are some virtual machines on there running Windows 2003 SP2 (X64) and Windows XP SP3 (x86).  Guess what, after running my script and verifying the integration services version I see that those VM still report version 6.2.9200.16384 . No update. Didn’t my new scripting achievement “take” on those older guests?

So I try the install manually and this is what I get:

clip_image001

 

Why is there no upgrade for these guests?  Are they not needed or do I have an issue? So I mount the ISO and dig around in the files to find a clue in the date:

clip_image001[10]

 

It looks like there are indeed no update components in there for Windows XP/ W2K3. So then I look at the following registry key on the host where I normally use the Microsoft-Hyper-V-Guest-Installer-Win6x-Package value to find out what integration services version my hosts are running:

image

 

Bingo, there it seems indicated that we indeed need version for XP/W2K3 and version for W2K8(R2)/W2K12 and Vista/Windows 7/Windows 8. Cool, but I had to check if this was indeed as it should be and I’m happy to confirm all is well. Ben Armstrong (http://blogs.msdn.com/b/virtual_pc_guy/) confirmed that this is how it should be. There was a update needed for backup that only applied to Windows 8 / Windows Server 2012 guests.  As this fix was in a common component for Windows Server 2008 and later they all got the update. But for the older OS versions this was not the case and hence no update is need. Which is reflected in all the above. In short, this means your XP SP3 & W2K3SP2 VMs are just fine running the version of the integration services and are not in any kind of trouble.

This does leave me with an another task. I was planning to do enhancements to my script like feedback on progress, some logging, some better logic for clustered and non clustered environments, but now I have to also address this possibility and verify using the registry keys on the host which IC version I should check against per OS version. Checking against just for the one related to the host isn’t good enough Smile.

Windows Server 2012 VHDX Thin Provisioning Benefits Explored

Thin Provisioning With Hyper-V

Windows Server 2012 provides thins provisioning at the virtual layer via the VHDX file format. It also provides it at the physical storage layer when your storage supports it. For the later don’t forget that this also means Storage Spaces! So even in environments where budgets are really tight you can leverage this on the physical storage now. So its not just for the feature rich SAN owners anymore Smile.

Even if you use a storage sub system that does not support thin provisioning at the physical layer you will benefit from this mechanism when you use dynamic VHDX files. Not only will these grow less but during shut down they shrink by the size of the empty blocks. Pretty cool! I do however see a potential risk for increased fragmentation. This has a negative impact on performance and needs defragmentation to remediate which also has an impact on IO performance. How much this is a concern depends on your environment and needs. We’ll also have to see in real life how well dynamic VHDX files live up to their performance improvements they got with Windows Server 2012 to entice more people to use this. You have proponents and naysayers. I’m selective and let the circumstances and needs/requirements decide.

Thin Provisioning at the Virtual Layer

You can take a look at the TechEd 2012 session VIR301 by Senthil Rajaram to see how VHD versus VHDX behaves in regards to thin provisioning. I will not repeat all of this here. What I am going to do is look at some other situations.

Important note: You get this UNMAP feature automatically in Windows. There’s no need to manually run the Optimize-Volume command we’ll use in the scenarios below. It’s run automatically for us when the standard Defrag scheduled task runs or during the NTFS check pointing mechanisms that sends the info down every 5 minutes.  So these will normally take care of all that. But the defrag “only” runs every week by default you might want to tweak it or create your own scheduled task in your environment if needed. In demos and labs we’re rather inpatient geeks so even the 5 minute interval for the check pointing mechanisms are to long so we run “Optimize-Volume  -DriveLetter X –ReTrim” to get immediate gratification while testing. In real life it’s zero touch feature, you don’t need to baby sit it.

Fixed VHDX versus Dynamic VHDX

Apart from the fact that you’ll have no shrink on shutdown this optimization does nothing for the file size. The only benefit here is that the UNMAP can be passed to the physical storage where it can help if that supports it. At the virtual layer it doesn’t matter for a fixed sized VDHX disk.

Dynamic VHDX Disk

You’ll profit from the savings in storage when the dynamically expanding VHDX file doesn’t need to grow as much this. This reduces the overhead of expanding the disk, which is a performance benefit and it even helps your non thin provisioning capable storage go further.

Watch Senthil’s presentation (from around minute 20) to see the benefits in action. With VHDX, If you “shift delete” the files inside the VM, then run “Optimize-Volume -DriveLetter X –ReTrim” or  the defrag job and then copy new files  you’ll see that there is no additional file growth as long as you don’t exceed the current size of the VHDX. If you don’t do this both the VHD and VHDX file will grow.

But is another potential benefit why this might be important. Even with the block sizes that have been increased to have less overhead when growing dynamic VDHX files we still have to deal with fragmentation of the VHDX files on the storage where they live. The better/more empty blocks are reused, the less the dynamic files will have to grow. This means you’ll have less opportunity for fragmentation. Whether this compensates for potential of more fragmentation due to the shrinking when they are shutdown I don’t know. If all the performance improvements for dynamic disks are good enough will depend on your environment and needs. Defragmentation can help mitigate this but IO performance during the defragmentation process suffers. Do it or better, schedule it, wisely!

Virtual SCSI controller attached versus virtual IDE controller attached

What about a guest (boot) VHDX disk attached to an IDE controller? I see a lot of one disk virtual machines out there, so it would be a pity if it didn’t work for those and just for the one who have extras vSCSI disk attached.So let’s test this.

image

Below you see the disk size of the VHD and VHDX files and what type of controller they are attached to. As you can see this they had one or two 3.3 GB ISO files copied to them and where then “shift deleted”. The size of the VHD(X) files reflects the amount of data that they stored.

image

Now after running the defrag job or executing “Optimize-Volume -DriveLetter X –ReTrim” inside the VM you’ll see the results below after you shut down the VM

image

So as it turns out, the thin provisioning benefits it work with an IDE attached VHDX files as well! Yes inside a Windows Server 2012 virtual machine you get the UNMAP support with IDE attached VHDX disks to. Think of Hosting companies with many thousands single disk virtual machines who can leverage this as well. So this is something you might not expect when having watch the video as there they only talk about virtual SCSI/ FC controllers.

Conclusion

Doing tests like these are a bit artificial but they do demonstrate how the technology works. In real life it will translate into efficiencies over time, based on the data creation and deletion in your VHDX files. Think about hundreds or thousands of virtual machines in your environment leveraging this mechanism. Over time, on that scale, the amount of storage consumed will be reduced which results in better economies. Now leverage that together with thin provisioning support in Storage Spaces and you see that there are some very interesting scenarios to investigate. Some how it’s starting to look like you can have your cookie and eat it to Smile. You don’t need an expensive SAN to get these efficiencies at the physical storage layer, but if you have and use to have to mess around with sdelete or agents, it’s easy to see the benefit you get from this here as well.

First Windows Server 2012 Cluster/Hyper-V related Patches

With November 2012 Patch Tuesday having come and gone, the first hotfixes (it’s a cumulative update) related to Windows Server 2012 are available. These are relevant to both Hyper-V & Failover clustering (Scale Out File Server)  There is also an older hotfix that has been brought to our attention that related to certain versions Windows Server 2008/R2 domain controllers,which is also important for Windows Server 2012 Clustering. None of these are urgent/critical and only apply in specific circumstances but it’s good to keep up with these and protect your environment..

Windows 8 and Windows Server 2012 cumulative update: November 2012

http://support.microsoft.com/kb/2770917: A collection of small changes – for HA VMs (Hyper-V on Cluster) there are three minor CSV file system fixes in this Hotfix : Improves clustered server performance and reliability in Hyper-V and Scale-Out File Server scenarios. Improves SMB service and client reliability under certain stress conditions.

Error code when the kpasswd protocol fails after you perform an authoritative restore: “KDC_ERROR_S_PRINCIPAL_UNKNOWN”

http://support.microsoft.com/kb/976424: Install on every domain controller running Windows Server 2008 Service Pack 2  or Windows Server 2008 R2 in order to add a Windows Server 2012 failover cluster. This is included in Windows Server 2008 R2 Service Pack 1. So just see if you need this fix in your environment or not.

I’m happy to see Microsoft acting fast on these issues,, even if not critical, to serve & protect their customers deployments.

Trouble Shooting Windows Server 2012 host based CommVault Backups with DELL Compellent hardware VSS provider of Hyper-V guests: ‘Microsoft Hyper-V VSS Writer’ State: [5] Waiting for completion

We have been running CommVault Simpana 9.0 R2 SP7 in combination with the DELL Compellent Hardware VSS provider to do host based backups of the virtual machines on our Windows Server 2012 Hyper-V clusters host with great success and speed.

We’ve run into two issues so far. One, I blogged about in DELL Compellent Hardware VSS Provider & Commvault on Windows Server 2012 Hyper-V nodes – Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface. hr = 0×80070005, Access is denied was an due to some missing permissions for the domain account we configured the Compellent Replay manager Service to run with. The solution for that issue can be found in that same blog post.

The other one was that sometimes during the backup of a Hyper-V host we got an error from CommVault that put the job in a “pending” status, kept trying and failing. The error is:

Error Code: [91:9], Description: Volume Shadow Copy Service (VSS) error. VSS service or writers may be in a bad state. Please check vsbkp.log and Windows Event Viewer for VSS related messages. Or run vssadmin list writers from command prompt to check state of the VSS writers.

clip_image001

When we look at the Compellent controller we see the following things happen:

  • The snapshots get made
  • They are mounted briefly and then dismounted.
  • They are deleted

The result at the CommVault end is that the job goes into a pending state with the above error. When we look at the state of the Microsoft Hyper-V VSS Writer by running “vssadmin list writer” …

image

… from an elevated command prompt we see:

Writer name: ‘Microsoft Hyper-V VSS Writer’
…Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
…Writer Instance Id: {2fa6f9ba-b613-4740-9bf3-e01eb4320a01}
…State: [5] Waiting for completion
…Last error: Retryable error

Note at this stage:

  1. Resuming the job doesn’t help (it actually keep trying by itself but no joy).
  2. Killing the job and restarting brings no joy. On top of that our friendly error “Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface. hr = 0×80070005, Access is denied.“ is back, but this time related to the error state of the ‘Microsoft Hyper-V VSS Writer’. The error now has changed a little and has become:

clip_image002

 

 

Writer name: ‘Microsoft Hyper-V VSS Writer’
…Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
…Writer Instance Id: {2fa6f9ba-b613-4740-9bf3-e01eb4320a01}
…State: [5] Waiting for completion
…Last error: Unexpected error

To get rid of this one we can restart the host or, less drastic, restart the Hyper-V Virtual Machine management Service (VMMS.exe) which will do the trick as well.  Before you do this , drain the node when you pause it, then resume it with the option failing back the roles. Windows 2012 makes it a breeze to do this without service interruption Smile

image

clip_image003

image

The Cause: Almost or completely full partitions inside the virtual machines

Looking for solutions when CommVault is involved can be tedious as their consultancy driven sales model isn’t focused on making information widely available. Trouble shooting VSS issues can also be considered a form of black art at times. Since this is Windows 2012 RTM an the date is September 20th 2012 as the moment of writing, there are not yet any hotfixes related to host level backups of Virtual machines and such. CommVault Simpana 9.0 R2 SP7 is also fully patched.

This,combined with the fact that we did not see anything like this during testing (and we did a fair amount) makes us look at the guests. That’s the big difference on a large production cluster. All those unique guests with their own history. We also know from the past years with VSS snapshots in Windows 2008(R2) that these tend to fail due to issues in the guests. Take a peak at Troubleshoot VSS issues that occur with Windows Server Backup (WBADMIN) in Windows Server 2008 and Windows Server 2008 R2 just for starters  As an example we already had seen one guest (dev/test server) that had 5 user logged in doing all kinds of reconfigurations and installs go into save mode during a backup, so it could be due to something rotten in certain guests. There is very much to consider when doing these kinds of backups.

By doing some comparing of successful & failed backups it really looks as if it was related to certain virtual machines. A lot of issues are caused by the VSS service, not running or not being able to do snapshots because of lack of space so perhaps this was the case here as well?

We poked around a bit. First let’s see what we can find in the Hyper-V specific logs like the Microsoft-Windows-Hyper-V-VMMS-Admin event log. Ah lot’s of errors relating to a number of guests!

image

Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          19/09/2012 22:14:37
Event ID:      10102
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      undisclosed server
Description:
Failed to create the volume shadow copy inside of virtual machine ‘undisclosedserver’. (Virtual machine ID 84521EG0G-8B7A-54ED-2F24-392A1761ED11)

Well people, that is called a clue Winking smile. So we did some Live Migration to isolate suspect VMs to a single node, run backups, see them fail, do the the same with a new and clean VM an it all works. and indeed … looking at the guest involved when the CommVault backup fails we that the VSS service is running and healthy but we do see all kind of badness related to disk space:

  • Large SQL Server backup files put aside on the system partition or or other disks
  • Application & service pack installers left behind,
  • Log and tempdb volumes running out of space.
  • Application Logs running out of control

That later one left 0MB of disk space on the system (Test Controller TFS shitting itself), but we managed to clear just enough to get to just over 1GB of free space which was enough to make the backup succeed.

clip_image001[8]

image

Servers, virtual or physical ones, should to be locked down to prevent such abuse. I know, I know. Did I already tell you I do not reside in a perfect world? We cannot protect against dev and test server admins who act without much care on their servers. We’ll just keep hammering at it to raise their awareness I guess. For end users and production servers we monitor those well enough to proactively avoid issues. With dev & test servers we don’t do so, or the response team would have a day’s work reacting to all alerts that daily dev & test usage on those servers generate.

The fix

  • Clear at least 1GB or a bit more inside each partition in the guest running on the host that has a failing backup. I prefer to have at least a couple of GB free  (10% to 15% => give yourself some head room people).
  • Then you can resume the backup job manually or let CommVault do that for you if it’s still in a pending state.
  • If you’ve killed the job make sure you restore the
  • Microsoft Hyper-V VSS Writer  to a healthy state as described above. Thanks to Live Migration this can be achieved without any down time.

Conclusion

There is experimenting, testing, production testing, production and finally real life environments where not all is done as it should be. Yes, really the world isn’t perfect. Managers sometimes think it’s click, click, Next, click and voila we’ve got a complex multisite system running. Well it isn’t like that and you need some time and skills to make it all work. Yes even in todays “cheap, fast, easy to run your business form your smartphone”  ecosystem of the private, hybrid and public cloud, where all is bliss and world peace reigns.

The DELL Compellent Hardware VSS provider & replay manager service handle all this without missing a beat, which is very comforting. As previous experiences with hardware VSS provides of other vendors make us think that these would probably have blown up by now.