Upgrading a Hyper-V R2 Cluster to Windows 2008 R2 SP1

For all you people waiting to roll out Windows 2008 R2 SP1 to your Hyper-V cluster here’s a quick screenshot rich run through of the process. Some people claim you need to shut down the cluster services and shut down the guests but this is not the case.  You can do a rolling upgrade and your guests can stay on line on the other nodes, just use live migration to get them there. Now I do recommend to upgrade all the nodes tot SP1 as soon as possible and not staying a mixed Windows 2008 R2 / Windows 2008 R2 SP1 situation in your cluster. But this mixed situation makes upgrades of the nodes in the cluster possible without any down time for the guests (if you have live migration), which is the aim of having a high availability cluster.

Walk Through

Live migrate all the guests from the node you wish to upgrade to SP1. Make sure the host is fully patched and disable any antivirus services if you are running any. I always reboot the node before a major upgrade to make sure we have the server in a clean state with  no lingering reboots waiting  or processes can cause issues.

Navigate to the service pack 1 file for Windows 2008 R2, it’s called windows6.1-KB976932-X64.exe and start it up:

You’ll have to accept the license terms:

And then the preparation process starts:

It is now ready to start the upgrade and yes we want it to reboot automatically when needed:

The upgrade process takes a while (about 17 minutes on my lab servers):

 

When it’s done it will reboot and bring you back to the logon screen. Multiple reboots might be needed to complete the upgrade process depending on what’s running on your server. In this case, we are dealing with dedicated Hyper-V nodes.

View when connected to the console

View when connected via RDP

After logging on you are greeted with this window:

And yes this is indeed the case

Reboot included the entire process took about 22 to 23 minutes. In the setup event log you’ll find these messages:

  • Initiating changes for package KB976932. Current state is Absent. Target state is Installed. Client id: SP Coordinater Engine.
  • Package KB976932 was successfully changed to the Installed state.

Note: if an extra reboot is required you’ll see an extra entry in between these stating: A reboot is necessary before package KB976932 can be changed to the Installed state.

When you have a cluster with nodes running both W2K8R2 TM and W2K8R2 SP1, mixed situation so to speak,  you’ll see the following notification in the cluster events:

You can live migrate the guest from the next node to the node already upgraded to SP1 and than repeat the process. You keep doing this until all your nodes are upgraded.

As a final recommendation, I would suggest waiting until you get the SCVMM2008R2 SP1 bits is you use this product before you upgrade your clusters especially when using this with SCOM2007R2 PRO Tips. Otherwise, you don’t need to wait just realize that until you have SP1 for SCVMM2008 R2 you won’t be able to use the new functionality for Hyper-V. In production, I would not recommend using the RC1 for this.

Please do not forget to update your guests with the new SP1 version of the Hyper-V Integration Components. This is needed to be able to use the new features like Dynamic Memory & Remote FX. The Windows 2008 R2 RTM version of the Integration Components is  6.1.7600.16385:

image

You can do this using Hyper-V Manager through selecting “Insert Integration Services Setup Disk”  and running the setup, this will require a reboot.

Click to start the upgrade process.

It will ask to upgrade or repair the previous version:

Work in progress:

Done and asking for a reboot:

SCVMM2008R2 can also be used, here you shut down the guest before updating the virtual guest services as it’s called in SCVMM2008R2. It can be annoying that the nomenclature differs. The good thing here is that you can upgrade multiple guest using VMM2008R2. Hans Vredevoort did a blog post on this here: http://www.hyper-v.nu/blogs/hans/?tag=windows-server-2008-r2-sp1.  After the upgrade you can see that the version of the Integration Components  for Windows 2088 R2 SP1 is  6.1.7601.17514:

Building A New Lab For 2011 And Beyond

Well with all this (Hyper-V) Clustering, Virtualization, System Center Suite, Exchange 2010 & Lync, SQL Servers, iSCSI demands on my lab network  I really need to refresh my hard ware. It sounds a bit like a paradox but such is life for the people building all this stuff. Yes, they still need some hardware, pretty beefy machines actually, to set it all up, test it, break it, fix it and keep learning. I’ve depleted my 4 years old lab material which in which I can’t put more than 4 GB RAM.  Now that I have finished all my infrastructure projects for 2010 I have time to focus on improving my old setup. Or at least I hope. Things are very busy. Thanks to W2K8R2 SP1 beta I could use Dynamic Memory which helped to keep churning away with these and various Exchange setups but now with Lync coming into the picture I want and need an upgrade.  A couple of SQL Servers in various high availability setups help eat any remaining resources resources . Add to that the fact that I want to do some private cloud testing so there it is. I need hosts with at least an Intel Quad Core  (i7) and at least 16 GB of DDR3 memory. They should have room for extra NIC cards. And I always try to get some speedy disks where it matters.  Now since Windows Server 2008 R2  added support for Second Level Address Translation (SLAT), which Intel calls Extended Page Tables (EPT) and which AMD calls Nested Page Tables (NPT) or Rapid Virtualization Indexing (RVI), we can make use of better graphics cards. Until now none of my processors had SLAT support.  With the Intel i7 (Nehalem) processor I’m good to go. As all machine in my lab are Intel so I’m sticking with them for Hyper-V migrations as that doesn’t work between brands.

So here’s an logical overview of my setup. This is what I already in place with my current hardware but have now drawn with my coveted hardware refreshment Smile Oh, yes the dual 1Gbps switches for iSCSI are new for this setup. I’m adding one so I can play with MPIO in the lab.

For disks I use 300GB – 16MB – 10.000 rpm and 600GB –32MB – 10.000rpm Raptors in combination with an external eSATA 1TB/2TB Western Digital Black Disk for storage of VHD’s, Images, backups etc.  I have to buy some extra now. The faster disks are expensive but a lab environment needs some performance as waiting around for servers & virtual machines becomes a major of annoyance when you need to get work done. The 10.000 rpm disks are great for iSCSI storage for which I use the iSCSI Target from Windows 2008 R2 Storage server via my TechNet subscription.

All this kit should keep me up and running from 2011 until the end of 2014. Is this expensive? Yes and no.  I can recuperate my 1 Gbps Intel NIC’s and most of my hard disks.  I already have my network switches, monitors and KVM switches. So in all it’s the new motherboards, CPU’s and memory that will eat the  most of the budget.  It’s a sum to put out but here’s a note to all IT Pro’s out there. You need to invest in yourself every now and then.

I’ve blogged about this before in https://blog.workinghardinit.work/2010/02/04/having-a-lab-using-it/. Self improvement and learning is a continuous process that never ends. Sure it does have some peak moments in financial costs when you need equipment. Remember you don’t need to buy it all at once. Talk to you employer about this if you’re not self employed. Look at how much a 5 day advanced course or a conference costs. You can use a lab to learn and experiment for many years to come. So basically the potential ROI is very good. In the end, what my employers and customers get out of this is knowledge, insight, skills and results. Think about it, it helps to put the investment in perspective. Sure, I invest more than just the hardware, my time which is very valuable to me. You can’t maker more time, everyone has the same 24 hours in a day. Now it really helps if you like this stuff and have fun whilst learning new technologies or setting up a proof of concept. In a way what people put into their job and knowledge is  an indicator of their professionalism. You do not become an expert by working 9 to 5 and only learning when a course is provided. It’s not going to happen. Even a genius who puts in the effort stands out amongst his or her peers. The same goes for you, but be smart about it. You can work yourself to death and not accomplish anything. So smart & hard is the way to go.

SCVMM 2008 R2 Phantom VM guests after Blue Screen

UPDATE: Microsoft posted an SQL Clean Up script to deal with this issue. Not exactly a fix and let’s hope it gets integrated into SCVMM vNext 🙂 Look at the script here http://blogs.technet.com/b/m2/archive/2010/04/16/removing-missing-vms-from-the-vmm-administrator-console.aspx. There is a link to this and another related blog post in the newsgroup link at the bottom of this article as well.

I’ve seen an annoying hick up in SCVMM 2008 R2 (November 2009) in combination with Hyper-V R2 Live migration two times now. In both cases a Blue Screen (due to the “Nehalem” bug http://support.microsoft.com/kb/975530) was the cause of this. Basically when a node in the Hyper-V cluster blue screens you can end up with some (never seen all) VM’s on that node being is a failed/missing state. The VM’s however did fail over to another node and are actually running happily. They will even fail back to the original node without an issue. So, as a matter of fact, all things are up and running. Basically you have a running VM and a phantom one. There are just multiple entries in different states for the same VM. Refreshing SCVMM doesn’t help and a repair of the VM is not working.

While it isn’t a show stopper, it is very annoying and confusing to see VM guest in a missing state, especially since it the VM is actually up and running. You’re just seeing a phantom entry. However be careful when deleting the phantom VM as you’ll throw away the running VM as well they point to the same files. 

Removing the failed/Orphaned VM in SCVMM is a no go when you use shared storage like for example CSV as it points to the same files as the running one and it is visible to both the good VM node and the phantom one. Meaning it will ruin your good VM as well.

Snooping around in the SCVMM database tables revealed multiple VM’s with the same name but with separate GUIDS. In production it’s really a NO GO to mess around with the records. Not even as a last resort because we don’t know enough about the database scheme and dependencies. So I have found two workarounds that do work (used ‘m both).

  1. Export the good VM for save keeping, delete the missing/orphaned VM entry in SCVMM (one taking the good one with it if you didn’t export it) and import the exported VM again. This means down time for the VM guest. 
  2. Remove the Hyper-V cluster from VMM and re add it. This has the benefit that it creates no down time for the good VM and that the bad/orphaned one is gone. 

Searching the net didn’t reveal much info but I did find this thread that discusses the issue as well http://social.technet.microsoft.com/Forums/en-US/virtualmachinemanager/thread/1ea739ec-306c-4036-9a5d-ecce22a7ab85 and this one http://social.technet.microsoft.com/Forums/en/virtualmachinemgrclustering/thread/a3b7a8d0-28dd-406a-8ccb-cf0cd613f666

I’ve also contacted some Hyper-V people about this but it’s a rare and not well-known issue. I’ll post more on this when I find out.