Fixing Hiccups in The SCVMM2008R2 GUI & Database

As you might very well know by experience sometimes the System Center Virtual Machine Manager GUI and database get out of sync with reality about what’s going on for real on the cluster. I’ve blogged about this before in SCVMM 2008 R2 Phantom VM guests after Blue Screen and in System Center Virtual Machine Manager 2008 R2 Error 12711 & The cluster group could not be found (0×1395)

The Issue

Recently I had to trouble shoot the “Missing” status of some virtual machines on a Hyper-V cluster in SCVMM2008R2. Rebooting the hosts, guests, restarting agents, … none of the usual tricks for this behavior seemed to do the trick. The SCVMM2008R2 installation was also fully up to date with service packs & patches so there the issue dot originate.

Repair was greyed out and was no use. We could have removed the host from SCVMM en add it again. That resets the database entries for that host en can help fix the issues but still is not guaranteed to work and you don’t learn what the root cause or solution is. But none of our usual tricks worked.We could have deleted the VMs from the database as in  but we didn’t have duplicates. Sure, this doesn’t delete any files or VM so it should show up again afterwards but why risk it not showing up again and having to go through fixing that.

The Cause

The VMs were in a “Missing” state after an attempted live migration during a manual patching cycle where the host was restarted the before the “start maintenance mode” had completed. A couple of those VMs where also Live Migrated at the same time with the Failover Cluster GUI. A bit of confusion al around so to speak nut luckily all VMs are fully operational an servicing applications & users so no crisis there.

The Fix

DISCLAIMER

I’m not telling you to use this method to fix this issue but you can at your own risk. As always please make sure you have good and verified backups of anything that’s of value to you Smile

We hade to investigate. The good news was that all VMs are up an running, there is no downtime at the moment and the cluster seems perfectly happy Smile.

But there we see the first clue. The Virtual machines on the cluster are not running on the node SCVMM thinks they are running, hence the “Missing” status.

First of all let’s find out what host the VM is really running on in the cluster and see what SCVMM thinks on what host the VM  is running. We run this little query against the VMM database. That gives us all hosts known to SCVMM.

SELECT [HostID],[ComputerName] FROM [VMM].[dbo].[tbl_ADHC_Host]

HostID                                                                        ComputerName

559D0C84-59C3-4A0A-8446-3A6C43ABF618          node1.test.lab

540C2477-00C3-4388-9F1B-31DBADAD1D8C        node2.test.lab

40B109A2-9E6B-47BC-8FB5-748688BFC0DF         node3.test.lab

C2DA03CE-011D-45E3-A389-200A3E3ED62E        node4.test.lab

6FA4ABBA-6599-4C7A-B632-80449DB3C54C         node5.test.lab

C0CF479F-F742-4851-B340-ED33C25E2013          node6.test.lab

D2639875-603F-4F49-B498-F7183444120A             node7.test.lab

CE119AAC-CF7E-4207-BE0B-03AAE0371165         node8.test.lab

AB07E1C2-B123-4AF5-922B-82F77C5885A2           node9.test.lab

(9 row(s) affected)

Voila en now the fun starts. SCVMM GUI tells us “MissingVM” is missing on node4.

We check this in the database to confirm:

SELECT Name, ObjectState, HostId
FROM VMM.dbo.tbl_WLC_VObject
WHERE Name = 'MissingVM'
GO

Which is indeed node4

Name                                                                                                                                                                                                                                                             ObjectState HostId

———  —  ————————————

node4  220  C2DA03CE-011D-45E3-A389-200A3E3ED62E

(1 row(s) affected)


In SCVMM we see that the moving of the VM failed. Between node 4 and node 6.

image

Now let’s take a look at what the cluster thinks … yes there it is running happily on node 6 and not on node 4. There’s the mismatch causing the issue.

So we need to fix this. We can Live Migrate the VM with the Failover Cluster GUI to the node SCVMM thinks the VM still resides on and see if that fixes it. If it does, great! You have to give SCVMM some time to detect all things and update its records.

But what to do if it doesn’t work out?  We can get the HostId from the node where the VM is really running in the cluster, which we can see in the Failover Cluster GUI, from the query we ran above and than update the record:

UPDATE VMM.dbo.tbl_WLC_VObject
SET HostId  = 'C0CF479F-F742-4851-B340-ED33C25E2013'
WHERE Name = 'MissingVM'
GO

We then reset the ObjectState to 0 to get rid of the Missing status. It would do this automatically but it takes a while.

UPDATE VMM.dbo.tbl_WLC_VObject
SET ObjectState = '0'
WHERE Name = 'MissingVM'
GO

After some patience & Refreshing all is well again and test with live migrations proves that all works again.

As I said before people get creative in how to achieve things due to inconsistencies, differences in functionality between Hyper-V Manager, Failover Cluster Manager and SCVMM 2008R2 can lead to some confusing situations. I’m happy to see that in Windows 8 the action you should perform using the Failover Cluster GUI or PowerShell are blocked in Hyper-V Manager. But SCVMM really needs a “reset” button that makes it check & validate that what it thinks is reality.

Upgrading Hyper-V Cluster Nodes to Windows Server 2012 (Beta) – Part 3

This is a multipart series based on some lab test & work I did.

  1. Part 1 Upgrading Hyper-V Cluster Nodes to Windows Server 2012 (Beta) – Part 1
  2. Part 2 Upgrading Hyper-V Cluster Nodes to Windows Server 2012 (Beta) – Part 2
  3. Part 3 Upgrading Hyper-V Cluster Nodes to Windows Server 2012 (Beta) – Part 3

And we have arrived at part three of my adventures while “transitioning” my Hyper-V cluster nodes to Windows Server 2012. I prefer the term transition as is more correct. We can still not do a rolling upgrade a cluster cluster. We still need to create a new cluster and recuperate the evicted nodes.

I’ll repeat myself here (again) by stating I did not reinstall the evicted nodes but upgraded them. Why, because I can and I wanted to try it out and see what happens. For production purposes I do advise you to rebuild nodes from scratch using a well defined and automated plan if possible. I already mentioned this in Upgrading Hyper-V Cluster Nodes to Windows Server 2012 (Beta) – Part 1

Moving the Storage & Hyper-V Guests

So we stopped Part 2 at a newly created cluster without any storage. That’s what we’ll be taking care of in this part.  Let’s recap what we already mentioned at the end of Part 2.

We have several options for storage here. We could assign new storage but we cannot do a Quick Storage Migration between cluster using SCVMM2008R2 but that doesn’t fly as SCVMM2008R2 can’t manage Windows 2012 clusters and I don’t know if it ever will. We can do a good old manual or scripted export to and import from the new storage of the VMs what takes a considerable amount of time. You also need to have the extra storage available.

We can also recuperate the old storage with the VMs still on there. This could get tricky as no two cluster should be able to see & use the storage at the same time. The benefit could be that we can just use the import type in Windows Server 2012 “Register the virtual machine in-place” (use the existing unique ID) and be done with it. We’ll try that one. We’ll still have some down time but it should be pretty fast. It’s only from Windows Server 2012 on that we’ll be able to do Shared Nothing Live Migrations between clusters Smile and live will be good. If you have a SAN you could also use clones to get this job done without less risk. You work on cloned data and keep the original around instead of using that for the process described below.

So how do we approach this?

Since Windows Server 2008 storage & clustering isn’t the pain it could be in earlier version. It’s the disk manager handling all that and it makes live a lot easier. All disks presented to a cluster node are off line to the operating system until you bring it online. Even if it contains data or is presented to another host, whether that is a member of another cluster or a stand alone host. Pretty cool. It also means you can have all your nodes on line during the process. The process of bringing the disk online and, if needed formatting it with NTFS and then adding it to the cluster as storage can be done on just one of the nodes.

As you recall I unplugged the evicted node from the iSCSI storage (you could also disable the ports) before I upgraded it. The entire iSCSI configuration got upgraded perfectly so all I needed to do was plug the iSCSI cables in and the storage appeared offline. My old cluster node was up and running still accessing it. Pretty slick! And great as a demo but you can play it safer. That was fun Smile but perhaps we won’t be that brave in a production environment.

Options

You could decide to bring all LUNS over at once or one at the time. The process is the same. If you do it one by one you’ll have to rely on the above behavior to protect the LUNs against corruption or you can un-present the LUNS remaining on the old cluster from the new cluster so you’ll never have an issue. We’ve done both and it works out rather fine in testing. Windows clustering is really doing it’s best to prevent you from shooting yourself in the foot Smile

Let’s say I go LUN by LUN. Now I can just remove the VMs from the old cluster using the Failover cluster GUI so they are no longer highly available on that node. When I have no more clustered VMs on a CSV LUN I can shut down all the guests in Hyper-V Manager and stop right there.

On the old cluster I remove that LUN from the CSV storage and from the cluster storage. At that moment that LUN is already taken offline for you!

image

Pardon the silly size but I didn’t have space left to make a realistic screenshot Smile

Great, Windows is protecting us against any possible data corruption! So now I can than un-present the LUN form the old cluster nodes. The next step is to enable the ISCI ports, present that LUN to the new cluster node or nodes (depends on where in the x number of node process you are) or just plug in the cable .

You’ll see the new LUN off line than on the new cluster. We can than make the LUN on line so it will be available to add to the cluster. Just right click that disk and select “Online”.

image

image

 

Right click on storage

image

 

Select an disk that’s available to add to the cluster.

02

 

Things has gotten a lot simpler with CSV in Windows Server 2012. No more enabling it with a funky warning message that’s well meant but is rather confusing an annoying. You just right click the disk and choose “Add to Cluster Shared Volumes” and that’s it.

image

 

And there it is. That disk in our new cluster is ready to use as a CSV.

image

 

So we can now us a nifty new capability in Windows Server 2012 Hyper-V: “Register the virtual machine in-place” (use the existing unique ID)

05

 

The wizard starts.

06

 

Select the folder where your VM or VMs live. yes you can do multiple given that your folder structure allows for this.

07

 

It’s found one VM in our folder

08

 

We click Next

10

 

We select “Register the virtual machine in-place” (use the existing unique ID) and click next.

11

If something is not right like some forgotten “saved” states you’ll get a change to dump those or cancel the process to deal with it properly before trying it again.

12

 

If virtual network names do not match you’ll get the opportunity to set correct that by specifying what virtual switch to use.

13

 

If all was well in the first place or after you’ve fixed any issues like the ones demonstrated above you’re good to go. Click finish and enjoy your Windows Server 2012 Hyper-V Guest.

18

 

At this point you can already start your VMs. I know that the next step is to make all these VMs highly available but here we have some good news as well. You can now make running VMs highly available. Yeah! They no longer need to be shut down. All this is done via the well know process so I’m not going to walk trough the entire process here. But the screen shot of a making a running VM highly available is worth posting Smile

addrunningvm

Upgrading Hyper-V Cluster Nodes to Windows 8 (Beta) – Part 2

This is a multipart series based on some lab test & work I did.

  1. Part 1 Upgrading Hyper-V Cluster Nodes to Windows Server 8 (Beta) – Part 1
  2. Part 2 Upgrading Hyper-V Cluster Nodes to Windows 8 (Beta) – Part 2
  3. Part 3 Upgrading Hyper-V Cluster Nodes to Windows 8 (Beta) – Part 3

Here’s part two of my adventures while upgrading or rather “transitioning” my Hyper-V cluster nodes to Windows 8. Transition is more correctly as you can not upgrade a cluster, you create a new cluster en recuperate the node. I did however not reinstall them but upgrade them. Why, because I can and I wanted to try it out to see what happens. For production purposes I do advise you to rebuild nodes from scratch using a well defined and automated plan if possible. I already mentioned this in Upgrading Hyper-V Cluster Nodes to Windows Server 8 (Beta) – Part 1

So we stopped Part 1 with a evicted and upgraded node. We’ll want to create a new cluster with that node and then transition the other nodes over to the new Windows 8 cluster one by one, or in batches, depending on how many you can afford to take down at one time. In this part we’ll just build our new Window 8 cluster with a single node. It’s a good thing this is possible as we can start a transition with just one node. This an easy part.

First of all we create a new cluster. I will all look very familiar if you’ve ever created a Windows 2008 (R2) cluster.

image

 

The Create Cluster Wizard appears, read all the advice you want and click “Next”

02

 

We select the node that we evicted from the old cluster and upgraded to Windows 8

03

04

 

You now run the validation test for your cluster

06

Let’s run ‘m all and see what it has to say.

07

 

We get a summary of what notes will be tested and what tests will be run. Click “Next”

08

 

The tests are running.

09

 

We get a pass with some warnings. So we click “View Report” to take a look. It’s OK we only have one node, we don’t have storage yet and networking wise we still need to configure some things but we can create a one node cluster, So click “Finish”

10

image

 

I named my new cluster “warriors”, the old one was called “warrior”.

12

 

I define the IP Address for the Access Point for administering the cluster

13

 

We’re ready to create the cluster so we click “Next” and the creation process starts

15

16

 

And we’re informed we’ve have successfully created a cluster. Click Finish. Any experienced cluster builder should find this process very familiar without surprises.

17

 

So now we have a cluster existing out of one node and we haven’t got any storage assigned yet.

We have several options for storage here. We could assign new storage but we cannot do a Quick Storage Migration between cluster using SCVMM2008R2 but that doesn’t fly as SCVMM2008R2 can’t manage Windows 8 clusters and I don’t know if it ever will.  We can do a good old manual or scripted export and import of the VMs what takes a considerable amount of time.

We can recuperate the old storage with the VMs still on there. This could get tricky as no two cluster should be able to see & use the storage at the same time. The benefit could be that we can just use the import type in Windows 8 ("Register the virtual machine in-place" (use the existing unique ID) and be done with it. We’ll try that one. We’ll still have some down time but it should be pretty fast. It’s only from Windows 8 on that we’ll be able to do Shared Nothing Live Migrations between clusters Smile We’ll address that in Part 3.

Integration Services Version Check Via Hyper-V Integration/Admin Event Log

I’ve written before (see "Key Value Pair Exchange WMI Component Property GuestIntrinsicExchangeItems & Assumptions") on the need to & ways with PowerShell to determine the version of the integration services or integration components running in your guests. These need to be in sync with the one running on the hosts. Meaning that all the hosts in a cluster should be running the same version as well as the guests.

During an upgrade with a service pack this get the necessary attention and scripts (PowerShell) are written to check versions and create reports and normally you end up with a pretty consistent cluster. Over time virtual machines are imported, inherited from another cluster of created on a test/developer host and shipped to production. I know, I know, this isn’t something that should happen, but I don’t always have the luxury of working in a perfect world.

Enough said. This means you might end up with guests that are not running the most recent version of the integration tools. Apart from checking manually in the guest (which is tedious, see my blog "Upgrading a Hyper-V R2 Cluster to Windows 2008 R2 SP1" on how to do this) or running previously mentioned script you can also check the Hyper-V event log.

Another way to spot virtual machines that might not have the most recent version of the integration tools is via the Hyper-V logs. In Server Manager you drill down in the “Diagnostics” to, “Event Viewer” and than navigate your way through  "Applications and Services Logs", "Microsoft", "Windows" until you hit “Hyper-V-Integration

image

Take a closer look and you’ll see the warning about 2 guests having an older version of the integration tools installed.

image

As you can see it records a warning for every virtual machine whose integration services are older than the host running Hyper-V. This makes it easy to grab a list of guest needing some attention. The down side is that you need to check all hosts, not to bad for a small cluster but not very efficient on the larger ones.

So just remember this as another way to spot virtual machines that might not have the most recent version of the integration tools. It’s not a replacement for some cool PowerShell scripting or the BPA tools, but it is a handy quick way to check the version for all the guests on a host when you’re in a hurry.

It might be nice if integration services version management becomes easier in the future. Meaning a built-in way to report on the versions in the guests and an easier way to deploy these automatically if there not part of a service pack (this is the case when the guest OS and the host OS differ or when you can’t install the SP in the guest for some application compatibility reason). You can do this in bulk using SCVMM and of cause Scripting this with PowerShell comes to the rescue here again, especially when dealing with hundreds of virtual machines in multiple large clusters. Orchestration via System Center Orchestrator can also be used. Integration with WSUS would be another nice option, for those that don’t have Configuration Manager or Orchestrator but that’s not supported as far as I know for now.