UPDATE: Microsoft posted an SQL Clean Up script to deal with this issue. Not exactly a fix and let’s hope it gets integrated into SCVMM vNext 🙂 Look at the script here http://blogs.technet.com/b/m2/archive/2010/04/16/removing-missing-vms-from-the-vmm-administrator-console.aspx. There is a link to this and another related blog post in the newsgroup link at the bottom of this article as well.
I’ve seen an annoying hick up in SCVMM 2008 R2 (November 2009) in combination with Hyper-V R2 Live migration two times now. In both cases a Blue Screen (due to the “Nehalem” bug http://support.microsoft.com/kb/975530) was the cause of this. Basically when a node in the Hyper-V cluster blue screens you can end up with some (never seen all) VM’s on that node being is a failed/missing state. The VM’s however did fail over to another node and are actually running happily. They will even fail back to the original node without an issue. So, as a matter of fact, all things are up and running. Basically you have a running VM and a phantom one. There are just multiple entries in different states for the same VM. Refreshing SCVMM doesn’t help and a repair of the VM is not working.
While it isn’t a show stopper, it is very annoying and confusing to see VM guest in a missing state, especially since it the VM is actually up and running. You’re just seeing a phantom entry. However be careful when deleting the phantom VM as you’ll throw away the running VM as well they point to the same files.
Removing the failed/Orphaned VM in SCVMM is a no go when you use shared storage like for example CSV as it points to the same files as the running one and it is visible to both the good VM node and the phantom one. Meaning it will ruin your good VM as well.
Snooping around in the SCVMM database tables revealed multiple VM’s with the same name but with separate GUIDS. In production it’s really a NO GO to mess around with the records. Not even as a last resort because we don’t know enough about the database scheme and dependencies. So I have found two workarounds that do work (used ‘m both).
- Export the good VM for save keeping, delete the missing/orphaned VM entry in SCVMM (one taking the good one with it if you didn’t export it) and import the exported VM again. This means down time for the VM guest.
- Remove the Hyper-V cluster from VMM and re add it. This has the benefit that it creates no down time for the good VM and that the bad/orphaned one is gone.
Searching the net didn’t reveal much info but I did find this thread that discusses the issue as well http://social.technet.microsoft.com/Forums/en-US/virtualmachinemanager/thread/1ea739ec-306c-4036-9a5d-ecce22a7ab85 and this one http://social.technet.microsoft.com/Forums/en/virtualmachinemgrclustering/thread/a3b7a8d0-28dd-406a-8ccb-cf0cd613f666
I’ve also contacted some Hyper-V people about this but it’s a rare and not well-known issue. I’ll post more on this when I find out.