November 2019 updates caused Windows Server 2012 reboot loop

Introduction

Some laggards that still have some Windows Server 2012 virtual machines running got a bit of a nasty surprise last weekend. A number of them went into a boot loop. Apparently the November 2019 updates caused Windows Server 2012 reboot loop. Well, we have dealt with update issues before like here in Quick Fix Publish : VM won’t boot after October 2017 Updates for Windows Server 2016 and Windows 10 (KB4041691). No need to panic.

Symptoms

Not all Windows Server 2012 virtual machines were affected. We did not see any issues with Windows Server 2012 R2, 2016 or 2019. Well the symptom is a reboot loop and below I have a sequence of what it looked like visually.

November 2019 updates caused Windows Server 2012 reboot loop
Restarting
Looking good so far …
OK, stage 2 of 4 … stlil seems OK.
November 2019 updates caused Windows Server 2012 reboot loop
Could still be OK … but no, its starts again in an endless loop.

So what caused this Windows Server 2012 reboot loop?

Fix

We turned of the virtual machines on which the November 2019 updates caused Windows Server 2012 reboot loop. We started them up again in “Safe mode” which completed successfully. Finally, we then did a normal reboot and that completed as well. All updates had been applied bar one. That was the 2019-11 Servicing Stack Update for Windows Server 2012 (KB4523208).

We manually installed it via Windows update and that succeeded.

When reading the information about this update https://www.catalog.update.microsoft.com/ScopedViewInline.aspx?updateid=5fa2a68f-e7cd-43c7-a48a-5e080472cb77 its states it need to be installed exclusively.

Maybe that was the root cause. It got deployed via WSUS with the other updates for November 2019.

Anyway, all is well now. I remind you that we have at least 2 ways of restoring those virtual machines. In case we had not been able to fix them. Have known good backups and a way to restore them people.

Conclusion

We fixed the issue and patched those servers completely. So all is well now. Except for the fact that they now , once more, have been urged to get of this operating system asap. You don’t go more than N-2 behind. It incurs operational overhead and risks. They did not test updates against this old server VM and got bitten. Technology debt without a plan is never worth it.

Manually Merging Hyper-V Checkpoints

A Last-ditch Effort

First of all you need to realize this might not work. It’s a last-ditch effort. There is a reason why you have backups (with tested restores) and why you should monitor your environment for things that are not as they should be. Early intervention can pay off.

Also, see blog post on a couple of more preferred actions.

If you have lost checkpoints, you have basically lost data and corruption/data inconsistencies are very much a possibility and reality. If the files have been copied and information about what file is the parent the dates/timestamps are what you have to go by. You might not know for sure if you have them all.

Setting up the demo

For demo purposes, we take a test VM and ad files to indicate what checkpoint we’re at.

We start with ORGINAL.TXT on the desktop and we create a checkpoint, which we rename accordingly.

image

We add a file called CHECK01.TXT and we create a checkpoint, which we rename accordingly.

image

We add a file called CHECK02.TXT and we create a checkpoint, which we rename accordingly.

image

We add a file called NOW.TXT no more checkpoints are taken.

image

The file names represent the content you’d see disappear if you applied the checkpoint and we have reflected this in the name for the checkpoints.

image

As we want to merge all the snapshots and end up with a usable VHDX we’ll work back from the most recent differencing disk until all is merged. As you can see this is a straightforward situation and I hope you’ll never be running having to deal with a vast collection of subtrees.

Finding out what are the parents of avhdx files

In this demo, it’s pretty obvious what snapshots exist and what avhdx files they represent. We’ve even shown you the single tree visualized in Hyper-V Manager. In reality, bad things have happened and you don’t see this information anymore. So you might have to find out yourself. This is done via inspect disk in Hyper-V manager. If you’re confused about what the parent is of (a)vhdx files this tool will help you find out or show you what the most recent one was.

image

Sometimes the original files have been renamed or moved and that it will show you the last known valid parent.

image

Manually Merging the checkpoints

Remember to make a copy of all files as a backup! Also, make sure you have enough free disk space … you need working space! You might need another shot at this. As we want to merge all the snapshots and end up with a usable vhdx we’ll work back from the most recent differencing disk until all is merged in the oldest one which is the vhdx. You can look at the last modified timestamps to find out the correct order in which to work. The most recent avhdx is the one used in the virtual machine configuration file and locate the information for the virtual hard disk.

image

The configuration file’s avhdx is the one containing the “NOW” running state of the VM.

Note: You might find some information that you need to rename the extension avhdx to vhdx (or avhd to vhd). The reason for this was that in Windows 2008 Hyper-V Manager did not show avhd files in the Edit virtual disk wizard. You can still do this and it will still works, but you do not need to. Ever since Windows Server 2008 R2 avhd (and with since Windows Server 2012 avhdx) files do show up in Hyper-V Managers Disk edit.

For some insights as to why the order is important read this blog by Ben Armstrong What happens when a snapshot is being merged? [Hyper-V]

WARNING: If you did not start with the most recent one and work your way down, which is the easiest and least confusing way all is not lost. But you will have to reconnect the first more recent (a)vhdx to one to its new parent. This is needed as by merging a snapshot out of order more recent one will have lost it’s will have lost its original parent.

Here’s how to do this: Select reconnect. This is the page you’ll get if you’d start edit disk wizard as all other options are unavailable due to the missing parent.

image

The wizard will tell you what used to be the parent and allow you to select a new one. Make sure to tick the check box for Ignore ID mismatch or the reconnect will fail as you’re previous out of order merge has created a new a(vhdx). If your in this pickle by renaming (a)vhdx files or after a copy this isn’t needed by the way.

image
Follow the wizard after that and when your done you can launch the edit disk wizard again and perform a merge. It’s paramount that you do not mix up orders when doing so that you reconnect to the parent this or you’ll end up in a right mess. There are many permutations, keep it simple!. Do it in order. If you start having multiple checkpoint trees/subtrees things can get confusing very fast.

You might also have to reconnect if the checkpoints have lost their connection the what they know to be their last parent for other reasons. In that case, you do this and when that’s done, you merge. Rinse and repeat. The below walk-through assumes you have no reconnects to be done. If so it will tell you like in the example above.

Walk through:

Open the Edit Disk Wizardimage

Select the most recent avhdx & click “Next”

image

image

We choose to merge the avhdx

image

In our case into its parent disk

image
Verify the options are correct and click “Finish”

image

Let the wizard complete

image

That’s it. You’ve merged the most recent snapshot into it’s parent. That means that you have not lost the most recent state of the virtual machine as when it was running before you shut it down. This can be verified by mounting the now most recent avhdx and looking at the desktop for my user profile. You can see the NOW.txt text file is there!

OK, dismount the avhdx and now it’s rinse and repeat.

image

image

You do this over and over again until your merge the last avhdx into the vhdx.

image

Than you have the vhdx you will use to create a new virtual machine.

image

Make sure you get the generation right.

image

Assign memory

image

Connect to the appropriate virtual switch or not if you’re not ready to do this yet

image

Use your vhdx disk that’s the remaining result of your merging efforts

image

image

When you boot that virtual machine you’ll see that all the text files are there. It’s as if you’ve deleted the checkpoints in the GUI and retained “NOW” in the vhdx.

image

image

Last but not least, you can use PowerShell or even DiskPart for this but I found that most people in this pickle value a GUI. Use what you feel most comfortable with.

Thanks for reading and hope this helps someone. Do remember “big boy” rules apply. This is not safe, easy or obvious in each and every situation so you are responsible for everything you do in your environment. If your in to deep, way over your head, etc. call in some expert help.

3 Ways To Deal With Lingering Hyper-V Checkpoints Formerly Known as Snapshots

Lingering or phantom Hyper-V checkpoints or snapshots

Once in a while the merging of checkpoints, previously known as snapshots, in Hyper-V goes south. An example of this is when checkpoints are not cleaned up and the most recent avhdx or multiple of these remains in use as active virtual disk/still even as you don’t see them anymore as existing in the Hyper-V Manager UI for example. When that happens you can try looking at the situation via PowerShell to see if that show the same situation. Whatever the cause, once in while I come across virtual machines that have one or more avhdx (or avdh) active that aren’t supposed to be there anymore. In that case you have to do some manual housekeeping.

Now please, do not that in Windows Server 2012(R2) Hyper-V replica is using checkpoints and since Windows Server 2012 R2 backups also rely on this. Just because you see a snapshot you didn’t create intentionally, don’t automatically think they’re all phantoms. They might exits temporarily for good reason Winking smile. We’re talking about dealing with real lingering checkpoints.

Housekeeping

Housekeeping comes in a couple of variants form simply dusting of to industrial cleaning. Beware of the fact that the latter should never be a considered a routine operation. It’s not a normal situation. It’s a last ditch resort and perhaps you want to call support to make sure that you didn’t miss anything else.

Basically you have tree options. In order of the easiest & safest to do first these are:

  1. Create a new checkpoint and delete it. Often that process will take care of merging the other (older) lingering avhd/avhdx files as well. This is the easiest way to deal with it and it’s as safe as it gets. Hyper-V cleans up for you, you just had to give it a kick start so to speak.
  2. Shut down the VM and create a new checkpoint. Export that newly created checkpoint. Yes you can do that. This will create a nicely exported virtual machine that only has the relevant vhd/vhdx files and no more checkpoints (avhd/avhdx). Do note that this vhd/vhdx is dynamically expanding one. If that is not to your liking you’ll need to convert it to fixed. But other than that you can dump the old VM (don’t delete everything yet) and replace it by importing the one you just exported. For added security you could first copy the files for save guarding before you attempt this. image
  3. Do manual mergers. This is a more risky process & prone to mistakes. So please do this only on a copy of the files. That way you’ll give Microsoft Support Services a fighting change if things don’t work out or you make a mistake. Also note that in this case you get one or more final VHDX files which you’ll use to create a new virtual machine with to boot from. It’s very hands on.

So that’s the preferred order of things to try/do in regards to safety. The 3rd option, is the last resort. Don’t do it before you’ve tried options 1 and 2. And as said above, if you do need to go for option 3, do it on copies.If you’re unsure on how to proceed with any of this, get an expert involved.

There’s actually another option which is very save but not native to Hyper-V. In the running virtual machine which current state you want to preserve do a V2V using Disk2vhd v2.01. Easy and sort of idiot proof if such a thing exists.

In a next blog post I’ll walk you through the procedure for the 3rd option. So if this is your last resort you can have practiced it before you have to use it in anger. Bit please, if needed, and do make sure it’s really needed as discussed above, try 1 first. If that doesn’t do it. Then try option 2. If that also fails try option 3. Do not that for option 2 and 3 you will have to create a new virtual machine with the resulting VHDX, having the required settings documented will help in this case.

DELL Has Great Windows Server 2012 R2 Feature Support – Consistent Device Naming–Which They Help Develop

The issue

Plug ‘n Play enumeration of devices has been very useful for loading device drivers automatically but isn’t deterministic. As devices are enumerated in the order they are received it will be different from server to server but also within the system. Meaning that enumeration and order of the NIC ports in the operating system may vary and “Local Area Connection 2” doesn’t always map to port 2 on the  on board NIC. It’s random. This means that scripting is “rather hard” and even finding out what NIC matches what port is a game of unplugging cables.

Consistent Device Naming is the solution

A mechanism that has to be supported by the BIOS was devised to deal with this and enable consistent naming of the NIC port numbering on the chassis and in the operating system.

But it’s even better. This doesn’t just work with on board NICs. It also works with add on cards as you can see. In the name column it identifies the slot in which the card sits and numbers the ports consistently.

In the DELL 12th Generation PowerEdge Servers this feature is enabled by default. It is not in HP servers for some reason, you need to turn in it on manually.

I first heard about this feature even before Windows Server 2012 Beta was released but as it turns out Dell has been involved with the development of this feature. It was Dell BIOS team members that developed the solution to consistently name network ports and had it standardized via PCI SIG.  They also collaborated with Microsoft to ensure that Windows Server 2012 would support all this.

Here’s a screen shot of a DELL R720 (12th Generation PowerEdge Server) of ours. As you can see the Consistent Device Naming doesn’t only work for the on broad NIC card. It also does a fine job with add on cards of which we have quite a few in this server.image

It clearly shows the support for Consistent Device Naming for the add on cards present in this server. This is a test server of ours (until we have to take it into production) and it has a quad 1Gbps Intel card, a dual Intel X520 DA card and a dual port Mellanox 10Gbps RoCE card. We use it to test out our assumptions & ideas. We still need a Chelsio iWarp card for more testing mind you Winking smile

A closer look

This solution is illustrated the in the “Device Name column” in the screen shot below. It’s clear that the PnP enumerated name (the friendly name via the driver INF file) and the enumerated number value are very different from the number in Name column ( NIC1, NIC2, NIC2, NIC4) even if in this case where by change the order is correct. If the operating system is reinstalled, or drivers changed and the devices re-enumerated, these numbers may change as they did with previous operating systems.

image

The “Name” column is where the Consistent Device Naming magic comes to live. As you can see you are able to easily identify port names as they are numbered consistently, regardless of the “Device Name” column numbering and in accordance with the numbering on the chassis or add on card. This column name will NEVER differ between identical servers of after reinstalling a server because it is not dependent on PnP. Pretty cool isn’t it! Also note that we can rename the Name column and if we choose we can keep the original name in that one to preserve the mapping to the physical hardware location.

In the example below thing map perfectly between the Name column and the Device Name column but that’s pure luck.image

On of the other add on cards demonstrates this perfectly.image