Upgrading MySQL 5.7.21 to 8.0.11

Upgrading MySQL 5.7.21 to 8.0.11 for WordPress 4.9.6

The process of Upgrading MySQL 5.7.21 to 8.0.11 for WordPress 4.9.6 is actually quite easy (official MySQL documentation). And as I maintain my WordPress version, plugins regularly that’s not big of a deal for me.

Prepare the upgrade

  1. Backup your VM
  2. Backup your MySQL database
  3. Verify the restores work
  4. I also copy the data folder (in my case C:\MySQLDataFolder\Data) which I keep separate from the MySQL installation files as it helps me with upgrades. The only things that are in the C:\Program Files\MySQL\MySQL Server X.Y folder (I use the x64 bits) are the MySQL application files and the my.ini file.
  5. Create the C:\Program Files\MySQL\MySQL Server 8.0 (x64 bit version of MySQL, otherwise use the C:\Program Files (x86) folder). Copy the content of the  zip file with MySQL files and folders in there.

Upgrading MySQL 5.7.21 to 8.0.11 for WordPress 4.9.5

I then copy the my.ini file form the current installation (C:\Program Files\MySQL\MySQL Server 5.7. into the C:\Program Files\MySQL\MySQL Server 8.0 as well. It might be necessary to edit this file a bit more later but I start out with and exact copy and one change to point to the new basedir:  5.7 becomes 8.0 in this case.

As you notice, I don’t keep the MySQL Data and the ini file in the ProgramData folder on windows. It’s fine to leave it all there, if you prefer that.

# Path to installation directory. All paths are usually resolved relative to this.
basedir=”C:/Program Files/MySQL/MySQL Server 8.0/”

Perform the upgrade

  • Stop the MySQL Service:NET STOP MYSQL
  • I the remove the service form the OS:
    mysqld –remove
  • Install the MySQL service again, now with the new version path. As you see I explicity specify the default name of the service as MySQL and point it to where my ini file lives so I know which ini files this service uses.
    “C:\Program Files\MySQL\MySQL Server 8.0\bin\mysqld.exe” –install MySQL –defaults-file=”C:\Program Files\MySQL\MySQL Server 8.0\my.ini”
  • Start the MySQL Service
    NET START MYSQL

If all goes well that’s it, your new MySQL version is running. If so we, can jump to the part where you run the upgrade command to upgrade the system and user databases.  You can verify all went well in the error log. The name (mine is called WORKINGHARDINIT.err) as defined in the my.ini and is to be found under the data folder. Any problems will be logged there as well. This approach makes it easy to go back if the service won’t start as all files of the previous MySQL install are still there and you just have to install it as a service again.

Most common issues I have seen

My.ini file mistakes

The things that go wrong the most often and cause the MySQL service not to start -based on some of the support I have given to some people (including myself) are the following: certain options in your ini file are not compatible with the MySQL version you just installed.

Specifically for MySQL 8.0 make sure you comment out query_cache_size=0 (put a # in front of it) or remove the entry from the my.ini file.
#query_cache_size=0

If not the MySQL service won’t start. The error logged is:
[ERROR] [MY-011071] [Server] unknown variable ‘query_cache_size=0’

Next to that if you have the sql-mode entry in there this some times causes issues, so comment out that line as well. at least remove the offending entry, which might take some trial and error.
# sql-mode=”STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION”

As you can see the error log is the pointer to many issues with the MySQL service failing to start so do look there, that’s how you find what to do.

Data file ibdata1 is not writable

Sometimes you can see an error like this:
ERROR] [MY-012271] [InnoDB, InnoDB: The innodb_system data file ‘ibdata1’ must be writable

The 2 most common reasons are that the file is locked because you started mysqld.exe manually as you didn’t close a process you don’t know about/ cannot find that accesses it will also cause this error. A restart is then normally the easiest solution.

Sometimes people run into this due to permission problems (but with an upgrade this should not really happen). The MySQL service account (the default or the one you created and assigned) the need full control over the data folder. Check that.

Thinking you lost your blog

Another issue might be that MySQL is running but WordPress can’t find your blog. This might be 1 or more missing files such as ibdata1in the data folder root (hence I always make a copy of the entire data folder before I start an upgrade for safe keeping). The service might start but WordPress might not find your blog. In that case you’ll see a lot o warning similar to this in the error log:

Warning] InnoDB: Cannot open table wordpress592/wp_options from the internal data dictionary of InnoDB though the .frm file for the table exists.

So if these files are gone or corrupted, place them back from your spare copy or grab them from a backup. Most often these files are gone because some bad advise to delete these to fix a permission issue or a mistake. If they are not there they are recreated with the correct permissions but that causes the above issue.

Once you know the basics and you are careful, an upgrade process is mostly straightforward for most IT enthusiasts. Once MySQL is up and running and you can access your wordpress database again it’s time for the last step. Upgrade the databases.

Perform the MySQL upgrade run

Finally whenever you have done an upgrade don’t forget to run the mysql_upgrade.exe. This will take care of any upgrades needed to you system and user databases. Until you do you’ll see in the error log. If you don’t look there you might not even notice much but it pays to complete the complete upgrade process.

To do so, from an elevated command prompt navigate to C:\Program Files\MySQL\MySQL Server 8.0\Bin and run mysql_upgrade.exe-u root –p
Then enter the password and the upgrade process will kick off. This takes a while and it also depends on the amount of work the script has to do.

It 1st deals with the system database, when it finds out of date issues it will take care of those like in this case the sys schema.

Upgrading MySQL 5.7.21 to 8.0.11 for WordPress 4.9.5

Next it checks the user databases (in my case the word press database),

Upgrading MySQL 5.7.21 to 8.0.11 for WordPress 4.9.5

…  not much work to do here clearly.

Upgrading MySQL 5.7.21 to 8.0.11 for WordPress 4.9.5

That’s it. You’re all up to date. I normally do this about once or twice per year to make sure the blog server is up to date (performance, security, capabilities) & I don’t introduce tech debt and potentially more involved and risky upgrade scenarios. With a free community edition, that’s a zero cost game,bar a little effort on your part.

As a final note, when everything has proven to be up and running as it should do some housekeeping and clean up the old files / folders you no longer need of previous MySQL version installs.

Latency kills

Introduction

I was investigating a very problematic Windows Server 2016 Hyper-V cluster. That cluster was just performing horribly. “Everything” was hanging, stalling, crashing and RHS.exe errors where flying around while WER dumps got created by the dozen. Things were extremely slow up to the points functionality was just failing. The “fun” thing was that the cluster validation wizard while slow gave that cluster a big thumb up and a supported status as all was well.

Prying around

Time to pry around a bit and see if we could find something wrong. We save live migrations stall, fail, last forever in pending or get stuck at a certain percentage, sometimes finally succeeding with ridiculous blackout times. We could not open up virtual machine properties or very slowly. The FCI GUI was highly unresponsive but so was the Hyper-V Manager GUI or even PowerShell. Those were hanging at even loading the virtual machines or enumerating them with Get-VM. Everything was slow to the point it timed out or crashed. Restarting the services (Cluster, Hyper-V) didn’t do anything and restarting VMMS was super slow or just got stuck. It was a depressing sight for which people tended to blame Hyper-V / Microsoft.

As the title gives away it was latency. Not just ordinary high latency. Real bad latency. That kind of latency kills. Extreme latency produces symptoms that are similar to bugs or corrupt components of roles and features. We have a tendency to look at those first in the event logs and then we look at the network and its usual suspects (VMQ, SET, DCB). But nothing pointed to an issue that I could find.

So, storage maybe?Well we did find one Hyper-V host in the cluster with one HBA port producing too many error so we disabled that FC port for testing. No joy the Hyper-V cluster after a clean reboot of all nodes remained problematic. So on to the storage array itself.

Well holy smoke! On the two volumes for CSV in those cluster we saw latencies that were so bad I could not even believe a single VM would boot. It actually made my appreciation for Hyper-V and clustering grow as it managed to do at least a couple of things. With such latencies I would expect the services to just crash & call it a day.

clip_image002

The horrific latency on one of the CSV LUNs.

Looking at the logs we saw that the latencies occurred on the FC HBAs of the controllers. Each one above 50ms, peaking to 150-250ms and one huge peak at almost 500ms. We saw this on all four HBA’s.

clip_image004

The latency on one of the 4 FC HBA’s on one of the controllers. Not a good day. All HBAs had high latencies like this.

The issues were not at the host level (host HBA’s) or not even at the IOPS/bandwidth level of the storage itself. The latency for some reason was spiking. Further investigation lead to the conclusion that the issue was related to synchronous replication going totally wrong. Moving the replication mode to asynchronous fixed that. We’re now investigating why this happened and how to prevent this from happening again. But that’s another story.

clip_image006

Latency on one of the 4 FC HBAs on one of the controllers after we fixed the issue.

Do not assume anything

So, there you go. Everything depends on everything in some direct or indirect way. It’s all connected and that my friends, is why I’m a proponent of “service resilience engineering” where the responsible team owns the entire stack. That’s is how you can act fast.

Frustrations about host level backups of Hyper-V guest clusters with Windows Server 2016

Introduction

With Windows Server 2016 came the hope and promise of improved backups for Hyper-V environments. And indeed Microsoft delivered on that and has given us faster, more scalable and more reliable backups. With VHD sets also came the promise of host based backups for guest clusters.

The problem is that this promise or, as it is perhaps better to be mild and careful, that expectation has not been met. Decent, robust host based backups of guest clusters in Windows Server 2016 are still not a reality. For me this means it blocked a few scenarios and we’re working on alternatives. This is a missed opportunity I think for MSFT to excel at virtualization.

The problem

Doing host based backup of guest clusters with VHD Set disks is supported in Windows Server 2016 under certain conditions.

At RTM it became clear that CSV inside the guest cluster was not supported.

You need a healthy cluster with all disks one line

These requirements are reflected in Errors discovered during backup of VHDS in guest clusters

Error code: ‘32768’. Failed to create checkpoint on collection ‘Hyper-V Collection’

Reason: We failed to query the cluster service inside the Guest VM. Check that cluster feature is installed and running.

Error code: ‘32770’. Active-active access is not supported for the shared VHDX in VM group

Reason: The VHD Set disk is used as a Cluster Shared Volume. This cannot be checkpointed

Error code: ‘32775’. More than one VM claimed to be the owner of shared VHDX in VM group ‘Hyper-V Collection’

Reason: Actually we test if the VHDS is used by exactly one owner. So having 0 owner also creates this error. The reason was that the shared drive was offline in the guest cluster

Unfortunately, this is not the only problems people are facing. Quite often the backup software doesn’t support backing up VHD Sets or when it does they fail. Some of those failings like being unable to checkpoint the VHD Set have been addressed via Windows Updates. But there are others issues.

Let’s look at the two most common ones.

Issue 1

You can make one backup an all subsequent backups fail. This is due to the avhdx files being in used and locked. This means that as long as the cluster is up and running the recovery checkpoint chain keeps growing. This can be “cleaned” or merged but only by taking down the cluster.

At the first backup live seems good.

image

The recovery checkpoint as a collection is indeed working.

clip_image003

All attempts at another backup fail.

clip_image005

Shutting down all cluster VMs and starting them up again does merge the recovery checkpoints.

Issue 2

You can make backups, successfully but the recovery checkpoints never get merged. clip_image007

This sounds “better” but it isn’t. There is no way to merge the checkpoint. Manually merging the checkpoints of a VHD Set is bad voodoo.

Both situations get you into problems and I have found no solution so far. At the time or writing I’m back at the “never ending” recovery checkpoint chain situation. But that can change back to the 1st issue I guess. Sigh.

I have found no solution so far

For now I have been unable to solve these problem. There is no fix or even a workaround. The only to get out if this stale mate is to shut down every node of the guest clusters and then restart them all. Just a restart of the guest nodes of the cluster doesn’t do the trick of releasing the checkpoints files and merging them. While this allows you to take one backup successfully again, the problem returns immediately. For you reference that was my issue with the October 2017 CU (KB)

The other scenario we run into is that the backups do work but the recovery checkpoints never ever merge. Not even when you shut down the all the guest VM cluster nodes and start them. With frequent backup that turns into a disaster of a never ending chain of recovery checkpoints. This is actually the situation I was in again after the November 2017 updates on both guests & hosts (KB4049065: Update for Windows Server 2016 for x64-based Systems and KB4048953: 2017-11 Cumulative Update for Windows Server 2016 for x64-based Systems).

To me this situation is blocking the use of guest clustering with VHD Sets where a backup is required. For many reasons we do not wish to go the route of iSCSI or vFC to the guest. That doesn’t cut it for us.

Conclusion

Host level backups of guest clusters in Windows Server 2016 are still a no go. This despite the good hopes we had with VHD Sets to address this limitation and which we were eagerly awaiting. For many of us this is a show stopper for the successful virtualization guest clusters. Every month we try again and we’re not getting anywhere. Hence the frustration and the disappointment.

More than 1 year after Windows Server 2016 RTM we still cannot do consistent host level backup a Hyper-V guest cluster, not even those without CSV, but also not those with standard clustered disks. Trust me on the fact that many of us have given this feedback to Microsoft. They know and I suggest you keep voicing your concerns to them in order to keep it on their radar screen and higher on the priority list. You can do this by opening support calls and by asking for it on user voice. Please Microsoft, we need these workloads to be first class citizens. I’m clearly not the only unhappy camper out there as noticeable in various support forums: Cannot create checkpoint when shared vhdset (.vhds) is used by VM – ‘not part of a checkpoint collection’ error and Backing up a Windows Failover Cluster with Shared vhdx?

Quick Fix Publish : VM won’t boot after October 2017 Updates for Windows Server 2016 and Windows 10 (KB4041691)

If you had WSUS (or SCCM) running tonight with auto approval on you might have woken up this morning to virtual machines that can boot anymore.

image

Great, another update gone wrong. Time to restore from backup as that can be the fasted way to restore services when in a pickle and if you have a good solutions for that in place. For the others you can do what I did is below. Actually a couple of us MVPs were on this issue at a number of sites as our fist task this morning. But first the root cause.

Well read this link Express update delivery ISV support and you have all you need. Basically the delta and the full cumulative update of October (KB4041691 – https://support.microsoft.com/en-us/help/4041691)  ended up in WSUS without you explicitly putting it there. That should not happen, normally the delta is not published for it to be downloaded and heaven forbid auto approved.  You could also have manually approved everything without really knowing what and why. Not a great idea at all.

image

So your VM get’s offered both of them and that is BAD!

image

Normally you get into this pickle if you some how managed to install both of these yourself or via other tools (see the link above), which you shouldn’t do.

Now if you don’t have decent restore capabilities from backups or snapshots there is another way out by removing the updates.

Boot into the problematic VM and select troubleshoot

image

Select to open the command prompt and stay away from any other auto repair options.

image

Microsoft advises to get rid of the SessionsPending reg key. To do so load the software registry hive as follows:

reg load hklm\temp c:\windows\system32\config\software

Delete the SessionsPending registry key, if it exists by running:

reg delete “HKLM\temp\Microsoft\Windows\CurrentVersion\Component Based Servicing\SessionsPending” /v Exclusive

Unload the software registry hive:

reg unload HKLM\temp

Run dism /image:c:\ /get-packages to find the updates installed that caused the issue

image

The yellow one are the ones of interest and you can see the first one never even got an install time/

We now use DISM to remove these updates.  Do first create the C:\Temp folder with MD temp if it doesn’t exist yet!

dism /image:c:\ /remove-package /packagename:myproblematicpackagetoremove /scratchdir:c:\temp

image

When done, close the command prompt, shut down the VM and then start it.

image

It will take a while but if will succeed and you’ll be greeted by a logon screen. Good luck!

Important: Do not try any other repair options or removing the updates with DISM might fail. We choose to remove all 3 updates from tonight to make sure. It might suffice to remove the delta one alone but we wanted to have an VM back as it was last night so more testing can be done before it is deployed again.

So, basically, don’t auto approve updates blindly, but test, validate & roll out in phases. Have great backup and TESTED restores. All by all we were only bitten in the lab, a couple of test/dev VMs and some of our infra VMs. Most of these are redundant and are patched stagger so our services were never badly effected. That gave us time to trouble shoot and investigate and warn our colleagues. As you can see here the issue was a delta update that made it into WSUS and was installed together with the full CU. Just manually downloading the CU and testing it would not have given you the heads up. About an issue. This is a reminder you need to test your real live situation and processes as realistically as possible. When you’re done with testing and cleaning up any fallout of this issue, make sure to patch your systems again!

Update: this also goes for Windows 10 Updates

Also see fellow MVP Mikael Nystrom blog post  https://deploymentbunny.com/2017/10/11/the-october-2017-update-inaccessible-boot-device/

Update: we now also have the official MSFT response & fix for each and every scenario right here https://support.microsoft.com/en-us/help/4049094/windows-devices-may-fail-to-boot-after-installing-october-10-version-o