Microsoft Azure AD Sync service fails to start – event id 528

IMPORTANT UPDATE: Microsoft released Azure AD Connect 2.1.1.0 on March 24th, 2022 which fixes the issue described in this blog post). You can read about it here Azure AD Connect: Version release history | Microsoft Docs The fun thing is they wrote a doc about how to fix it on March 25th, 2022. The best option is to upgrade to AD Connect 2.1.1.0 or higher.

IMPORTANT UPDATE 2: Upgrade to version 2.1.15.0 (or higher) as that version also addresses LocalDB corruption issues!
Introduction

On Windows Server 2019 and Windows Server 2022 running AD Connect v2, I have been seeing an issue since October/November 2021 where Microsoft Azure AD Sync service fails to start – event id 528. It does not happen in every environment, but it does not seem to go away when it does. It manifests clearly by the Microsoft Azure AD Sync service failing to start after a reboot. If you do application-consistent backups or snapshots, you will notice errors related to the SQL Server VSS writer even before the reboot leaves the Microsoft Azure AD Sync service in a bad state. All this made backups a candidate for the cause. But that does not seem to be the case.

Microsoft Azure AD Sync service fails to start - event id 528
Microsoft Azure AD Sync service fails to start – event id 528

In the application event log, you’ll find Event ID 528 from SQLLocalDB 15.0 with the below content.

Windows API call WaitForMultipleObjects returned error code: 575. Windows system error message is: {Application Error}
The application was unable to start correctly (0x%lx). Click OK to close the application.
Reported at line: 3714.

Getting the AD Connect Server operational again

So, what does one do? Well, a Veeam Vanguard turns to Veeam and restores the VM from a restore point that a recent known good AD Connect installation.

But then the issue comes back

But then it comes back. Even worse, the AD Connect staging server suffers the same fate. So, again, we restore from backups. And guess what, a couple of weeks later, it happens again. So, you rebuild clean AD Connect VMs, and it happens again. We upgraded to every new version of AD Connect but no joy. You could think it was caused by failed updates or such, but no.

The most dangerous time is when the AD Connect service restarts. Usually that is during a reboot, often after monthly patching.

Our backup reports a failure with the application consistent backup of the AD Connect Server, often before Azure does so. The backup notices the issues with LocalDB before the AD Sync Service fails to start due to the problems.

The failing backups indicate that there is an issue with the LoclaDB database …

However, if you reboot enough, you can sometimes trigger the error. No backups are involved, it seems. That means it is not related to Veeam or any other application consistent backup. The backup process just stumbles over the LocalDB issue. It does not cause it. The error returns if we turn off application-consistent backups in Veeam any way. We also have SAN snapshots running, but these do not seem to cause the issue.

We did try all the tricks from an issue a few years back with backing up AD Connect servers. See https://www.veeam.com/kb2911 but even with the trick to prevent the unloading of the user profile
COM+ application stops working when users logs off – Windows Server | Microsoft Docs we could not get rid of the issue.

So backups, VSS, it seems there is a correlation but not causation.

What goes wrong with LocalDB

After a while, and by digging through the event and error logs of a server with the issue, we find that somehow, the model.mdf and model.ldf are toast for some inexplicable reason on a pseudo regular basis. Below you see a screenshot from the C:\Windows\ServiceProfiles\ADSync\AppData\Local\Microsoft\Microsoft SQL Server Local DB\Instances\ADSync2019\Error.log. Remember your path might differ.

That’s it, the model db seems corrupt for some reason.

You’ll find entries like “The log scan number (37:218:29) passed to log scan in database ‘model’ is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf).”

Bar restoring from backup, the fastest way to recover is to replace the corrupt model DB files with good ones. I will explain the process here because I am sure some of you don’t have a recent, good know backup.

Sure, you can always deploy new AD Connect servers, but that is a bit more involved, and as things are going, they might get corrupted as well. Again, this is not due to cosmic radiation on a one-off server. Now we see it happen sometime three weeks to a month apart, sometimes only a few days apart.

Manual fix by replacing the corrupt model dd files

Once you see the  SQLLocalDB event ID 528 entries in the application logs when your Microsoft Azure AD Sync service fails to start, you can do the following. First, check the logs for corruption issues with model DB. You’ll find them. To fix the problem, do the following.

Disable the Microsoft Azure AD Sync service. To stop the service that will hang in “starting” you will need to reboot the host. You can also try and force kill ADSync.exe via its PID

Depending on what user account the AD Sync Service runs under, you need to navigate to a different path. If you run under NT SERVICE\ADSync you need to navigate to

Microsoft Azure AD Sync service fails to start - event id 528
The account the Microsoft Azure AD Sync service runs under

C:\Windows\ServiceProfiles\ADSync\AppData\Local\Microsoft\Microsoft SQL Server Local DB\Instances\ADSync2019

Welcome to the home of the AD Connect LocalDB model database

If you don’t use the default account but another one, you need to go to C:\Users\ YOURADSyncUSER\AppData\Local\Microsoft\Microsoft SQL Server Local DB\Instances\ADSync2019

Open a second explorer Windows and navigate to C:\Program Files\Microsoft SQL Server\150\LocalDB\Binn\Templates. From there, you copy the model.mdf and modellog.ldf files and paste those in the folder you opened above, overwriting the existing, corrupt model.mdf and model.ldf files.

You can now change the Microsoft Azure AD Sync service back to start automatically and start the service.

If all goes well, the Microsoft Azure AD Sync service is running, and you can synchronize to your heart’s content.

Conclusion

If this doesn’t get resolved soon, I will automate the process. Just shut down or kill the ADSync process and replace the model.mdf and model.ldf files from a known good copy.

Here is an example script, which needs more error handling but wich you can run manually or trigger by monitoring for event id 528 or levering Task Scheduler. As always run this script in the lab first. Test it, make sure you understand what it does. You are the only one responsible for what you run on your server! Once you are done testing replace Write-Host with write-output or turn it into a function and use cmdletbinding and param to gain write-verbose if you don’t want all the output/feedback. Bothe those options are more automation friendly.

cls
$SQLServerTemplates = "C:\Program Files\Microsoft SQL Server\150\LocalDB\Binn\Templates"
$ADConnectLocalDB = "C:\Windows\ServiceProfiles\ADSync\AppData\Local\Microsoft\Microsoft SQL Server Local DB\Instances\ADSync2019"

Write-Host -ForegroundColor Yellow "Setting ADSync startup type to disabled ..."
Set-Service ADSync -StartupType Disabled

Write-Host -ForegroundColor Yellow "Stopping ADSync service  ..."
Stop-Service ADSync -force

$ADSyncStatus = Get-Service ADSync

if ($ADSyncStatus.Status -eq 'Stopped'){
    Write-Host -ForegroundColor Cyan "The ADSync service has been stopped  ..."
}
else {
    if ($ADSyncStatus.Status -eq 'Stopping' -or $ADSyncStatus.Status -eq 'Starting'){
        
        Write-Host -ForegroundColor Yellow "Setting ADSync startup type to disabled ..."
        Set-Service ADSync -StartupType Disabled

        Write-Host -ForegroundColor Red "ADSync service was not stopped but stuck in stoping or starting ..."
        $ADSyncService = Get-CimInstance -class win32_service | Where-Object name -eq 'ADSync'
        $ADSyncProcess = Get-Process | Where-Object ID -eq $ADSyncService.processid

        #Kill the ADSync process if need be ...
        Write-Host -ForegroundColor red "Killing ADSync service processs forcfully ..."
        Stop-Process $ADSyncProcess -Force

        #Kill the sqlserver process if need be ... (in order to be able to overwrite the corrupt model db files)
        Write-Host -ForegroundColor red "Killing sqlserver process forcfully ..."
         $SqlServerProcess = Get-Process -name "sqlservr" -ErrorAction SilentlyContinue
         if($SqlServerProcess){
        Stop-Process $SqlServerProcess -Force}

        }
    }

$ADSyncStatus = Get-Service ADSync
if ($ADSyncStatus.Status -eq 'Stopped'){

    Write-Host -ForegroundColor magenta "Copy known good copies of model DB database to AD Connect LocaclDB path file ..."
    Copy-Item "$SQLServerTemplates\model.mdf" $ADConnectLocalDB

    Write-Host -ForegroundColor magenta "Copy known good copy of model DB log file to AD Connect LocaclDB path ..."
    Copy-Item "$SQLServerTemplates\modellog.ldf" $ADConnectLocalDB


    Write-Host -ForegroundColor magenta "Setting ADSync startup type to automatic ..."
    Set-Service ADSync -StartupType Automatic

    Write-Host -ForegroundColor magenta "Starting ADSync service ..."
    Start-Service ADSync
}

$ADSyncStatus = Get-Service ADSync
if ($ADSyncStatus.Status -eq 'Running' -and $ADSyncStatus.StartType -eq 'Automatic'){
    Write-Host -ForegroundColor green "The ADSync service is running ..."
}
else {
    Write-Host -ForegroundColor Red "ADSync service is not running, something went wrong! You must trouble shoot this"
}


That fixes this cause for when Microsoft Azure AD Sync service fails to start – event id 528. For now, we keep an eye on it and get alerts from the AD Connect health service in Azure when things break or when event id occurs on the AD Connect servers. Let’s see if Microsoft comes up with anything.

IMPORTANT UPDATE: Microsoft released Azure AD Connect 2.1.1.0 on March 24th 2022 which fixes the issue described in this blog post). You can read about it here Azure AD Connect: Version release history | Microsoft Docs The fun thing is the wrote a doc about how to fix it on March 25th 2022. The best option is top upgrade to AD Connect 2.1.1.0 or higher.

PS: I am not the only one seeing this issue Azure AD Sync Connect keeps getting corrupted – Spiceworks

Resolve the 500 Internal Server Error for PHP 7.4 with IIS on Windows Server for WordPress 5.31

Resolve the 500 Internal Server Error for PHP 7.4 with IIS on Windows Server for WordPress

Many of my readers will know I run a tight ship when it comes to keeping my blog infrastructure up to date and as secure as I can. This means, among other things, I keep my operating system recent and patched. The same goes for my MySQL database, PHP versions, and WordPress + plugins.

Recently I wanted to upgrade to PHP 7.4 that was just released and should be compatible with WordPress 5.3.1.

Resolve the 500 Internal Server Error for PHP 7.4 with IIS on Windows Server

But once I switched to PHP 7.4 my site responded with the dreaded 500 Internal Server Error. Instead of going trial and error I decided to zoom in fast to the root cause to resolve the 500 Internal Server Error for PHP 7.4 with IIS on Windows Server.

Troubleshooting steps

The fasted way to figure out what issues PHP has is to open an elevated command prompt and navigate to the install path of your PHP version. For me, that would be C:\Program Files\PHP\php-7.4.0-nts-Win32-vc15-x64. Note that IIS with requires the non thread safe version of PHP.

Once there run php -m

I got greeted by this warning.

PHP Warning: ‘vcruntime140.dll’ 14.0 is not compatible with this PHP build linked with 14.16 in Unknown on line 0

Resolve the 500 Internal Server Error for PHP 7.4 with IIS on Windows Server

That ‘s clear enough. I need to upgrade my Visual C++ runtime. I have but I need a more recent version. Basically the one for Visual ++ 2105-2019. You can grab those here for both x64 and x86.

https://aka.ms/vs/16/release/VC_redist.x64.exe
https://aka.ms/vs/16/release/VC_redist.x86.exe

I tend to install both. Just in case I have an issue with x64 PHP I can try the x86 version to try and resolve it, Normally I default to x64 bit and for the past years that has worked very well.

Once I had installed those my blog sprung back to life and that was it. If you run php -m again you won’t see the warning anymore. No need to switch themes (I have a simple one that has longevity). I also do not have to empirically disable/enable plugins. I select quality, free and modern plugins that I keep up to date or replace when needed. So, for me, that was the fix. Since most troubleshooting guided don’t start here, give it a shot. The HTTP Error 500.0 – Internal Server Error is an annoying one to troubleshoot. The PHP error log might not tell you anything as it goes wrong before it can log anything and enabling more detailed error handling does not always pinpoint something precisely. It might even send you on a wild goose chase.

Resolve the 500 Internal Server Error for PHP 7.4 with IIS on Windows Server
Not very specific is it?

If you did change the feature settings for the error pages for your site to get more detailed information to “Detailed errors” don’t forget to set it back. Configure it back to the default of  “Detailed error for local requests and custom error pages for remote requests” or your custom pages. You don’t want the details shown publically.

That’s it. I hope it helps someone who tries to resolve the 500 Internal Server Error for PHP 7.4 with IIS on Windows Server

Force Mellanox ConnectX-4 Lx 25Gbps to 10Gbps speed

Introduction

As you might remember I wrote a blog post about SFP+ and SFP28 compatibility. In this i discuss future proofing your network investments and not having to upgrade everything all at once. One example is that buying 25Gbps NICs when your main network infrastructure is still on 10Gbps is not an issue. 25Gbps normally handles 10Gbps well so you don’t have do replace all parts in the fabric at the same time but you can start with either the network fabric or the server NICs. It’s a way of future proofing the investments you make today.

When installing Mellanox ConnectX-4 Lx 25Gbps NICs in a bunch of servers we hit an issue when connected them to the DELLEMC N4000 10Gbps switches. The intent is to replace these with 25/50/100Gbps in the future.

The links did not come up.

The links did not come up. The switch ports are normally forced 10 Gbps in our setups so we check that. The speed was indeed set fix to 10Gbps. When changing that to auto-negotiate the link would come up at 1Gbps.

Naturally you check everything from cabling to used transceivers (BCM84754 on the switches) but that all checked out. We also check the firmware on the switches to determine if they were up to date and perhaps a new version fixed a known issue related to this. But no hardware wise everything was up to date on the switches and on the NICs.

Note that these links worked fine when used with 10 Gbps cards like the ConnectX-3 Pro. The DELL branded transceivers on the switches were BCM84754 (Broadcom)

The fix: Force Mellanox ConnectX-4 Lx 25Gbps to 10Gbps speed

I do not need to tell you that when you want 10Gbps getting 1Gbps doesn’t fly well. The fix was easy enough. We put the switch ports back to 10Gbps fixed speed. Auto-negotiate doesn’t deliver. No worries we fix the ports anyway. We then used mlc5cmd.exe Mellanox tool to change the NIC ports from auto-negotiate to fixed.

On hosts with Mellanox Connect-X4 NICs you open an elevated command prompt.

Navigate to C:\Program Files\Mellanox\MLNX_WinOF2\Management Tools. Run the below command to check the current link speed.

mlx5cmd.exe -LinkSpeed -Name “MyNicName ” -Query

Note 10 and 25 Gbps are supported, so it’s autonegotiate.

We force the link speed to 10Gbps:

mlx5cmd.exe -LinkSpeed -Name “MyNicName ” -Set 10

Link speed is forced to 10Gbps

The link comes up at 10Gbps

Likewise you can force the link to 25Gbps. If you want to change it back to the default you can force the link speed to auto-negotiate again.

mlx5cmd.exe -LinkSpeed -Name “MyNicName” -Set 0

See https://community.mellanox.com/s/article/mlx5cmd-exe for more information on this tool.

Do note that the switch port also needs to be set to 10Gbps fixed. As you can see below the command will notify you when those are still on auto.

The change was done but still no uplink when the switch port isn’t fixed to 10Gbps.

Conclusion

So my statement hold true the path to 25/50/100Gbps is one you can do step by step with future proofing. You might run into some issues but these are normally fixable. I have shared with you how to fix failing or wrong speed negotiations on 25 Gbps RDMA NICs (Mellanox ConnectX-4 Lx) when connecting to 10Gbps ports. I’m pretty sure the same holds true for other models. I have also had cards where things work out of the box but don’t give up when you hit an issue. I hope this helps some of you out there.

Windows Server 2019 is a supported guest OS on Windows Server 2016 Hosts

Is this even a concern?

While many of you are probably already running Windows Server 2019 VMs in test and production without a worry a little hiccup in the Microsoft documentation cause some concern. So, yes, it is, or rather, it was. Some people noticed or were told that Windows Server 2019 is not a supported guest OS on Windows Server 2016 Hosts. That was a mistake in the documentation and confused some people and account managers. But, yes. Windows Server 2019 is a supported guest OS on Windows Server 2016 Hosts. No worries!

The documentation mistake has been fixed

When we look at Supported Windows guest operating systems for Hyper-V on Windows Server and GitHub https://github.com/MicrosoftDocs/windowsserverdocs/commit/2c54e781c64e0cc3fec2cef349a762b972987870#diff-5347e6e782aa2be9a9ec94ff6ef0436b today we’ll see that the mistake has been corrected. In good tradition of Hyper-V the host support guest OS versions up to N+1. This means that Windows Server 2019 is a supported guest OS on Windows Server 2016 Hosts.

But until recently you might have seen the below.

This is what caused the concern. It was a simple mistake. So please if someone tells you Windows Server 2019 guests are or might not be supported on a Windows Server 2016 host, tell them to check again and point them to the above links.

Windows Server 2019 is a supported guest OS on Windows Server 2016 Hosts

The good news is that the mistake is fixed and all is well. I’m sorry if your decision makers or managers who got shown those documents before that mistake was fixed got scared but all is well. Windows Server 2019 is a supported guest OS on Windows Server 2016 Hosts. Be happy and start rolling out and upgrading as soon as you have all things like backup covered. I know I am.