Intermittent TLS issues with Windows Server 2012 R2 connecting to SQL Server 2016 running on Windows Server 2016 or 2019

Introduction

Someone asked me to help investigate an issue that was hindering client/server applications. They suffered from intermittent TLS issues with Windows Server 2012 R2 connecting to SQL Server 2016 running on Windows Server 2016 or 2019. Normally everything went fine but one in every 250 to 500 connection of the server-client to a database on SQL Server 2016 they got the error below.

System.ServiceModel.CommunicationException: The InstanceStore could not be initialized. —> System.Runtime.DurableInstancing.InstancePersistenceCommandException: The execution of the InstancePersistenceCommand named {urn:schemas-microsoft-com:System.Activities.Persistence/command}CreateWorkflowOwner was interrupted by an error. —> System.Data.SqlClient.SqlException: A connection was successfully established with the server, but then an error occurred during the login process. (provider: SSL Provider, error: 0 – An existing connection was forcibly closed by the remote host.) —> System.ComponentModel.Win32Exception: An existing connection was forcibly closed by the remote host

Now actually retrying the connection in the code could work around this. Good code should have such mechanisms implemented. But in the end, there was indeed an issue.

Intermittent TLS issues with Windows Server 2012 R2 connecting to SQL Server 2016 running on Windows Server 2016 or 2019

I did a quick verification of any network issues. The network was fine, so I could take this cause of the table. The error itself did not happen constantly, but rather infrequently. All indications pointed to a (TLS) configuration issue. “An existing connection was forcibly closed by the remote host” was the clearest hint for this. But then one would expect this to happen every time.

We also checked that the Windows Server 201 R2 hosts were fully up to date and had either .NET 4.7 or .NET 4.8 installed. These versions support TLS 1.2 without issues normally.

Also, on Windows Server 2012 R2 TLS 1.2 is enabled by default and does not require editing the registry to enable it. You have to do this is you want to disable it and re-enable it.

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.2\Client]
“DisabledByDefault”=dword:00000000

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.2\Server]
“DisabledByDefault”=dword:00000000

These were not present on the Windows Server 2012 R2 host but also not on the Windows Server 2016 or 2019 hosts we use for comparison.

We also check if KB3154520 – Support for TLS System Default Versions included in the .NET Framework 3.5 on Windows 8.1 and Windows Server 2012 R2 is installed. This allows using the operating system defaults for SSL and TLS instead of the hardcoded .NET Framework defaults.

For 64-bit operating systems:
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft.NETFramework\v2.0.50727]
“SystemDefaultTlsVersions”=dword:00000001

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft.NETFramework\v2.0.50727]
“SystemDefaultTlsVersions”=dword:00000001
For 32-bit operating systems:
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft.NETFramework\v2.0.50727]
“SystemDefaultTlsVersions”=dword:00000001


Note If the application has set the ServicePointManager.SecureProtocol in code or through config files to a specific value or uses the SslStream.AuthenticateAs* APIs to specify a specific SslProtocols enum, the registry setting behavior does not occur. But in our test code, we also have control over this. So we can test a lot of permutations.viu

Test code

To properly dive into the issue we needed to reproduce the error at will or at least very fast. So for that, we contacted a dev and asked him to share the code paths that actually made the connections to the databases. This is to verify if there were multiple services connecting and maybe only one had issues. It turned out it was all the same. It all failed in the same fashion.

So we came up with a test program to try and reproduce it as fast as possible even if it occurred infrequently. That ability, to test configuration changes fast, was key in finding a solution. With this test program, we did not see this issue with clients that were running on Windows Server 2016 or Windows Server 2019. Not with the actual services in the environments test or in production nor with our automated test tool. Based on the information and documentation of .NET and Windows Server 2012 R2 this should not have been an issue there either. But still, here we are.

The good thing about the test code is that we can easily play with different settings in regards to the TLS version specified in the code. We noted that using TLS 1.1 or 1.0 would show a drop in connection errors versus TLS 1.2 but not eliminate them. No matter what permutation we tried we just got a difference in frequency of the issue. We were not able to get rid of the error. Now that we had tried to deal with the issue on the application level we decided to focus on the host.

The host

Even with no TLS 1.2 enforced the Windows Server 2012 R2 client hello in its connection to the SQL Server host uses TLS 1.2. The cipher they select ( TLS_DHE_RSA_WITH_AES_256_GCM_SHA384) is actually considered weak. This is evident when we look at the client and server hello to port 1433.

TLS issues with Windows Server 2012 R2 connecting to SQL Server 2016 running on Windows Server 2016 or 2019
Client hello
TLS issues with Windows Server 2012 R2 connecting to SQL Server 2016 running on Windows Server 2016 or 2019
Server Hello

Trying to specify the TLS version in the test code did nor resolve the issue. So we now tried solving it on the host. We are going for best practices on the Windows Server 2012 R2 client side. ! In the end, I ended up doing the following

  • Allowed only TLS 1.2
  • Allowed only Secure ciphers & Ordered those for Perfect Forward Privacy
  • Enforce the OS to always select the TLS version
  • Enforce the use of the most secure (TLS 1.2 as that is the only one we allow)

Note that there is no support for TLS 1.3 at the moment of writing.

After this, we ran our automated tests again and did not see even one occurrence of these issues anymore. This fixed it. It worked without any other changes except for one. There was one application that had not been upgraded to .NET 4.6 or higher and enforced TLS 1.1. So we leaned on the app owners a bit to have them recompile their code. In the end, they went with 4.8.

In the network capture of connections, we see they now select a secure cipher (TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384).. With this setup, we did not experience the flowing infrequent error anymore. “System.Data.SqlClient.SqlException: A connection was successfully established with the server, but then an error occurred during the login process. (provider: SSL Provider, error: 0 – An existing connection was forcibly closed by the remote host.) —> System.ComponentModel.Win32Exception: An existing connection was forcibly closed by the remote host

Client Hello
Server Hello

The cause

We had an assumption is that once in a while and older/weaker ciphers were used and that this causes the connection error. By implementing best practices we disabled those which prevent this from happening. As we did not change the source code om most machines it either used the .Net framework defaults or the specified settings.

You can easily implement best practices by using a community script like Setup Microsoft Windows or IIS for SSL Perfect Forward Secrecy and TLS 1.2. There is also a tool like IISCrypto or IISCryptoCLI. Potentially, you can always automate this and devise a way to configure this wholesale with the tools of your choice if needed.

The solution

As you can read above, what finally did the trick was implementing a TLS 1.2 only best practices configuration on the Windows Server 2012 R2 hosts.

If you run into a similar issue please test any solution, including mine here, before implementing it. TLS versions that software and operating systems require or are able to use differ. So any solution needs to be tested end to end for a particular use case. Even more so if ServicePointManager.SecureProtocol in code or through config files or via  SslStream.AuthenticateAs* APIs are in play.

In this particular environment forcing TLS 1.2 and having the OS control which TLS version is being used by the .NET applications worked. Your mileage may differ and you might need to use a different approach to fix the issue in your environment. Big boy rules apply here!

Conclusion

Tech debt will sooner or later always rear its ugly head. There is a reason I upgrade and update regularly and well ahead of deadlines. I have always been doing that as much as possible (SSL Certs And Achieving “A” Level Security With Older Windows Version) . But in this case that this seemed more like a bug than a configuration issue. It only happened every now and then which meant troubleshooting was a bit more difficult. But that was addressed with a little test program. this helped us test configuration changes to fix these Intermittent TLS issues with Windows Server 2012 R2 connecting to SQL Server 2016 running on Windows Server 2016 or 2019 fast and easily. I shared this case as it might help other people out there struggling with the same issue.

Hyper-V Amigos Showcast Episode 20 and 21

Introduction

This is just a quick blog post to let you know the Hyper-V Amigos have released 2 webcasts recently. These are Hyper-V Amigos Showcast Episode 20 and 21. You will find a link to the videos and a description of the content below.

Hyper-V Amigos Showcast – Episode 20

In episode 20 of the Hyper-V Amigo ShowCast, we continue our journey in the different ways in which we can use storage spaces in backup targets. In our previous “Hyper-V Amigos ShowCast (Episode 19)– Windows Server 2019 as Veeam Backup Target Part I” we looked at stand-alone or member servers with Storage Spaces. With both direct-attached storage and SMB files shares as backup targets. We also played with Multi Resilient Volumes.

For this webcast, we have one 2 node S2D cluster set up for the Hyper-V workload (Azure Stack HCI). On a second 2 node S2D cluster, we host 2 SOFS file shares. Each on their own CSV LUN. SOFS on S2D is supported for backups and archival workloads. And as it is SMB3 and we have RDMA capable NICs we can leverage RDMA (RoCE, Mellanox ConnectX-5) to benefit from CPU offloading and superb throughput at ultra-low latency.

Hyper-V Amigos Show Cast Episode 20

Some extra information

The General Purpose File Server (GPFS role) is not supported on S2D for now. You can use GPFS with shared storage and in combination with continuous availability. This performs well as a high available backup target as well. The benefit here is that this is cost-effective (Windows Server Standard licenses will do) and you get to use the shared storage of your choice. But in this show cast, we focus on the S2D scenario and we didn’t build a non-supported scenario.

You would normally expect to notice the performance impact of continuous availability when you compare the speeds with the previous episode where we used a non-high available file share (no continuous availability possible). But we have better storage in the lab for this test, the source system is usually the bottleneck and as such our results were pretty awesome.

The lab has 4 Tarox server nodes with a mix of Intel Optane DC Memory (Persistent Memory or Storage Class Memory), Intel NVMe and Intel SSD disks. For the networking, we leverage Mellanox ConnectX-5 100Gbps NICs and SN2100 100Gbps switches. Hence we both had a grin on our face just prepping this lab.

As a side note, the performance impact of continuous availability and write-through is expected. I have written about it before here. The reason why you might contemplate to use it. Next to a requirement for high availability, is due to the small but realistic data corruption risk you have with not continuously available SMB shares. The reason is that they do not provide write-through for guaranteed data persistence.

We also demonstrate the “Instant Recovery” capability of Veeam to make workloads available fast and point out the benefits.

Hyper-V Amigos Showcast – Episode 21

In episode 21 we are diving into leveraging the Veeam Agent for Windows integrated with Veeam Backup & Replication (v10 RC1)  to protect our physical S2D nodes. For shops that don’t have an automated cluster node build processes set up or rely on external help to come in and do it this can be a huge time saver.

We walk through the entire process and end up doing a bare metal recovery of one of the S2D nodes. The steps include:

  • Setting up an Active Directory protection group for our S2D cluster.
  • Creating a backup job for a Windows Server, where we select failover cluster as type (Which has only the “Managed by Backup Server”  as the mode).
  • We run a backup
  • After that, we create the Veeam Agent Recovery Media (the most finicky part)
  • Finally, we restore one of the S2D hosts completely using the bare metal recovery option

Some more information

Now we had some issues in the lab one of them suffering to a BSOD on the laptop used to make the recording and being a bit too impatient when booting from the ISO over a BMC virtual CD/DVD. Hence we had to glue some parts together and fast forward through the boring bits. We do appreciate that watching a system bot for 10 minutes doesn’t make for good infotainment. Other than that, it went fine and we were able to demonstrate the process from the beginning to the end.

As is the case with any process you should test and experiment to make sure you are familiar with the process. That makes it all a little easier and hurt a little less when the day comes you have to do it for real.

We hope the show cast helps you look into some of the capabilities and options you have with Veeam in regards to protecting any workloads. Long gone are the days that Veeam was only about protecting virtual Machines. Veeam is about protecting data where ever it lives. In VMs, physical servers, workstations, PCs, laptop, on-prem, in the cloud and Office 365. On top of that, you can restore it where ever you want to avoid lock-in and costly migration projects and tools. Check it out.

Conclusion

We will be doing more web casts on Veeam Backup & Replication v10 in 2020 as it will be generally available in Q1 as far I can guess.

Hyper-V Amigos Showcast Episode 20 and 21

But with Hyper-V Amigos Showcast Episode 20 and 21, that’s it for 2019. Enjoy the holidays during this festive season. The Hyper-V Amigos wish you a Merry X-Mas and a very happy New Year in 2020!

Veeam Vanguard Renewals and Nominations 2020

Introduction

Are you are working with Veeam software solutions? Are you passionate about sharing your experiences, knowledge, and insights? If so, you might want to consider a nomination for the Veeam Vanguard program. If you are already a Veeam Vanguard I’m pretty sure you already know submissions for Veeam Vanguard Renewals and Nominations 2020 are open.

Veeam Vanguard Renewals and Nominations

As we are nearing the end of 2019 Veeeam has opened the Veeam Vanguard Renewals and Nominations for 2020.

Describing the Veeam Vanguard program is not easily done. But Nikola Pejková has done a great job to do exactly that in Join the Veeam Vanguard 2020 class! She also explains how to nominate someone or yourself. Read the blog post and find out if this is something for you. I enjoy being a part of it because I get to learn with and from some of the best minds in the industry. This allows me to help others better while also keeping up with the changing IT landscape whilst helping others.

Veeam Vanguard Renewals and Nominations 2020
My fellow Veeam Vanguard and me in a Q&A session with the Veeam R&D and PM teams at the Veeam Vanguard Summit.

I would like to emphasize that the diversity of the Veeam Vanguard is paramount to me. It works because we have people in there form around the globe, from all kinds of backgrounds and job roles. This helps open up discussions with different points of view and experiences. Customers, consultants, and partners look at needs and solutions from their perspectives. Having us together in the Vanguard benefits us all and prevents tunnel vision.

Nominate someone, yourself or be nominated

Nikola explains how to do this in her blog so read Join the Veeam Vanguard 2020 class! and apply to become Vanguard! It is quite an experience. Quality people who are active in the commumnity and help by sharing their knowledge are welcomed and appreciated. Maybe you’ll find yourself to be a Veeam Vanguard in 2020!

Resolve the 500 Internal Server Error for PHP 7.4 with IIS on Windows Server for WordPress 5.31

Resolve the 500 Internal Server Error for PHP 7.4 with IIS on Windows Server for WordPress

Many of my readers will know I run a tight ship when it comes to keeping my blog infrastructure up to date and as secure as I can. This means, among other things, I keep my operating system recent and patched. The same goes for my MySQL database, PHP versions, and WordPress + plugins.

Recently I wanted to upgrade to PHP 7.4 that was just released and should be compatible with WordPress 5.3.1.

Resolve the 500 Internal Server Error for PHP 7.4 with IIS on Windows Server

But once I switched to PHP 7.4 my site responded with the dreaded 500 Internal Server Error. Instead of going trial and error I decided to zoom in fast to the root cause to resolve the 500 Internal Server Error for PHP 7.4 with IIS on Windows Server.

Troubleshooting steps

The fasted way to figure out what issues PHP has is to open an elevated command prompt and navigate to the install path of your PHP version. For me, that would be C:\Program Files\PHP\php-7.4.0-nts-Win32-vc15-x64. Note that IIS with requires the non thread safe version of PHP.

Once there run php -m

I got greeted by this warning.

PHP Warning: ‘vcruntime140.dll’ 14.0 is not compatible with this PHP build linked with 14.16 in Unknown on line 0

Resolve the 500 Internal Server Error for PHP 7.4 with IIS on Windows Server

That ‘s clear enough. I need to upgrade my Visual C++ runtime. I have but I need a more recent version. Basically the one for Visual ++ 2105-2019. You can grab those here for both x64 and x86.

https://aka.ms/vs/16/release/VC_redist.x64.exe
https://aka.ms/vs/16/release/VC_redist.x86.exe

I tend to install both. Just in case I have an issue with x64 PHP I can try the x86 version to try and resolve it, Normally I default to x64 bit and for the past years that has worked very well.

Once I had installed those my blog sprung back to life and that was it. If you run php -m again you won’t see the warning anymore. No need to switch themes (I have a simple one that has longevity). I also do not have to empirically disable/enable plugins. I select quality, free and modern plugins that I keep up to date or replace when needed. So, for me, that was the fix. Since most troubleshooting guided don’t start here, give it a shot. The HTTP Error 500.0 – Internal Server Error is an annoying one to troubleshoot. The PHP error log might not tell you anything as it goes wrong before it can log anything and enabling more detailed error handling does not always pinpoint something precisely. It might even send you on a wild goose chase.

Resolve the 500 Internal Server Error for PHP 7.4 with IIS on Windows Server
Not very specific is it?

If you did change the feature settings for the error pages for your site to get more detailed information to “Detailed errors” don’t forget to set it back. Configure it back to the default of  “Detailed error for local requests and custom error pages for remote requests” or your custom pages. You don’t want the details shown publically.

That’s it. I hope it helps someone who tries to resolve the 500 Internal Server Error for PHP 7.4 with IIS on Windows Server