Intermittent TLS issues with Windows Server 2012 R2 connecting to SQL Server 2016 running on Windows Server 2016 or 2019

Introduction

Someone asked me to help investigate an issue that was hindering client/server applications. They suffered from intermittent TLS issues with Windows Server 2012 R2 connecting to SQL Server 2016 running on Windows Server 2016 or 2019. Normally everything went fine but one in every 250 to 500 connection of the server-client to a database on SQL Server 2016 they got the error below.

System.ServiceModel.CommunicationException: The InstanceStore could not be initialized. —> System.Runtime.DurableInstancing.InstancePersistenceCommandException: The execution of the InstancePersistenceCommand named {urn:schemas-microsoft-com:System.Activities.Persistence/command}CreateWorkflowOwner was interrupted by an error. —> System.Data.SqlClient.SqlException: A connection was successfully established with the server, but then an error occurred during the login process. (provider: SSL Provider, error: 0 – An existing connection was forcibly closed by the remote host.) —> System.ComponentModel.Win32Exception: An existing connection was forcibly closed by the remote host

Now actually retrying the connection in the code could work around this. Good code should have such mechanisms implemented. But in the end, there was indeed an issue.

Intermittent TLS issues with Windows Server 2012 R2 connecting to SQL Server 2016 running on Windows Server 2016 or 2019

I did a quick verification of any network issues. The network was fine, so I could take this cause of the table. The error itself did not happen constantly, but rather infrequently. All indications pointed to a (TLS) configuration issue. “An existing connection was forcibly closed by the remote host” was the clearest hint for this. But then one would expect this to happen every time.

We also checked that the Windows Server 201 R2 hosts were fully up to date and had either .NET 4.7 or .NET 4.8 installed. These versions support TLS 1.2 without issues normally.

Also, on Windows Server 2012 R2 TLS 1.2 is enabled by default and does not require editing the registry to enable it. You have to do this is you want to disable it and re-enable it.

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.2\Client]
“DisabledByDefault”=dword:00000000

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.2\Server]
“DisabledByDefault”=dword:00000000

These were not present on the Windows Server 2012 R2 host but also not on the Windows Server 2016 or 2019 hosts we use for comparison.

We also check if KB3154520 – Support for TLS System Default Versions included in the .NET Framework 3.5 on Windows 8.1 and Windows Server 2012 R2 is installed. This allows using the operating system defaults for SSL and TLS instead of the hardcoded .NET Framework defaults.

For 64-bit operating systems:
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft.NETFramework\v2.0.50727]
“SystemDefaultTlsVersions”=dword:00000001

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft.NETFramework\v2.0.50727]
“SystemDefaultTlsVersions”=dword:00000001
For 32-bit operating systems:
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft.NETFramework\v2.0.50727]
“SystemDefaultTlsVersions”=dword:00000001


Note If the application has set the ServicePointManager.SecureProtocol in code or through config files to a specific value or uses the SslStream.AuthenticateAs* APIs to specify a specific SslProtocols enum, the registry setting behavior does not occur. But in our test code, we also have control over this. So we can test a lot of permutations.viu

Test code

To properly dive into the issue we needed to reproduce the error at will or at least very fast. So for that, we contacted a dev and asked him to share the code paths that actually made the connections to the databases. This is to verify if there were multiple services connecting and maybe only one had issues. It turned out it was all the same. It all failed in the same fashion.

So we came up with a test program to try and reproduce it as fast as possible even if it occurred infrequently. That ability, to test configuration changes fast, was key in finding a solution. With this test program, we did not see this issue with clients that were running on Windows Server 2016 or Windows Server 2019. Not with the actual services in the environments test or in production nor with our automated test tool. Based on the information and documentation of .NET and Windows Server 2012 R2 this should not have been an issue there either. But still, here we are.

The good thing about the test code is that we can easily play with different settings in regards to the TLS version specified in the code. We noted that using TLS 1.1 or 1.0 would show a drop in connection errors versus TLS 1.2 but not eliminate them. No matter what permutation we tried we just got a difference in frequency of the issue. We were not able to get rid of the error. Now that we had tried to deal with the issue on the application level we decided to focus on the host.

The host

Even with no TLS 1.2 enforced the Windows Server 2012 R2 client hello in its connection to the SQL Server host uses TLS 1.2. The cipher they select ( TLS_DHE_RSA_WITH_AES_256_GCM_SHA384) is actually considered weak. This is evident when we look at the client and server hello to port 1433.

TLS issues with Windows Server 2012 R2 connecting to SQL Server 2016 running on Windows Server 2016 or 2019
Client hello
TLS issues with Windows Server 2012 R2 connecting to SQL Server 2016 running on Windows Server 2016 or 2019
Server Hello

Trying to specify the TLS version in the test code did nor resolve the issue. So we now tried solving it on the host. We are going for best practices on the Windows Server 2012 R2 client side. ! In the end, I ended up doing the following

  • Allowed only TLS 1.2
  • Allowed only Secure ciphers & Ordered those for Perfect Forward Privacy
  • Enforce the OS to always select the TLS version
  • Enforce the use of the most secure (TLS 1.2 as that is the only one we allow)

Note that there is no support for TLS 1.3 at the moment of writing.

After this, we ran our automated tests again and did not see even one occurrence of these issues anymore. This fixed it. It worked without any other changes except for one. There was one application that had not been upgraded to .NET 4.6 or higher and enforced TLS 1.1. So we leaned on the app owners a bit to have them recompile their code. In the end, they went with 4.8.

In the network capture of connections, we see they now select a secure cipher (TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384).. With this setup, we did not experience the flowing infrequent error anymore. “System.Data.SqlClient.SqlException: A connection was successfully established with the server, but then an error occurred during the login process. (provider: SSL Provider, error: 0 – An existing connection was forcibly closed by the remote host.) —> System.ComponentModel.Win32Exception: An existing connection was forcibly closed by the remote host

Client Hello
Server Hello

The cause

We had an assumption is that once in a while and older/weaker ciphers were used and that this causes the connection error. By implementing best practices we disabled those which prevent this from happening. As we did not change the source code om most machines it either used the .Net framework defaults or the specified settings.

You can easily implement best practices by using a community script like Setup Microsoft Windows or IIS for SSL Perfect Forward Secrecy and TLS 1.2. There is also a tool like IISCrypto or IISCryptoCLI. Potentially, you can always automate this and devise a way to configure this wholesale with the tools of your choice if needed.

The solution

As you can read above, what finally did the trick was implementing a TLS 1.2 only best practices configuration on the Windows Server 2012 R2 hosts.

If you run into a similar issue please test any solution, including mine here, before implementing it. TLS versions that software and operating systems require or are able to use differ. So any solution needs to be tested end to end for a particular use case. Even more so if ServicePointManager.SecureProtocol in code or through config files or via  SslStream.AuthenticateAs* APIs are in play.

In this particular environment forcing TLS 1.2 and having the OS control which TLS version is being used by the .NET applications worked. Your mileage may differ and you might need to use a different approach to fix the issue in your environment. Big boy rules apply here!

Conclusion

Tech debt will sooner or later always rear its ugly head. There is a reason I upgrade and update regularly and well ahead of deadlines. I have always been doing that as much as possible (SSL Certs And Achieving “A” Level Security With Older Windows Version) . But in this case that this seemed more like a bug than a configuration issue. It only happened every now and then which meant troubleshooting was a bit more difficult. But that was addressed with a little test program. this helped us test configuration changes to fix these Intermittent TLS issues with Windows Server 2012 R2 connecting to SQL Server 2016 running on Windows Server 2016 or 2019 fast and easily. I shared this case as it might help other people out there struggling with the same issue.

Leave a Reply, get the discussion going, share and learn with your peers.

This site uses Akismet to reduce spam. Learn how your comment data is processed.