Microsoft Active Directory Replication Status Tool won’t upgrade

For getting a quick insight into the AD replication health of an environment the Microsoft Active Directory Replication Status Tool is a very handy instrument. The only annoyance is the expiration of the license that forces you to download a new one and upgrade. A bit of a convoluted way to update free software but hey it is handy and free.

image

And then again …

image

OK, I’ll download the new one. But the Microsoft Active Directory Replication Status Tool won’t upgrade. That’s because the currently installed version is newer than the one you just downloaded form the Microsoft site. That’s annoying, did they post the wrong version?

image

Let’s install the new version quickly in a VM. Now looking at the executable in the current install and the new one they are the same … so the license is the only thing causing an issue here; not a version difference actually.

Old version

image

New version

image

 

Let’s look at the license.xml file in C:\Program Files (x86)\Microsoft Active Directory Replication Status Tool\Licensing

image

The only difference between the old and the new installed is the license file.You can see it has the expiration dates in the future.

image

So the fix is easy, just uninstall the currently installed version of AD Replication Status tool wherever it is installed and reinstall the one you downloaded. It seems to be exactly the same version but that’s how you get it working again with a fresh license.xml file. Note that you cannot copy the license file between machine, the generated signature is wrong.

Hope this helps someone.

Troubleshooting Veeam B&R Error code: ‘32768’. Failed to create VM recovery snapshot

I recently had to move a Windows Server 2016 VM over to another cluster (2012R2 to 2016 cluster)  and to do so I uses shared nothing live migration. After the VM was happily running on the new cluster I kicked of a Veeam backup job to get a first restore point for that VM. Better safe than sorry right?

image

But the job and the retries failed for that VM. The error details are:

Failed to create snapshot Compellent Replay Manager VSS Provider on repository01.domain.com (mode: Veeam application-aware processing) Details: Job failed (‘Checkpoint operation for ‘FailedVM’ failed. (Virtual machine ID 459C3068-9ED4-427B-AAEF-32A329B953AD). ‘FailedVM’ could not initiate a checkpoint operation: %%2147754996 (0x800423F4). (Virtual machine ID 459C3068-9ED4-427B-AAEF-32A329B953AD)’). Error code: ‘32768’.
Failed to create VM recovery snapshot, VM ID ‘3459c3068-9ed4-427b-aaef-32a329b953ad’.

Also when the job fails over to the native Windows VSS approach when the HW VSS provider fails it still does not work. At first that made me think of a bug that sued to exist in Windows Server 2016 Hyper-V where a storage live migration of any kind would break RCT and new full was needed to fix it. That bug has long since been fixed and no a new full backup did not solve anything here. Now there are various reasons why creating a checkpoint will not succeed so we need to dive in deeper. As always the event viewer is your friend. What do we see? 3 events during a backup and they are SQL Server related.
image

image

image

On top of that the SQLServerWriter  is in a non retryable error when checking with vssadmin list writers.

image

It’s very clear there is an issue with the SQL Server VSS Writer in this VM and that cause the checkpoint to fail. You can search for manual fixes but in the case of an otherwise functional SQL Server I chose to go for a repair install of SQL Server. The tooling for hat is pretty good and it’s probably the fastest way to resolve the issues and any underlying ones we might otherwise still encounter.

After running a successful repair install of SQL Server we get greeted by an all green result screen.

image

So now we check vssadmin list writers again to make sure they are all healthy if not restart the SQL s or other relevant service if possible. Sometime you can fix it by restarting a service, in that case reboot the server. We did not need to do that. We just ran a new retry in Veeam Backup & Replication and were successful.

There you go. The storage live migration before the backup of that VM made me think we were dealing with an early Windows Server 2016 Hyper-V bug but that was not the case. Trouble shooting is also about avoiding tunnel vision.

Dell iDRAC 6 Remote Console Connection Failed

Dell iDRAC 6 Remote Console Connection Failed

I recently had the honor to fix a real annoying issue with the iDRAC on rather old DELL hardware, R710 servers that are stilling puling their weight. They have been upgraded to the latest firmware naturally and DELL allows access to those updates to anyone without the need for a support contract (happy users/customers).You can perfectly configure Java site exceptions and use Firefox or Chrome to connect to it (IE is different story, you can connect but the view is messed up). Anyway the browser isn’t the big issue. The  problem was that Dell iDRAC 6 remote console connection failed consistently at the very last moment with “Connection Failed”

image

image

Note: are you nuts?

Yes I like 25/50/100Gbps RDMA, S2D, All Flash etc. I do live the vanguard live on the bleeding edge, but part of that is funding solutions that fit the environment. In this case. They have multiple spare servers and extra disks on top the ones they use in the lab or even in production. So even when a server or a component fails they can use that to fix it. They have the hands on and savvy staff members to do that. No problem. This is not an organization driven by fear of risk and responsibility but by results and effective TCO/ROI. They know very well what they can handle and what not. On top of that they know very well what part of IT sectors sales and marketing promises/predictions are FUD and which are reality. This means they can make decisions based on optimizing for their needs delivering real results.

Leveraging old hardware does mean that sometimes you’ll  run into silly issues but annoying issues like older DRAC cards with modern client operating systems, browsers and recent Java versions.

Most tricks are to be found on line to get those to work together but sometimes even those fails. First of all make sure all network requirements are in order (ports, firewall etc) and on top of that:

  • Upgraded the DRAC Firmware to the latest v2.85
  • Add DRAC IP into the Java Exception List.
  • Change Java Network Setting from Browser to Direct Connect
  • Hack the Java config files
  • Disable Encrypted Video on the DRAC
  • Reset the DRAC
  • On top of this you can run and older version of the browser and Java but at a certain point this becomes a silly option. You see at a given moment the entire stack as moved ahead and one trick like running an old version of Java won’t do it anymore and keeping a VM around that’s at a 10 year old tech/version level is a pain.

The missing piece for me: generate & upload SHA256 certs

So let me share you what extra step got the remote console of the DELL R710 iDRAC to work with the most recent version of Java, Windows 10 and the latest of the greatest Firefox browser at the time of writing.

The trick that finally did it is to generate a CSR on the DRAC while you are connected to it. You see, many people never upload their own certs and if they did, it might have been many years ago. Those old SHA1 certs are frowned upon by modern browsers and Java.

image

image

Open the CSR file, copy the content and submit it to a PKI you have or a free one on line like at getacert.com. Just fill out some random info in the request and you’ll get a SHA256 cert for download immediately that “valid” a couple of months. Enough for testing or getting out of a pickle. Your own corporate CA will do better for long term needs.

image

On top of that you’ll need to reset the DRAC card and give it a few minutes.

image

Reconnect to the DRAC and after that, without failure, we could connect to the on all R710 servers where before we kept getting the dreaded “Connection Failed” error otherwise.

That’s it! Good luck.

Hyper-V integration components 6.3.9600.18692

After the July 2017 round of patching we got a new version of the Hyper-V integration components on Windows Server 2012 R2. Yes, something that you no longer need to deal with manually since Windows Server 2016. But hey, my guess is that many of you are still taking care of Windows Server 2012 R2 Hyper-V deployments. I’m still taking care of a couple of Windows Server 2012 R2 Clusters, so don’t be shy now.

The newest version (at the time of writing) is 6.3.9600.18692 and 1st appeared in the June 27, 2017—KB4022720 (Preview of Monthly Rollup) update. It has since  been release in the July 11, 2017—KB4025336 (Monthly Rollup) update. You can follow up on the versions of the IC via this link Hyper-V Integration Services: List of Build Numbers

image

That means that you’ll need to upgrade the integration components for the VMs running on your Hyper-V (cluster) nodes after patching those.

image

And yes despite some issues we have seen with QA on updates in the past we still keep our environment very well up to date as when doing balanced risk management the benefits of a modern, well patched environment are very much there. Both for fixing bugs and mitigating security risks. Remember WannaCry ?

So my automation script has run against my Windows Server 2012 R2  Clusters. have you taken care of yours? I did adapt it to deal with the ever growing number of Windows Server 2016 VMs we see running, yes even on Windows Server 2012 R2 Hyper-V hosts.

image