BitLooker In Veeam Backup and Replication v9

When your backup size is bigger than the amount of disk space used in the virtual machine you might wonder why that is. Well it’s deleted data who’s blocks have not been released for reuse by the OS yet. BitLooker in Veeam Backup and Replication v9 as announced at VeeamOn 2015 offers a solution for this situation. BitLooker analyses the NFTS MFT to identify deleted data. It uses this information to reduce the size of an imaged based backup file and helps reduce bandwidth needed for replication. It just makes sense!

BitLooker In Veeam Backup and Replication v9

I really like these additions that help out to optimize the consumption of backup storage. Now I immediately wondered f this would make any difference on the recent versions of Hyper-V that support UNMAP. Well, probably not. My take on this is that the Hyper-V virtual Machine is aware of the deleted blocks via UNMAP this way so they will not get backed up. This is one of the examples of the excellent storage optimization capabilities of Hyper-V.

UNMAP

It’s a great new addition to Veeam Backup & Replication v9. Especially when you’re running legacy hypervisors like like Windows Server 2008 R2 or older, or (at the time of writing) VMware. When you’ve been rocking Windows Server 212 R2 for the last three years Hyper-V already had your back with truly excellent UNMAP support in the virtual layer.

Hyper-V Virtual Machines and the Storage Optimizer

Windows Server 2012 (R2) has made many improvements to how storage optimization and maintenance is done. You can read a lot more about this in What’s New in Defrag for Windows Server 2012/2012R2. It boils down to a more intelligent approach depending on the capability of the underlying storage.

This is reflected in the Media type we see when we look at Optimize Drives.

This is my workstation … looks pretty correct a couple of SSDs and a couple of HDDs.

image

SSD are optimized intelligently by the way.When VSS is leveraged SSD do get fragmentation and so one in while they are “defragmented”. This has to do with keeping performance up to par. Read more about this in The real and complete story – Does Windows defragment your SSD? by Scott Hanselman.

The next example is a Hyper-V Cluster. You can see the local disks identified as HDD and the CSV as Thin provisioned disks. Makes sense to me, the SAN I use supports thin provisioned disks.

image

But now, let’s look at a Virtual Machine with virtual disks of every type known and on any type of storage we could find. All virtual disks are identified as “Thin provisioned disk”. How can that be?

image

What had me puzzled a little bit is that in a virtual machine each and every virtual disk is identified as thin provisioned disk. It doesn’t matter what type of virtual disk it is: fixed VHD/VHDX or dynamically expanding VHD/VHDX. It also doesn’t matter on what physical disk the virtual disk resides: SATA, SAS, SSD, SAN (iSCSI/FC) LUN or CSV, SMB Share …

So how does this work with a fixed VHD on a local SATA disk? A VHD doesn’t know about UNMAP, does it? And a SATA HHD? How does that compute? Well, my understanding on this is that all virtual disks, dynamically expanding or fixed, both VHDX/VHD are identified as thin provisioned disks, no matter what type of physical disk they reside on (CSV, SAS, SATA, SSD, shared/non shared). This is to allow for UNMAP (RETRIMs in Storage Optimizer speak, which is  way of dealing with the TRIM limitations / imperfections, again see Scott Hanselman’s blog for this) command to be sent from the guest to the Hyper-V storage stack below. If it’s a VHD those UNMAP command are basically black holed just like they would never be passed down to a local SATA HHD (on the host) that has no idea what it is and used for.

But wait a minute ….what about SSD and defragmentation you say, my VHDX lives on an SSD.. Well they are for one not identified as SSD or HDD. The hypervisors deals with the storage optimization at the virtual layer. The host OS handles the physical layer as intelligent as it can to optimize the disks as best as it can. How that happens depends on the actual storage beneath in the case of a modern SAN you’ll notice it’s also identified as a Thin provisioned disk. SANs or hyper converged storage arrays provide you with storage that is also virtual with all kinds of features and are often based on tier storage which will be a mix of SSD/SAS/NL-SAS and in some cases even NVMe Flash. So what would an OS have to identify it as?  The storage array must play its part in this.

So, if you ever wondered why that is, now you know. Hope you found this interesting!

Hyper-V UNMAP Does Work With SAN Snapshots And Checkpoints But Not Always As You First Expect

Recently I was asked to take a look at why UNMAP was not working predictably  in a Windows Server 2012 R2 Hyper-V environment. No, this is not a horror story about bugs or bad storage solutions. Fortunately, once the horror option was of the table I had a pretty good idea what might be the cause.

San snapshots are in play

As it turned out everything was indeed working just fine. The unexpected behavior that made it seem that UNMAP wasn’t working well or at least at moments they didn’t expected it was caused by the SAN snapshots. Once you know how this works you’ll find that UNMAP does indeed work predictably.

Snapshots on SANs are used for automatic data tiering, data protection and various other use cases. As long as those snapshots live, and as such the data in them, UNMAP/Trim will not free up space on the SAN with thinly provisioned LUNs. This is logical, as the data is still stored on the SAN for those snapshots, hard deleting it form the VM or host has no impact on the storage the SAN uses until those snapshots are deleted or expire. Only what happens in the active portion is directly impacted.

An example

  • Take a VM with a dynamically expanding VHDX that’s empty and mapped to drive letter D. Note the file size of the VHDX and the space consumed on the thinly provisioned SAN LUN where it resides.
  • Create 30GB of data in that dynamically expanding  virtual hard disk of the virtual machine
  • Create a SAN snapshot
  • Shift + Delete that 30GB of data from the dynamically expanding virtual hard disk in the virtual machine. Watch the dynamically expanding VHDX  grow in size, just like the space consumed on the SAN
  • Run Optimize-Volume D –retrim to force UNMAP and watch the space consumed of the Size of the LUN on the SAN: it remains +/- the same.
  • Shut down the VM and look at the size of the dynamic VHDX file. It shrinks to the size before you copied the data into it.
  • Boot the VM again and copy 30GB of data to the dynamically expanding VHDX in the VM again.
  • See the size of the VHDX grow and notice that the space consumed on the SAN for that LUN goes up as well.
  • Shift + Delete that 30GB of data from the dynamically expanding  virtual hard disk in the virtual machine
  • Run Optimize-Volume D –retrim to force UNMAP and watch the space consumed of the Size of the LUN on the SAN: It drops, as the data you delete is in the active part of your LUN (the second 30GB you copied), but it will not drop any more than this as the data kept safe in the frozen snapshot of the LUN is remains there (the first 30GB you copied)
  • When you expire/delete that snapshot on the SAN  we’ll see the size on the thinly provisioned SAN LUN  drop to the initial size of this exercise.

I hope this example gave you some insights into the behavior

Conclusion

So people who have snapshot based automatic data tiering, data protection etc. active in their Hyper-V environment and don’t see any results at all should check those snapshot schedules & live times. When you take them into consideration you’ll see that UNMAP does work predictably, all be it in a “delayed” fashion Smile.

The same goes for Hyper-V checkpoints (formerly known as snapshots). When you create a checkpoint the VHDX is kept and you are writing to a avhdx (differencing disk) meaning that any UNMAP activity will only reflect on data in the active avhdx file and not in the “frozen” parent file.

Mind the UNMAP Impact On Performance In Certain Scenarios

The Problem

Recently we’ve been trouble shooting some weird SQL Server to file backup issues. They started failing on the clock at 06:00 AM. We checked the NICs, the switches, the drivers, the LUNs, HBAs, … but it was all well. We considered over stressed buffers as the root cause or spanning tree issues but the clock steadiness of it all was weird. We tried playing with some time out parameters but with little to no avail. Until the moment it hit me, the file deletions that clean up the old backups!We had UNMAP enabled recently on the SAN.

Take a look at the screenshot below an note the deletion times underlined in red. That’s with UNMAP enabled. Above is with UNMAP disabled. The Backup jobs failed waiting for the deletion process.

image

This is a no issues if your backup target is running something prior to Windows Server 2012. if not, UNMAP is disabled by default. I know about the potential performance impact of UNMAP when deleting or more larger files due to the space reclamation kicking in. This is described here Plan and Deploy Thin Provisioning under the heading “Consider space reclamation and potential performance impact”. But as I’m quite used to talking about many, many terabytes of data I kind of forget to think of 500 to 600GB of files as “big” Embarrassed smile. But it seemed to a suspect so we tested certain scenarios and bingo!

Solutions

  1. Disable the file-delete notification that triggers real-time space reclamation. Find the following value HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlFileSystemDisableDeleteNotification and set it to 1.

    Note that: This setting is host wide, so for all LUNs. Perhaps that server has many other roles or needs to server that could benefit from UNMAP. If not this is not an issue.  It is however very efficient in avoiding issues. You can still use the Defragment and Optimize Drives tool to perform space reclamation on-demand or on a scheduled basis.

  2. Create LUNs that will have high deltas in a short time frame as fully provisioned LUNs (aka thick LUNs). As you do this per LUN and not on the host it allows for more fine grained actions than disabling UNMAP.  It makes no sense to have UNMAP do it’s work to reclaim the free space that deleting data created when you’ll just be filling up that space again in the next 24 hours in an endless cycle. Backup targets are a perfect example of this. This avoid the entire UNMAP cycle and you won’t mind as it doesn’t make much sense and fixes you issue. The drawback is you can’t do this for an existing volumes. So it has some overhead & downtime involved depending on the SAN solution you use. It also means that you have to convince you storage admins to give you fully provisioned LUNs, which might or might not be easy depending on how things are organized.

Conclusion

UNMAP has many benefits both in the physical and virtual layer. As with all technologies you have to understand its capabilities, requirements, benefits and draw backs. Without this you might run into trouble.