The Problem
Recently we’ve been trouble shooting some weird SQL Server to file backup issues. They started failing on the clock at 06:00 AM. We checked the NICs, the switches, the drivers, the LUNs, HBAs, … but it was all well. We considered over stressed buffers as the root cause or spanning tree issues but the clock steadiness of it all was weird. We tried playing with some time out parameters but with little to no avail. Until the moment it hit me, the file deletions that clean up the old backups!We had UNMAP enabled recently on the SAN.
Take a look at the screenshot below an note the deletion times underlined in red. That’s with UNMAP enabled. Above is with UNMAP disabled. The Backup jobs failed waiting for the deletion process.
This is a no issues if your backup target is running something prior to Windows Server 2012. if not, UNMAP is disabled by default. I know about the potential performance impact of UNMAP when deleting or more larger files due to the space reclamation kicking in. This is described here Plan and Deploy Thin Provisioning under the heading “Consider space reclamation and potential performance impact”. But as I’m quite used to talking about many, many terabytes of data I kind of forget to think of 500 to 600GB of files as “big” . But it seemed to a suspect so we tested certain scenarios and bingo!
Solutions
- Disable the file-delete notification that triggers real-time space reclamation. Find the following value HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlFileSystemDisableDeleteNotification and set it to 1.
Note that: This setting is host wide, so for all LUNs. Perhaps that server has many other roles or needs to server that could benefit from UNMAP. If not this is not an issue. It is however very efficient in avoiding issues. You can still use the Defragment and Optimize Drives tool to perform space reclamation on-demand or on a scheduled basis.
- Create LUNs that will have high deltas in a short time frame as fully provisioned LUNs (aka thick LUNs). As you do this per LUN and not on the host it allows for more fine grained actions than disabling UNMAP. It makes no sense to have UNMAP do it’s work to reclaim the free space that deleting data created when you’ll just be filling up that space again in the next 24 hours in an endless cycle. Backup targets are a perfect example of this. This avoid the entire UNMAP cycle and you won’t mind as it doesn’t make much sense and fixes you issue. The drawback is you can’t do this for an existing volumes. So it has some overhead & downtime involved depending on the SAN solution you use. It also means that you have to convince you storage admins to give you fully provisioned LUNs, which might or might not be easy depending on how things are organized.
Conclusion
UNMAP has many benefits both in the physical and virtual layer. As with all technologies you have to understand its capabilities, requirements, benefits and draw backs. Without this you might run into trouble.
Pingback: Microsoft Most Valuable Professional (MVP) – Best Posts of the Week around Windows Server, Exchange, SystemCenter and more – #47 - TechCenter - Blog - TechCenter - Dell Community
Pingback: Microsoft Most Valuable Professional (MVP) – Best Posts of the Week around Windows Server, Exchange, SystemCenter and more – #47 - Dell TechCenter - TechCenter - Dell Community
Interesting, we have a Compellent SAN running backup software on Server 2012 and experienced the exact same type of problem. Our file deletes could be anywhere from 300GB to 2.5 TB per file. Sometimes this would stall the host for a couple of hours. The interesting thing for us was that when we disabled file notification, the delete delay stopped but the manual optimize and trim commands did NOT reclaim space on the Compellent. I had to return the registry key to the normal settings and then run retrim to get my reclamation. Is there some other step I am missing? My understanding was I would still be able to run optimize-volume manually. In my case, I seem to be stuck with slow deletes if I want to get my space back…
Not sure. We went for the fully provisioned LUNs. Reclamiming didn’t makse sense as we fill them right back up.
Have you run Defrag not just retrim command?
Pingback: Full or Thick Provisioned Volume on Compellent - Working Hard In IT