Recently we’ve been trouble shooting some weird SQL Server to file backup issues. They started failing on the clock at 06:00 AM. We checked the NICs, the switches, the drivers, the LUNs, HBAs, … but it was all well. We considered over stressed buffers as the root cause or spanning tree issues but the clock steadiness of it all was weird. We tried playing with some time out parameters but with little to no avail. Until the moment it hit me, the file deletions that clean up the old backups!We had UNMAP enabled recently on the SAN.
Take a look at the screenshot below an note the deletion times underlined in red. That’s with UNMAP enabled. Above is with UNMAP disabled. The Backup jobs failed waiting for the deletion process.
This is a no issues if your backup target is running something prior to Windows Server 2012. if not, UNMAP is disabled by default. I know about the potential performance impact of UNMAP when deleting or more larger files due to the space reclamation kicking in. This is described here Plan and Deploy Thin Provisioning under the heading “Consider space reclamation and potential performance impact”. But as I’m quite used to talking about many, many terabytes of data I kind of forget to think of 500 to 600GB of files as “big” . But it seemed to a suspect so we tested certain scenarios and bingo!
Disable the file-delete notification that triggers real-time space reclamation. Find the following value HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlFileSystemDisableDeleteNotification and set it to 1.
Note that: This setting is host wide, so for all LUNs. Perhaps that server has many other roles or needs to server that could benefit from UNMAP. If not this is not an issue. It is however very efficient in avoiding issues. You can still use the Defragment and Optimize Drives tool to perform space reclamation on-demand or on a scheduled basis.
Create LUNs that will have high deltas in a short time frame as fully provisioned LUNs (aka thick LUNs). As you do this per LUN and not on the host it allows for more fine grained actions than disabling UNMAP. It makes no sense to have UNMAP do it’s work to reclaim the free space that deleting data created when you’ll just be filling up that space again in the next 24 hours in an endless cycle. Backup targets are a perfect example of this. This avoid the entire UNMAP cycle and you won’t mind as it doesn’t make much sense and fixes you issue. The drawback is you can’t do this for an existing volumes. So it has some overhead & downtime involved depending on the SAN solution you use. It also means that you have to convince you storage admins to give you fully provisioned LUNs, which might or might not be easy depending on how things are organized.
UNMAP has many benefits both in the physical and virtual layer. As with all technologies you have to understand its capabilities, requirements, benefits and draw backs. Without this you might run into trouble.
I’m playing and examining some of the ODX capabilities of our SANs (Dell, Compellent) at the moment. It all seems pretty impressive in the demo’s. But how does that behave in real live on our gear? How impressive is ODX? Well pretty darn impressive actually. And as all great power it needs to be wielded carefully, with insight and thought.
Let’s create some fixed virtual disks. 10 * 50GB vhdx and 10* 475GB vhdx. We run a simple quick PowerShell script:
You see this correctly, it’s 41.5088855 seconds. let’s round up to 42 seconds. That’s 20 fixed VHDX files. 10 of 50GB, 10 of 475GB in 42 seconds. That’s a total of 5.12TB of vhdx files.
Compared to creating a single 5TB vhdx file this isn’t to shabby as that get done in 26 seconds!
You can only dream of the kind of scenario’s this kind of power enables. Woooot!!!
Last year we renewed our SAN storage and our backup systems. They had been serving us for 5 years and where truly end of life as both technologies uses are functionally obsolete in the current era of virtualization and private clouds. The timing was fortunate as we would have been limited in our Windows 2012, Hyper-V & disaster recovery plans if we had to keep it going for another couple of years.
Now any time you dispose of old hardware it’s a good idea to wipe the data securely to a decent standard such as DoD 5220.22-M. This holds true whether it’s a laptop, a printer or a storage system.
We did the following:
Un-initialize the SAN/VLS
Reinitialize the SAN/VLS
Un-initialize the SAN/VLS
Swap a lot of disks around between SAN/VLS and disk bays in a random fashion
Un-initialize the SAN/VLS
Create new (Mirrored) LUNS, as large as possible.
Mounted them to a host or host
Run the DoD grade disk wiping software against them.
That process is completely automatic and foes faster than we were led to believe, so it was not really such a pain to do in the end. Just let it run for a week 24/7 and you’ll wipe a whole lot of data. There is no need to sit and watch progress counters.
Un-initialize the SAN/VLS
Have it removed by a certified company that assures proper disposal
We would have loved to take it to a shooting range and blast the hell of of those things but alas, that’s not very practical nor feasible . It would have been very therapeutic for the IT Ops guys who’ve been baby sitting the ever faster failing VLS hardware over the last years.
Here’s some pictures of the decommissioned systems. Below are the two old VLS backup systems, broken down and removed from the data center waiting disposal. It’s cheap commodity hardware with a reliability problem when over 3 years old and way to expensive for what is. Especially for up and out scaling later in the life time cycle, it’s just madness. Not to mention that those thing gave us more issues the the physical tape library (those still have a valid a viable role to play when used for the correct purposes). Anyway I consider this to have been my biggest technology choice mistake ever. If you want to read more about that go to Why I’m No Fan Of Virtual Tape Libraries
The old EVA 8000 SANs are awaiting removal in the junk yard area of the data center. They served us well and we’ve been early customers & loyal ones. But the platform was as dead as a dodo long before HP wanted to even admit to that. It took them quite a while to get the 3Par ready for the same market segment and I expect that cost them some sales. They’re ready today, they were not 24-12 months ago.
The next years the storage wares will rage and the landscape will change a lot. But We’re out of the storm for now. We’ll leverage what we got . One tip for all storage vendors. Start listening to your SME customers a lot more than you do now and getting the features they need into their hands. There are only so many big enterprises so until we’re all 100% cloudified, don’t ignore us, as together we buy a lot of stuff to. Many SMEs are interested in more optimal & richer support for their windows environments if you can deliver that you’ll see your sales rise. Keep commodity components, keep building blocks and from factors but don’t use a cookie cutter to determine our needs or “sell” us needs we don’t have. Time to market & open communication is important here. We really do keep an eye on technologies so it’s bad to come late to the party.
At the end of this day I was doing some basic IO tests on some LUNs on one of the new Compellent SANs. It’s amazing what 10 SSDs can achieve … We can still beat them in certain scenarios but it takes 15 times more disks. But that’s not what this blog is about. This is about goofing off after 20:00 following another long day in another very long week, it’s about kicking the tires of Windows and the SAN now that we can.
For fun I created a 300TB LUN on a DELL Compellent, thin provisioned off cause, I only have 250 TB
I then mounted it to a Windows 2008 R2 test server.
The documented limit of a Volume in Windows 2008 R2 is 256TB when you use 64K allocation size. So I tested this limit by trying to format the entire LUN and create a 300TB simple volume. I brought it online, initialized it to an GPT disk, created a simple volume with an allocation unit size of 64K and well that failed with following error:
There is nothing unexpected about this. This has to do with the maximum NTFS volume size supported on a GPT disk. It
depends on the cluster size that is selected at the time of formatting. NTFS is currently limited to 2^32-1 allocation units. This yields a 256TB volume, using 64k clusters. However, this has only been tested to 16TB, or 17,592,186,040,320 bytes, using 4K cluster size. You can read up on this in Frequently asked questions about the GUID Partitioning Table disk architecture. The table below shows the NTFS limits based on cluster size.
This was the first time I had the opportunity to test these limits I formatted part of that LUN to a size close to the limit and than formatted the remainder to a second simple volume.
I still need get a Windows Server 2012 test server hooked up to the SAN. To see if anything has changed there. One thing is for sure, you could put at least 3 64TB VHDX files on a single volume in Windows. Not too shabby . It’s more than enough to put just about any backup software into problems. Be warned, MSFT tested and guarantees performance & behavior up to 64TB in Windows Server 2012, but beyond that you’d better do your own due diligence.
The next thing I’ll do when I have a Windows Server 2012 host hooked up is, is create 64TB VHDX file and see if I can go beyond it before things break. Why, well because I can and I want to take the new SAN and Windows 2012 for a ride to see what boundaries we can push. The SANs are just being set up so now is the time to do some testing.
Close your account?
Your account will be closed and all data will be permanently deleted and cannot be recovered. Are you sure?