Hyper-V UNMAP Does Work With SAN Snapshots And Checkpoints But Not Always As You First Expect

Recently I was asked to take a look at why UNMAP was not working predictably  in a Windows Server 2012 R2 Hyper-V environment. No, this is not a horror story about bugs or bad storage solutions. Fortunately, once the horror option was of the table I had a pretty good idea what might be the cause.

San snapshots are in play

As it turned out everything was indeed working just fine. The unexpected behavior that made it seem that UNMAP wasn’t working well or at least at moments they didn’t expected it was caused by the SAN snapshots. Once you know how this works you’ll find that UNMAP does indeed work predictably.

Snapshots on SANs are used for automatic data tiering, data protection and various other use cases. As long as those snapshots live, and as such the data in them, UNMAP/Trim will not free up space on the SAN with thinly provisioned LUNs. This is logical, as the data is still stored on the SAN for those snapshots, hard deleting it form the VM or host has no impact on the storage the SAN uses until those snapshots are deleted or expire. Only what happens in the active portion is directly impacted.

An example

  • Take a VM with a dynamically expanding VHDX that’s empty and mapped to drive letter D. Note the file size of the VHDX and the space consumed on the thinly provisioned SAN LUN where it resides.
  • Create 30GB of data in that dynamically expanding  virtual hard disk of the virtual machine
  • Create a SAN snapshot
  • Shift + Delete that 30GB of data from the dynamically expanding virtual hard disk in the virtual machine. Watch the dynamically expanding VHDX  grow in size, just like the space consumed on the SAN
  • Run Optimize-Volume D –retrim to force UNMAP and watch the space consumed of the Size of the LUN on the SAN: it remains +/- the same.
  • Shut down the VM and look at the size of the dynamic VHDX file. It shrinks to the size before you copied the data into it.
  • Boot the VM again and copy 30GB of data to the dynamically expanding VHDX in the VM again.
  • See the size of the VHDX grow and notice that the space consumed on the SAN for that LUN goes up as well.
  • Shift + Delete that 30GB of data from the dynamically expanding  virtual hard disk in the virtual machine
  • Run Optimize-Volume D –retrim to force UNMAP and watch the space consumed of the Size of the LUN on the SAN: It drops, as the data you delete is in the active part of your LUN (the second 30GB you copied), but it will not drop any more than this as the data kept safe in the frozen snapshot of the LUN is remains there (the first 30GB you copied)
  • When you expire/delete that snapshot on the SAN  we’ll see the size on the thinly provisioned SAN LUN  drop to the initial size of this exercise.

I hope this example gave you some insights into the behavior

Conclusion

So people who have snapshot based automatic data tiering, data protection etc. active in their Hyper-V environment and don’t see any results at all should check those snapshot schedules & live times. When you take them into consideration you’ll see that UNMAP does work predictably, all be it in a “delayed” fashion Smile.

The same goes for Hyper-V checkpoints (formerly known as snapshots). When you create a checkpoint the VHDX is kept and you are writing to a avhdx (differencing disk) meaning that any UNMAP activity will only reflect on data in the active avhdx file and not in the “frozen” parent file.

Mind the UNMAP Impact On Performance In Certain Scenarios

The Problem

Recently we’ve been trouble shooting some weird SQL Server to file backup issues. They started failing on the clock at 06:00 AM. We checked the NICs, the switches, the drivers, the LUNs, HBAs, … but it was all well. We considered over stressed buffers as the root cause or spanning tree issues but the clock steadiness of it all was weird. We tried playing with some time out parameters but with little to no avail. Until the moment it hit me, the file deletions that clean up the old backups!We had UNMAP enabled recently on the SAN.

Take a look at the screenshot below an note the deletion times underlined in red. That’s with UNMAP enabled. Above is with UNMAP disabled. The Backup jobs failed waiting for the deletion process.

image

This is a no issues if your backup target is running something prior to Windows Server 2012. if not, UNMAP is disabled by default. I know about the potential performance impact of UNMAP when deleting or more larger files due to the space reclamation kicking in. This is described here Plan and Deploy Thin Provisioning under the heading “Consider space reclamation and potential performance impact”. But as I’m quite used to talking about many, many terabytes of data I kind of forget to think of 500 to 600GB of files as “big” Embarrassed smile. But it seemed to a suspect so we tested certain scenarios and bingo!

Solutions

  1. Disable the file-delete notification that triggers real-time space reclamation. Find the following value HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlFileSystemDisableDeleteNotification and set it to 1.

    Note that: This setting is host wide, so for all LUNs. Perhaps that server has many other roles or needs to server that could benefit from UNMAP. If not this is not an issue.  It is however very efficient in avoiding issues. You can still use the Defragment and Optimize Drives tool to perform space reclamation on-demand or on a scheduled basis.

  2. Create LUNs that will have high deltas in a short time frame as fully provisioned LUNs (aka thick LUNs). As you do this per LUN and not on the host it allows for more fine grained actions than disabling UNMAP.  It makes no sense to have UNMAP do it’s work to reclaim the free space that deleting data created when you’ll just be filling up that space again in the next 24 hours in an endless cycle. Backup targets are a perfect example of this. This avoid the entire UNMAP cycle and you won’t mind as it doesn’t make much sense and fixes you issue. The drawback is you can’t do this for an existing volumes. So it has some overhead & downtime involved depending on the SAN solution you use. It also means that you have to convince you storage admins to give you fully provisioned LUNs, which might or might not be easy depending on how things are organized.

Conclusion

UNMAP has many benefits both in the physical and virtual layer. As with all technologies you have to understand its capabilities, requirements, benefits and draw backs. Without this you might run into trouble.

Future Proofing Storage Acquisitions Without A Crystal Ball

Dealing with an unknown future without a crystal ball

I’ve said it before and I’ll say it again. Storage Spaces in Windows Server 2012 (R2) is are the first steps of MSFT to really make a difference (or put a dent into) in the storage world. See TechEd 2013 Revelations for Storage Vendors as the Future of Storage lies With Windows 2012 R2 (that was a nice blog by the way to find out what resellers & vendors have no sense of humor & perspective). It’s not just Microsoft who’s doing so. There are many interesting initiatives at smaller companies to to the same. The question is not if these offerings can match the features sets, capabilities and scenario’s of the established storage vendors offerings. The real question is if the established vendors offer enough value for money to maintain themselves in a good enough is good enough world, which in itself is a moving target due to the speed at which technology & business needs evolve. The balance of cost versus value becomes critical for selecting storage. You need it now and you know you’ll run it for 3 to 5 years. Perhaps longer, which is fine if it serves your needs, but you just don’t know. Due to speed of change you can’t invest in a solution that will last you for the long term. You need a good fit now at reasonable cost with some headway for scale up / scale out. The ROI/TCO has to be good within 6 months or a year. If possible get a modular solution. One where you can replace the parts that are the bottle neck without having to to a fork lift upgrade. That allows for smaller, incremental, affordable improvements until you have either morphed into a new system all together over a period of time or have gotten out of the current solution what’s possible and the time has arrived to replace it. Never would I  invest in an expensive, long term, fork lift, ultra scalable solution. Why not. To expensive and as such to high risk. The risk is due to the fact I don’t have one of these:

http://trustbite.co.nz/wp-content/uploads/2010/01/Crystal-Ball.jpg

So storage vendors need to perform a delicate balancing act. It’s about price, value, technology evolution, rapid adoption, diversification, integration, assimilation & licensing models in a good enough is good enough world where the solution needs to deliver from day one.

I for one will be very interested if all storage vendors can deliver enough value to retain the mid market or if they’ll become top feeders only. The push to the cloud, the advancements in data replication & protection in the application and platform layer are shaking up the traditional storage world. Combine that with the fast pace at which SSD & Flash storage are evolving together with Windows Server 2012 that has morphed into a very capable storage platform and the landscape looks very volatile for the years to come. Think about  ever more solutions at the application (Exchange, SQL server) and platform layer (Hyper-V replica) with orchestration on premise and/or in the cloud and the pressure is really on.

So how do you choose a solution in this environment?

Whenever you are buying storage the following will happen. Vendors, resellers & sales people, are going to start pulling at you. Now, some are way better than others at this, some are even down right good at this whole process a proceed very intelligently.

Sometimes it involves FUD, doom & gloom combined with predictions of data loss & corruption by what seem to be prophets of disaster. Good thing is when you buy whatever they are selling that day, they can save you from that. The thing is this changes with the profit margin and kickbacks they are getting. Sometimes you can attribute this to the time limited value of technology, things evolve and todays best is not tomorrows best. But some of them are chasing the proverbial $ so hard they portray themselves as untrustworthy fools.

That’s why I’m not to fond of the real big $ projects. Too much politics & sales. Sure you can have people take care of but you are the only one there to look out for your own interests. To do that all you need to do is your own due diligence and be brave. Look, a lot of SAN resellers have never ever run a SAN, servers, Hyper-V clusters, virtualized SQL Server environments or VDI solutions in your real live production environments for a sustained period of time. You have. You are the one whose needs it’s all about as you will have to live and work with the solution for years to come.  We did this exercise and it was worth while. We got the best value for money looking out for our own interests.

Try this with a reseller or vendor. Ask them about how their hardware VSS providers & snapshot software deals with the intricacies of CSV 2.0 in a Hyper-V cluster. Ask them how it works and tell them you need to references to speak to who are running this in production. Also make sure you find your own references. You can, it’s a big world out there and it’s a fun exercise to watch their reactions Winking smile

As Aidan remarked in his blog on ODX–Not All SANs Are Created Equally

These comparisons reaffirm what you should probably know: don’t trust the whitepapers, brochures, or sales-speak from a manufacturer.  Evidently not all features are created equally.

You really have to do your own due diligence. Some companies can afford the time, expense & personnel to have the shortlisted vendors deliver a system for them to test. Costs & effort rise fast if you need to get a setup that’s comparable to the production environment. You need to device tests that mimic real life scenario’s in storage capacity, IOPS, read/write patterns and make sure you don’t have bottleneck outside of the storage system in the lab.

Even for those that can, this is a hard thing to do. Some vendors also offer labs at their Tech Centers or Solutions Centers where customers or potential customers can try out scenarios. No matter what options you have, you’ll realize that this takes a lot of effort. So what do I do? I always start early. You won’t have all the information, question & answers available with a few hours of browsing the internet & reading some brochures. You’ll also notice that’s there’s always something else to deal with or do, so give your self time, but don’t procrastinate. I did visit the Tech Centers & Solution Centers in Europe of short listed vendors. Next to that I did a lot of reading, asked questions and talked to a lot of people about their view and experiences with storage. Don’t just talk to the vendors or resellers. I talked a lot with people in my network, at conferences and in the community. I even tracked down owners of the shortlisted systems and asked to talk to them. All this was part of my litmus test of the offered storage solutions. While perfection is not of this world there is a significant difference between vendor’s claims and the reality in the field. Our goal was to find the best solution for our needs based on price/value and who’s capabilities & usability & support excellence materialized with the biggest possible majority of customers in the field.

Friendly Advice To Vendors

So while the entire marketing and sales process is important for a vendor I’d like to remind all of them of a simple fact. Delivering what you sell makes for very happy customers who’s simple stories of their experiences with the products will sell it by worth of mouth. Those people can afford to talk about the imperfections & some vNext wishes they have. That’s great as those might be important to you but you’ll be able to see if they are happy with their choice and they’ll tell you why.

Upgrading Your DELL Compellent Storage Center Firmware (Part 2)

This is Part 2 of this blog. You’ll find Part 1 over here.

In part 1 we prepared our Compellent SAN to be ready and install Storage Center 6.3.10 that has gone public.  As said, 6.3.10 brings interesting features like ODX and UNMAP to us Windows Server 2012 Hyper-V users. It also introduces some very nice improvements to synchronous replication and Live Volumes. But here we’ll just do the actual upgrade, the preparations & health check have been done in part 1 so we can get started here right away.

Log in to your Compellent system and navigate to the Storage Management menu. Click on “System”, select Update and finally click on “Install Update”.  It’s already there as we downloaded it in Part 1. Click on “Install Now” to kick it all off.

image

Click on Install now to launch the upgrade.

image

After initialization you can walk away for 10 minutes but you might want to keep an eye on things and the progress of the process.

image

So go have a look at your storage center. Look at the Alert Monitor for example and notice that the “System is undergoing maintenance”.

image

When the controller the VIP address of the SAN reboots it becomes unavailable. After a while you can login again to the other controller via the VIP, if you cant’ wait a few seconds just use the IP address of the active controller. That will do.

image

When you log in again you’ll see the evidence of an ongoing SAN firmware upgrade. Nothing to panic about.image

This is also evident in Alert Monitor. CoPilot knows you’re doing the upgrade so no unexpected calls to make sure your system is OK will come in. They’re there every step of the way. The cool thing is that is the very first SAN we ever owned that we don’t need engineers on site or complex and expensive procedure to do all this. It’s all just part of an outstanding customer service Compellent & DELL deliver.image

You can also take a peak at your Enterprise manager software to see paths going down and so on. The artifacts of a sequential controller failovers during an upgrade. Mind you you’re not suffering downtime in most cases.image

Just be patient and keep an eye on the process. When you log in again after the firmware upgrade and your system is up and running again, you’ll be asked to rebalance the ports & IO load between the controllers on the system. You do, so click yes.image

image

When done you’ll return to the Storage Center interface. Navigate to “Help”" and click on About Compellent Storage Center. image

You can see that both controllers are running 6.3.10.

image

You’re rocking the new firmware. As you kept an eye on your hosts you should know these are good to go. Send of an e-mail to CoPilot support and they’ll run a complete health check on your system to make sure you’re good to go. Now it’s time to start leveraging the new capabilities you just got.