There is a small environment that provides web presence and services. In total there a bout 20 production virtual machines. These are all backed up to a Transparent Failover File Share on a Windows Server 2012 cluster that is used to host all the infrastructure and management services.
The LUN/Volume for the backups is about 5.5 TB of storage is available. The folder layout is shown in the screenshot below. The backups are run “in guest” using native Windows Backup which has the WindowsImageBackup subfolder as target. Those backups are archived to an “Archives” folder. That archive folder is the one that gets deduplicated, as the WindowsImageBackup folder is excluded.
This means that basically the most recent version is not deduplicated guaranteeing the fastest possible restore times at the cost of some disk space. All older (> 1 day) backup files are deduplicated. We achieve the following with this approach:
- It provides us with enough disk space savings to keep archived backups around for longer in case we need ‘m.
- It also provides for enough storage to backup more virtual machines while still being able to maintain a satisfactory number of archived backups.
- Ay combination of the above two benefits can be balanced versus the business needs
- It’s a free, zero cost solution
The Results
About 20 virtual machines are backed up every week (small delta and lots of stateless applications).As the optimization runs we see the savings grow. That’s perfectly logical. The more backups we make of virtual machines with a small delta the more deduplication can shine. So let’s look at the results using Get-DedupStatus | fl
A couple of weeks later it looks like this.
Give it some more months, with more retained backusp, and I think we’ll keep this around 88%-90% .From tests we have done (ddpeval.exe) we think we’ll max out at around 80% savings rate. But it’s a bit less here overall because we excluded the most recent backups. Guess what, that’s good enough for us . It beats buying extra storage of paying a wad of money for disk deduplication licenses from some backup vendor or appliance. Just using the build in deduplication mechanisms in Windows Server 2012 Server saved us a bunch of money.
The next step is to also convert the production Hyper-V cluster to Windows Server 2012 so we can do host based backups with the native Windows Backup that now supports Cluster Shared Volumes (another place where that 64TB VHDX size can come in handy as Windows backup now writes to VHDX).
Some interesting screen shots
The volume reports we’re using 3TB in data. So 2.4TB is free.
Looking at the backup folder you see 10.9TB of data stored on 1.99 TB of disk .
So the properties of the volume reports more disk space used that the actual folder containing the data. Let’s use WinDirStat to have a look.
So the above agrees with the volume properties. In the details of this volumes we again see about 2TB of consumed space.
Could it be that the volume might is reserving some space ensure proper functioning?
When you dive deeper things we get some cool view of storage space used.. Where Windows Explorer is aware of deduplication and shows the non deduplicates size for the vhd file, WinDirStat does not, it always shows shows the size on disk, which is a whole lot less.
This is the same as when you ask for the properties of a file in Windows Explorer.
Discussion
Is it the best solution for everyone? Not always no. The deduplication is done on the target after the data is copied there. So in environments where bandwidth is seriously constrained and there is absolutely no technical and/or economical way to provide the needed throughput this might not be viable solution. But don’t dismiss this option to fast. In a lot of scenarios is it is very good and cost effective feature. Technically & functionally it might be wiser to do it on the target as you don’t consumes to much memory (deduplication is a memory hog) an CPU cycles on the source hosts. Also nice is that these dedupe files are portable across systems. VEEAM has demonstrated some nice examples of combing their deduplication with Windows dedupe by the way. So this might also be an interesting scenario.
Financially the the cost of deduplication functionality with hardware appliances or backup software hurts like the kick of a horse straight onto the head. So even if you have to invest a little in bandwidth and cabling you might be a lot better of. Perhaps, as you’re replacing older switches by new 1Gbps or 10Gbps gear, you might be able to recuperate the old ones as dedicated backup switches. We’re using mostly recuperated switch ports and native Windows NIC teaming, it works brilliantly. I’ve said this before, saving money whilst improving operations rarely gets you fired. The sweet thing about this that this is achieved by building good & reliable solutions, which means they are efficient even if it costs some money to achieve. Some managers focus way to much on efficiency from the start as to them means nothing more than a euphemism for saving every € possible. Penny wise and Pound foolish. Bad move. Efficiency, unless it is the goal itself, is a side effect of a well designed and optimized solution. A very nice and welcome one for that matter, but it’s not the end all be all of a solution or you’ll have the wrong outcome.
Hello
Great post, I am actualy contemplating doing the same thing for some clients.
My only problem is the part about moving the backup to the archive folder.
How do you handle that? I could ofcourse try to write my self some scripts to do all the copying, and something to clean up archived folders older than XX days.
But I do not like to rely on a homemade script for backup + it wouldnt necesarily give me any easy way to monitor it. Like what happens if a copy fails for some reason, or some other error. Ofcourse it could all be handled in the homemade script if one is good enough at scripting/has the time to implement alerting and monitoring in the script, but that seems like alot of work.
So since you write the blog as its in production somewhere, how are you handling the monitoring/alerting part of it?
Thanks for an interesting blog.
Best regards
Jonas Larsen
That is a scripted solution. We collect all feedback from the backup (log, progress, errors, success) to a log file that is send to the archive as well as historical reference on the backup target and send it to the IT guy indication success or possible failure as well. Archive copy and deletion is also scripted.
MSFT now has an SMB 3.0 VSS provider so basically the same backup model as for local disk could be feasible now I guess when backing up to a file share but it’s nothing the box today. That would deal with not having to manage an archive but no joy as of yet.
Pingback: Microsoft Most Valuable Professional (MVP) – Best Posts of the Week around Windows Server, Exchange, SystemCenter and more – #3 - Dell TechCenter - TechCenter - Dell Community
Pingback: Microsoft Most Valuable Professional (MVP) – Best Posts of the Week around Windows Server, Exchange, SystemCenter and more – #3 - Flo's Datacenter Report
Pingback: You Got To Love Windows Server 2012 Deduplication for Backups | Working Hard In IT
Cool thanks for this. There’s not a lot of people posting their results that I can find. I did some testing with SQL backups and Veeam backups as well if anyone is interested. I wasn’t quite as thorough as Jonas 🙂 but you can get a feel for the numbers you can expect here: http://blog.randait.com/2012/12/server-2012-dedupe-results/
You’re welcome. good to see some other numbers from more real environments as well 🙂 Thx!
Pingback: Microsoft Virtualisierungs Podcast Folge 25: PowerShell 3.0 « Rachfahl IT Solutions