Windows Server 2012 RTM in Production as Backup Media Server

As Brad Anderson suggested a few times this year we are leading our businesses to a better IT future. Here’s a quick screen shot of one of our backups media agent servers hooked up to bunch of NL-SAS disks (150TB). One host (older model, no 10Gbps yet, running Windows 2008 R2) is sending backups over its dedicated 1Gbps interface to this media agent server who has a 10Gbps interface. Performance and traffic looks great Smile. I love it when a plan comes together!

image

 

This is what I looks like at host being backed up

image

Now we’ll see what magic Data Deduplication can do for us with the RTM bits!

Windows Server Backup Benefits from Improvements in Windows Server 2012

Introduction

In certain environments we backup VMs and any remaining physical hosts using Windows Backup. Before you all think this is ridiculous, I advise you to think again. With some automation you can build a very reliable agentless backup solution with the built in functionality. Windows Server 2012 brings good news for the smaller & perhaps low budget environments. Windows Server Backup is now capable of doing host level backups of the Hyper-V guest stored on Clustered Shared Volumes. This was not the case in Windows 2008 R2 and it is a vast improvement.This change is due to the fact that CSV has been changed not to require specialized API capable of dealing with it’s intricacies. All backup products now can backup CSVs without specialized APIs.

This is, linked to huge improvements in how a CSV behaves during a backup. In the past, when you started a backup, the CSV ownership would be moved to the node that runs the backup and all access by other nodes was in redirected mode for the duration of the backup. Unless you used a hardware VSS provider, which were not trouble free either. If your backup software did not understand CSVs and use the CSV APIs you were out of luck. From Windows Server 2012 on you are only in redirected I/O mode for the time it takes to create the VSS snapshot. The rest of the backup duration your nodes access the CDV disk in direct mode. So back to Windows Server Backup. You cannot up the CSV as disk volume but you can select Hyper-V from the items to include in the backup.

image

That will show all VMs running on that host, meaning you cannot backup VMs running on another host. Compared to Windows Server 2008 R2 where using the native Windows backups  with VMs on a CSV LUN meant using in guest backups this a major improvement.

Some Approaches to Using Windows Server Backup

Sometime we run the backups to local disk and regularly copy those off to a file share. This has the benefit of providing the backup versioning you can from using a local disk. The draw back is that the backups can be rather big.

In VMs, that is with backups in the guest, we run those backups to a file share over a 1Gbps management network. Performance is good, but it leaves us with the issue that there is no versioning.

For that reason our backup script copies the entire backup folder for a server to an archive folder on that same JBOD. Depending on how much space you have and need you can can configure the retention time of these older backups. This way you can keep a large number of backups over time.  A script runs every day that deletes the older backups based on the chosen retention time so you don’t run out of space.

There is one way around the lack of versioning when writing backups to a share and that is to mount a VHD on a file share locally to the host where you are running a backup or use pass through disk inside the VM. While you can get away with this becomes rather messy due to management & flexibility drawbacks of pass through disks. Mounting a VHD on a file share inside a VM is also a performance issue. So while possible and viable in certain scenarios I don’t use this for more than a few hosts and those are physical ones.

We had hoped that Windows Server 2012 with its support for VSS snapshots on SMB 3.0 shares would have enabled backup versioning in Windows Backup, just like it can do for backups to local disk. Unfortunately this is not the case. You’ll still get the same warning when backing up from a Windows Server 2012 host to an SMB 3.0 share as you used to get with previous versions:

image

How fast are backups and restores?

The largest environment is a couple of Physical servers, a two node Hyper-V cluster and about 22 virtual machines. That includes some with a larger amount of data. The biggest being 400 GB. Full backups are run weekly at night on all servers over 1Gbps and this works just fine.

We can backup a VM of with a 50GB VHD (about 50% to 75% in use) and copy that backup to the archive folder in 20 minutes. We backup AND copy to archive a VM’s C: Drive (20GB of data) and a D: Drive (190GB of data), 2 separate VHD  in 2.5 hours.

For some statistics. A bare metal restore of a VM or physical host over 1Gbps with a single VHD or volume with the OS, applications and some data takes us 30 tot 35 minutes in real life due to the overhead of setting a new VM up. I you just want to restore individual data you can do that as well. You can even mount the backup VHD and recover them via the Windows Explorer.

This is what a the logging we do and e-mail to the sys admins looks like:

image

Where else do the Windows Server 2012 improvements help?

Data Deduplication

Now there is some good news. We ran ddpeval.exe against the JBOD LUNS where we store these archived backups and got some great results. We also copied such an archive folder to a Windows Server 2012 Host and ran data deduplication against it. In that test we achieved a 84%-85% deduplication rate depending on how many versions of the backups we archive and what the delta is during that time frame.  The latter is important. If we run dedupe only against domain controller backup archives we get up to 94%. Deduplication should not not impact restore performance to much because in 99% of the cases you revert to the last backup which sits in the WindowsBackup Folder. Only if you need older backups you will work against the deduped files, unless the archive folder is on the same LUN as the original backups. I’ don’t have real life info about restores yet. Just a small lab test.

SMB 3.0 & Multichannel

In Windows Server 2012 you also get all benefits of SMB 3.0 and MultiChannel for your backup traffic.

Not your grandfathers ChkDsk

The vastly improved ChkDsk is a comfort for worried minds when it comes to fixing potential corruption on a large LUN. Last but not least, the FLUSH command in NTFS makes using cheaper data disks safer.

VHDX

The VHDX format allows for 64TB. That means your Windows Server backup can now handle more than 2TB LUNs. This should be adequate Smile

Conclusion

With some creativity and automation via scripting you can leverage the Windows Backup to be a nice and flexible solution. Although I feel that providing backup versioning to a file share is an improvement that is missing  the new features help out a lot and all in all, it’s not bad at all! So you see, even smaller organizations can benefits seriously from Windows Server 2012 and get more bang for their bucks.

Disk to Disk Backup Solution with Windows Server 2012 – Part I

Backing Up 100 Plus Terabyte of Data Cheaply

When dealing with large amounts of data to backup you’re going to start bleeding money. Sure people will try to sell you great solutions with deduplication, but in a lot of scenarios this is not a very cost effective solution. The cost of dedupe in either backup hardware or software is very expensive and in some scenarios the cost cannot be justified. It’s also not very portable by the way unless in certain scenarios in which you stick with certain vendors. Once you get into backing up  > 100TB you need to forget about overly expensive hard & software. Just build your own solutions. Now depending on your needs you might want to buy backup software anyway but forget about dedupe licenses. Some of the more profitable hosting companies & cloud providers are not buying appliances or dedupe software either. They make real good money but they rather spend it on SUVs and swimming pools.

What Can You Do?

You can build your own solution. Really. You can put together some building blocks that scale up and out. You’ll a dual socket server with two 8 core CPUs and 24GB of ram, perhaps 32GB. Plug in some 6Gbps SAS controllers, hook those up to a bunch of 3.5” disk bays with 12 *2TB or 3TB disks each and you’re good to go. You can scale out to about 8 disk bays if you don’t cluster. Plug in a dual port 10Gbps card. You’ll need that as you be hammering that server. If you need more than this system, than scale out, put in a second, a third, etc. 3.5TB –4TB of backup capacity per hour in total should be achievable..

image

When you buy the components from super micro and some on line retailers you can do this pretty cheap. Spare parts you say? Buy some cold spares. You can have a dozen disk on the shelf, a SAS controller and even a shelf if you want. You could use hardware redundancy (RAID, hot spares) or use storage pools & spaces if you’re going the Windows Server 2012 route and save some extra money. Disk bay failure? Scale out so that even when you loose a node you still have tree others up and running. Spread backups around. Don’t backup the same data only to the same node. I know it’s not perfect for deduplication with Windows 2012 that way but hey, you win some, you lose some. Checks & balances right?  If you need a bit more support get some DELL PowerVaults or the like. It depends on what you’re comfortable with and how deep your pockets are.

You can by more storage than dedupe will ever save you & still come out with money to pay for the electricity. Okay it’s less good for the penguins but trust me, those companies selling those solutions would fry a penguin for breakfast everyday if it would make them money. Now talking about those penguins, the Windows Server 2012 deduplication feature could be providing me with the tools to save them Smile, but that’s for another post. I hope this works. I’d love to see it work. I bet some would hate to see it work. So much perhaps that they might even consider making their backup format non dedupableDevil?

Tip for users: Don’t use really cheap green SATA disks. They’re pretty environment friendly but the performance sucks. My view on “Green IT” is to right size everything, never to over subscribe and let that infrastructure work hard for you. This will minimize the hardware needed  and the performance is way better than all the power saving settings and green hardware. Which will ruin the environment anyway as you’ll end up buying more gear to compensate for lack of performance unless you’ll just suffer the bad performance. Keep the green disks for the home user’s picture, movie & music collection and use 2TB/3TB SAS/NL-SAS. Remember that when you don’t cluster (shared storage) you can make due nicely without the enterprise NL-SAS disks.

Now I’m not saying you should do what I suggest here, but you might find it useful to test this on your own scale for your own purposes. I did it for the money. For the money? Yup for the money. No not for me personally, I don’t have a swimming pool and I don’t even own a car, let alone an SUV. But saving your company a 100.000 or more in cash isn’t going to get you into trouble now is it? Or perhaps this is the only way you’re going to afford to back up that volume of data. People don’t throw away data and they don’t care about budgets you’d better be able to restore their data. Which reminds me, you will also need some backup software solutions that doesn’t cost an arm and a leg. That’s also a challenge as you need one that can handle large amounts of data and has some intelligence when I comes to virtualization, snapshots etc. It also has to be easy to use, as simple as possible as this helps ensure backups are made and are valid.

Are we trying to replace appliances or other solutions? No, we’re trying to provide lots of cheap and “fast enough” storage. Reading the data & providing it to the backup device can be an issue as well. Why fast enough? Pure speed on the target side is not useful if the sources can’t deliver. We need this backup space for when the shits really hits the fan and all else has failed.That doesn’t have to be a SAN crash or a SAN firmware issue ruing all your nice snapshots. It can also be the business detecting a mistake in a large data set a mere 14 months after the facts when all replicas, snapshots etc. have already expired. I’m sure you’ve got quality assurance that is so rock solid that this would never happen to you but hey, welcome to my world Sarcastic smile.

Why I’m No Fan Of Virtual Tape Libraries

After implementing a couple of SAN’s with backup solutions I have come to dislike Virtual Tape Libraries. This is definitely technology that, for us, has never delivered the promised benefits. To add insult to injury it is overly expensive and only good to practice hardware babysitting. When discussing this I’ve been told that I want things to cheap and that I should have more FTE to handle all the work. That’s swell but the business and the people with the budgets are telling me exactly the opposite. That explains why in the brochures it’s all about reduced cost with empty usage of acronyms like CAPEX and OPEX. But when that doesn’t really materialize the message to the IT Pros is to get more personnel and cash. In the best case (compared to calling you a whining kid) they‘ll offer to replace the current solution with the latest of the greatest that has, wonder oh wonder, reduced CAPEX & OPEX. Fool me once shame on you, fool me twice, shame on me.

So one thing I’m not planning to buy, ever again, is a Virtual Tape Library (VLS). Those things have a shelf life of about 2 years max. After that they turn into auto disintegrating pieces of crap you’ll spend babysitting for the rest of the serviceable live. This means regularly upgrading the firmware to get your LUN(s) back, if you get them back that is. This means convincing tech support to come up with a better solution than restarting from scratch when they acknowledge that their OS never cleans up its own log files and thus one day just kicks the bucket. Luckily they did go the extra mile on that one after we insisted and got a workaround without losing al backups. Babysitting also means that replacing the battery kits of all shelves becomes a new hobby. You become so good at it that you have better and faster way of doing it than the junior engineers they send who happily exclaim “so this is what it looks like”. The latter is not a confidence builder. The disks fail at a rate of 1 to 2 per week until you replaced almost all of them. Those things need to be brought down to fix just about anything. That means shutting down the disk shelf’s as well and cutting the power, not just a reboot so yes you need to be in the data center.

There is no RAID 6, no hot spares (global or otherwise). The disks cost an arm and a leg and have specialized hardware to make sure all runs fine and well. But in they are plain7.200 rpm cheap 500 GB SATA disks that cost way too much. The need for special firmware I can understand but the high cost I cannot. The amount of money you pay in support costs and in licensing the storage volume is more than enough to make a decent profit. Swapping disks and battery kits isn’t hard and we do it ourselves as waiting for a support engineer takes more time. We have spares at hand. We buy those when we by the solution. We’ve grown wise that way. We buy a couple units of all failure prone items at the outset of any storage project. Having only RAID 5 means that one disk failure puts you virtual tapes at a very high risk so you need to replace it as soon as possible. Once they shipped us a wrong disk, our VTL went down the drain due to incorrect firmware on disk. They demanded to know how it got in there. Well Mr. Vendor, you put it in yourself a as a replacement for a failed disk. In the first year it often happened they didn’t have more than 1 spare disk to ship. If anyone else has a VLS in your area you’re bound to hit that limit have to wait longer for parts. They must have upped the spare parts budget to have some more on hand just for us as we now get a steady supply in.

When you look at the complete picture the cost of storage per GB on a VLS is a much as on 1st tier SAN storage. That one doesn’t fly well. At least the SAN has decent redundancy and is luckily a 100 fold more robust and reliable. Why buy a VLS when you can have a premier tier SAN for the same cost, the VLS functionality? No sir that also never lived up to its promises. It has come to the point that the VTL, due to the underlying issues with the device, are more error prone than our Physical Tape Library. That’s just sad. Anyway, we never got the benefits we expected form those VTL. For disk based backup I don’t want a Virtual Tape Library System anymore. It just isn’t worth the cost and hassle.

Look you can buy 100 TB of SATA storage, put it in couple of super micro disk bays, add a 10Gbps Ethernet to your backup network and you’re good to go. Hey that even gives you RAID 6, the ability to add hot spares etc. You buy some spare disks, controllers, NICS, and perhaps even just build two of these setups. That would give you redundant backup to two times 100 TB for under 60.000 €. A VLS with 100TB, support and licensing will put you back the 5 fold of that. Extending that capacity costs an arm and a leg and you’re babysitting it anyway so why bleed money while doing that?

Does this sound crazy? Is this blasphemy? The dirty little secret they don’t like you to know is that’s how cloud players are doing it. First tier storage is always top notch, but if you talk about backing up several hundreds of terabytes of data, the backup solutions by the big vendors are prohibitively expensive. This industry looks a lot like a mafia racketeering business. Well if you don’t buy it you’ll get into trouble, you’ll lose your data. You won’t’ be able to handle it otherwise. Accidents do happen. The guys selling it even dress like mobsters in their suits. But won’t you miss out on cool things like deduplication? If your backup software supports it you can still have that. The licensing cost for this isn’t that bad a deal when compared to VLS storage costs. And do realize instead of 2100 TB you could make due to 2 25 TB. Hey that price even dropped even more.

When it comes to provisioning storage our strategy is to buy as much of your storage needs during the acquisition phase. That’s the only time deals can be made. The amount of discounts you’ll get will make you wonder what the markup in this market actually. It must be huge. And storage can’t cost as much as you think to build as they would like us to believe. Last time the storage sales guy even told us they were not making any money on the deal. Amazing how companies giving away their products have very good profit margins, highly paid employees with sales bonuses and 40.000 € cars If one vendor or reseller ever tells you that things will get cheaper with time, they are lying. They are actually saying their profit margins will increase during the life cycle of you storage solution. Look, all 1st tier storage is going to be expensive. The best you can hope for is to get a good quality product that performs as promised and doesn’t let you down. We’ve been fortunate in that respect to our SAN solutions but when it comes to backup solutions I’m not pleased with the state of that industry. Backups are extremely important live savers, but for some reason the technology and products remain very buggy, error prone, labor intensive and become very expensive when the data volumes and backup requirements rise.