Disk to Disk Backup Solution with Windows Server 2012 – Part I

Backing Up 100 Plus Terabyte of Data Cheaply

When dealing with large amounts of data to backup you’re going to start bleeding money. Sure people will try to sell you great solutions with deduplication, but in a lot of scenarios this is not a very cost effective solution. The cost of dedupe in either backup hardware or software is very expensive and in some scenarios the cost cannot be justified. It’s also not very portable by the way unless in certain scenarios in which you stick with certain vendors. Once you get into backing up  > 100TB you need to forget about overly expensive hard & software. Just build your own solutions. Now depending on your needs you might want to buy backup software anyway but forget about dedupe licenses. Some of the more profitable hosting companies & cloud providers are not buying appliances or dedupe software either. They make real good money but they rather spend it on SUVs and swimming pools.

What Can You Do?

You can build your own solution. Really. You can put together some building blocks that scale up and out. You’ll a dual socket server with two 8 core CPUs and 24GB of ram, perhaps 32GB. Plug in some 6Gbps SAS controllers, hook those up to a bunch of 3.5” disk bays with 12 *2TB or 3TB disks each and you’re good to go. You can scale out to about 8 disk bays if you don’t cluster. Plug in a dual port 10Gbps card. You’ll need that as you be hammering that server. If you need more than this system, than scale out, put in a second, a third, etc. 3.5TB –4TB of backup capacity per hour in total should be achievable..

image

When you buy the components from super micro and some on line retailers you can do this pretty cheap. Spare parts you say? Buy some cold spares. You can have a dozen disk on the shelf, a SAS controller and even a shelf if you want. You could use hardware redundancy (RAID, hot spares) or use storage pools & spaces if you’re going the Windows Server 2012 route and save some extra money. Disk bay failure? Scale out so that even when you loose a node you still have tree others up and running. Spread backups around. Don’t backup the same data only to the same node. I know it’s not perfect for deduplication with Windows 2012 that way but hey, you win some, you lose some. Checks & balances right?  If you need a bit more support get some DELL PowerVaults or the like. It depends on what you’re comfortable with and how deep your pockets are.

You can by more storage than dedupe will ever save you & still come out with money to pay for the electricity. Okay it’s less good for the penguins but trust me, those companies selling those solutions would fry a penguin for breakfast everyday if it would make them money. Now talking about those penguins, the Windows Server 2012 deduplication feature could be providing me with the tools to save them Smile, but that’s for another post. I hope this works. I’d love to see it work. I bet some would hate to see it work. So much perhaps that they might even consider making their backup format non dedupableDevil?

Tip for users: Don’t use really cheap green SATA disks. They’re pretty environment friendly but the performance sucks. My view on “Green IT” is to right size everything, never to over subscribe and let that infrastructure work hard for you. This will minimize the hardware needed  and the performance is way better than all the power saving settings and green hardware. Which will ruin the environment anyway as you’ll end up buying more gear to compensate for lack of performance unless you’ll just suffer the bad performance. Keep the green disks for the home user’s picture, movie & music collection and use 2TB/3TB SAS/NL-SAS. Remember that when you don’t cluster (shared storage) you can make due nicely without the enterprise NL-SAS disks.

Now I’m not saying you should do what I suggest here, but you might find it useful to test this on your own scale for your own purposes. I did it for the money. For the money? Yup for the money. No not for me personally, I don’t have a swimming pool and I don’t even own a car, let alone an SUV. But saving your company a 100.000 or more in cash isn’t going to get you into trouble now is it? Or perhaps this is the only way you’re going to afford to back up that volume of data. People don’t throw away data and they don’t care about budgets you’d better be able to restore their data. Which reminds me, you will also need some backup software solutions that doesn’t cost an arm and a leg. That’s also a challenge as you need one that can handle large amounts of data and has some intelligence when I comes to virtualization, snapshots etc. It also has to be easy to use, as simple as possible as this helps ensure backups are made and are valid.

Are we trying to replace appliances or other solutions? No, we’re trying to provide lots of cheap and “fast enough” storage. Reading the data & providing it to the backup device can be an issue as well. Why fast enough? Pure speed on the target side is not useful if the sources can’t deliver. We need this backup space for when the shits really hits the fan and all else has failed.That doesn’t have to be a SAN crash or a SAN firmware issue ruing all your nice snapshots. It can also be the business detecting a mistake in a large data set a mere 14 months after the facts when all replicas, snapshots etc. have already expired. I’m sure you’ve got quality assurance that is so rock solid that this would never happen to you but hey, welcome to my world Sarcastic smile.

Shared Nothing Live Migration White Board Time – Scenario I

The Problem

Let’s say you are very happy with your SAN. You just love the snapshots, the thin provisioning, deduplication, automatic storage tiering, replication, ODX and the SMI-S support. Live is good! But you have one annoying issue. For example; to get the really crazy IOPS for your SQL Server 2012 DAG nodes you would have to buy 72 SSDs to add to you tier 1 storage in that SAN. That’s a lot of money I you know the price range of those. But perhaps you don’t even have SSDs in your SAN.To get the required amount of IOPS from your SAN with SAS or NL-SAS disks in second and respectively third level storage tier you would need to buy a ridiculous amount of disks and, let’s face it, waste that capacity. Per IOPS that becomes a very expensive and unrealistic option.

Some SSD only SAN vendors will happily sell you a SAN that address the high IOPS need to help out with that problem. After all that is their niche, their unique selling point, fixing IOPS bottle necks of the big storage vendors where and when needed. This is cheaper solution per IOPS than you standard SAN can deliver but it’s still a lot of money, especially if you need more than a couple of terabytes of storage. Granted they might give you some extra SAN functionality you are used to, but you might not need that.

Yes I know there are people who say that when you have such needs you also have the matching budgets. Maybe, but what if you don’t? Or what if you do but you can put 500.000 € towards another need or goal? Your competitive advantage for pricing your products and winning customers might come form that budget Winking smile

Creative Thinking or Nuts?

Let’s see if we can come up with a home grown solution bases on Windows Server 2012 Hyper-V. If we can this might solve your business need, save a ton of money and extend  (or even save) the usefulness of you SAN in your environment. The latter is possible because you successfully eliminated the biggest disk IO from you SAN.

The Solution Scenario

So let’s build 3 Hyper-V hosts, non-clustered, each with its own local SAS based storage with commodity SSD drives. You can use either storage pools/spaces with a non-raid SAS HBA or use a RAID SAS HBA with controller based virtual disks for this. If you’ve seen what Microsoft achieved with this during demos you know you can easily get to hundreds of thousands of IOPS. Let’s say you achieve half of what MSFT did in both IOPS and latency. Let’s just put a number on it => that’s about 500.000 IOPS and 5GB/s. Now reduce that for overhead of virtualization, the position of the moon and the fact things turn out a bit less than expected. So let’s settle for 250.000 IOPS and 2.5GB/s. Anybody here who knows what this kind of numbers would cost you with the big storage vendors their SANs? Right, case closed. Don’t just look at the cost, put it into context and look at the value here. What does and can your SAN do and at what cost?

OK we lose some performance due to the virtualization overhead. But let’s face it. We can use SR-IOB to get the very best network performance. We have hundreds of thousands of IOPS. All the cores on the hosts are dedicated to a single virtual machine running a SQL Server DAG node and bar 4Gb of RAM for the OS we can give all the RAM in the hosts to the VM. This one VM to one host mapping delivers a tremendous amount of CPU, Memory, Network and Storage capabilities to your SQL Server. This is because it gets exclusive use of the resources on the host, bar those that the host requires to function properly.

In this scenario it is the DAG that provides high availability to the SQL Server database. So we do not mind loosing shared storage here.

image

Because we have virtualized the SQL server you can leverage Shared Nothing Live Migration to move the virtual machines with SQL server to the central storage of the SAN without down time if the horsepower is no longer needed. That means that you might migrate another application to those standalone Hyper-V hosts That could be high disk IO intensive application, that is perhaps load balanced in some way so you can have multiple virtual machines mapped to the hosts (1 to 1, many to one). You could even automate this all and use the “Beast” as a dynamic resource based on temporal demands.

In the case of the SQL Server DAG you might opt to keep one DAG member on the SAN so it can be replicated and backed up via snapshot or whatever technology you are leveraging on that storage.

Extend to Other Use Cases

More scenarios are possible. You could build such a beast to be a Scale Out File Server or PCI RAID/Shared SAS if you need shared storage to build a Hyper-V cluster when your apps require it for high availability.

image

The latter looks a lot like a cluster in a box actually. I don’t think we’ll see a lot iSCSI in cluster in a box scenarios, SAS might be the big winner here up to 4 nodes (without a “SAS switch”, which brings even “bigger” scenarios to live with zoning,  high availability, active cables and up to 24Gbps of bandwidth per port).

Using a SOFS means that if you also use SMB 3.0 support with your central SAN you can leverage RDMA for shared nothing live migration, which could help out with potentially very large VHDs of your virtual SQL Servers.

Please note that the big game changer here compared to previous versions of windows is Shared Nothing Live Migration. This means that now you have virtual machine mobility. High performance storage and the right connectivity (10Gbps, Teaming, possibly RDMA if using SMB 3.0 as source and target storage) means we no longer mind that much to have separate storage silos. This opens up new possibilities for alleviating IOPS issues. Just size this example to your scenarios & needs to think about what it can do for you.

Disclaimer: This is white board thinking & design, not a formal solution. But I’d cannot ignore the potential and possibilities. And for the critics. No this doesn’t mean that I’m saying modern SANs don’t have a place anymore. Far from it, they solve a lot of needs in a lot of scenarios, but they do have some (very expensive) pain points.