DELL Compellent Hardware VSS Provider & Commvault on Windows Server 2012 Hyper-V nodes – Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface. hr = 0x80070005, Access is denied.

As you know by now I’ve been building a high throughput, large volume Disk2Disk backup solution and that has been rather successful. At optimal speed we get 2TB/Hour per backup media server. As we currently have two, we can get to 4TB/hour at maximum throughput Smile. Currently that is, we’ll see if more is possible. The solution, which I’ll blog about later, is based on the Dell Compellent hardware VSS provider (Dell Compellent Replay Manager 6.2.0.9), Windows Server 2012, CommVault 9.0 R2 and PowerVault storage a the target disks and as it’s working now is saving us many hundred of thousands of Euros compared to dedicated D2D backup appliances or solutions.

The entire process has been a fairly smooth one, which was a relief as decent hardware VSS providers are not easy to come by, based on our experience and that of many colleagues. So, once again the DELL Compellent choice is turning out to be a good one. We did have to fix one small issue along the way.

Whilst using the DELL Compellent Hardware VSS Provider with Commvault Simpana 9.0 R2  to do host level back of the virtual machines on Windows Server 2012 Hyper-V cluster nodes. We ran into a small issue. The backup run fine but we saw this error being logged:

Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface.  hr = 0x80070005, Access is denied.
. This is often caused by incorrect security settings in either the writer or requestor process.

Operation:
   Gathering Writer Data

Context:
   Writer Class Id: {e8132975-6f93-4464-a53e-1050253ae220}
   Writer Name: System Writer
   Writer Instance ID: {3f4965d8-10ac-411b-bf6d-6a607f237775}

image

The description found here was not helpful in resolving this. This error is found all over the internet with just about any backup product. The possible causes and solutions that are suggested are as follows:

Change the Language for non-Unicode programs to English (United States)

This is not a solution in our case, which would have surprised me if it had been.

image

Eliminate the error condition by adding the access permissions for the domain account that the Compellent Replay Manager Service for Microsoft Servers (VSS) runs under to COM Security of the affected server

As the domain account used for this service is a member of the local administrators group that already has the permissions this would also have surprised me but we tested it anyway. It turns out this is not the cause either.

image

But for your information this is how it’s done:

You can eliminate the error condition by adding the access permissions for the Network Service account to the COM Security of the affected server. To add the access permissions for the Network Service account, do the following:

  • From the Start Menu, select Run
    The Run dialog opens.In the Open field, input dcomcnfg and click OK.
    The Component Services dialog opens.
  • Expand Component Services, Computers, and My Computer.
    Right-click My Computer and click Properties on the pop-up menu.
  • The My Computer Properties dialog opens.
  • Click the COM Security tab.
    Under Access Permission click Edit Default.
  • The Access Permissions dialog opens.
  • From the Access Permissions dialog, add the DOMAINcompellentreplayservice account with Local Access & Remote access allowed (cluser => not just a local host).
  • Close all open dialogs.
  • Restart the computer.

The real cause & solution

According to Microsoft support this issue occurs when using a 3rd party backup program that utilizes Windows VSS (Volume Shadow Service) and has its own requestor. This is indeed the case here. “It looks like the requestor (the backup application) does not allow system writer to call back into their process and hence generates the error in the application log.” This sounds very plausible Winking smile

The fix:

  1. The following example grants access to the "DOMAINcompellentreplayservice" account.
  2. Click on Start, type regedit in the search box.
  3. On the Registry Editor window, navigate to: KEY_LOCAL_MACHINE>SYSTEM>CurrentControlSet>Services>VSS>VssAccessControl
  4. Add a DWORD value with the name: "DOMAINcompellentreplayservice" and set the value to “1”.

image

image

image

And voila: the error has gone when running backups Open-mouthed smile

image

I assume no responsibility if you do this in your environment but I can say that all this works perfectly in our CommVault Simpana 9.0 R2 setup whist backing up the virtual machines on our Hyper-V cluster nodes at the host level using the DELL Compellent hardware VSS provider. And yes this is Windows Server 2012, careful testing, planning helps when being an early adaptor.

Migration LUNs to your Compellent SAN

A Hidden Gem in Compellent

As you might well know I’m in the process of doing a multi site SAN replacement project to modernize the infrastructure at a non disclosed organization. The purpose is to have a modern, feature reach, reliable and affordable storage solution that can provide the Windows Server 2012 roll out with modern features (ODX, SMI-S, …).

One of the nifty things you can do with a Compellent SAN is migrations from LUNs of the old SAN to the Compellent SAN with absolute minimal downtime. For us this has proven a real good way of migrating away from 2 HP EVA 8000 SANs to our new DELL Compellent environment. We use it to migrate file servers, Exchange 2010 DAG Member servers (zero downtime),  Hyper-V clusters, SQL Servers, etc. It’s nothing less than a hidden gem not enough people are aware off and it comes with the SAN. I was told that it was hard & not worth the effort by some … well clearly they never used and as such don’t know it. Or they work for competitors and want to keep this hidden Winking smile.

The Process

You have to set up the zoning on all SANs involved to all fabrics. This needs to be done right of course but I won’t be discussing this here. I want to focus on the process of what you can do. This is not a comprehensive how to. It depends on your environment and I can’t write you a migration manual without digging into that. And I can’t do that for free anyway. I need to eat & pay bills as well Winking smile

Basically you add your target Compellent SAN as a host to your legacy SAN (in our case HP EVA 8000) with an operating system type of “Unknown”. This will provide us with a path to expose EVA LUNs to our Compellent SAN.

image

Depending on what server LUNs you are migrating this is when you might have some short downtime for that LUN. If you have shared nothing storage like in an Exchange 2010 or a SQL Server 2012 DAG you can do this without any downtime at all.

Stop any IO to the LUN if you can (suspend copies, shut down data bases, virtual machines) and take CSVs or disks offline. Do what is needed to prevent any application and data issue, this varies.

What we then do is we unpresent the LUN of a server on the legacy SAN.

image

After a rescan of the disks on the server you’ll see that disk/LUN disappear.

This same LUN we then present to the Compellent host we added above.

image

 

We then “Scan for Disks” in the Compellent Controller GUI. This will detect the LUN as an unassigned disk. That unassigned disk can be mapped to an “External Device” which we name after the LUN to keep things clear (“Classify Disk as External Device” in the picture below).

image

 

Then we right click that External Device and choose to “Restore Volume from External Device”.

image

This kicks off replication from the EVA LUN mapped to the Compellent target LUN. We can now map that replica to the host as you can see in this picture.

image

After this rescan the disks on the server and voila, the server sees the LUN again. Bring the disk/CSV back online and you’re good to go.

image

All the downtime you’ll have is at a well defined moment in time that you choose. You can do this one LUN at the time or multiple LUNs at once. Just don’t over do it with the number of concurrent migrations. Keep an eye on the CPU usage of your controllers.

After the replication has completed the Compellent SAN will transparently map the destination LUN to the server and remove the mapping for the replica.

image

 

The next step is that the mirror is reversed. That means that while this replica exists the data written to the Compellent LUN is also mirrored to the old SAN LUN until you break the mirror.

image

 

Once you decide you’re done replicating and don’t want to keep both LUNs in sync anymore, you break the mirror.

image

 

You delete the remaining replica disk and you release the external disk.

image

 

Now you unpresent the LUN from the Compellent host on your old SAN.

image

 

After a rescan your disks will be shown as down in unassigned disks and you can delete them there. This completes the clean up after a LUN migration.

image

 

Conclusion

When set up properly it works very well. Sure it takes some experimenting to deal with some intricacies, but once you figure all that out you’re good to go and are ready to deal with any hiccups that might occur. The main take away is that this provides for minimal downtime at a moment that you choose. You get this out of the box with your Compellent. That’s a pretty good deal I say!

So as you can see this particular environment will be ready for Windows Server 2012 & Hyper-V. Life is good!

Multi Site SAN Storage & Windows Server 2012 Hyper-V Efforts Under Way

First some stats: 36 pallets of hardware handled over a period of 10 days. 29 of those over a period of 3 days. Most of it didn’t even exist at the beginning of the month, it was just an order. But DELL is a logistical force to be reckoned with. “Easy as DELL” is a reality, the speed at which they respond to request and orders is amazing. For quality/price balance, service, logistics, speed and support, it’s hard to beat them

A lot of people are used to dealing with slower processes and think SANs take at least 2 to 3 months to de delivered after ordering. This means they are caught of guard by this. I’m happy to say I’m not otherwise the data center would have been blocked by a tsunami of packaging material and hardware.

We’ve been busy unloading, unpacking, racking and partially cabling the new hardware coming in for a multi site SAN project. And let’s not forget the labeling. While we are far from finished, this good news. We’re finally busy working on the installation after the long time consuming process of procuring the equipment. That’s never an easy process, let alone a fast one. But I digress.

What are we working with?

  • Dell Compellent SANs (intra and inter site data protection / redundancy)
  • PowerVault MD3600 & MD1200 storage units for disk to disk backup capacity

Now to go from this

image

to this and beyond  …

IMGP0822

image

Takes quite a while as you can imagine and we still have a ton of stuff to do Smile. I’ll be sharing my experiences and findings via this blog when I can.

My high level design  focuses on scale out to achieve both performance, flexibility and resiliency. We’ll build a modular scale up and scale out solution using commodity hardware and not in a mega redundant, ultra scalable single and very expensive storage solution. You can read more on my views about this subject here Some Thoughts Buying State Of The Art Storage Solutions Anno 2012.For the backup we are following the same approach. We cannot afford to pay the amounts of money that seems to be needed to buy high end backup appliances. We have plans to leverage Windows 2012 to help us achieve this but these are subjects for some other blog posts later.

DELL PowerConnect 8024F Is Now Stackable

A colleague pointed me the latest firmware update (4.2.0.4) for the DELL PowerConnect 8024F switches. As I was reading the release notes one item in particular caught my attention. The PowerConnect 8024/8024f/M8024-k switches are now stackable. You can put up to 6 switches in one stack using the regular front ports (SFP+). You might remember form a previous blog post on 10Gbps, Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4), where I mentioned that we got a great deal on those switches. I also mentioned that the only thing lacking in these switches and what would make this the best 10Gbps switch when comparing value for money is the ability to stack them. I quote myself:

“They could make that 8024F an unbeatable price/quality deal if they would make them stackable.”

I’ve been called visionary before but I won’t go into that that insider joke right now Winking smile. Now it’s for sure that not just my little blog post that made this update happen but it is a nice New Year’s gift. More features & options with hardware you already own is always nice. So I guess a lot of people have made the same observation, both customers & DELL themselves. You could just “smell” by the available command & configuration that this switch could be made stackable and they did.

Is Ethernet based stacking perfect? No (there is very little perfection in this world). The biggest drawback, if you need that feature,  is the fact that you can hot plug the stacking links. But for all other practical purposes it’s a nice deal. Why? Well, now that these switches supports Ethernet based stacking you will be able to choose more types of NIC Teaming to use for your servers. That means those teaming configurations that are dependent on stacking, such as for active-active NIC Teaming across two switches to be more precise. I find this pretty good news.

You all know I’m very enthusiastic to use the NIC Teaming build into Windows 8 and I will use it where and when I can. But there will be for many years to come a lot of Windows 2008 R2 systems to support and install. So it’s always good to see your hardware vendors improving their gear to give you more options. For the pricing I got on the 8024F in the last project and the needs of the solution we could deal with not being able to stack. Stacking via Ethernet using other switches was way more expensive, not even to mention the ones using stacking module ones (real premium pricing). So we got the best deal for our needs.

For 10Gbps switches stacking over Ethernet give you up to 80Gbps with a maximum of 8 uplinks so bandwidth is not as much a concern. With 1Gbps switches it is, which makes stacking modules the only way to go there I think. If you need massive bandwidth and you probably do. The drawback, as with all forms of inter switch links (a LAG for example) is that this method means you’re losing ports for other purposes. But you need to look at your needs and do the math. I think buying with investment protection is good but don’t always buy in preparation for the time you’ll become a fortune 500 company. That takes a while and in the mean while you’ll be very well served anyway.

Another related feature that’s new is Nonstop Forwarding (NSF). NSF allows the forwarding plane of stack units to continue forwarding packets even while the control and management planes restart. This could be a power failure, some hardware of software error or even an upgrade. This feature is common to all stackable switches as far as I know and is needed. Not that ‘m saying the redundant loop in stack is bad or overkill, far from it, but that takes care of other scenarios that NSF is designed to handle.