Dell Compellent SCOS 6.7 ODX Bug Heads Up

UPDATE 3: Bad and disappointing news. After update 2 we’ve seen DELL change the CSTA (CoPilot Services Technical Alert)  on the customer website to “’will be fixed” in a future version. No according to the latest comment on this blog post that would be In Q1 2017. Basically this is unacceptable and it’s a shame to see a SAN that was one of the best when in comes to Hyper-V Support in Windows Server 2012 / 2012 R2 decline in this way. If  7.x is required for Windows Server 2016 Support this is pretty bad as it means early adopters are stuck or we’ll have to find an recommend another solution. This is not a good day for Dell storage.

UPDATE 2: As you can read in the comments below people are still having issues. Do NOT just update without checking everything.

UPDATE: This issue has been resolved in Storage Center 6.7.10 and 7.Ximage

If you have 6.7.x below 6.7.10 it’s time to think about moving to 6.7.10!

No vendor is exempt form errors, issues, mistakes and trouble with advances features and unfortunately Dell Compellent has issues with Windows Server 2012 (R2) ODX in the current release of SCOS 6.7. Bar a performance issue in a 6.4 version they had very good track record in regards to ODX, UNMAP, … so far. But no matter how good your are, bad things can happen.

DellCompellentModern

I’ve had to people who were bitten by it contact me. The issue is described below.

In SCOS 6.7 an issue has been determined when the ODX driver in Windows Server 2012 requests an Extended Copy between a source volume which is unknown to the Storage Center and a volume which is presented from the Storage Center. When this occurs the Storage Center does not respond with the correct ODX failure code. This results in the Windows Server 2012 not correctly recognizing that the source volume is unknown to the Storage Center. Without the failure code Windows will continually retry the same request which will fail. Due to the large number of failed requests, MPIO will mark the path as down. Performing ODX operations between Storage Center volumes will work and is not exposed to this issue.

You might think that this is not a problem as you might only use Compellent storage but think again. Local disks on the hosts where data is stored temporarily and external storage you use to transport data in and out of your datacenter, or copy backups to are all use cases we can encounter.  When ODX is enabled, it is by default on Windows 2012(R2), the file system will try to use it and when that fails use normal (non ODX) operations. All of this is transparent to the users. Now MPIO will mark the Compellent path as down. Ouch. I will not risk that. Any IO between an non Compellent LUN and a Compellent LUN might cause this to happen.

The only workaround for now is to disable ODX on all your hosts. To me that’s unacceptable and I will not be upgrading to 6.7 for now. We rely on ODX to gain performance benefits at both the physical and virtual layer. We even have our SMB 3 capable clients in the branch offices leverage ODX to avoid costly data copies to our clustered Transparent Failover File Servers.

When a version arrives that fix the issue I’Il testing even more elaborate than before. We’ve come to pay attention to performance issues or data corruption with many vendors, models and releases but this MPIO issue is a new one for me.

DELL Microsoft Storage Spaces Offerings

Dell was the 1st OEM to actively support and deliver Microsoft Storage Spaces solutions to its customers.

image

They recognized the changing landscape of storage and saw that this was one of the option customers are interested in. When DELL adds their logistical prowess and support infrastructure into the equation it helps deliver Storage Spaces to more customers. It removes barriers.

In June 2015 DELL launched their newest offering based on generation 13 hardware.

image

Recently DELL has published it’s docs and manuals for Storage Spaces with the MD1420 JBOD Storage Spaces with the MD1420 JBOD

You can find some more information on DELL storage spaces here and here.

I’m looking forward to what they’ll offer in 2016 in regards to Storage Spaces Direct (S2D) and networking (10/25/40/50/100Gbps). I’m expecting that to be a results of some years experience combined with the most recent networking stack and storage components. 12gbps SAS controller, NVMe options in Storage Spaces Direct. Dell has the economies of scale & knowledge to be one of the best an major players in this area. Let’s hope they leverage this to all our advantage. They could (and should) be the first to market with the most recent & most modern hardware to make these solutions shine when Windows Server 2016 RTM somewhere next year.

Dell Storage Replay Manager 7.6.0.47 for Compellent 6.5

Recently as a DELL Compellent customer version 7.6.0.47 became available to us. I download it and found some welcome new capabilities in the release notes.

  • Support for vSphere 6
  • 2024 bit public key support for SSL/TLS
  • The ability to retry failed jobs (Microsoft Extensions Only)
  • The ability to modify a backup set (Microsoft Extensions Only)

The ability to retry failed jobs is handy. There might be a conflicting backup running via a 3rd party tool leveraging the hardware VSS provider. So the ability to retry can mitigate this. As we do multiple replays per day and have them scheduled recurrently we already mitigated the negative effects of this, but this only gibes us more options to deal with such situations. It’s good.

image

The ability to modify a backup set is one I love. It was just so annoying not to be able to do this before. A change in the environment meant having to create a new backup set. That also meant keeping around the old job for as long as you wanted to retain the replays associated with that job. Not the most optimal way of handling change I’d say, so this made me happy when I saw it.

image

Now I’d like DELL to invest a bit more in make restore of volume based replays of virtual machines easier. I actually like the volume based ones with Hyper-V as it’s one snapshot per CSV for all VMs and it doesn’t require all the VMs to reside on the host where we originally defined the backup set. Optimally you do run all the VMs on the node that own the CSV but otherwise it has less restrictions. I my humble opinion anything that restricts VM mobility is bad and goes against the grain of virtualization and dynamic optimization. I wonder if this has more to do with older CVS/Hyper-V versions, current limitations in Windows Server Hyper-V or CVS or a combination. This makes for a nice discussion, so if anyone from MSFT & the DELL Storage team responsible for Repay Manager wants to have one, just let me know Smile 

Last but not least I’d love DELL to communicate in Q4 of 2015 on how they will integrate their data protection offering in Compellent/Replay manager with Windows Server 2016 Backup changes and enhancements. That’s quite a change that’s happing for Hyper-V and it would be good for all to know what’s being done to leverage that. Another thing that is high on my priority for success is to enable leveraging replays with Live Volumes. For me that’s the biggest drawback to Live Volumes: having to chose between high/continuous availability and application consistent replays for data protection and other use cases).

I have some more things on my wish list but these are out of scope in regards to the subject of this blog post.

Remote Access to the KEMP R320 LoadMaster (DELL) via DRAC Adds Value

If you have a virtual Loadmaster you gain a capability you do not have with an appliance: console access. You can have lost all network connectivity to the Loadmaster but you can still gain access over the Hyper-V console connection to the virtual machine. Virtual appliances are not the only or best choice for all environments and needs. When evaluating your options you should consider going for a bare metal solution like the DELL R320.

image

These are basically DELL servers and as such have a Dell Remote Access Card (DRAC) that allows for remote access independently of the production network. Great for when you need to resolve an issue where you cannot connect to the unit anymore and you’re not near the Loadmaster. It also allows for remote shutdown and start capabilities, mounting images for updates, … all the good stuff. Basically it offers all the benefits of a DELL Server with a DRAC has to offer.

image

That means I have an independent way into my load balancer to deal wit problems when I can no longer connect to it via the network interface or even when it is shut down. As we normally telecommute as much as possible, either from the offices, on the road or home this is a great feature to have. It sure beats driving to your data center at zero dark thirty if that is even a feasible option. image

I know that normally you put in two units for high availability but that will not cover all scenarios and if you have a data center filled with DELL PowerEdge servers that have DRAC and you cannot restore services because you cannot get to your load balancers that’s a bummer. It’s for that same reason we have IP managed PDU, OOB capabilities on the switches. The idea is to have options and be able to restore services remotely as much as possible. This is faster, cheaper and easier than going over there, so reducing that occurrence as much as possible is good. Knowledge today flies across the planet a lot faster than human being can.