My first Dell SC7020(F) Array Solution

Introduction

After the introduction of the SC7020 hybrid array, today also have all-flash only arrays (AFA) in the SC offerings with one being the SC7020F. I was lucky and got to leverage these with both iSCSI (10Gbps) for replication over IP to remote destinations and FC (16Gbps) fabrics for the main workloads.

clip_image002

As always storage is decided upon based on needs, contextual limitations, budgets and politics. Given the state of affairs the SC7020(F) was the best solution we could come up with. In a diverse world there is still a need in certain environments for SAN based storage solutions despite what some like to promote. I try not to be tribal when it comes to storage architectures but pick the best solution given the conditions as they are and as they will evolve for in the environment where it will serve the needs of the business.

Some points of interest

When I first heard and looked at the SC7020 this was to be the “unified” storage solution where block and file level capabilities where both available in the controllers. Given today’s multi socket, multi core systems with plenty of PCIe slots and RDMA capable cards that was a good idea. If DELL played this right and made sure the NAS capability provided 1st class SMB 3 Support this could have been the SOFS offering in a box without the need for SME customers to set up a SOFS Solution with separate DELL PowerEdge Servers. I’m not saying that is the best solution for all customers or use cases but it would have been for some or even many. Especially since in real life not many storage vendors offer full SMB3 support that is truly highly redundant without some small print on what is supported and what is not. But it was not to be. I won’t speculate into the (political?) reasons for this but I do see this as a missed opportunity for a certain market segment. Especially since the SC series have so much to offer to Windows / Hyper-V customers.

Anyway, read on, because when this opportunity’s door got closed it also opened another. Read the release notes for the 7.2, the most recent version can be found here. That original SC7020 7.1 SCOS reserved resources for the file level functionality. But that isn’t there, so, it’s interesting to read this part of the document:

SC7020 Storage System Update Storage Center 7.2 performs the reallocation of SC7020 system resources from file and block to block only. The system resources that are reallocated include the CPUs, memory, and front-end iSCSI ports. An SC7020 running Storage Center 7.2 allows access to block storage from all the iSCSI ports on the SC7020 mezzanine cards. NOTE: In Storage Center 7.1, access to block storage was limited to the right two iSCSI ports on the SC7020 mezzanine cards.

Now think about what that means. NICs are in PCI slots. PCI slots connect to a CPU socket. This means that more CPU cores become available for those block level operations such as dedupe, compression but also other CPU intensive operations. The same for memory actually. Think about back ground scrubbing, repair, data movement operations. That makes sense, why waste these resources, they are in there. Secondly for a SC7020 with only flash disks or the purpose designed SC7020F that which is flash only flash. When you make storage faster and reduce latency you need to make sure your CPU cycles can keep up. So, this is the good news. The loss of unified capabilities leads to more resources for block level workloads. As Cloud & Datacenter MVP focusing in on high availability I can build a SOFS cluster with PowerEdge servers when needed and be guaranteed excellent and full SMB 3 capabilities, backed by a AFA. Not bad, not bad at all.

Hardware considerations

With a complete dual controller SAN in only 3 units with room for 30 2.5” (12Gbps SAS) disks this form factor packs a lot of punch for its size.

With the newer SC series such as the SC7020(F) you are actually not required to use the “local” drives. You can use only expansion enclosure. That comes at the cost of leaving the disk bays go to “waste” and having to buy one or more expansion enclosures. The idea is to leave some wiggle room for future controller replacements. With disks in the 3U that’s another story. But in the end, many people run storage as long as they can and then migrate instead of doing mid life upgrades. But, it is a nice option to have when and where needed. If I had had the budget margin I might have negotiated a bit longer and harder and opted to leverage only disks in external disk bays. But it placed to big a dent on the economics and I don’t have a clear enough view of the future needs to warrant that investment in the option. The limit of 500 disk is more than enough to cover any design I’ve ever made with a SC series (my personal maximum was about 220 disks).

clip_image004

We have redundant power supplies and redundant controllers that are hot swappable. Per controller we get dual 8 core CPU and 128GB of memory. A single array can scale up to 3PB which is also more than I’ve had to deliver so far in a single solution. For those sizes we tend to scale out anyway as a storage array is and remains a failure domain. In such cases federation helps break the silo limitations that storage arrays tend to have.

Configuration Options

For such a small size a single controller offers ample configuration options. There are 3 slots for expansion modules. It’s up to the designer to determine what’s in the system. You can select:

  • 4 port 16Gb FC or 4 port 8Gb FC card
  • 2 port 10Gb iSCSI or a 4 port 10Gbps iSCSI card with SFP+/RJ45 options.
  • 4 port 12Gb SAS cards

As we’re not using expansion enclosures we’ve gone for the below layout.

clip_image006

We also have 3 indicator lights with the Info providing a LED to identify a controller form Dell Storage Manager. When the Cache to Flash indicator (middle) lights up you’re running on battery power. The 3rd is the health status indicating the controllers condition (off/starting/running/errors).

DRAC observations

The DRAC on the SC7020 looks pretty decent. It’s not a separate dedicated port but shares the management interface. It does have good DRAC functionality. You can have VLAN on the management & DRAC logical interfaces when you so desire. For a storage controller sharing the bandwidth between the management interface and DRAC is not big deal. The only drawback is when the port is broken but hey we have 2 controllers right. The other drawback is that during a FW upgrade of the NIC you’ll also lose DRAC access. For customers coming from the old SC40 controllers that’s progress no matter what. But as in reality these units are not in an Antarctic unmanned research facility I can live with this.

The choice for single tier and 15TB MLC SSD

Based on budget, requirements, the environmental context and politics we opted to go for an All-Flash Array with only 15TB read intensive MLC disk. This was a choice for this particular use case and environment. So, don’t go use this for any environment. It all depends as I have mentioned in many blog posts (Don’t tell me “It depends”! But it does!). Opting to use read intensive MLC SSD means that the DWPD isn’t very high like with an SLC, write intensive SSD. That’s OK. We have large capacity ones and the capacity is needed. It’s not overkill which would lead us to use too few disks.

If these were systems that had 2 tiers or disk-based caching (SSD or even NVMe) and where focused on ingesting large daily volumes of data that 1st tier would have been SLC SSD with lower capacity but with a lot higher DWPD. But using larger SSD allows us some real benefits:

  • Long Enough Life time for the cost. Sure, MLC have less durability then SLC but hear me out. 30 * 15TB SSD means that even with a DWPD of 1 we can ingest lot of data daily within our warranty period. We went for 5 years.
  • Space & power savings. The aging systems the SC7020F are replacing consumed a grand total of 92 rack units. The monthly cost for a RU they pay means that this is a yearly saving of over 100K Euros. Over a period of 5 years that’s 500K. Not too shabby.
  • The larger drives allow for sufficient IOPS and latency specs for the needs at hand. The fact that with the SC Series UNMAP & ODX works very well (bar their one moment of messing it up) helps with space efficiencies & performance as well in a Windows Server / Hyper-V environment.

Then there is the risk not being to read fast enough from a single disk, even at 1200 MB/s sequential reads because it is so large. Well we won’t be streaming data from these and the data is spread across all disks. So, we have no need to read the entire disk capacity constantly. So that should mitigate that risk.

Sure, I won’t be bragging about a million IOPS with this config, but guess what, I’m not on some conference floor doing tricks to draw a crowd. I’m building a valuable solution that’s both effective & efficient with equipment I have access to.

One concern that still needs to be addressed

15TB SSD and rebuild time when a such a large drive fails. Rebuilds for now are many to one. That’s a serious rebuild time with the consumption of resources that comes with it and the long risk exposure. What wee need here is many to many rebuild and for that I’m asking/pushing/demanding that DELLEMC changes their approach to a capacity-based redundancy solution. The fact that it’s SSD and not 8TB HDD gives it some speed. But still we need a change here. 30TB SSD are on their way …

Software considerations

7.1 was a pretty important release adding functionality. In combination with the SC7020(F) we can leverage all it has to offer. 7.2 is mandatory for the SC7020F. Dedupe/compression as well as QoS are the two most interesting capabilities that were added. Especially for those use cases where we have no other options.

I deal a lot with image data in the places I work, which means that I don’t bank too much on deduplication. It’s nice to have but I will never rely on X factor for real world capability. But if your data dedupes well, have a ball!

As stated above, for this particular use case I’m not leveraging tiering. That’s a budget/environment/needs decision I had to make and I think, all things given, it was a good choice.

There is now the option to change between replication types on the fly. This is important. If the results of synchronous (High availability or high consistency) are not working out you can swap those settings without having to delete & recreate the replication.

We have the option to leverage Live Volumes / Live Migrations (the SC kind, not Hyper-V kind here) when required.

Combine that with functional UNMAP/ODX and a good Hardware VSS provider to complete picture for a Windows Server / Hyper-V environment.

Cost

With any investment we need to keep the bean counters happy. While I normally don’t care too much how the solution is bought and paid for I do care about the cost and the ROI/TCO. In the end I’m about high value solutions, which is not the same as very expensive. So it did help that we got great deal and Dell Financial Services worked with the management and the accounting department to create a lease to buy solution that worked for everyone involved. So there is are flexible solutions to be found for any preference between full OPEX and full CAPEX.

Conclusion

Design, budget & time wise we had to make decisions really fast and with some preset conditions. Not an ideal situation but we found a very good solution. I’ve heard and read some very bad comments about the SC but for us the Compellent SANs have always delivered. And some highly praised kit has failed us badly. Sure, we’ve have to replace disks, one or 2 PSUs, a motherboard once and a memory DIMM. We’ve had some phantom error indications to deal with and we once in 2012 ran into a memory issue which we fixed by adding memory. Not all support engineers are created equal but overall over 6 years’ time the SC series has served us well. Well enough that when we have a need for centralized storage with a SAN we’re deploying SC where it fits the needs. That’s something some major competitors in this segment did not achieve with us. For a Windows Server / Hyper-V environment it delivers the IOPS, latency and features we need, especially if you have other needs than only providing virtualization hosts with storage and HCI might not be the optimal choice.

I made Veeam Vanguard 2018!

While attending the Microsoft MVP Global Summit 2018 I received notification that I was renewed as a Vanguard in 2018. This is my forth year, as I’m one of the inaugural members in 2015.

clip_image001

The Veeam Vanguard group is a collection of smart, hardworking IT experts that have a healthy interest in data protection and availability. No matter what you build in IT to support your business or customers it requires to be protected against down time. You also need the ability to perform disaster recovery and deliver business continuity for those days things are not going smoothly. Those requirements keeps these technologist busy and honest. They have to deliver on those requirements and they can’t talk their way out of not being able to do that when needed. The result is that this group of experts is very experienced and knowledgeable in both their specialties and in how to protect their workloads. Being part of the Veeam Vanguards means sharing that experience and knowledge and tapping in to their collective brain power. I’m happy and proud the be a Veeam Vanguard as it is a great learning experience and it helps me to deliver even more value to my employers and all Veeam customers. It’s win-win all over. Thank you Veeam for the opportunity and recognition.

Attending the Microsoft MVP Global Summit 2018

Attending the Microsoft MVP Global Summit 2018

Once more I’m flying to the USA (Bellevue/Redmond, in Washington) where I’m attending the Microsoft MVP Global summit 2018. I’ll be spending my week at the Microsoft campus in Redmond and the offices in Bellevue. I feel grateful and honored to be part of this community and of the chances it offers to learn, connect and build a network world of class experts.

clip_image002

The Microsoft MPV Global Summit is always a very busy week with both official and unofficial, planned and unplanned meetings. From breakfast till nightcaps we’re talking tech with peers from all over the world. The amount of expertise, experience and knowledge that descends on and near the Microsoft offices that week is nothing but amazing. Especially when you consider that MVPs are not Microsoft employees. They are independent experts who care enough about the technology to share what they’ve learned with Microsoft and each other. We provide feedback, the good the bad and the ugly.

I feel very lucky and privileged to get this opportunity once more that comes with being an MVP. As such I always try to attend by freeing up the time and the budget. It’s an investment. It also provides for an opportunity to meet up with many of my fellow MVPs and Microsoft employees we talk to, collaborate with and provide feedback to on a regular basis. Feedback is a dish we server with care and respect. Likewise, we expect Microsoft to listen and when possible, act. Our feedback is meant to be constructive and help, not to insult or cause pain.

If anyone still doubts the viability of remote teams spread across the world you should see a bunch of MVPs interacting, troubleshooting, assisting each other, create presentations and deliver results. All while spread around the globe.

Flying high above it all

I am and remain in the trenches because I don’t think you can design great solutions in the isolation of an ivory tower and without being in touch with reality. But I do tend to make frequent journeys and fly high above it all regularly to keep perspective.

clip_image004

Yes, I’m even flying above the clouds. That’s because “Cloud” has become a polluted word that means whatever a vendor/ISV/VAR/OEM wants it to mean as long as it helps them sell whatever it is they are selling. But hey, nothing new there, sales will be sales.

In order to make sure we talk about usable and valuable solutions that put customer needs first we literally have to rise above it. We need to look at the needs of the customers and find ways to serve them with the solutions we build and offer. Too often today customers are offered cookie cutter services that are designed to meet the profitability of the provider and whatever politics that are at hand and not primarily the needs of the customer. I see “cloud” projects last for years and fail to deliver as much or more than they used to so in the client/server era. The failures of bad ideas, politics over customers, lack of context, bad designs and architectures are still blamed on technology or companies. Nothing new there. Customers or employers are not resources to be mined for every penny. And because I pride myself in not playing that game to make money I keep investing in myself. Getting out of the echo chambers that projects and organizations tend to become is key in achieving that. Way to often the focus is on “a can-do attitude” and “loyalty”. People, voicing concerns, discussing issues and speaking one’s mind are key to achieving success. Conformity and compliance are not, those are measurements, that’s all.

So, what is it I go and discuss for a week on end? Well, I cannot tell you, my employers or customers. Luckily, I get to discuss a lot with my fellow MVPs. It’s always a blast to see so many of them again.

Forging valuable solutions

clip_image006

I can tell you that it helps me make better decisions. It enables me to provide the excellent advice, great designs and functional architecture. These are forged in the fires of reality with my knowledge and insights for a hammer. My context and situational awareness are the furnace and the technologies Microsoft and partners provide are the resources that are turned into valuable solutions. Those are built, not bought. If that’s something you’d like, we can help get you in touch with many MVPs that have a wide variety of skills and are able to assist you.

Our regular schedule resumes after the summit

Anyway, while that week will be busy and while we’ll be anything but quiet over there within our Non-Disclosure Agreement (NDA).

clip_image008

That means you won’t hear about anything under NDA and as such we’ll remain silent on what we see, hear, learn and discuss at the MVP Global Summit 2018. See you after “The Summit”!

It’s not as simple as renaming the avhdx to vhdx

This arrives in via the feedback option on my blog

Hi. I see through your website that you are an expert in vhdx / avhdx file. I had a system crash with data loss. I think this data is in an avhdx file. When I rename this file in vhdx, I can mount it but I have an error: the file is corrupted. Do you know a procedure to repair this type of file? I thank you in advance for your support!

Oh dear! An expert? While flattery can get you a long way in life with certain people virtual disks are impervious to that sort of thing. Look, MVP, Veeam Vanguard, Dell Rockstar … tip of the spear, edge of the sword, it’s all fine and well but it’s no good to split a granite piece of rock and virtual disks don’t care about titles, jut about how they are designed to work.

Before we dive into some more details please use the comments sections under the relevant blog post to ask questions. That way everyone can benefit form the answer. It’s all quite anonymous if you want it to be. Secondly vendors like Microsoft have great public support forums with many thousand pairs of eyes reading. That might also work better and faster for your needs.

Some details

When you have avhdx your data is stored in the avhdx and in the parent disks (more avhdx but at least always one vhdx). While you can throw away what’s in a avhdx under certain conditions (and lose that data) and mount the vhdx you cannot throw away the vhdx and hope to be able to access the data in the avhdx you rename to vhdx.

clip_image002

For a case of real data corruption, not just phantom or mixed up VHDX/AVHDX chain, where you can try to intervene, even manually if needed – and if you have the skills – you’ll have to recover or restore data.

If the storage on which the vhdx/avhdx reside is corrupted a good but time-consuming run of chksdk /f /r can do the job. I have done that before with success. But there are no guarantees in this game.

Other than that, or when the storage is gone, it is restore time. This can be leveraging whatever backup solution you use or VSS snapshots on the storage side of things. Those options are your best bet. You can find some more info on manually manipulating vhdx/avhdx files here but that’s not what you’re facing here it seems.

If you don’t have recovery options in place, what can I say?

Stop what you’re doing and contact a good data recovery company. Only damage can come from trying if you don’t know what you’re doing. You can hope trial and error will fix it but that would be the triumph of hope over experience. You’re usually not that lucky. Trust me.

The snarky bit

I’ll fight like hell if I’m in a pickle and the data is valuable. But it’s near to impossible to do it for someone else as it’s hard, time consuming and often it’s a case were the files have been worked on before, so they tend to be messed up. If the data is not that valuable, just eat the loss.

In reality my time always seems less valuable then peoples their data . Now if you say you can help me retire early by trying anyway and are OK with a best effort, no guarantees given deal I might do it. But I’m pretty sure investing in backups and restores is way cheaper and will lead to better results. Your data is important and valuable, even when my time is not. Just saying