SMB Direct: Choosing A Flavor

I often get asked what to buy for implementing SMB Direct. It’s a non trivial question actually and I’m not an expert, nor do I play one on TV.  All joking aside, it’s a classical consulting answer: it depends. I don’t do free consulting in a blog post, even if that was possible, as there are many factors such as the characteristics and futures of your organization. There’s also a lot of FUD & marketing flying around. Basically in real life you only have two vendors: Cheslio (iWarp) and Mellanox (Roce/Infiniband). Hard to say which one is best. You make the best choice for your company and you live with it.

There is talk about other vendors joining the SMB Direct market. But it seems to be taking a while. This is not that strange. I’ve understood that in the early days of this century iWarp got a pretty bad reputation due to the many issues around it. Apparently offloading the TCP/IP stack to the NIC, which is what iWarp does is not an easy endeavor. Intel had some old Net card a couple of years ago but has gotten out of the game. Perhaps they’ll step back in but that might very well take a couple of years.

Other vendors like Broadcom, Emulex & QLogic might be working on solutions but I’m not holding my breath. Broadcom has DCB and has been hinting at RDMA in it’s NICs for many years but as of the writing of this post there is nothing functional out there yet. But bar the slowness (is complexity slowing the process?) it will be very interesting to see what they’ll choose: RoCE or iWarp. That choice might be the most public statement we’ll ever see about what technology seems like the best bet for these companies. But be careful, I have seen technology choices based on working/living with design choices at at another level due to constrictions in hardware & software that are no longer true today. So don’t just do blindly what others do.

Infiniband will remain a bit more of a niche I think and my guess is that RoCE is the big bet of Mellanox for the long term. 10Gbps and higher Ethernet switches are sold to everyone in the world. Infiniband, not so much. Does that make it a bad choice? Nope, it all depends. Just like FC is not a bad choice for everyone today, it depends.

Your options today

The options you have today to do SMB Direct are rather limited and bound to the different flavors and their vendor. Yes vendor not vendors.

  1. iWarp: Chelsio
  2. RoCE: Mellanox (v2 of RoCE has brought routability into the game, which counters one of iWarps biggest advantages, next to operational ease but the no fuss about DCB story might not be 100% correct, the question is if this matters, after all many people do well with iSCSI which is easy but has performance limits).
  3. Infiniband: Mellanox (Qlogic was the only other remaining one, but Intel bought it form them. I have never ever seen Intel Infiniband in the wild.

Note: You can do iWarp (and even RoCE in theory) without DCB but in all realistic high traffic situations you’ll want to implement PFC to keep the experience and results good under load. Especially the ports connecting to the SOFS nodes could other wise potentially drop packets. iWarp, being TCP/IP, will handle dropped packets but possibly at the cost of deteriorated performance. With RoCE you’re basically toast if you lose packets, it should be losses. I’m not too convinced that pure offloaded TCP/IP scales. Let’s face it, what was the big deal about lossless iSCSI => DCB Smile I would really love to see Demartek testing these things out for us.

If you have a smaller environment, no need for routing and minimal politics I have seen companies select Infiniband which per Gbps is very cheap. Lots of people have chosen iWarp due to it simplicity (which they heavily market) and routability. The popularity however has dropped due to prices hikes that came with increased demand and no competition. RoCE  is popular (I see it the most) and affordable but for this one you MUST do at least PFC. DCB support on switches is not an issue, even budget friendly DELL PowerConnect N4000 series supports it as did it’s predecessor the PC8100 series. Meaning if you have bought switches in the past 24 months and did your home work you’re good to go. Are routability and distance important? Well perhaps not that much today but as the trend in networking is heading for layer 3 down to the rack which will be more acceptable when we see a lot of the workload goodness in hypervisors (Live Migration, vMotion,yes there is work being done on that) being lit up in layer 3 it might become a key feature.

More Tips On Dealing With Removing Short File Names When Migrating To a SMB3 Transparent Failover File Server Cluster

You might have read my blog posts on the capabilities and the process of migrating to a Transparent Failover File Server. If not, here they are:

These are a good read with some advice from real world experience and in this post I’ll offer some more tips. I’ve discussed the need to disable and get rid of short file names in my blog and offered other tips to prepare for your migration and get your file share LUNs in tip top, modern shape. But what if you run into short file name issues where you can seem to get rid of them?

Well here’s 3 more things to check:

1) Get rid of the shadow copies used for Previous Versions

The reason you’d better get rid of them is that they can also contain short files names & way to long path or file names. We don’t want them to ruin the party so we remove them all by disabling shadow copies on the LUNs to be copied. We can enable them again once the LUN is up and running in the new file cluster.

2) The logs indicate there are short file names you don’t have access to

If the NFTS permissions on the folder & file structure are OK you should not have to much problems bar some files being locked by being in use. Rerunning the fsutil command prior to migrating with the server service stopped will prevent any connectivity and use of file shares by people ignoring the request to log of or shut down their clients or automated jobs that otherwise keep accessing them.

But you might still get some indications in the log file(s) that state you can remove certain file names.

image

There is the good old trick of running your command under SYSTEM. That those the job! That helps get rid of short file name instances of folders where you normally don’t get access to. If system has rights you’ll be fine whether it’s a system folder or not.To do this the Sysinternals tools come in handy once again. You can launch a command prompt running under the NT AUTHORITYSYSTEM account using psexec.exe by running the following from a elevated command prompt:

psexec -i -s cmd.exe or psexec  -s cmd.exe

image

The-s switch runs the remote process in the System account. Psexec temporarily installs a service "psexec running psexesvc.exe" on the remote computer (or locally if that’s what you doing) which is removed when the app or process that’s running is closed. It’s obvious now I hope why you need an elevated command prompt to run this command.

Now should you do this by default? Nope. Just when you need to and as always have a realistic backup plan, a way to recover when things go south.

3) Anti virus sometime prevents the removal of short file names

Disable Anti-Virus, sometimes it holds a temporary entry in the registry for the file involved. At least that’s what I’ve seen as a transient issue in some of the large number of logs I gathered. Yeah, I ran a lot of fsutil against large NTFS volumes. What can I say. Due diligence pays off!

4) Run ChkDsk

Just make sure the volume is healthy and no repairs are needed. If your migrating from and older file server there might be outstanding issues and a check disk on volumes with lot’s of files take time. Some of the ones I’ve dealt with had more that 2 million files on a 2TB LUN and it it can take 24 hours. Fun when you have 10 LUNs :-/

Dilbert Life Series: The War For Talent

Disclaimer: The Dilbert® Life series is a string of post on corporate culture from hell and dysfunctional organizations running wild. This can be quite shocking and sobering. A sense of humor will help when reading this. If you need to live in a sugar coated world were all is well and bliss and think all you do is close to godliness, stop reading right now and forget about the blog entries. It’s going to be dark. Pitch black at times actually, with a twist of humor, if you can laugh at yourself.

Attracting & retaining talent

If you listen to the talking heads in the media, recruiters & companies and read business related publications you’ll have noticed that when it comes to “Human Resources” there is supposedly a global war on. A war for talent. It’s not just attracting the best and brightest employees that is a concern but retaining them is even a bigger challenge it seems. When things are not to their liking they just pack off and fly off to the next awesome job opportunity which are available in vast numbers and give freedom to excel whilst paying great salaries.

They are talking about somebody else

Keeping employees happy is supposed to be a major concern in “the talent wars”. All companies are in this war we’re told. Perhaps even if just for the fact that no company will admit they are not looking for great talented employees. All evidence to the contrary I might add as a lot of organizations do not act as if they are in a war for talent at all. Good jobs don’t seem to be available in any decent number either. It often looks more like they are in a race to the bottom.

Last year of our major news papers had front page news. “War for talent? Forget it, that doesn’t exist”. They point to high unemployment, low wage jobs, social dumping, demographics, immigration, age, sex, race, … discrimination. In short a slew of reasons to conclude the war for talent doesn’t exist. Basically it boils down to this: if companies are in a a war for talent they can’t afford to lose so they can’t afford to act like this. Ergo, there is no war for talent.

I kind of disagree. There is most definitely a war for talent and there has always been one until computers & robotics outsmart us (dream on!). But let’s face it reality, 95% of us is not considered talent at all, but a resource, so we’re not in that war. As a resource we’re as expendable as ammo in a war. As long as they can keep the supply line filled they’ll fire (pun intended) and waste those resources at will.  Basically we’re lucky if we’re smart enough and young (cheap?) enough to be considered employable. Forget the lower 20% of our unskilled workforce, for them the deal is even rougher. And when you get fired at > 50, well good luck “grandpa”. All this while the talking heads blabber on about working beyond 67 …

You want proof? Look around you. Here’s where the war for talent is raging: A Google Programmer ‘Blew Off’ A $500,000 Salary At A Startup — Because He’s Already Making $3 Million Every Year. Well that isn’t me and probably not you either. Now don’t think everyone at Google is in that position, it’s a minority. => Techies CAN sue Google, Apple, Intel et al accused of wage-strangling pact. You see they want your talent, but not pay for it in free market.

Lets look at some evidence that there might be no war for talent.

Toys & work force multipliers are not salary or a career

BYOD, a smartphone, tablet, laptop paid for by work. They bombard us with commercials about how we need to supply & support this if we want to stand a chance to even attract young talent. That’s only partially true. If I’m true top talent I’ll be able to afford those my self, thank you. I’d rather take a 6 figure salary and 30 days paid vacation & affordable quality health care. After all you need to take good care of talent, right?

Performance Reviews

A golden oldie. When judging by the annual performance review practices out there, they are trying to make talent walk by proving to them the organization is too hopeless to even stop totally useless evaluation practices.

November 14, 1993

In corporate life your management often has no clue what you do. They often don’t even understand it. To add injury to insult you often have to write them yourself.

January 06, 2003

Usually there’s only  a stick

If you don’t have promotions, bonuses, rewards (not a merit badge, that’s just Neanderthal gamification done very, very wrong) or pay raises in place what’s with this war for talent anyway?

The fact that you can fire me if I’m not up to your standards? What kind of a messed up model is that? If we’re below standards you have a stick, I get that. If I meet, exceed or absolutely own those standards what exactly do you have to offer? Absolutely nothing? March 10, 1995

Ouch! We cannot do anything for you, it’s out of our control, they’ll tell you. Could be, but I cannot get away with that answer when it comes to delivering results. Do you even offer a career path? Employees don’t get promoted and if they do, it’s without a pay raise. Pay raises themselves are dead except for the legal minimum.

The exit interview to improve retention

The exit interview is as useful as a post mortem in preventing death. It helps find out what went wrong after the facts, but slightly less accurate than a real post mortem because in general the deceased don’t lie to you when you’re probing around and they always show up, all be it they have to be carried in. Just think the people left you was because while you’re great & wonderful and they just didn’t fit in and leave it at that. You’ll sleep better and waste less time.

You are creating your own hell

Most CxO types complain constantly about the lack of skilled employees that can think independently and have the ability to execute in order to achieve an end state.  In reality that is their own fault. The system doesn’t work. The expect to buy and discard talent at will. Well there isn’t enough talent to go around anymore because too many don’t really invest in developing it for short term accounting benefits.

Talent needs time and opportunity to develops skills and expertise. No one wants to give that any more. So you’re creating your own shortage as it’s not magically going to start growing on trees. Secondly when you have people that have the intrinsic motivation, drive and abilities to develop themselves to be experts you don’t reward them. Instead they demand ever more from them and pay them nothing more then anyone else or even less as you promote the bodies you can do without. We’re creating our own skills gap hell. But it’s easier to cry that you are a victim of a failing education system that doesn’t deliver experts that are experienced and cheap straight out of college.

Short term perceived gains for real long term damage & costs

Without the right people in the right place you no longer have analytical, design and architectural expertise. You have outsourced all that to vendors, “partners” and consultants. So now who can evaluate what is valid and valuable for you? No one. You’ll just get sold the flavor of the day that generates them the most profits. And of that doesn’t work there is always new stuff to sell you that will fix it. You fell for the trap of easy and cheap access to expertise meaning you lost all the expertise you had yourself. You are now dependent on mercenaries and their aim is to make money for themselves and survive even if it means killing you.  Every penny you spend wisely internally is an investment. Every penny you spend stupidly on a vendor is buying stuff that potentially makes you more dependent on them.

Companies are the ones to blame as they’re constantly in search of quick & dirty wins for short term (personal) gain. “Quick” is forgotten as fast as the word itself entails but the dirty part lingers around and stinks up the place long after the facts.

War for talent? Think again.

So exactly what’s the game play here? Employees doing exactly enough not to get fired? Because by the rules that ignore the above everything we do above that level is a misallocation of our resources. That’s very, very Office Space like dude.

image

image

In general it’s a race to the bottom leading to ever more mediocrity at ever higher costs and we all know who’ll get to pay the bill. Let’s hope some spin doctors can turn it into “good news”.

VEEAM Invests in Faster & More Efficient Data Protection With Backup & Replication 8

Ever more data to protect without breaking the systems or the bank

One of my major concerns today in IT, weather it is on premises or in the cloud, is the cost, time, reliability and feasibility of backup and restores. This true for most of us. Due to the environments in which I deliver my services my main issue with backups is the quantity of data. The amount of data is staggering and growth is not showing a downward trend.

The big four: CPU, Memory, Network & Storage

Over the years we have seen a vast increase in compute, memory, network and storage capabilities and pricing. CPUs are up to 18 cores per socket as I write this. DDR4 memory is here and the cost is relatively low. We have affordable 10Gbps networking to throw at the problem as well or in some case 8 to 16Gbps Fibre Channel. So when it comes to CPU, memory and network we’re pretty well served.

Storage is evolving as well and we’re getting ever bigger and, if you have the budget that is, faster storage arrays in different flavors. But it remains a challenge. First of all to get the right amount of IOPS and storage capacity at an affordable price point is a balancing act. Secondly when dealing with backups we need to manage the source IOPS & latency against the target. But that’s not all, while you might want to squeeze every last IOPS & 1ms latency out of your backup target you can’t carelessly do that to your source storage. If you do, this might constitute a Denial Of Service attack against your applications and services. Even today storage QoS is either non existent, in it’s infancy or at best limited to particular workloads on storage solutions.

The force multiplier: Backup software capabilities & approaches

If you’ve made sure the above 4 resources are not your killer bottle neck the backup software, methods algorithms and the approach used will be either your biggest problem or you best friends. You need your backup software to be:

  • Capable
  • Scalable
  • Fast
  • Configurable
  • Scale Out

There are some challenging environments out there. To deal with this backup software should be able to leverage the wealth of capabilities compute, network, memory & storage are offering to protect large amounts of data reliable and fast. This should be done smart and in an operationally supportable manner. VEEAM has been working on this for a long time and they keep getting better at this with every release and it allows for scale out designs in regards to backups targets.

VEEAM Backup & Replication 8.0

There are many improvements in v8 but a couple stand out.

image

Consistency groups (Hyper-V)

Backup jobs can execute more than one VM backup task simultaneously from the same volume snapshot with “Allow Processing of Multiple VMs with a single volume snapshot”.

image

This means you can reduce the number of snapshots significantly where in the past you needed a volume snapshot per VM. VEEAM limits the the maximum amount of VMs you can backup per snapshot to 4 when using software VSS and to eight with hardware VSS. They do this because under heavy load VSS/CSV sometimes has issues. This number can be tweaked to fit your needs (no all environments are created equally) with 2 registry values under HKLMSOFTWAREVeeamVeeam Backup and Replication key:

  • MaxVmCountOnHvSoftSnapshot (DWORD)
  • MaxVmCountOnHvHardSnapshot (DWORD) registry values

Reducing the number of snapshots to be taken is good as it saves resources, speeds up things & as VSS can be finicky, not needing more than absolutely necessary is a good thing.

Backup I/O Control.

Another improvement is backup I/O Control which delivers capability to dynamically adjust the number of backup tasks based on IOPS latency. Under Options you’ll find a new Tabbed sheet, I/O Control. It contains the parallel processing option that used to be under “Advanced” tab in Veeam B&R 7.

image

The idea is to move to a more “policy driven” approach for handling the load backups can put on the storage. Until now we’d configure a number of X amounts of tasks to run against the source storage in order to keep IOPS/Latency in check. But this is very static and in a dynamic / elastic “cloud” world this isn’t very flexible nor is it feasible to keep tuned to the best number for the current workload.

I/O Control let’s you set limits on how much latency is acceptable for your data stores. Removing or adding VMs to the data store won’t invalidate your carefully set number of tasks allowed as it’s now the latency that’s used to dynamically tune that number for you.

I/O control has two settings:

 “Stop assigning new tasks to datastore at: X ms” :VEEAM looks at the latency (IOPS) before assigning a proxy (backup target) to a virtual disk or won’t launch the task until the load has dropped.  This prevents the depletion of IOPS by launching to many backups.

“Throttle I/O of existing tasks at: Y ms”: This will throttle the IO of already running  backup jobs when needed due to some application workloads in the VMs running on the source storage kicking in. The backups will be throttled so they’ll take longer but they won’t kill the performance of the applications while they are running.

These two setting allow for the dynamic and on the fly tweaking of the number of backups tasks running as well as their impact on the storage performance. Once you have determined what latency values are acceptable to you you’re done, VEEAM handles the tweaking for you. The default values seems to reflect industry best practices (sustained > 20 ms is considered problematic)

The below screenshot is for the backup job log and shows latency being monitoredclip_image002

With VEEMA B&R v8 Enterprise + You can even do this per data store, meaning you can optimize this per backup source. This recognizes that is no “one sizes fits all perfectly” and allows for differentiation. Yet it does so in a way that does not compromise on the simplicity of use that VEEAM offers. This sounds easy but from experience I know this isn’t. VEEAM manages to offer a great balance between simplicity and functionality for companies of all sizes.

Select “Configure”

image

In the “Datastore Latency Settings” you can add one, more or all data store you are protecting with VEEAM. This allows for differentiation when you have CSV that are used for SQL Server VMs versus stateless web servers of or other workloads that are not storage I/O intensive.

image

Select the datastore (in our case the CSV volumes in Hyper-V Cluster)

image

By selecting the desired datastore and clicking “Edit”  you can individually adjust the settings for that datastore.

image

Conclusion

It looks like we have some great additional capabilities in an already very good solution. I’ll be using these new capabilities in real life scenarios to see how these work out for us and optimize the backups of the virtualized environment under my care. Hardware VSS Providers, SANs, CSV’s normally need some tweaking and care to make them run well, so that’s what we’ll be doing.