SMB Direct With RoCE in a Mixed Switches Environment

Posted on December 15, 2014 by workinghardinit

I’ve been setting up a number of Hyper-V clusters with Mellanox ConnectX3 Pro dual port 10Gbps Ethernet cards. These Mellanox cards provide a nice amount of queues (128) for DVMQ and also give us RDMA/SMB Direct capabilities for CSV & live migration traffic.

Mixed Switches Environments

Now RoCE and DCB is a learning curve for all of us and not for the faint of heart. DCB configuration is non trivial, certainly not across multiple hops and different switches. Some say it’s to be avoided or can’t be done.

You can only get away with a single pair of (uniform) switches in smaller deployments. On top of that I’m seeing more and more different types of switches being used to optimize value, so it’s not just a lab exercise to do this. Combine this with the fact that DCB is an unavoidable technology in networking, unless it get’s replaced with something better and easier, and you might as well try and learn. So I did.

Well right now I’m successfully seeing RoCE traffic going across cluster nodes spread over different racks in different rows at excellent speeds. The core switches are DELL Force10 S4810 and the rack switches are PowerConnect 8132Fs. By borrowing an approach from spine/leave designs this setup delivers bandwidth where they need it a a price point they can afford. They don’t need more expensive switches for the rack or the core as these do support DCB and give the port count needed at the best price point. This isn’t supposed to be the top in non blocking network design. Nope but what’s available & affordable today in you hands is better than perfection tomorrow. On top of that this is a functional learning experience for all involved.

We see some pause frames being sent once in a while and this doesn’t impact speed that very much. It does guarantee lossless traffic which is what we need for RoCE. When we live migrate 300GB worth of memory across the nodes in the different racks we get great results. It varies a bit depending on the load the switches & switch ports are under but that’s to be expected.

Now tests have shown us that we can live migrate just as fast with non RDMA 10Gbps as we can with RDMA leveraging “only” Multichannel. So why even bother? The name of the game low latency and preserving CPU cycles for SQL Server or storage traffic over SMB3. Why? We can just buy more CPUs/Cores. Great, easy & fast right? But then with SQL licensing comes into play and it becomes very expensive. Also storage scenarios under heavy load are not where you want to drop packets.

Will this matter in your environment? Great question! It depends on your environment. Sometimes RDMA is needed/warranted, sometimes it isn’t. But the Mellanox cards are price competitive and why not test and learn right? That’s time well spent and prepares you for the future.

But what if it goes wrong … ah well if the nodes fail to connect over RDAM you still have Multichannel and if the DCB stuff turns out not to be what you need or can handle, turn it of and you’ll be good.

RoCE stuff to test: Routing

Some claim it can’t be done reliably. But hey they said that for non uniform switch environments too Winking smile . So will it all fall apart and will we need to standardize on iWarp in the future? Maybe, but isn’t DCB the technology used for lossless, high performance environments (FCoE but also iSCSI) so why would not iWarp not need it. Sure it works without it quite well. So does iSCSI right, up to a point? I see these comments a lot more form virtualization admins that have a hard time doing DCB (I’m one so I do sympathize) than I see it from hard core network engineers. As I have RoCE cards and they have become routable now with the latest firmware and drivers I’d love to try and see if I can make RoCE v2 or Routable RoCE work over different types of switches but unless some one is going to sponsor the hardware I can’t even start doing that. Anyway, lossless is the name of the game whether it’s iWarp or RoCE. Who know what we’ll be doing in 5 years? 100Gbps iWarp & iSCSI both covered by DCB vNext while FC, FCoE, Infiniband & RoCE have fallen into oblivion? We’ll see.

Dilbert Life Series: Mediocrity Kills aka Show Me Your Strategy Or Be Doomed

Posted on December 10, 2014 by workinghardinit

Disclaimer: The Dilbert® Life series is a string of post on corporate culture from hell and dysfunctional organizations running wild. This can be quite shocking and sobering. A sense of humor will help when reading this. If you need to live in a sugar coated world were all is well and bliss and think all you do is close to godliness, stop reading right now and forget about the blog entries. It’s going to be dark. Pitch black at times actually, with a twist of humor, if you can laugh at yourself.

“Some men are born mediocre, some men achieve mediocrity, and some men have mediocrity trust upon them.”
― Joseph Heller, Catch-22

I don’t do mediocre. There, I said it. I only do good to great. Well sort of Smile . The point is that no matter how good you are, you still mess up. While perfection is not of this world it doesn’t look too great on my résumé when I have to write “As a real team player I collaborated enthusiastically to achieve mediocrity”. Sure I might cover it up with fluff like “I integrated the lateral dynamics of horizontally deployed technologies across a vertically integrated stack to realize an optimal use of resources exposing their inherent value to the business while leveraging the synergies of the cloud”, but I won’t.

As no one likes to be mediocre we sometimes see creative attempts to make sure we all pass the bar but we won’t discuss that here. Whilst every organization will have its share of mediocre processes, way too many are mediocre as an entire organization.

Indicators of mediocrity

An addiction to meetings: How To Save A Company From Death By Meetings
And addiction to consultants & outsourcing: The do’s and don’ts when engaging consultants Part I & The do’s and don’ts when engaging consultants Part II stimulated by not cultivating & leveraging talent Dilbert Life Series: The War For Talent
An addiction to methodologies & cookbooks: The Dilbert® Life Series – White Collar Blues in Corporate Culture, The shortage of skilled employees, are we making it worse?
An addiction to buying “solutions” – aka the shopaholic organization
An addiction to political correctness & fake it till you make it The Dilbert® Life Series: Mental Hygiene Is Counter Productive
Wasting time on commodities and call it innovation.
Indecisiveness. Waiting for external factors to change. This means losing the initiative. In a fight this is very bad, your opponents are calling the shots.

Claiming to be innovative

Avoiding mediocrity is not about being original or “innovative” all of the time. Quite the opposite! Sometimes not being mediocre means using plain good commodity solutions that are great for the issue at hand. The good old 80/20 rule, “good enough is good enough” & commoditization delivers the best value for money here. Don’t spend vast amounts of money and time on custom or “boutique” solutions when a commodity will do. This has secondary benefits as well. That time and money can be used for some custom or creative design & work on the things that do matter a lot and make a big difference.

Groups providing false security

For some reasons mediocrity tends to flourish more often in groups and committees. I see this way too much. This danger of sliding into mediocrity exists as an individual but it seems to become more prevalent in a group or organization. Some of my peers call the “this the race to the bottom”:

”Mediocre people working for mediocre organizations delivering mediocre results”

Nobody wants to be that way, it just turns out like that. It has many reasons. The Peter Principle, The Dilbert Principle, B People hiring B people, human behavior in an environment where it’s wiser to conform & play politics than to get results etc. Don’t underestimate the group pressure to conform, avoid mistakes, be a team player or a “can do” person. And then there is the desire to avoid responsibility. Which also happens to be easier in group. The bigger the group in a meeting the bigger the risk of this, a group enforces indecisiveness & caters to fears.

Some organizations tolerate and even reward mediocrity. Management lead by example, whether they like it or not. The effects of this can be partially hidden and mitigated by real leadership in the group (competent employees, highly skilled external help), but it cannot be stopped. If management doesn’t care, they can’t expect others to care. If managers talks about team work & going the extra miles but don’t do so themselves, things break. If the need for safety, fear for failure or not looking good is what drives them you won’t progress & see success. Success cannot be bought and you can’t lead from behind.

Mediocre groups can be manipulated quite easily. “Politicians” like this. It’s like water following the path of least resistance. By leveraging the group you make them accomplices and they can’t complain about decisions made over their heads. Some (most) probably know all to well that they are being manipulated, but why struggle if there is no benefit in it? It safer to conform a when risk aversion sets in, great ideas die. Here’s a beautiful summary (thanks to Kathy Sierra):

Riskaversion2

Avoiding reality is game we all play to some extent. The abuse of best practices, methodologies and such by clinging to time like a life craft or actually thinking that following the bullet points will magically result in stellar results. This leads to needing ever more resources for ever diminishing returns on investment. The organization becomes an overly complex entity where avoiding responsibility is a top priority and perception is everything. ITIL done wrong will achieve exactly that. It drains the all the fun out of work, and grinds progress to a halt. But no one is to blame as all rules where adhered to. Risk Avoidance As a Service (RAAS™).

Personal note: The power of a group lies in the excellence of the individuals and their ideas. Harvesting those to create the best possible solution is far from conformity to different points of view. It’s about leveraging the discussions, the different or opposite points of view to come to better solutions. In this respect I find the view that “people should learn to do what they’re told” misguided, dangerous & counter productive.

Who’s managing and who’s leading, if anyone?

It doesn’t take very long to walk into a group and observe who the real leaders are. Often these are not the people with the rank, title, mandate. In a lot of cases they are very different persons. This might sound great as a fail safe, but there’s only so many wrongs bottom up approaches can prevent or mitigate, let alone solve. “Bottom up” can only do so much.

This isn’t surprising as middle management is used a dumping ground for people they can do without in critical functions and are willing to sell their souls for the illusion of advancement. They often become a burden to employees & progress.

Now employees do notice this and it ruins trust. Sure you can blame the culture and bad attitude but hey when the team or the organization fails it is their fault and their responsibility. No this is not to harsh. They are all to eager to claim higher wages & ownership of success. Well that knife has two edges and you can’t blame it on the culture. You get the culture you cultivate Smile . Those that can’t handle that responsibility are the ones to fail as managers & most certainly as leaders. You cannot complain to your subordinates as a managers. Shit flows down, gripes flow up. Go it?

Read The Dilbert Life Series – A Bad Manager’s Priorities. Your personnel already has enough crap to deal with, just like you. Don’t add to it. Not that employees can’t be total fools and pains in the proverbial behind but hey, I have posts on that to.

Strategies, Tactics & Execution

Mediocrity is seen where real strategies, tactics & execution are missing. They just do or buy stuff, often without any understanding of the ecosystems they operate in and the relations between them. Their situational awareness is zero and that’s deadly. So we have “managers”, “architects”, “analysts”, both in house and consultants, that cannot even explain what a strategy is. They might claim or believe to have one, but they don’t. It’s opportunistic actions towards the flavor of the day. Such an organization is doomed for mediocrity and survival is by chance, not skill.

Who’s to blame?

Most people just try to survive or perhaps get ahead to a nicer job and/or a better paid one. But no one will admit to it on a performance review, so we have institutionalized lying. At best you’ll get justifications when you ask, but no real explanations. It’s not just as simple as managers being stupid or lazy. When it comes to strategy many are playing a game they don’t understand, let alone master. They are out of their depth and as such they are bound to lose. They’re being used.

However it’s very in vogue to blame the lack of Business – IT alignment for the woes in these volatile IT times. The problem is not IT or the business. It is the entire organization that allows for mediocrity. Sure you read that “IT is an old school ivory tower” all over the internet and it has to prove it’s value. It’s pure management failure who don’t seem to know who does what and why in their organization. The division is purely artificial. It’s man made and kept alive as it serves political, personal & careerist agenda’s. Book authors, coaches & business consultant smile as they collect their fees discussing this at length. Welcome to mediocrity and failure. You have exactly what you have built.

Nobody has any incentive to fix it either. There is good money to be made and job security to be had by prolonging the problem on both sides. Are these people to blame if some one keeps paying them for that? These woes are true both in the private and in the public sector. Bar some minor detail differences in buzz words they all get handled by the same players. These are the ones that deliver the lobbyists and advisers that turn out ever less services for ever higher costs. They sell “solutions”. One size fits all if possible. Gartner makes a killing from this situation and they do have a clear strategy for that.

No IT strategy? No map? You’re doomed, indecisiveness will kill you.

If you don’t map out your game on the field you play on you can have no strategy. Without that you just do stuff. At best it’s functional (which is an achievement by the way) but often not. Planning, methods, tools … al of these fall victim to indecisiveness. So execution becomes impossible.

Here the result of decisiveness & purpose of action. You create green waves. When all the lights are green, you can ride the green wave. No starting, stopping, but a fluid highly effective way of moving ahead towards your target.

You’re not always in that situation and the light will turn orange & red along the way. That’s live and it’s not too bad unless you get caught in deadlock traffic jams during rush hour.

That situation requires a solution as it’s stressing, frustrating and detrimental to achieving your goals. In extreme case the time between the colors becomes shorter and shorter and eventually drops to zero …

There is another form of deadlock. Doing everything for everyone at the same time to avoid making choices. All the lights are on, on all sides, at all times. You do not get a clear signal or guidance.

Indecisive action kills or grinds you to a halt. Whatever the case you’re losing time and fail to reach your goals. Either by doing everything for everyone at the same time or by being stuck being in a mess. Game over.

SMB Direct: Choosing A Flavor

Posted on December 8, 2014 by workinghardinit

I often get asked what to buy for implementing SMB Direct. It’s a non trivial question actually and I’m not an expert, nor do I play one on TV. All joking aside, it’s a classical consulting answer: it depends. I don’t do free consulting in a blog post, even if that was possible, as there are many factors such as the characteristics and futures of your organization. There’s also a lot of FUD & marketing flying around. Basically in real life you only have two vendors: Cheslio (iWarp) and Mellanox (Roce/Infiniband). Hard to say which one is best. You make the best choice for your company and you live with it.

There is talk about other vendors joining the SMB Direct market. But it seems to be taking a while. This is not that strange. I’ve understood that in the early days of this century iWarp got a pretty bad reputation due to the many issues around it. Apparently offloading the TCP/IP stack to the NIC, which is what iWarp does is not an easy endeavor. Intel had some old Net card a couple of years ago but has gotten out of the game. Perhaps they’ll step back in but that might very well take a couple of years.

Other vendors like Broadcom, Emulex & QLogic might be working on solutions but I’m not holding my breath. Broadcom has DCB and has been hinting at RDMA in it’s NICs for many years but as of the writing of this post there is nothing functional out there yet. But bar the slowness (is complexity slowing the process?) it will be very interesting to see what they’ll choose: RoCE or iWarp. That choice might be the most public statement we’ll ever see about what technology seems like the best bet for these companies. But be careful, I have seen technology choices based on working/living with design choices at at another level due to constrictions in hardware & software that are no longer true today. So don’t just do blindly what others do.

Infiniband will remain a bit more of a niche I think and my guess is that RoCE is the big bet of Mellanox for the long term. 10Gbps and higher Ethernet switches are sold to everyone in the world. Infiniband, not so much. Does that make it a bad choice? Nope, it all depends. Just like FC is not a bad choice for everyone today, it depends.

Your options today

The options you have today to do SMB Direct are rather limited and bound to the different flavors and their vendor. Yes vendor not vendors.

iWarp: Chelsio
RoCE: Mellanox (v2 of RoCE has brought routability into the game, which counters one of iWarps biggest advantages, next to operational ease but the no fuss about DCB story might not be 100% correct, the question is if this matters, after all many people do well with iSCSI which is easy but has performance limits).
Infiniband: Mellanox (Qlogic was the only other remaining one, but Intel bought it form them. I have never ever seen Intel Infiniband in the wild.

Note: You can do iWarp (and even RoCE in theory) without DCB but in all realistic high traffic situations you’ll want to implement PFC to keep the experience and results good under load. Especially the ports connecting to the SOFS nodes could other wise potentially drop packets. iWarp, being TCP/IP, will handle dropped packets but possibly at the cost of deteriorated performance. With RoCE you’re basically toast if you lose packets, it should be losses. I’m not too convinced that pure offloaded TCP/IP scales. Let’s face it, what was the big deal about lossless iSCSI => DCB I would really love to see Demartek testing these things out for us.

If you have a smaller environment, no need for routing and minimal politics I have seen companies select Infiniband which per Gbps is very cheap. Lots of people have chosen iWarp due to it simplicity (which they heavily market) and routability. The popularity however has dropped due to prices hikes that came with increased demand and no competition. RoCE is popular (I see it the most) and affordable but for this one you MUST do at least PFC. DCB support on switches is not an issue, even budget friendly DELL PowerConnect N4000 series supports it as did it’s predecessor the PC8100 series. Meaning if you have bought switches in the past 24 months and did your home work you’re good to go. Are routability and distance important? Well perhaps not that much today but as the trend in networking is heading for layer 3 down to the rack which will be more acceptable when we see a lot of the workload goodness in hypervisors (Live Migration, vMotion,yes there is work being done on that) being lit up in layer 3 it might become a key feature.

More Tips On Dealing With Removing Short File Names When Migrating To a SMB3 Transparent Failover File Server Cluster

Posted on December 1, 2014 by workinghardinit

You might have read my blog posts on the capabilities and the process of migrating to a Transparent Failover File Server. If not, here they are:

These are a good read with some advice from real world experience and in this post I’ll offer some more tips. I’ve discussed the need to disable and get rid of short file names in my blog and offered other tips to prepare for your migration and get your file share LUNs in tip top, modern shape. But what if you run into short file name issues where you can seem to get rid of them?

Well here’s 3 more things to check:

1) Get rid of the shadow copies used for Previous Versions

The reason you’d better get rid of them is that they can also contain short files names & way to long path or file names. We don’t want them to ruin the party so we remove them all by disabling shadow copies on the LUNs to be copied. We can enable them again once the LUN is up and running in the new file cluster.

2) The logs indicate there are short file names you don’t have access to

If the NFTS permissions on the folder & file structure are OK you should not have to much problems bar some files being locked by being in use. Rerunning the fsutil command prior to migrating with the server service stopped will prevent any connectivity and use of file shares by people ignoring the request to log of or shut down their clients or automated jobs that otherwise keep accessing them.

But you might still get some indications in the log file(s) that state you can remove certain file names.

There is the good old trick of running your command under SYSTEM. That those the job! That helps get rid of short file name instances of folders where you normally don’t get access to. If system has rights you’ll be fine whether it’s a system folder or not.To do this the Sysinternals tools come in handy once again. You can launch a command prompt running under the NT AUTHORITYSYSTEM account using psexec.exe by running the following from a elevated command prompt:

psexec -i -s cmd.exe or psexec -s cmd.exe

The-s switch runs the remote process in the System account. Psexec temporarily installs a service "psexec running psexesvc.exe" on the remote computer (or locally if that’s what you doing) which is removed when the app or process that’s running is closed. It’s obvious now I hope why you need an elevated command prompt to run this command.

Now should you do this by default? Nope. Just when you need to and as always have a realistic backup plan, a way to recover when things go south.

3) Anti virus sometime prevents the removal of short file names

Disable Anti-Virus, sometimes it holds a temporary entry in the registry for the file involved. At least that’s what I’ve seen as a transient issue in some of the large number of logs I gathered. Yeah, I ran a lot of fsutil against large NTFS volumes. What can I say. Due diligence pays off!

4) Run ChkDsk

Just make sure the volume is healthy and no repairs are needed. If your migrating from and older file server there might be outstanding issues and a check disk on volumes with lot’s of files take time. Some of the ones I’ve dealt with had more that 2 million files on a 2TB LUN and it it can take 24 hours. Fun when you have 10 LUNs :-/

Working Hard In IT

My view on IT from the trenches