Happy New Year from a renewed Microsoft MVP in 2016

Happy New Year from a renewed Microsoft MVP in 2016

It’s January 1st 2016, late in the afternoon here local time and I have just received great news to start the new year with. It came by way of an e-mail notifying me I have been renewed as a Microsoft Most Valued Professional (MPV).

The Microsoft MVP Award provides us the unique opportunity to celebrate and honor your significant contributions and say “Thank you for your technical leadership.”


So it’s time for a happy New year from a renewed Microsoft MVP in 2016. My expertise is now Cloud and Data Center Management. It’s quite an honor to be renewed. Somewhere people think I make a big enough difference to be recognized, that caresses my ego just a little bit. More importantly however it means I get the opportunity to keep working with a lot of passionate and talented people. The ability to participate in a global community and ecosystem focused on our areas of expertise is something I have enjoyed for many years now. Attending the MVP Summit is the cherry on the cake and they sure do make you feel welcome at every place you stop on and around the campus.


My fellow MVPs are always very helpful, they are both an inspiration as well as a source of tremendous experience and knowledge. Being a MVP has opened opportunities to both learn and teach, both professionally and personally. That’s what enabled me to grow in depth and breadth within my areas of expertise which ultimately translates into our new expertise assignment, cloud and datacenter management.

Thank you!

It’s a good time to wish you all a happy New Year. Let me take a moment to express my gratitude to all loyal or accidental readers of WorkingHardInIT. A blog without readers would be a sad thing but luckily you’re all reading this blog more and more, year after year.


I’m grateful you for your continued support and spending the time reading my blog. To the people, businesses and organizations that given me so many opportunities and support and with whom I had the pleasure to work with in 2015, I say thank you and let’s continue to do so. I wish you all a marvelous 2016 with lots of joy, good health and tons fun at and outside of work!

The road ahead

2016 will be an interesting year. There’s a lot going on in our industry, some of it is hype, a lot of it is real. That reality is sometimes sobering but often inspiring. Keep cool, don’t panic or go ballistic. Smart discipline with a good portion of common sense, insights and a solid, yet flexible plan wins the day. You’ll also need some luck and turn up at the right place at the right time every now and then, ready to make the most of an opportunity. You get the idea.

There are and have been, as always, personal and professional challenges. That’s a given. Only newbies and idiots make picture perfect plans. They then get “dazzled” by the first punch on their snout which sends their plans falling apart like shattered glass. Sometimes the challenges are bigger and harder. This can mean you need to work even harder, smarter and perhaps even longer. It can also mean to cut your losses and disengage. No matter how good you are, how long, hard and smart you work, you cannot right all wrongs in this world. Leave that to the self-promoting LinkedIn blogs on “personal success and growth” aimed at ridiculously entitled people or the painfully naïve.


2016 will also know its challenges. They will be met with all the attention and dedication required where and when needed. They will be passed by or ignore where the effort just isn’t worthwhile. There’re good places to go, nice things to do and great people to meet. If I can seize as many opportunities in 2016 (TechEd, ITPROCeed, E2EVC, VEEAMON, Microsoft MVP Summit, ExpertsLive)  like I have been able to do in 2015 I’ll be a happy man, both professionally and personally.

How to get a dream job in 2016?

I’ve been asked that a couple of times. I’m not the one  for handing out personal advice, that would only shock your parents and potentially shake your worldview as well. Professionally I’d say, your profession, your career is not the same as your job. It might be, but more often than not it isn’t. That’s OK. You can build a career in your (chosen) profession even despite your job or jobs. Most MVPs work very hard and we put a lot of personal time into our technical skills and community. It isn’t a lifestyle of the rich and famous as some would think when you read a blog about a conference or summit.

VEEAMON 2015 Party

Those are a fun part of work, that’s for sure, but they don’t define our work days. It’s lots of work, learning, sharing and many battles are uphill!. We all have jobs that require us to do things we’d rather not have to do. Do what you need to do to stay afloat but try to do as much of what you like and enjoy it as possible. Do it smart and don’t waste your time or let others waste yours. The latter is something you should not do to other people either. When it comes to jobs it’s not all that simple as the sloganesque “Do what you love, versus work for money/the man/a pension/security” for most people. Sure most don’t like to admit that they have to take crap, but we all do. Anything else is as much BS as every employer that seems to pretend everybody has to be and is an engaged, inspired team player who’s going all out for the company, beyond and above what the job demands. That’s a bit too much like Office Space’s “Is this good for the company?” for comfort 😉

SMB Direct: Choosing A Flavor

I often get asked what to buy for implementing SMB Direct. It’s a non trivial question actually and I’m not an expert, nor do I play one on TV.  All joking aside, it’s a classical consulting answer: it depends. I don’t do free consulting in a blog post, even if that was possible, as there are many factors such as the characteristics and futures of your organization. There’s also a lot of FUD & marketing flying around. Basically in real life you only have two vendors: Cheslio (iWarp) and Mellanox (Roce/Infiniband). Hard to say which one is best. You make the best choice for your company and you live with it.

There is talk about other vendors joining the SMB Direct market. But it seems to be taking a while. This is not that strange. I’ve understood that in the early days of this century iWarp got a pretty bad reputation due to the many issues around it. Apparently offloading the TCP/IP stack to the NIC, which is what iWarp does is not an easy endeavor. Intel had some old Net card a couple of years ago but has gotten out of the game. Perhaps they’ll step back in but that might very well take a couple of years.

Other vendors like Broadcom, Emulex & QLogic might be working on solutions but I’m not holding my breath. Broadcom has DCB and has been hinting at RDMA in it’s NICs for many years but as of the writing of this post there is nothing functional out there yet. But bar the slowness (is complexity slowing the process?) it will be very interesting to see what they’ll choose: RoCE or iWarp. That choice might be the most public statement we’ll ever see about what technology seems like the best bet for these companies. But be careful, I have seen technology choices based on working/living with design choices at at another level due to constrictions in hardware & software that are no longer true today. So don’t just do blindly what others do.

Infiniband will remain a bit more of a niche I think and my guess is that RoCE is the big bet of Mellanox for the long term. 10Gbps and higher Ethernet switches are sold to everyone in the world. Infiniband, not so much. Does that make it a bad choice? Nope, it all depends. Just like FC is not a bad choice for everyone today, it depends.

Your options today

The options you have today to do SMB Direct are rather limited and bound to the different flavors and their vendor. Yes vendor not vendors.

  1. iWarp: Chelsio
  2. RoCE: Mellanox (v2 of RoCE has brought routability into the game, which counters one of iWarps biggest advantages, next to operational ease but the no fuss about DCB story might not be 100% correct, the question is if this matters, after all many people do well with iSCSI which is easy but has performance limits).
  3. Infiniband: Mellanox (Qlogic was the only other remaining one, but Intel bought it form them. I have never ever seen Intel Infiniband in the wild.

Note: You can do iWarp (and even RoCE in theory) without DCB but in all realistic high traffic situations you’ll want to implement PFC to keep the experience and results good under load. Especially the ports connecting to the SOFS nodes could other wise potentially drop packets. iWarp, being TCP/IP, will handle dropped packets but possibly at the cost of deteriorated performance. With RoCE you’re basically toast if you lose packets, it should be losses. I’m not too convinced that pure offloaded TCP/IP scales. Let’s face it, what was the big deal about lossless iSCSI => DCB Smile I would really love to see Demartek testing these things out for us.

If you have a smaller environment, no need for routing and minimal politics I have seen companies select Infiniband which per Gbps is very cheap. Lots of people have chosen iWarp due to it simplicity (which they heavily market) and routability. The popularity however has dropped due to prices hikes that came with increased demand and no competition. RoCE  is popular (I see it the most) and affordable but for this one you MUST do at least PFC. DCB support on switches is not an issue, even budget friendly DELL PowerConnect N4000 series supports it as did it’s predecessor the PC8100 series. Meaning if you have bought switches in the past 24 months and did your home work you’re good to go. Are routability and distance important? Well perhaps not that much today but as the trend in networking is heading for layer 3 down to the rack which will be more acceptable when we see a lot of the workload goodness in hypervisors (Live Migration, vMotion,yes there is work being done on that) being lit up in layer 3 it might become a key feature.

Microsoft Keeps Investing In Storage Big Time

Disclaimer: These are my musing on the limited info available about Windows Server vNext and based on the Technical Preview bits at the time of writing. So it’s not set in stone & has a time limited value.

Reading the documentation that’s already available on vNext of Windows it’s clear that Microsoft is continuing it’s push towards the software defined data center. They are also pushing high to continuous availability ever more towards the  “continuous” side of things.

It’s early days yet and we just only downloaded the Technical Preview but what do we read in What’s New in Storage Services in Windows Server Technical Preview

Storage Quality of Service

  • They are giving us more Storage Quality of Service tied into the use of SOFS as storage over SMB3. As way to many NAS solutions don’t support SMB3 or only partially (in a restricted way) it’s clear too me that self build SOFS solution on a couple of servers is and remains the best SMB3 implementation on the market and has just gotten storage QoS.

Little Rant here: To the people that claim that this is not capable of high performance, I usually laugh. Have you actually build a SOFS or TFFS with 10Gbps networking on modern enterprise grade servers line the DELL R720 or 730? Did you look at the results form that relative low cost investment? I think not, really. And if you did and found it lacking, I’ll be very impressed of the workload you’re running.  You’ll force your storage to the knees earlier than your Windows file server nowadays.

  • It’s in the SOFS layer, so this does not tie you into to Storage Space if you’re not ready for that yet but would like the benefits of SOFS. As long as you have shared storage behind the SOFS you’re good.
  • It’s policy based and can apply to virtual machines, groups of virtual machines a service or a tenant
  • The virtual disk is the level where the policy is set & enforced.
  • Storage performance will dynamically adjust to meet the policies & when tied the performance will be fairly distributed.
  • You can monitor all this.

It’s right there in the OS.

Storage Replica

This gives us “storage-agnostic, block-level, synchronous replication between servers for disaster recovery, as well as stretching of a failover cluster for high availability. Synchronous replication enables mirroring of data in physical sites with crash-consistent volumes ensuring zero data loss at the file system level. Asynchronous replication allows site extension beyond metropolitan ranges with the possibility of data loss.”

Look for Hyper-V we already had Hyper-V replica (which is also being improved), but for other workloads we still rely on the storage vendors or 3rd party solutions. But now I can have my storage replicas for service protection and continuity out of the box with Windows.  WOW!

and as we read on ..

  • Provide an all-Microsoft disaster recovery solution for planned and unplanned outages of mission-critical workloads.
  • Use SMB3 transport with proven reliability, scalability, and performance.
  • Stretch clusters to metropolitan distances.
    Use Microsoft software end to end for storage and clustering, such as Hyper-V, Storage Replica, Storage Spaces, Cluster, Scale-Out File Server, SMB3, Deduplication, and ReFS/NTFS.
  • Help reduce cost and complexity as follows:

Hardware agnostic, with no requirement to immediately abandon legacy storage such as SANs.

Allows commodity storage and networking technologies.
Features ease of graphical management for individual nodes and clusters through Failover Cluster Manager and Microsoft Azure Site Recovery.

Includes comprehensive, large-scale scripting options through Windows PowerShell.

  • Helps reduce downtime, and increase reliability and productivity intrinsic to Windows.
  • Provide supportability, performance metrics, and diagnostic capabilities.

I have gotten this to work in the lab with some trial and error but this is the Technical Preview, not a finish product. If they continue along this path I’m pretty confident we’ll have functional & operational viable solution by RTM. Just think about the possibilities this brings!

Storage Spaces

Now I have not read much on Storage Space in vNext yet but I think its safe to assume we’ll see major improvements there as well. Which leads me to reaffirm my blog posy here: TechEd 2013 Revelations for Storage Vendors as the Future of Storage lies With Windows 2012 R2

Microsoft is delivering more & great software defined storage inbox. This means cost effective yet very functional storage solutions. On top of that they put pressure on the market to deliver more value if they want to stay competitive. As a customer, whatever solution fits my needs the best, I welcome that. And as a consumer of large amounts of storage in a world where we need to spend the money where it matters most I like what I’m seeing.

Tip for Microsoft: configurability, reliability and EASY diagnostics and remediation are paramount to success. Sure some storage vendor solution aren’t to great on that front either but some are awesome. Make sure your in the awesome category. Make it a great user experience from start to finish in both deployment and operations.

Tip for you: If you’re not ready for prime time with Storage Spaces , SMB Direct etc … do what I’ve done. Use it where it doesn’t kill you if you hit some learning curves. What about storage spaces as a backup target where you can now replicate the backups of to your disaster recovery site?

A reality Check On Disaster Recovery & Business Continuity


Another blog post in “The Dilbert Life Series®” for those who are not taking everything personal. Every time business types start talking about business continuity, for some reason, call it experience or cynicism, my bull shit & assumption sensors go into high alert mode. They tend to spend a certain (sometimes considerable) amount of money on connectivity, storage, CPUs at a remote site, 2000 pages of documentation and think that covers just about anything they’ll need. They’ll then ask you when the automatic or 5 minute failover to the secondary site will be up and running. That’s when the time has come to subdue all those inflated expectations and reduce the expectation gap between business and IT as much as possible. It should never have come to that in the first place. But in this matter business people & analysts alike, often read (or are fed) some marchitecture docs with a bunch of sales brochures which make it al sound very easy and quickly accomplished. They sometimes think that the good old IT department is saying “no” again just because they are negative people who aren’t team players and lack the necessary “can do attitude” in world where their technology castle is falling down. Well, sorry to bust the bubble, but that’s not it. The world isn’t quite that black and white. You see the techies have to make it work and they’re the ones who have to deal with the real. Combine the above with a weak and rather incompetent IT manager bending over to the business (i.e. promising them heaven on earth) to stay in there good grace and it becomes a certainty they’re going to get a rude awakening. Not that the realities are all that bad. Far from it, but the expectations can be so high and unrealistic that disappointment is unavoidable.

The typical flow of things

The business is under pressure from peers, top management, government & regulators to pay attention to disaster recovery. This, inevitably leads to an interest in business continuity. Why, well we’re in a 24/7 economy and your consumer right to buy a new coffee table on line at 03:00 AM on a Sunday night is worth some effort.  So if we can do it for furniture we should certainly have it for more critical services. The business will hear about possible (technology) solutions and would like to see them implemented. Why wouldn’t they? It all sounds effective and logical. So why aren’t we all running of and doing it? Is it because IT is a bunch of lazy geeks playing FPS games online rather than working for their mythically high salaries? How hard can it be? It’s all over the press that IT is a commodity, easy, fast, dynamic and consumer driven so “we” the consumers want our business continuity now! But hey it costs money, time, a considerable and sustained effort and we have to deal with the less than optimal legacy applications (90% of what you’re running right now).

Realities & 24/7 standby personnel

The acronyms & buzz words the business comes up with after attending some tech briefing by Vendors Y & Z (those are a bit like infomercials but without the limited value those might have Sarcastic smile) can be quite entertaining. You could say these people at least pay attention to the consumerized business types. Well actually they don’t, but they do smell money and lots of it. Technically they are not lying. In a perfect world things might work like that … sort of, some times and maybe even when you need it. But it will really work well and reliable. Sure that’s not the vendors fault. He can’t help  that the cool “jump of a cliff” boots he sold you got you killed. Yes they are designed to jump of a cliff but anything above 1 meter without other precautions and technologies might cause bodily harm or even death. But gravity and its effects in combination with the complexity of your businesses are beyond the scope of their product solutions and are entirely your responsibility. Will you be able to cover all those aspects?

Also don’t forget the people factor. Do you have the right people & skill sets at your disposal 24/7 for that time when disaster strikes? Remember that could be on a hot summer night in a weekend when they are enjoying a few glasses of wine at a BBQ party and not at 10:15 AM on a Tuesday morning.

So what terminology flies around?

They hear about asynchronous or even synchronous replication of storage of applications. Sure it can work within a data center, depending on how well it is designed and setup. It can even work between data centers, especially for applications like Exchange 2010. But let’s face it, the technical limitations and the lack of support for this in many of the legacy applications will hinder this considerably.

They hear of things like stretched clusters and synchronous storage replication. Sure they’ll sell you all kinds of licensed features to make this works at the storage level with a lot of small print. Sometimes even at the cost of losing functionality that makes the storage interesting in the first place. At the network level anything below layer 3 probably suffers from too much optimism. Sure stretched subnets seem nice but … how reliable are these solutions in real live?

Consider the latency and less reliable connectivity.You can and will lose the link once in a while. With active-active or active-passive data centers that depend on each other both become single points of failure. And then there are all the scenarios where only one part of the entire technology stack that makes everything work fails. What if the application clustering survives but not the network, the storage or the database? You’re toast any way. Even worse, what if you get into a split brain scenario and have two sides writing data. Recover from that one my friend, there’s no merge process for that, only data recovery. What about live migration or live motion (state, storage, shared nothing) across data centers to avoid an impending disaster? That’s a pipe dream at the moment people. How long can you afford for this to take even if your link is 99.999% reliable? Chances are that in a crisis things need to happen vast to avoid disaster and guess what even in the same data center, during normal routine operations, we’re leveraging <1ms latency 10Gbps pipes for this. Are we going to get solutions that are affordable and robust? Yes, and I think the hypervisor vendors will help push the entire industry forward when I see what is happening in that space but we’re not in Walhalla yet.

Our client server application has high availability capabilities

There are those “robust and highly available application architectures” (ahum) that only hold true if nothing ever goes wrong or happens to the rest of the universe. “Disasters” such as the server hosting the license dongle that is rebooted for patching. Or, heaven forbid, your TCP/IP connection dropped some packages due to high volume traffic. No we can’t do QoS on the individual application level and even if we could it wouldn’t help. If your line of business software can’t handle a WAN link without serious performance impact or errors due to a dropped packet, it was probably written and tested on  <1ms latency networks against a database with only one active connection. It wasn’t designed, it was merely written. It’s not because software runs on an OS that can be made highly available and uses a database that can be clustered that this application has any high availability, let alone business continuity capabilities. Why would that application be happy switching over to another link. A link that is possibly further away and running on less resources and quite possibly against less capable storage? For your apps to works acceptably in such scenarios you would already have to redesign them.

You must also realize that a lot of acquired and home written software has IP addresses in configuration files instead of DNS names. Some even have IP addresses in code.  Some abuse local host files to deal with hard coded DNS names … There are tons of very bad practices out there running in production. And you want business continuity for that? Not just disaster recovery  to be clear but business continuity, preferably without dropping one beat. Done any real software and infrastructure engineering in your life time have you? Keeping a business running often looks like a a MacGyver series. Lots creativity, ingenuity, super glue, wire, duct tape and Swiss army knife or multi tool. This is still true today, it doesn’t sound cool to admit to it, but it needs to be said.

We can make this work with the right methodologies and strict processes

Next time you think that, go to the top floor and jump of, adhering to the flight methodologies and strict processes that rule aerodynamics. After the loud thud due to you hitting the deck, you’ll be nothing more than a pool of human waste. You cannot fly. On top of unrealistic scenarios things change so fast that documentation and procedures are very often out of date as soon as they are written.

Next time some “consultants” drop in selling you products & processes with fancy acronyms proclaiming rigorous adherence to these will safe the day consider the following. They make a bold assumption given the fact they don’t know even 10% of the apps and processes in your company. Even bolder because they ignore the fact that what they discover in interviews often barely scratches the surface. People can only tell you what they actually know or dare tell you. On top of that any discovery they do with tools is rather incomplete. If the job consist of merely pushing processes and methodologies around without reality checks you could be in for a big surprise. You need the holistic approach here, otherwise it’s make believe. It’s a bit like paratrooper training for night drops over enemy strong holds, to attack those and bring ‘m down. Only the training is done in a heated class room during meetings and on a computer. They do not ever put on all their gear, let alone jump out of an aircraft in the dead of night, regroup, hump all that gear to the rally points and engage the enemy in a training exercise. Well people, you’ll never be able to pull of business continuity in real life either if you don’t design and test properly and keep doing that. It’s fantasy land. Even in the best of circumstances no plan survives it first contact with the enemy and basically you would be doing the equivalent of a trooper firing his rifle for the very first time at night during a real engagement. That’s assuming you didn’t break your neck during the drop, got lost and managed to load the darn thing in the first place.

You’re a pain in the proverbial ass to work with

Am I being to negative? No, I’m being realistic. I know reality is a very unwelcome guest in fantasy land as it tends to disturb the feel good factor. Those pesky details are not just silly technological “manual labor” issues people. They’ll kill your shiny plans, waste tremendous amounts of money and time.

We can have mission critical applications protected and provide both disaster recovery and business continuity. For that the entire solution stack need to be designed for this. While possible, this makes things expensive and often only a dream for custom written and a lot of the shelf software. If you need business continuity, the applications need to be designed and written for it. If not, all the money and creativity in the world cannot guarantee you anything. In fact they are even at best ugly and very expensive hacks to cheap and not highly available software that poses as “mission critical”.


Seriously people, business continuity can be a very costly and complex subject. You’ll need to think this through. When making assumptions realize that you cannot go forward without confirming them. We operate by the mantra “assumptions are the mother of al fuckups” which is nothing more than the age old “Trust but verify” in action. There are many things you can do for disaster recovery and business continuity. Do them with insight, know what you are getting into and maybe forget about doing it without one second of interruption for your entire business.

Let’s say disaster strikes and the primary data center is destroyed. If you can restart and get running again with only a limited amount of work and productivity lost, you’re doing very well. Being down for only a couple of hours or days or even a week, will make you one of the top performers. Really! Try to get there first before thinking about continuous availability via disaster avoidance and automatic autonomous failovers.

One approach to achieve this is what I call “Pandora’s Box”. If a company wants to have business continuity for its entire stack of operations you’ll have to leave that box closed and replicate it entirely to another site. When you’re hit with a major long lasting disaster you eat the down time and loss of a certain delta, fire up the entire box in another location. That way you can avoid trying to micro manage it’s content. You’ll fail at that anyway. For short term disasters you have to eat the downtime. Deciding when to fail over is a hard decision. Also don’t forget about the process in reverse order. That’s another part of the ball game.

It’s sad to see that more money is spend consulting & advisers daydreaming than on realistic planning and mitigation. If you want to know why this is allowed to happen there’s always my series on The do’s and don’ts when engaging consultants Part I and Part II. FYI, the last guru I saw brought into a shop was “convinced” he could open Pandora’s Box and remain in control. He has left the building by now and it wasn’t a pretty sight, but that’s another story.