We Need Your Opinion On This Strategy, Vision, Management Issue …

Could you give us your opinion on this?

Lately people, managers, have asked me to give advice or at least my opinion on how to organize & manage IT. In the broad sense of the term. Infrastructure, software, services, support, on premise, cloud, data protection, security …  “Just think about it a bit”.

That question “Could you give us your opinion on this?” is a hard one for me.  I could say “read my blog”, the non technical posts. But my opinion is often too high level and they don’t they actually want that. They want a solution. And it’s not that I don’t think about it or don’t have an opinion. But I can’t focus on areas out of my expertise, my control and priorities.

Basically I cannot help them. Not because I’m that stupid or the matter is beyond our control. It’s because the way managers and organizations think is getting more and faster obsolete by the day.

The Issue

Our world, both privately and work related, is becoming more and more connected every day. That means there is a tremendous amount of input, leading to an ever continuing increase of permutations of ever more variables that come in to play. In short, complexity is on the rise at an enormous rate and will overwhelm us. Even worse is that this complexity only shows itself after things have gone wrong. That’s bad but, that also means there are probably many more relationships of cause and effect that haven’t even shown themselves yet. That kind of sounds like a time bomb.

How do you deal with this? Not in the way so many are asking for. And I’m not here to tell my managers or customers what they want to hear. I’m in the business of telling them what they need to hear as I deal in results, not services or studies. More often than not they are looking for processes and methodologies to keep central control over planning, execution, operations and change. All this while the rug is literally pulled away under their feet. There’s the problem.

Situations, technologies, solutions, frameworks, processes all have a time limited value that’s becoming shorter. So the idea that you can plan and control for many years ahead is obsolete in many areas in our ecosystem. There are just to many moving parts, that are changing too fast. So how do we manage this? What kind of leadership do you need? Well there is no easy answer.

How do I deal with this?

Personally I deal with this by working, collaborating & cooperating in a network, in “the community”. My insights, knowledge, help and support come from my network. Some of my colleagues, the contractors and consultants we hire are in that network. A lot of colleagues are not. Most managers are not. Why is that? They are stuck in a hierarchal world of centralized command and control that is failing them fast. At best they achieve good results, but very slow and at a very high expense. We can only hope that the results also don’t turn out bad. They want procedures & processes. Predictability & consistency but I deal with complexity in wide area of expertise that cannot readily be put into manuals and documentation. Not in a timely fashion. I’m in a dog fight (insert “Top Gun” theme). The processes & logistics provide the platform. Learn where procedures & methodologies work and where they’ll kill you. The knowledge and the skills we need are a living thing that feeds on a networked collective and are very much in flux.  I’m so much more better skilled and effective at my job through participating my global community than I can be tied into the confines of my current workplace they’d be mad not to leverage that, let alone prevent me from doing so. You can’t do it alone or in isolation.

An example

Yesterday was an extreme example in a busy week. I started work at 05:30 AM yesterday to set up a testing environment for questions I needed answered by a vendor who leverages the community at large. That’s required some extra work in the datacenter that I could have done by a colleague that was there today because I found out in time. I went to the office at 08:30. I worked all day on an important piece of work I mentioned in my network and was alerted to a potential issue. That led to knowledge sharing & testing. Meaning we could prevent that very potential issue and meanwhile we’re both learning. I went home at 18:30, dinner & testing. I was attending an MVP web cast at 20:00 PM till 21:00 PM learning new & better ways to trouble shoot clusters. I got a call at 19:10PM of a mate in Switzerland who’s running into SAN issues and I helped him out with the two most possible causes of this through my experience with SANs and that brand of HP SAN.  We did some more testing & research until 22:00 after which I wrote this blog up.

We don’t get paid for this. This is true mutual beneficial cooperation. We don’t benefit directly and it’s not “our problem” or job goal. But oh boy do we learn and grow together and in such help each other and our employers/customers. It’s a true long term investment that pays of day by day the longer you are active in the community and network. But the thing is, I can’t put that into a process or manual. Any methodology that has to serve centralized command and control structure while dealing with agile subjects is bound to fail. Hence you see agile & scrum being abused to the level it’s just doing stuff without the benefits.

Conclusion

This is just one small and personal example. Management and leadership will have to find ways of nurturing collaboration and cooperation beyond the boundaries of their control. The skillset and knowledge needed are not to be found in a corporate manual or in never ending in house meetings & committees. Knowledge gained has to flow to grow As such it flows both in an out of your organization. You’re delusional if you think you can stop that today and it’s not the same a leaking corporate secrets. Hierarchies & management based on rank and pay grades are going to fail. And if those managers in higher pay grades can’t make the organization thrive in this ever more connected, faster moving world, they might not be worth that pay grade.

I assure you that employees and consultants who live in the networked global community will quickly figure out if an organization can handle this. They will not and should not do their managers job. In fact they are already doing managers areal big favor by working and operating the way they do. They are leading at their level, they are leveraging their networks and getting the job done. They are taking responsibilities, they solve problems creatively and get results. It just doesn’t fit easily in an obsolete model of neatly documented procedures in a centralized command and control structure. They don’t need a manager for that, they need one that will make it possible to thrive in that ultra-connected ever changing fast paced world. Facilitate, stimulate and reward learning and taking responsibilities, not hierarchies. That way all people in your organization will lead or at least contribute to the best of their ability. You’ll need to trust them for that to work. If you don’t trust them, fine, but act upon it. Letting people you don’t trust work for and with you doesn’t work.

How to do this is a managers & leaders challenge. Not mine. I know when I’m out of my depth or when not to engage. The grand visions, the strategic play of a company is their responsibility. Getting results & moving forward will come from your perpetually learning, and engaged workforce, if you don’t mess it up. And yes, that is your responsibility. Cultures are cultivated by definition. So if the culture of the company is to blame for things going south, realize you’re the ones supposed to make it a good one. People don’t leave organizations, they leave managers 😉 And to paraphrase the words of Walt Disney … you’re in a world of hurt if they leave you but stay at their desk and on the pay roll. It’s called mediocrity, which also serves a purpose, providing commodities & cookie template services whilst letting others shine. But if you want to be a thriving, highly skilled, expertise driven center of excellence … it’s going to take lot of hard and sustained work and it’s not a one way street.

Live Migration over SMB Direct leaves more CPU cycles for Virtual RSS (vRSS) in Windows Server 2012 R2

I recently (January 22nd 2014) gave a WebCast presentation for the Dutch Windows Management User Group (@WMUG_NL) in which I made the case for using SMB Direct with Live Migration to save CPU cycles other (VM) workloads. There are several areas where the CPU cycles are better spent but I used vRSS to show case one scenario.

We’re using a 2 node Windows Server 2012 R2 Hyper-V cluster on Dell PowerEdge R720 servers with Mellanox ConnectX-3 (CSV  &  live migration) and Intel X520-DA (Hyper-V switch), all 10Gbps.

This is what a CPU bottleneck looks like that can be solved by using vRSS in Windows Server 2012 R2.image

The host machines are Hyper Threading enabled. The virtual switch is attached to a switch independent NIC team with dynamic mode. In this setup it’s normal that the sending VM is leveraging both members while the receiving VM traffic is coming in over one member of the host team.

No let’s enable vRSS in the VM and see what this does for this picture.image

Pretty impressive isn’t it. DidierTest03 is the sending VM running on host A and DidierTest04 is the receiving VM that has vRSS enabled and is running on Host B. For vRSS you need both hosts and VMs to run Windows Server 2012 or Windows 8.1. You can see the load is spread across 7 vCPUs in the VM. DidierTest04 has 8 vCPUs. I configured vRSS in the VM to be able to use 7 vCPUs and leave vCPU 0, the default one, alone to handle those workloads.

image

Given multiple Logical CPUs & vCPUs we can get line speed with 10Gbps inside a virtual machine. This, ladies and gentlemen is a thing of beauty.

Now tell me, if you have business related needs for those CPU cycles why would you not offload the work that needs to be done for live migration to the NIC via SMB direct? This is about getting maximum VM density, performance & ROI form your infrastructure, whilst saving on servers, power and cooling. When you see the smile on your clients or bosses face, just say “you’re welcome” and smile back Open-mouthed smile.

Failed Live Migrations with Event ID 21502 Planned virtual machine creation failed for virtual machine ‘VM Name’: An existing connection was forcibly closed by the remote host. (0x80072746) Caused By Wrong Jumbo Frame Settings

OK so Live Migration fails and you get the following error in the System even log with event id 21502:

image

Planned virtual machine creation failed for virtual machine ‘DidierTest01’: An existing connection was forcibly closed by the remote host. (0x80072746). (Virtual Machine ID 41EF2DB-0C0A-12FE-25CB-C3330D937F27).

Failed to receive data for a Virtual Machine migration: An existing connection was forcibly closed by the remote host. (0x80072746).

There are some threads on the TechNet forums on this like here http://social.technet.microsoft.com/Forums/en-US/805466e8-f874-4851-953f-59cdbd4f3d9f/windows-2012-hyperv-live-migration-failed-with-an-existing-connection-was-forcibly-closed-by-the and some blog post pointing to TCP/IP Chimney settings causing this but those causes stem back to the Windows Server 2003 / 2008 era.

In the Hyper-V event log Microsoft-Windows-Hyper-V-VMMS-Admin you also see a series of entries related to the failed live migration point to the same issue: image

  
Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          10/8/2013 10:06:15 AM
Event ID:      20413
Task Category: None
Level:         Information
Keywords:     
User:          SYSTEM
Computer:      SRV1.BLOG.COM
Description:
The Virtual Machine Management service initiated the live migration of virtual machine  ‘DidierTest01’ to destination host ‘SRV2’ (VMID 41EF2DB-0C0A-12FE-25CB-C3330D937F27).
 
Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          10/8/2013 10:06:26 AM
Event ID:      22038
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      SRV1.BLOG.COM
Description:
Failed to send data for a Virtual Machine migration: An existing connection was forcibly closed by the remote host. (0x80072746).
 
Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          10/8/2013 10:06:26 AM
Event ID:      21018
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      SRV1.BLOG.COM
Description:
Planned virtual machine creation failed for virtual machine ‘DidierTest01’: An existing connection was forcibly closed by the remote host. (0x80072746). (Virtual Machine ID 41EF2DB-0C0A-12FE-25CB-C3330D937F27).
 
Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          10/8/2013 10:06:26 AM
Event ID:      22040
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      SRV1.BLOG.COM
Description:
Failed to receive data for a Virtual Machine migration: An existing connection was forcibly closed by the remote host. (0x80072746).
Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          10/8/2013 10:06:26 AM
Event ID:      21024
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      srv1.blog.com
Description:
Virtual machine migration operation for ‘DidierTest01’ failed at migration source ‘SRV1’. (Virtual machine ID 41EF2DB-0C0A-12FE-25CB-C3330D937F27)

There is something wrong with the network and if all checks out on your cluster & hosts it’s time to look beyond that. Well as it turns out it was the Jumbo Frame setting on the CSV and LM NICs.

Those servers had been connected to a couple of DELL Force10  S4810 switches. These can handle an MTU size up to 12000. And that’s how they are configured. The Mellanox NICs allow for MTU Sizes up to 9614 in their Jumbo Frame property.  Now super sized jumbo frames are all cool until you attach the network cables to another switch like a PowerConnect 8132 that has a max MTU size of 9216. That moment your network won’t do what it’s supposed to and you see errors like those above. If you test via an SMB share things seem OK & standard pings don’t show the issue. But some ping tests with different mtu sizes & the –f (do no fragment) switch will unmask the issue soon. Setting the Jumbo Frame size on the CSV & LM NICs to 9014 resolved the issue.

Now if on the server side everything matches up but not on the switches you’ll also get an event id 21502 but with a different error message:

Event ID: 21502 The Virtual Machine Management Service failed to establish a connection for a Virtual machine migration with host XXXX. A connection attempt failed because the connected party did not properly respond after a period of time, or the established connection failed because connected host has failed to respond (0X8007274C)

image

This is the same message you’ll get for a known cause of shared nothing live migration failing as described in this blog post by Microsoft Shared Nothing Migration fails (0x8007274C).

So there you go. Keep an eye on those Jumbo Frame setting especially in a mixed switch environment. They all have their own capabilities, rules & peculiarities. Make sure to test end to end and you’ll be just fine.

I’m In Austin Texas For Dell World 2013

This is the night time sky line of where I’m at right now. Austin, Texas, USA. That famous “Lone Star State” that until now I only knew from the movies & the media. Austin is an impressive city in an impressive state and, as most US experiences I’ve had, isn’t comparable with anything in my home country Belgium. That works both ways naturally and I’m lucky I get to travel a bit and see a small part of the world.image

Dell World 2013

So why am I here?  Well I’m here to attend DELL World 2013, but you got that already Smile

image

That’s nice Didier but why DELL World? Well, several reasons. For one, I wanted to come and talk to as many product owners & managers, architects & strategists as I can. We’re seeing a lot of interest in new capabilities that Windows Server 2012 (R2) brought to the Microsoft ecosystem. I want to provide all the feedback I can on what I as a customer, a Microsoft MVP and technologist expect from DELL to help us make the most of those. I’m convinced DELL has everything we need but can use some guidance on what to add or enhance. It would be great to get our priorities and those of DELL aligned. Form them I expect to hear their plans, ideas, opinions and see how those match up. Dell has a cost/value leadership position when it comes to servers right now. They have a great line up of economy switches that pack a punch (PowerConnect) & some state of the art network gear with Force10. it would be nice to align these with guidance & capabilities to leverage SMB Direct and NVGRE network virtualization. Dell still has the chance to fill some gaps way better than others have. A decent Hyper-V network virtualization gateway that doesn’t cost your two first born children and can handle dozens to hundreds of virtual networks comes to mind. That and real life guidance on several SMB Direct with DCB configuration guidance. Storage wise, the MD series, Equalogic & Compellent arrays offer great value for money. But we need to address the needs & interest that SMB 3.0, Storage Spaces, RDMA has awoken and how Dell is planning to address those. I also think that OEMs need to pick up pace & change some of their priorities when it comes to providing answers to what their customers in the MSFT ecosystem ask for & need, doing that can put them in a very good position versus their competitors. But I have no illusions about my place in & impact on the universe.

Secondly, I was invited to come. As it turns out DELL has the same keen interest in talking to people who are in the trenches using their equipment to build solutions that address real life needs in a economical feasible way.  No, this is not “just” marketing. A smart vendor today communicates in many ways with existing & potential customers. Social media is a big part of that but also off line at conferences, events and both contributor and sponsor.  Feedback on how that works & is received is valuable as well for both parties. They learn what works &n doesn’t and we get the content we need. Now sure you’ll have the corporate social media types that are bound by legal & marketing constrictions but the real value lies in engaging with your customers & partners about their real technological challenges & needs.

Third is the fact that all these trends & capabilities in both the Microsoft ecosystem and in hardware world are not happening in isolation. They are happening in a world dominated by cloud computing in all it’s forms. This impact everything from the clients, servers, servers to the data centers as well as the people involved. It’s a world in which we need to balance the existing and future needs with a mixture of approaches & where no one size fits all even if the solutions come via commodity products & services. It’s a world where the hardware  & software giants are entering each others turf. That’s bound to cause some sparks Smile. Datacenter abstraction layer, Software Defined “anything” (storage, networking, …), converged infrastructure. Will they collaborate or fight?

So put these three together and here I am. My agenda is full of meetings, think tanks, panels, briefings and some down time to chat to colleagues & DELL employees alike.

Why & How?

Some time ago I was asked why I do this and why I’m even capable to do this. It takes time, money and effort.  Am I some kind of hot shot manager or visionary guru? No, not at all. Believe there’s nothing “hot” about working on a business down issue at zero dark thirty. I’m a technologist. I’m where the buck stops. I need things to work. So I deal in realities not fantasies. I don’t sell methods, processes or services people, I sell results, that’s what pays my bills long term. But I do dream and I try to turn those into realities. That’s different from just fantasy world where reality is an unwelcome guest. I’m no visionary, I’m no guru. I’m a hard working IT Pro (hence the blog title and twitter handle) who realizes all to well he’s standing on the shoulders of not just giants but of all those people who create the ecosystem in which I work. But there’s more. Being a mere technologist only gets you that far. I also give architectural & strategic advice as that’s also needed to make the correct decisions. Solutions don’t exist in isolation and need to be made in relation to trends, strategies and needs. That takes insight & vision. Those don’t come to you by only working in the data center, your desktop or in eternal meetings with the same people in the same place. My peers, employers and clients actively support this for the benefit of our business, customers, networks & communities. That’s the what, why and who that are giving me the opportunities to learn & grow both personally & professionally. People like Arlindo Alves and may others at MSFT, my fellow MVPs (Aidan Finn, Hans Vredevoort, Carsten Rachfahl, …), Florian Klaffenbach & Peter Tsai. As a person you have to grab those opportunities. If you want to be heard you need to communicate. People listen and if the discussions and subjects are interesting it becomes a two way conversation and a great learning experience. As with all networking and community endeavors you need to put in the effort to reap the rewards in the form of information, insights and knowledge you can leverage for both your own needs as well as for those in your network. That means speaking your mind. Being honest and open, even if at times you’re wrong. That’s learning. That, to me, is what being in the DELL TechCenter Rock StarDELL TechCenter Rock Star program is all about.

Learning, growing, sharing. That and a sustained effort in your own development slowly but surely makes you an “expert”. An expert that realizes all to well how much he doesn’t known & cannot possible all learn.  Luckily, to help deal with that fact, you can turn to the community.