The Dilbert® Life Series: Mental Hygiene Is Counter Productive

There are times that IT people need to vent. Usually they do that amongst their peers. Sometimes they disagree with each other and they express that. Why? Well most of them are straight shooters, not politicians or diplomats.  Now don’t get me wrong. I do understand the benefits of politics & diplomacy and I most definitely see the need for it. They can achieve things more often than conflict or direct orders can. Mainly because they make the people think it was their own decision and/or choice. The drawback with politics is that it takes time and in some situations, unfortunately, you don’t have that luxury. Don’t forget IT Pros work in sometimes rather stressful crisis situations. The bad part about politics is that it can also be perceived as “shady dealings” but this is actually not true. This is a negative connotation due to the often very poor quality of politicians. But I digress. I actually love diplomacy. It’s the process that delivers me either the desired result or buys me enough time to for my sniper to get the range . Either way, politics and diplomacy gets the job done, when you fulfill one prerequisite and that is to have professional diplomats around. As you might have already guessed, that wouldn’t exactly be me Winking smile. Politics however is not the same as “political correctness” run amok. Don’t be afraid of people speaking their mind. Don’t let the fear of others hearing some strong language or an unpopular issue being discussed guide you. That alone will not kill a reputation or wreck a well-oiled team.

Reputations have a major flaw. They take a life time to build and only a second to destroy. Are you telling me your approach to protecting a reputation is making sure no one ever hears a bad word out of the mouth of an employee who’s ranting to blow off steam? Guess what? You’re doomed to fail. Don’t we need to protect people from being offended? Yes, but don’t take it to far. Chances are that the offence is both ways. So don’t restrict free speech & open communication too much. But perception is reality right? Good lord, get a grip and grow a pair. People need to vent, express themselves and be allowed to do that in an not overly politically correct way amongst their peers. These people are in the trenches together, they deal with all the shit and stress. They shouldn’t be worried or stressed about using the proper diplomatic approach to everything they say. Political correctness can be taken too far. It makes for a very hypocritical, bottled up with frustration, unhealthy work environment. Amongst comrades you need not have to worry about that. And for crying out loud, I really do hope that humanities only hope for decent behavior is the fact that things are forbidden or regulated.

One shouldn’t judge IT managers or team leaders by the fact none of their team members ever curses or vents. Let alone some silly dress code. No, that T-shirts saying “You’ve read my T-shirt. That’s enough social interaction for one day” will not ruin professional relationships. Acting on those things remind me of micro managers. Meaning they focus on small issues for all kinds of reasons, non of which have anything to do with them being good managers. Do you want to know what you IT teams are worth? Look at the members. Do they stick up for each other? Are they not afraid to stand up and speak up about issues that are “threatening” one of them or their boss? Do they get the job done? No I don’t mean that they wear a tie, are in the office at 08:30 or never ever vent, I mean do they get the job done. Even at night, during those wee hours of the morning when needed or just even when is more convenient for the business? That should tell you a lot. That’s their PR without the glossy brochures.

Next to that it also has some other negatives associated with it.

  • First of all you lose your eyes and ears. Trust me, your IT people are your boots on the ground. They see, hear & know a lot as they deal with the entire organization. No matter how many tests, technology and reports you got at your disposal your people are a very valuable resource of information on what’s going on in the company. IT  as an bio indicator so to speak. From problems with vendors, storage issues, dysfunctional project managers to insane analysis and architects who’ve become a bit to enamored with the esoteric part of their job. In other words, if you want to know what going on let your IT staff speak their minds without fear. Create an environment where they can do that. Otherwise they’ll shut up even when they better open their mouths.
  • You’re flushing the morale of your troops down the drain. When people feel frustrated they need to vent, not be censored. That leads to unhappy employees and instead of having “undesired” verbal statements about a situation you’ll be hearing some very unsettling complaints about your stupid company. You might not like those either but you’d better listen and learn from them instead of saying that such talk “ist verboten”.
  • Don’t block the vents on a steam engine. They are there for a very important reason. Their proper functioning is to assure that the pressure doesn’t build up to high, thereby preventing the engine from blowing up. Same thing here, speaking their minds relieves pressure , stress and prevents frustrations. That’s a good thing as human beings under high pressure tend not to become diamonds even if they are bio carbon life forms. Chances are they’ll explode out of proportion when it really shouldn’t happen. A bit counter productive don’t you think?

Now this doesn’t mean you should stand for an all-out negative culture where all is piss and vinegar. Some venting is good, being a full time complaining sourpuss is not. Lead by example. By all means avoid e-mailing vents and frustrations. Words are volatile and dissipate. E-mail is very persistent. Maintain professional courtesy whenever possible.  While I think that respect needs to be earned, politeness and correctness can and should indeed be given. It goes along way when dealing with people. And the beauty is that by allowing people to vent and speak their minds you help achieve this. All you have to to do is maintain balance and don’t let the morale and the culture go south. So forget about dress codes, punch clocks, “mental hygiene” measures. They indicate another much worse problem. Management failure. Sure you can blame the issues on that T-shirt or someone’s venting. Perhaps you can even fool yourself into believing it. Perhaps it even helps you sleep at night. But it sure will not help you improve your business. For that you’ll need to put the good managers, diplomats & politicians in the right place instead of trying to rely on never needing those particular skills.

Assigning Large Memory To Virtual Machine Fails: Event ID 3320 & 3050

We had a kind reminder recently that we shouldn’t forget to complete all steps in a Hyper-V cluster node upgrade process. The proof of a plan lies in the execution Smile. We needed to configure a virtual machine with a whooping 50GB of memory for an experiment. No sweat, we have plenty of memory in those new cluster nodes. But when trying to do so it failed with a rather obscure error in System Center Virtual Machine Manager 2008 R2

Error (12711)

VMM cannot complete the WMI operation on server hypervhost01.lab.test because of error: [MSCluster_Resource.Name="Virtual Machine MYSERVER"] The group or resource is not in the correct state to perform the requested operation.

(The group or resource is not in the correct state to perform the requested operation (0x139F))

Recommended Action

Resolve the issue and then try the operation again.

image

One option we considered was that SCVMM2008R2 didn’t want to assign that much memory as one of the old host was still a member of the cluster and “only” has 48GB of RAM. But nothing that advanced was going on here. Looking at the logs found the culprit pretty fast: lack of disk space.

We saw following errors in the Microsoft-Windows-Hyper-V-Worker-Admin event log:

Log Name:      Microsoft-Windows-Hyper-V-Worker-Admin
Source:        Microsoft-Windows-Hyper-V-Worker
Date:          17/08/2011 10:30:36
Event ID:      3050
Task Category: None
Level:         Error
Keywords:     
User:          NETWORK SERVICE
Computer:      hypervhost01.lab.test
Description:
‘MYSERVER’ could not initialize memory: There is not enough space on the disk. (0x80070070). (Virtual machine ID DEDEFFD1-7A32-4654-835D-ACE32EEB60EE)

Log Name:      Microsoft-Windows-Hyper-V-Worker-Admin
Source:        Microsoft-Windows-Hyper-V-Worker
Date:          17/08/2011 10:30:36
Event ID:      3320
Task Category: None
Level:         Error
Keywords:     
User:          NETWORK SERVICE
Computer:      hypervhost01.lab.test
Description:
‘MYSERVER’ failed to create memory contents file ‘C:ClusterStorageVolume1MYSERVERVirtual MachinesDEDEFFD1-7A32-4654-835D-ACE32EEB60EEDEDEFFD1-7A32-4654-835D-ACE32EEB60EE.bin’ of size 50003 MB. (Virtual machine ID DEDEFFD1-7A32-4654-835D-ACE32EEB60EE)

Sure enough a smaller amount of memory, 40GB, less than the remaining disk space on the CSV did work. That made me remember we still needed to expand the LUNS on the SAN to provide for the storage space to store the large BIN files associated with these kinds of large memory configurations. Can you say "luxury problems"? The BIN file contains the memory of a virtual machine or snapshot that is in a saved state. Now you need to know that the BIN file actually requires the same disk space as the amount of physical memory assigned to a virtual machine. That means it can require a lot of room. Under "normal" conditions these don’t get this big and we provide a reasonable buffer of free space on the LUNS anyway for performance reasons, growth etc. But this was a bit more than that buffer could master.

As it was stated in the planning that we needed to expand the LUNS a bit to be able to deal with this kind of memory hogs this meant that the storage to do so was available and the LUN wasn’t maxed out yet. If not, we would have been in a bit of a pickle.

So there you go a real life example of what Aidan Finn warns about when using dynamic memory. Also see KB 2504962 “Dynamic Memory allocation in a Virtual Machine does not change although there is available memory on the host” which discusses the scenario where dynamic memory allocation seems not to work due to lack of disk space. Don’t forget about your disk space requirements for the bin files when using virtual machines with this much memory assigned. They tend to consume considerable chunks of your storage space. And even if you don’t forget about it in your planning, please don’t forget the execute every step of the plan Winking smile

Introducing 10Gbps & Integrating It into Your Network Infrastructure (Part 4/4)

This is a 4th post in a series of 4. Here’s a list of all parts:

  1. Introducing 10Gbps Networking In Your Hyper-V Failover Cluster Environment (Part 1/4)
  2. Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4)
  3. Introducing 10Gbps & Thoughts On Network High Availability For Hyper-V (Part 3/4)
  4. Introducing 10Gbps & Integrating It Into Your Network Infrastructure (Part 4/4)

In my blog post “Introducing 10Gbps & Thoughts On Network High Availability For Hyper-V (Part 3/4)” in a series of thoughts on 10Gbps and Hyper-V networking a discussion on NIC teaming brought up the subject of 10Gbps for virtual machine networks. This means our switches will probably no longer exist in isolation unless those virtual Machines don’t ever need to talk to anything outside what’s connected to those switches. This is very unlikely. That means we need to start thinking and talking about integrating the 10Gbps switches in our network infrastructure. So we’re entering the network engineers their turf again and we’ll need to address some of their concerns. But this is not bad news as they’ll help us prevent some bad scenarios.

Optimizing the use of your 10Gbps switches

As not everyone runs clusters big enough, or enough smaller clusters, to warrant an isolated network approach for just cluster networking. As a result you might want to put some of the remaining 10Gbps ports to work for virtual machine traffic. We’ve already pointed out that your virtual machines will not only want to talk amongst themselves (it’s a cluster and private/internal networks tend to defeat the purpose of a cluster, it just doesn’t make any sense as than they are limited within a node) but need to talk to other servers on the network, both physical and virtual ones. So you have to hook up your 10Gbps switches from the previous example to the rest of the network. Now there are some scenarios where you can keep the virtual machine networks isolated as well within a cluster. In your POC lab for example where you are running a small 100% virtualized test domain on a cluster in a separate management domain but these are not the predominant use case.

But you don’t only have to have to integrate with the rest of your network, you may very well want to! You’ve seen 10Gbps in action for CSV and Live Migration and you got a taste for 10Gbps now, you’re hooked and dream of moving each and every VM network to 10Gbps as well. And while your add it your management network and such as well. This is nothing different from the time you first got hold of 1Gbps networking kit in a 100 Mbps world. Speed is addictive, once you’re hooked you crave for more Smile

How to achieve this? You could do this by replacing the existing 1Gbps switches. That takes money, no question about it. But think ahead, 10Gbps will be common place in a couple of years time (read prices will drop even more). The servers with 10Gbps LOM cards are here or will be here very soon with any major vendor. For Dell this means that the LOM NICs will be like mezzanine cards and you decide whether to plug in 10Gbps SPF+ or Ethernet jacks. When you opt to replace some current 1Gbps switches with 10gbps ones you don’t have to throw them away. What we did at one location is recuperate the 1Gbps switches for out of band remote access (ILO/DRAC cards) that in today’s servers also run at 1Gbps speeds. Their older 100Mbps switches where taken out of service. No emotional attachment here. You could also use them to give some departments or branch offices 1gbps to the desktop if they don’t have that yet.

When you have ports left over on the now isolated 10Gbps switches and you don’t have any additional hosts arriving in the near future requiring CSV & LM networking you might as well use those free ports. If you still need extra ports you can always add more 10Gbps switches. But whatever the case, this means up linking those cluster network 10Gbps switches to the rest of the network. We already mentioned in a previous post the network people might have some concerns that need to be addressed and rightly so.

Protect the Network against Loops & Storms

The last thing you want to do is bring down your entire production network with a loop and a resulting broadcast storm. You also don’t want the otherwise rather useful spanning tree protocol, locking out part of your network ruining your sweet cluster setup or have traffic specifically intended for your 10Gbps network routed over a 1Gbps network instead.

So let us discuss some of the ways in which we can prevent all these bad things from happening. Now mind you, I’m far from an expert network engineer so to all CCIE holders stumbling on to this blog post, please forgive me my prosaic network insights. Also keep in mind that this is not a networking or switch configuration course. That would lead us astray a bit too far and it is very dependent on your exact network layout, needs, brand and model of switches etc.

As discussed in blog post Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4) you need a LAG between your switches as the traffic for the VLANs serving heartbeat, CSV, Live Migration, virtual machines but now also perhaps the host management and optional backup network must flow between the switches. As long as you have only two switches that have a LAG between them or that are stacked you have not much risk of creating a loop on the network. Unless you uplink two ports directly with a network cable. Yes, that happens, I once witnessed a loop/broadcast storm caused by someone who was “tidying up” spare CAT5E cables buy plugging all ends up into free port switches. Don’t ask. Lesson learned: disable every switch port not in use.

Now once you uplink those two or more 10Gbps switches to your other switches in a redundant way you have a loop. That’s where the Spanning Tree protocol comes in. Without going into detail this prevents loops by blocking the redundant paths. If the operational path becomes unavailable a new path is established to keep network traffic flowing. There are some variations in STP. One of them is Rapid Spanning Tree Protocol (RSTP) that does the same job as STP but a lot faster. Think a couple of seconds to establish a path versus 30 seconds or so. That’s a nice improvement over the early days. Another one that is very handy is the Multiple Spanning Tree Protocol (MSTP). The sweet thing about the latter is that you have blocking per VLANs and in the case of Hyper-V or storage networks this can come in quite handy.

Think about it. Apart from preventing loops, which are very, very bad, you also like to make sure that the network traffic doesn’t travel along unnecessary long paths or over links that are not suited for its needs. Imagine the Live Migration traffic between two nodes on different 10Gbps switches travelling over the 1Gbps uplinks to the 1Gbps switches because the STP blocked the 10Gbps LAG to prevent a loop. You might be saturating the 1Gbps infrastructure and that’s not good.

I said MSTP could be very handy, let’s address this. You only need the uplink to the rest of the network for the host management and virtual machine traffic. The heartbeat, CSV and Live Migration traffic also stops flowing when the LAG between the two 10Gbps switches is blocked by the RSTP. This is because RSTP works on the LAG level for all VLANs travelling across that LAG and doesn’t discriminate between VLAN. MSTP is smarter and only blocks the required VLANS. In this case the host management and virtual machines VLANS as these are the only ones travelling across the link to the rest of the network.

We’ll illustrate this with some picture based on our previous scenarios. In this example we have the cluster network going to the 10Gbps switches over non teamed NICs. The same goes for the virtual machine traffic but the NICs are teamed, and the host management NICS. Let’s first show the normal situation.

 clip_image002

Now look at a situation where RSTP blocks the purple LAG. Please do note that if the other network switches are not 10Gbps that traffic for the virtual machines would be travelling over more hops and at 1Gbps. This should be avoided, but if it does happens, MSTP would prevent an even worse scenario. Now if you would define the VLANS for cluster network traffic on those (orange) uplink LAGs you can use RSTP with a high cost but in the event that RSTP blocks the purple LAG you’d be sending all heartbeat, CSV and Live Migration traffic over those main switches. That could saturate them. It’s your choice.

clip_image004

In the picture below MSTP saves the day providing loop free network connectivity even if spanning tree for some reasons needs to block the LAG between the two 10Gbps switches. MSTP would save your cluster network traffic connectivity as those VLAN are not defined on the orange LAG uplinks and MSTP prevents loops by blocking VLAN IDs in LAGs not by blocking entire LAGs

clip_image006

To conclude I’ll also mention a more “star like” approach in up linking switches. This has a couple of benefits especially when you use stackable switches to link up to. They provide the best bandwidth available for upstream connections and they provide good redundancy because you can uplink the lag to separate switches in the stack. There is no possibility for a loop this way and you have great performance on top. What’s not to like?

clip_image008

Well we’ve shown that each network setup has optimal, preferred network traffic paths. We can enforce these by proper LAG & STP configuration. Other, less optimal, paths can become active to provide resiliency of our network. Such a situation must be addressed as soon as possible and should be considered running on “emergency backup”. You can avoid such events except for the most extreme situations by configuring the RSTP/MSTP costs for the LAG correctly and by using multiple inter switch links in every LAG. This does not only provide for extra bandwidth but also protects against cable or port failure.

Conclusion

And there you have it, over a couple of blog posts I’ve taken you on a journey through considerations about not only using 10Gbps in your Hyper-V cluster environments, but also about cluster networks considerations on a whole. Some notes from the field so to speak. As I told you this was not a deployment or best practices guide. The major aim was to think out loud, share thoughts and ideas. There are many ways to get the job done and it all depends on your needs an existing environment. If you don’t have a network engineer on hand and you can’t do this yourself; you might be ready by now to get one of those business ready configurations for your Hyper-V clustering. Things can get pretty complex quite fast. And we haven’t even touched on storage design, management etc.. The purpose of these blog was to think about how Hyper-V clustering networks function and behave and to investigate what is possible. When you’re new to all this but need to make the jump into virtualization with both feet (and you really do) a lot help is available. Most hardware vendors have fast tracks, reference architectures that have a list of components to order to build a Hyper-V cluster and more often than not they or a partner will come set it all up for you. This reduces both risk and time to production. I hope that if you don’t have a green field scenario but want to start taking advantage of 10Gbps networking; this has given you some food for thought.

I’ll try to share some real life experiences,what improvements we actually see, with 10Gbps speeds in a future blog post.

Experts2Experts Virtualization Conference London 2011 Selling Out Fast!

It seems a lot of Hyper-V expertise is converging on London in from November 2011 for the small but brilliant Experts2Experts Virtualization Conference. I’m looking forward to learning a lot from them and listening to real world experiences of people who deal with the technologies on a daily basis. It will also be nice to meet up with a lot of on line acquaintances from the blogosphere and twitter. The conference is selling out fast. That’s due to the quality, small scale and very economic attendance fee. So if you want to meet up with and listen to expertise the likes of Aidan Finn, Jeff Wouters, Carsten Rachfal, Ronnie Isherwood and hopefully Kristian Nese have to share you’d better hurry up and register right now.

I’ll be sharing some musings on “High Performance & High availability Networks for Hyper-V Clusters”.

Perhaps we’ll meet.