DELL PowerConnect 8024F Is Now Stackable

A colleague pointed me the latest firmware update (4.2.0.4) for the DELL PowerConnect 8024F switches. As I was reading the release notes one item in particular caught my attention. The PowerConnect 8024/8024f/M8024-k switches are now stackable. You can put up to 6 switches in one stack using the regular front ports (SFP+). You might remember form a previous blog post on 10Gbps, Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4), where I mentioned that we got a great deal on those switches. I also mentioned that the only thing lacking in these switches and what would make this the best 10Gbps switch when comparing value for money is the ability to stack them. I quote myself:

“They could make that 8024F an unbeatable price/quality deal if they would make them stackable.”

I’ve been called visionary before but I won’t go into that that insider joke right now Winking smile. Now it’s for sure that not just my little blog post that made this update happen but it is a nice New Year’s gift. More features & options with hardware you already own is always nice. So I guess a lot of people have made the same observation, both customers & DELL themselves. You could just “smell” by the available command & configuration that this switch could be made stackable and they did.

Is Ethernet based stacking perfect? No (there is very little perfection in this world). The biggest drawback, if you need that feature,  is the fact that you can hot plug the stacking links. But for all other practical purposes it’s a nice deal. Why? Well, now that these switches supports Ethernet based stacking you will be able to choose more types of NIC Teaming to use for your servers. That means those teaming configurations that are dependent on stacking, such as for active-active NIC Teaming across two switches to be more precise. I find this pretty good news.

You all know I’m very enthusiastic to use the NIC Teaming build into Windows 8 and I will use it where and when I can. But there will be for many years to come a lot of Windows 2008 R2 systems to support and install. So it’s always good to see your hardware vendors improving their gear to give you more options. For the pricing I got on the 8024F in the last project and the needs of the solution we could deal with not being able to stack. Stacking via Ethernet using other switches was way more expensive, not even to mention the ones using stacking module ones (real premium pricing). So we got the best deal for our needs.

For 10Gbps switches stacking over Ethernet give you up to 80Gbps with a maximum of 8 uplinks so bandwidth is not as much a concern. With 1Gbps switches it is, which makes stacking modules the only way to go there I think. If you need massive bandwidth and you probably do. The drawback, as with all forms of inter switch links (a LAG for example) is that this method means you’re losing ports for other purposes. But you need to look at your needs and do the math. I think buying with investment protection is good but don’t always buy in preparation for the time you’ll become a fortune 500 company. That takes a while and in the mean while you’ll be very well served anyway.

Another related feature that’s new is Nonstop Forwarding (NSF). NSF allows the forwarding plane of stack units to continue forwarding packets even while the control and management planes restart. This could be a power failure, some hardware of software error or even an upgrade. This feature is common to all stackable switches as far as I know and is needed. Not that ‘m saying the redundant loop in stack is bad or overkill, far from it, but that takes care of other scenarios that NSF is designed to handle.

My Recommendations For Our IT Pro at TechDays 2012 Belgium

I’m very busy with storage at the moment and as I’m already doing some other conferences this year we’re giving some other members of our IT the opportunity to attend TechDays 2012. That doesn’t stop me from giving some hints to the junior but very smart & fast learning IT Pro that’s in our delegation (4 developers & 1 IT Pro) and I might as well share those hints with you. The idea of sending him to TechDays is to expose him to a variety of subjects that are related to current and future needs/trends in the industry and in our line of business. It’s not just focused on training. I want him to look at the bigger picture of all parts in the puzzle. It’s about getting some context and tasting the possibilities. It’s also good to see some of our local IT talent (like Mike Resseler and Kurt Roggen) in action. Naturally I leave it up to him to make his own choices, and I hope he does. So here are, in chronological order, my recommendations:

February 14th

10:45-12:00
Monitoring and Operating a Private Cloud with System Center 2012

12:00-13:00 (If you want to lunch & learn)
Manage VM’s and Services across Private Clouds and Windows Azure with System …

13:00-14:15
Take the Spaghetti out of Windows Azure – an insight for IT Pro Techies Part 1 (John Craddock)

14:30-15:45
Take the Spaghetti out of Windows Azure – an insight for IT Pro Techies Part 2 (John Craddock)

16:15-17:30
System Center Virtual Machine Manager 2012, Fabric Management, creation and consumption of the cloud (Vijay Tewari)

17:45-19:00
Windows 8 Dynamic Access Control (John Craddock)

February 15th

09:00-10:15
Windows 8 Hyper: Availability (Bryon Surace)

10:45-12:00
Discover what’s new in Windows 8 Active Directory (Paul Loonen)

13:00-14:15
The Private Cloud, Principles, Patterns and Concepts (Tom Shinder)

14:30-15:45
Toolmaking for Administrators using Windows PowerShell (Jason Helmick)

16:15-17:30
Windows 8 Disk Dedupe (Speaker: Mike Resseler)

17:45-18:45
What’s new in PowerShell V3! (Jason Helmick)

February 16th

09:00-10:15
Private Cloud Day Session 1- Building your Private Cloud Infrastructure (Kurt Roggen)

10:45-12:00
Private Cloud Day Session 2- Creating & Configure your Private Cloud (Kurt Roggen)

13:00-14:15
Private Cloud Day Session 3- Monitor & Operate your Private Cloud (Mike Resseler)

14:30-15:45
Private Cloud Day Session 4- Automating & Delivering Services in your Private Cloud (Mike Resseler & Kurt Roggen)

16:15-17:30
Private Cloud Day Session 5- A Solution for Private Cloud Security (Tom Shinder)

KB 2636573: Guest Crashes with Win2008R2 RTM/SP1 STOP 0xD1 in storvsc!StorChannelVmbusCallback During Live Migration

The BSOD

I helped hunt down this bug and tested the private fix. Some months ago, during the summer of 2011, I was putting some new Hyper-V clusters under stress tests. You know, letting it work very hard for a longer period of time to see if anything falls off or goes “boink". It all looked pretty robust and and after some tweaking also very fast. Just when you’re about to declare “we’re all set” here you see a BSOD on one of the guests that’s being live migrated happily announcing: “DRIVER_IRQL_NOT_LESS_OR_EQUALSTOP: 0x000000D1 (0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000)”

image

Now that doesn’t make ME very happy however. So I investigate to see if there are any more VMs dropping dead during live migration but we don’t see any. Known issues like out of date versions of the integration tools or the like are not in play nor are any other possible suspects.

We throw the MEMORY.DMP file in the debugger and we come up with the following culprit:

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)

The driver probably at fault is storvsc.sys

Probably caused by : storvsc.sys ( storvsc!StorChannelVmbusCallback+2b8 )

Hmmmmmmm, we start searching the internet but we don’t find much. We also throw it on to Twitter to see if the community comes up with something. Meanwhile we keep looking and find this little blog post by a Microsoft support engineer Rob Scheepens:

http://blogs.technet.com/b/dip/archive/2011/10/21/win2008r2-rtm-stop-0xd1-in-storvsc-storchannelvmbuscallback-0x2b5.aspx

We pinged Rob and opened a case with MS support. That evening Hans Vredevoort (www.hyper-v.nu), who saw my tweet, mails me with the details of a fellow MVP in the USA having this same issue. We get in contact an via both Microsoft & the Hyper-V community we start hunting the cause of this bug. The progress on this issue can be read at the Microsoft blog above. You’ll notice that the fix is in the works now.

Hunting down the STOP error

What did we establish:

  • It only happens occasionally with a live migration and it rather ad hoc, not every time, not after X amount of live migrations or X amount of up time.
  • It seems sometimes to happens only with guests running dynamically expanding VHDs attached to ISCSI controllers in Hyper-V.  But that’s not really the case as I remember one being with  fixed VHD attached to an SCSI controller. In our case the VMs we could reproduce the issue with in a reasonable time were all SQL Server test and development guests running SQL Server 2008 where the dynamically expanding disks are used as “poor man’s thin provisioning”.
  • I have not heard of this on Windows 2008 hosts, only R2, but I have not tested this.

So it’s reproducible but it takes intensive live migration activity. Meanwhile we received private instrumentation to install on both guests & hosts to collect “enriched” memory.dumps when a guest experiences a BSOD. With PowerShell we have continuous live migration running to reproduce the issue. The fact that can live migrate over 10Gbps does help Smile. Because you can get lucky but in reality needs many hundreds of live migration to reproduce it. On some machines many thousands. Not a joke but I total we did 8000 Live migrations to test the fix and we did about 12000 to reproduce the issue on several VMs with different configurations to send memory dumps to MSFT. So yes, you really need some PowerShell and having a 10Gbps Live Migration network also helps Winking smile.

All the collected MEMORY.DMP files form these live migration exercises were uploaded to Microsoft for analysis. That took a while, also because they had a boatload of live migrations to do and I don’t know if their test lab has 10 Gbps.

On Tuesday the 25th of October Microsoft contacts us with good news. They have root-caused the problem and a hotfix is in the works. You can download that here http://support.microsoft.com/kb/2636573

On Thursday 27th of October we get access to a private fix and after installing this one we’ve been running thousands of live migrations without  seeing the issue.

The public release of this hotfix is currently planned (HTP11-12) under KB2636573.

The details for the curious

Root Cause?

The root cause can be summarized as follows: “StorVSP was modifying guest memory while the VM’s virtual motherboard was being powered off.” Doing this storvsc access a NULL pointer in a memory buffer that is already freed up. The result of this is a BSOD or STOP error in the virtual machine.

Only SCSI attached VHDs

OK but why do we only see this with SCSI attach VHDs? Now the issue happens during power down of the virtual machines’ mother board because there is a disk enumeration during the shut down phase. And this enumeration only happens with SCSI disks.

Right! So the more VHDs we had attached to SCSI controllers in a virtual machine the higher the likelihood of this happening.

Why so much more likely with dynamically expanding VHDs

But still we saw this exponentially more with dynamically expanding disks. Why is that? Well it’s not that dynamically expanding disks trigger disk enumeration more than fixed disks.  However it seems that any disk expansion, which causes write delays, can lead to a timing issue that will cause the disk enumeration to hit the issue described above. So this significantly increases the risk that the STOP error will happen and it explains that the chance this will happen with fixed VHDs attached to SCSI controllers is significantly lower. This is sync with what we saw. The virtual machines with a lot of dynamic disks attach to SCSI controllers that had a lot of activity (and thus potential for expanding) is the ones where we could reproduce this the fasted.

Conclusion

It can take some time to hunt down certain bugs, especially the rare ones that only happen every now and then so occurrences are few and far between. But when you put in some effort Microsoft helps out and works on a fix. And no you don’t have to have the most expensive support contract for that to happen. As a  matter of fact this call was logged under a free support call with the TechNet Plus subscription. And as it was a bug, they return it as unused.

Full Steam Ahead With Windows 8 & Hyper-V in 2012

Some History

There have been a good number of people who’ve always used, some a lot more and some others a lot less, a bit of Microsoft bashing to gain some extra credibility or try to position other products as superior. Sometimes this addressed, at least, some real challenges and issues with Microsoft products. A lot of the time it doesn’t. I have always found this ridiculous. In the early years of this century I was told to get out of the Microsoft stack and into the LAMP stack to make sure I still had a job in a few years’ time. My reaction was to buy Inside SQL Server 2000 among other technology books Smile. The paradox is that in some cases, like some storage integrators, is that the ones doing the bashing are forgetting that their customers are often heavily invested in the Microsoft stack.

I Still Have A Job

As you might have realized already, I still have a job today. I’m very busy, building more and better environments based on Microsoft technologies. Microsoft does not get everything right. Who does? Sometimes it takes more than a few tries, sometimes they fail. But they also succeed in a lot of their endeavors.They are capable to learn, adapt and provide outstanding results with a very good support system to boot (I would dare say that you get out of that what you put into it). Given the size and nature of the company, combined with IT evolving at the speed of light, that’s not an easy task.

Today that ability translates into the upcoming release of Windows 8. Things like Hyper-V 3.0, the new storage and networking features, the improvements to clustering and the file system are the current state an evolution. A path along Windows 2000 over Windows 2003(R2), to  the milestone Windows 2008 which was improved with Windows 2008 R2. Now, Windows 8 being the next generation improves vastly on that very good and solid foundation. With Windows 8 we’ll take the next step forward in building highly scalable, highly available, feature rich a very functional solutions in a very cost effective manner. On top of that we can do more now than ever before, with less complexity and with affordable  standard hardware. If you have a bigger budget, great, Windows 8 will deliver even more and better bang for the buck if and when your hardware vendors get on the band wagon.

Windows 8 & Storage

One of the things the Windows BUILD Conference achieved is that it wanted me to buy hardware that I couldn’t get yet. Just try asking DELL or HP for RDMA support on 10Gbps and you get a bit of a vacant blank stare.

Another thing is that it made me look at our storage roadmap again. One of the few sectors in IT that are still very expensive is storage. Some of the storage vendors might start to feel a bit like a major network gear vendor. You know the one that has also seen the effects of serious competition by high quality but lower cost kit. Just think about what Storage Pools/Spaces will do for affordable, easy to use and rich storage solutions. Both with standard over the shelf available (read affordable) hardware and with modern SANs that leverage the Windows 8 features there is value. Heath my warning storage vendors. You’re struggling in the SMB market due to complexity, cost and way to much overhead and expensive services. Well it’s only going to get worse. You’ll have to come with better proposals or you’ll end up being high end / niche market players in the future. Let’s face it, if I can buy a super micro chassis with the disks of my choosing I can build my own storage solution for cheap and use Windows 8 to achieve my storage needs. Perhaps is 80/20 but hey, that’s great. It’s not that much better with more expensive solutions (vendor disks are ridiculously over priced) and the support process is sometimes a drain on your workforce’s time and motivation. And yes you paid for that. Compare this with being able to buy some spare parts on the cheap and having it all available of the shelf with the vendors. No more calls, no more bureaucratic mess for return parts, nor more IT illiterate operators to work through before you reach support that can be sub standard as well. Once you reach a certain level of hardware quality there is not that much difference any more except for price and service. Granted, some vendors are better at this then others. The really big ones often struggle getting this right.

I’ve been in this business long enough to know that all stuff breaks. SLAs are fine for lawyers and for management. CYA is part of doing business. But for the IT Pro in the field you need reliable people, gear and services.  On top of that you have to design for failure. You know things will break. So it should be a cheap, easy and fast as possible to fix while your design and architecture should cope with the effects of a failure. That’s what IT Pros need and that what’s keeps things running (not that SLA paper in the mailbox of your manager).

Show the Windows customers a bit more love than you have done in the past. Some in the storage industry tend to like to look down on the Windows OS. But guess what, it is your largest customer base. Unless you want to end up in the same niche as a very expensive personal trainer for Hollywood stars (tip: there’s not a huge job market there) you’d better adjust to new realities. A lot of them are doing that already , some of them aren’t. To those: get over it and leverage the features in Windows 8. You’ll be able to sell to a more varied public and at the high end you’ll have even better solutions to offer. Today I notice way to many storage integrators who haven’t even looked at Windows 8. It’s about time they started … really, like today. I mean how do you want to sell me storage today if you can’t answer my queries on Windows 8 & System Center 2012 support and integration? To me this is huge! I want to know about ODX, RDMA, SMI-S and yes I want you to be able to answer me how your storage deals with CSVs. You should know about the consumption of persistent ISCSI-3 reservations and a rock solid hardware VSS provider. If you can do that it creates the warm fuzzy feeling a customers need to make that leap of faith.

When I look at the network improvements in Windows 8. Things like RDMA, SMB 2.2; File Transfer Offload and what that means for file sharing and data intensive environments I’m pretty impressed. Then there is Hyper-V 3.0 and it many improvements. Only a fool would deny that it is a very good, affordable & rich hypervisor with a bright future as far as hypervisors go (they are not the goal, just a means to an end). Live Storage Migration, an extensible virtual switch, monitoring of the virtual switch, Network Virtualization, Hyper-V Replica, … it’s just too much to mention here. But hop on over to Windows 8 Hyper-V Feature Glossary by Aidan Finn. He’s got a nice list up of the new features relevant to the Hyper-V crowd. Again, we see improvements for all business sizes, from SMB to enterprise, including the ISPs and Cloud providers. Windows 8 is breaking down barriers that would interdict it’s use in various environments and scenarios. Objections based on missing features, scalability, performance or security in multi tenancy environments are being wiped of the map. If you want to see some musing on this subject just look at Group Video Interview: What is your favorite Hyper-V feature in Windows 8?.

2012 & Beyond

Hyper-V is growing. It’s already won a lot of hearts and minds of many smaller Microsoft shops but it’s also growing in the enterprise. The hybrid world is here when you look at the numbers, even if it’s not yet the case in your neck of the woods. Why? Cost versus features. Good enough is good enough. Especially when that good is rather great. On top of that the integration is top notch and it won’t cost you a fortune and save you a lot of plumbing hassle.

Basically everyone can benefit from all this. You’ll get more and better at a lesser or at least a more affordable cost. Even if you don’t use any Microsoft technologies you’ll benefit from the increased competition. So everyone can be happy.