SMB Direct with DCB, PFC, ETS … How do I know it works?!

A question that comes up over time, again and again, is how do you know SMB Direct is working. The question stems from a nagging feeling that configuring DCB is a bit of playing wizard’s apprentice and we might not completely know what we’re doing, i.e. lack of experience.

image

Many have suspected me of brewing up DCB configurations in a dark corner of the data center where no one else dares venture. But those are unsubstantiated rumors. But in coming blog posts we’ll address how to configure it end to end and we’ll show how to find out if it’s really working and how to test that.

Finding out if it really works, testing and monitoring isn’t magic. It boils down to using tools you know. Performance counters for RDMA Activity and SMB direct are natively available in Windows. Use them!The NIC vendors also provide very detailed counters, those are excellent and of great value when testing and confirming things work as they should. The latter is very important. Because after people are satisfied SMB Direct works they want to know if DCB is configured correctly. Does PFC work, are pause frames being send and received? Is it really lossless?  Does ETS really kick in when needed, do I get the minimum bandwidth I configured? These are very valid questions people struggle with. But the answer eludes many, almost like the question if the refrigerator light really goes out when you close the door.

It’s hard to do deep down in the network packets … that often requires a very specialized skillset and experience with packet analyzers etc. Nothing most of you can’t learn but often this is not a priority. But with some creativity and the performance counters on windows provided by the NIC vendors and the statistics counters on the switches you can demonstrate that both PFC & ETS doe work and kick in.

So in upcoming blogs & videos I’ll demonstrate the configuring SMB Direct over RoCE leveraging 2 parts of DCB:

  • PFC (Priority Flow Control) – mandatory for SMB Direct over RoCE
  • ETS (Enhanced Transmission Selection) – optional but I advise you to leveraged it for SMB Direct over RoCE

Actually, when doing true converged, no matter what route you go, QoS is not really optional any more.

The biggest challenge is to get people to wrap their heads around the concepts and it’s behavior. Once you do that you’ll understand how and why to configure it. It took me time and effort, there’s no way around it, but it’s well worth the effort.

Look, DCB is not 100% fully matured or perfect especially in large scale environments over > 2 or 3 hops. Frak, while I love tinkering, testing and playing with this stuff I have never been a “QoS first person”. If I can I thrown resources at the problem (CPU cycles; memory, bandwidth, …). QoS is like a gun. You only draw it when you must use it and than you’d better do it right otherwise you don’t touch it, bar for practice/training/ education. While perfection is not of this world and improvements are being worked on (ECN) it does work and deliver. How many of you had a large scale > 2 hops , > 20 switches deployment with FC, FCoE or iSCSI to worry about? So can it deliver what you need today in most scenarios? Yes! Can I fix the short comings of any random technologies? No. Can I leverage current technologies with great success despite this? Yes! So can you. There is a reason I get hired and paid. Trust me it’s not my looks, my bed side manner or charismatic appearance Winking smile.

Side note 1: I’m cannot possibly provide a switch configuration guide in a step by step fashion as the details vary by vendor, they can also be switch model/type specific and it all depends on your environment & needs. So no I cannot and will not attempt to write a bunch of these. This would be way too much work and way too expensive (time, hardware etc.), so unless I’m paid very generously to do so, you’re out of luck. It might be cheaper to hire me or to come to the free community sessions, presentations, ATE evenings and study up.

Free VEEAM Endpoint Backup Goes RTM – First Upgrade Experiences!

VEEAM Endpoint backup has gone RTM and that’s great news. I’ve been using it since the beta version with great results. I moved to the release candidate when that became available and now I’m running RTM. The version number of the RTM bits is 1.0.0.1954.

image

You can download it here and put it into action straight away!

Quick Tips & Findings

There is no supported upgrade path form the beta release. As a matter of fact the RTM version cannot read the backup files. When trying to upgrade from beta to RTM you’ll be greeted with this message:

image

Now that’s OK. You should have been on the RC already and there things are better Smile. Mind you, there’s no way to do an in place upgrade either but it can read the backups made by the RC version!

image

With a clean install (green field or after uninstalling the beta or RC version) the installation will kick off.

image

Now in the case of or RC backups we tested 2 things:

  • Can we restore the existing backups? Yes we can!

image

  • How are the backs made by the RTM version handled in regards to the already present ones. We just reconfigured the backups to the same repository and kicked of a backup. A new backup job folder was created and the backup was made there. So our DBA’s great self service SQL Server backup offloading repository made with the RC candidate is still available for restores while RTM backups to it’s own new folder.

image

Well there you go, VEEAM Endpoint Backup just got launched in production. We still have to wait for the production ready update for integration with VEEAM Backup & Replication v8 but that will arrive soon enough. The future looks bright.

Windows Server Technical Preview and Hyper-V Server Technical Preview Expiration Extension

Great news for all those of us that are running Windows Server Technical Preview v1 in their labs. It was due to expire on April 15th but Microsoft announced they were working on a fix to extend that deadline. They did not mention an ETA for it bit it’s here now, see http://www.microsoft.com/en-us/download/details.aspx?id=46447

image

So download, install, reboot and you’re good to go until we get our hands on Technical Preview v2! We’ve been saved by the cavalry and life is good!

Changing the segment size of a virtual disk in DELL PowerVault MD Storage Series

It happens to the best of us, sometime we selected the wrong option during deployment and or configuration of our original virtual disks. Or, even with the best of planning, the realities and use cases of your storage change so the original choice might not be the most optimal one. Luckily on a DELL MD PowerVault storage device, you do not need to delete the virtual disk or disks and lose your data to reconfigure the segment’s size. Even better is that you can do this online as a background process., which is a must at it can take a very long time and it would cause prohibitively long downtime if you had to take the data offline for that amount of time.

image

You have some control over the speed at which this happens via the priority setting but do realize that this takes a (very) long time. Due to the fact it’s a background process you can keep working. I have noticed little to no impact on performance but your mileage may vary.

image

image

How long does it take? Hard to predict. This is a screenshot of two 50TB virtual disks where the segment size is being adjusted online…

image

You cannot always go to the desired segment size in one step. Sometimes you have only an intermediate size available. This is the case in the example below.

image

The trick is to first move to that segment size and then repeat the process to reach the size you require.  In this case, we’ll first move to 256 KB and then to 512 KB segment size. So this again takes a long time. But again, it all happens online.

In conclusion, it’s great to have this capability. When you need to change the size when there is already data on the PowerVault virtual disks you have the ability to do so online while the data remains available. That this can require multiple steps and take a long time is not a huge deal. You kick it off and let it run. No need to sit there and watch it.