SMB Direct over RoCE Demo – Hosts & Switches Configuration Example

As mentioned in Where SMB Direct, RoCE, RDMA & DCB fit into the stack this post’s only function is to give you an overview of the configurations used in the demo blogs/videos. First we’ll configure one Windows Server 2012 R2 host. I hope it’s clear this needs to be done on ALL hosts involved. The NICs we’re configuring are the 2 RDMA capable 10GbE NICs we’ll use for CSV traffic, live migration and our simulated backup traffic. These are Mellanox ConnectX-3 RoCE cards we hook up to a DCB capable switch. The commands needed are below and the explanation is in the comments. Do note that the choice of the 2 policies, priorities and minimum bandwidths are for this demo. It will depend on your environment what’s needed.

#Install DCB on the hosts
Install-WindowsFeature Data-Center-Bridging
#Mellanox/Windows RoCE drivers don't support DCBx (yet?), disable it.
Set-NetQosDcbxSetting -Willing $False
#Make sure RDMA is enable on the NIC (should be by default)
Enable-NetAdapterRdma –Name RDMA-NIC1
Enable-NetAdapterRdma –Name RDMA-NIC2
#Start with a clean slate
Remove-NetQosTrafficClass -confirm:$False
Remove-NetQosPolicy -confirm:$False

#Tag the RDMA NIC with the VLAN chosen for PFC network
Set-NetAdapterAdvancedProperty -Name "RDMA-NIC-1" -RegistryKeyword "VlanID" -RegistryValue 110
Set-NetAdapterAdvancedProperty -Name "RDMA-NIC-2" -RegistryKeyword "VlanID" -RegistryValue 120

#SMB Direct traffic to port 445 is tagged with priority 4
New-NetQosPolicy "SMBDIRECT" -netDirectPortMatchCondition 445 -PriorityValue8021Action 4
#Anything else goes into the "default" bucket with priority tag 1 🙂
New-NetQosPolicy "DEFAULT" -default  -PriorityValue8021Action 1

#Enable PFC (lossless) on the priority of the SMB Direct traffic.
Enable-NetQosFlowControl -Priority 4
#Disable PFC on the other traffic (TCP/IP, we don't need that to be lossless)
Disable-NetQosFlowControl 0,1,2,3,5,6,7

#Enable QoS on the RDMA interface
Enable-NetAdapterQos -InterfaceAlias "RDMA-NIC1"
Enable-NetAdapterQos -InterfaceAlias "RDMA-NIC2"

#Set the minimum bandwidth for SMB Direct traffic to 90% (ETS, optional)
#No need to do this for the other priorities as all those not configured
#explicitly goes in to default with the remaining bandwith.
New-NetQoSTrafficClass "SMBDirect" -Priority 4 -Bandwidth 90 -Algorithm ETS

We also show you in general how to setup the switch. Don’t sweat the exact syntax and way of getting it done. It differs between switch vendors and models (we used DELL Force10 S4810 and PowerConnect 8100 / N4000 series switches), it’s all very alike and yet very specific. The important thing is that you see how what you do on the switches maps to what you did on the hosts.

!Disable 802.3x flow control (global pause)- doesn't mix with DCB/PFC
workinghardinit#configure
workinghardinit(conf)#interface range tengigabitethernet 0/0 -47 
workinghardinit(conf-if-range-te-0/0-47)#no flowcontrol rx on tx on
workinghardinit(conf-if-range-te-0/0-47)# exit
workinghardinit(conf)# interface range fortyGigE 0/48 , fortyGigE 0/52
workinghardinit(conf-if-range-fo-0/48-52)#no flowcontrol rx on tx off
workinghardinit(conf-if-range-fo-0/48-52)#exit

!Enable DCB & Configure VLANs
workinghardinit(conf)#service-class dynamic dot1p
workinghardinit(conf)#dcb enable
workinghardinit(conf)#exit
workinghardinit#copy running-config startup-config
workinghardinit#reload

!We use a <> VLAN per subnet
workinghardinit#configure
workinghardinit(conf)#interface vlan 110
workinghardinit (conf-if-vl-vlan-id*)#tagged tengigabitethernet 0/0-47
workinghardinit (conf-if-vl-vlan-id*)#tagged port-channel 3
workinghardinit(conf)#interface vlan 120
workinghardinit (conf-if-vl-vlan-id*)#tagged tengigabitethernet 0/0-47
workinghardinit (conf-if-vl-vlan-id*)#tagged port-channel 3
workinghardinit (conf-if-vl-vlan-id*)#exit


!Create & configure DCB Map Policy
workinghardinit(conf)#dcb-map SMBDIRECT
workinghardinit(conf-dcbmap-profile-name*)#priority-group 0 bandwidth 90 pfc on 
workinghardinit(conf-dcbmap-profile-name*)#priority-group 1 bandwidth 10 pfc off 
workinghardinit(conf-dcbmap-profile-name*)#priority-pgid 1 1 1 1 0 1 1 1
workinghardinit(conf-dcb-profile-name*)#exit 

!Apply DCB map to the switch ports & uplinks
workinghardinit(conf)#interface range ten 0/047
workinghardinit(conf-if-range-te-0/0-47)# dcb-map SMBDIRECT 
workinghardinit(conf-if-range-te-0/0-47)#exit
workinghardinit(conf)#interface range fortyGigE 0/48 , fortyGigE 0/52
workinghardinit(conf-if-range-fo-0/48,fo-0/52)# dcb-map SMBDIRECT
workinghardinit(conf-if-range-fo-0/48,fo-0/52)#exit
workinghardinit(conf)#exit
workinghardinit#copy running-config startup-config 

With the hosts and the switches configured we’re ready for the demos in the next two blog posts. We’ll show Priority Flow Control (PFC) and Enhanced Transmission Selection (ETS) in action with some tips on how to test this yourselves.

10 thoughts on “SMB Direct over RoCE Demo – Hosts & Switches Configuration Example

  1. Is there a typo on this line sir? :

    Disable-NetQosFlowControl 0,1,2,>>>4<<<,5,6,7

    I presume you meant to skip 4 in the sequence, not 3, as you enable PFC for class 4 using the command on the previous line.

    Have I got that right?

    Cheers!

  2. Hi, I have two Force10 S4810p switches. And I’m about to configure these with DCB. You tagged the vlans on the Ethernet ports and port-channel 3. What is the port-channel 3 for? I’m a neewbi at networking :). I have stacked my two switches, fortyGig 52,60. Is that a good approach?

    thanks,

    • The port channel is just an illustration that if you need to connect to other switches you must also configure DCB/PFC for the uplinks and as such tag those with the appropriate VLAN as well. Stacking will work but since you have S4810 switched I’d go for VLT (separate control planes)as that will prevent down time during switch firmware upgrades which you will have with stacking (single control plane). Where possible for redundant switches and paths I prefer MLAG (VLT) when possible/available for this amongst other reasons. Good luck!

      • Thanks, for now we have to play with stacking I’m afraid. Can you take the cluster network as well on the RDMA card with these settings?

        I’m Reading Another post that describes almost the same settings as you posted but with som differences:
        #SMB Direct traffic to port 445 is tagged with priority 3
        New-NetQosPolicy “SMBDIRECT” -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3
        New-NetQosPolicy “DEFAULT” -Default -PriorityValue8021Action 1
        #New-NetQosPolicy “TCP” -IPProtocolMatchCondition TCP -PriorityValue8021Action 1
        #New-NetQosPolicy “UDP” -IPProtocolMatchCondition UDP -PriorityValue8021Action 1

        #Enable PFC (lossless) on the priority of the SMB Direct traffic.
        Enable-NetQosFlowControl -Priority 3
        Disable-NetQosFlowControl 0,1,2,4,5,6,7

        Is this is a more suitble config if I’m going to run cluster network as well? Can you please let me understand then how this Qos settings will look like in the switch config for DCB map policy settings? Most grateful!

        • Sure, I would not make it losses. Some of the examples out there do that but it’s not needed, some of them even tagged it with the same value as the RoCE traffic. The example you have seems fine. On the switch just add it to the second or a 3rd priority group. Now this example catches just about any traffic (UDP/TCP not in SMBDIRECT and DEFAULT), in reality that will be just about anything everything, and gives it the same priority as default, so unless you want to do something different with it ETS wise (by using a different PG) it could be not much use. Your mileage may vary.

  3. Did you mean for this line

    workinghardinit(conf-if-range-fo-0/48-52)#no flowcontrol rx on tx off

    to be

    workinghardinit(conf-if-range-fo-0/48-52)#no flowcontrol rx on tx on
    ?

    • default is off off but i this use case we just disable Flow control completely on all ports and uplinks, so not very important what it is set to.

      • I’m new to these switches. I didn’t realize that setting “no” before the “flow control” would disable it entirely and thus make the rx/tx on/off settings irrelevant. So really one could do any of the 4 cominations:

        no flowcontrol rx on tx on
        no flowcontrol rx on tx off
        no flowcontrol rx off tx on
        no flowcontrol rx off tx off

        “no flowcontrol rx on tx on” feels like it conveys the intent more…

  4. I’ve got another one for you. Here is a scenario that concerns me and I was wondering if you would have any insight on it. Let’s say we have two switches; Switch A and Switch B. For simplicity, let’s say we have a LAG between them on a single 40gbe port. Now let’s say we have Server 1 connected to Switch B over a single 10gbe port. We also have Server 2 on Switch B on a 10gbe port and Server 3 on Switch A on a 10gbe port. All three servers are using priority 3. We have other servers connected that will also generate traffic with priority 3, the exact number of servers not being relevant. All ports and the LAG are configured for PFC on priority 3. There is no end to end flow control, pause frames are only sent between adjacent ports. Let’s say Server 2 and Server 3 are passing traffic to each other. This traffic would traverse the LAG. Now let’s say Server 1 suddenly gets bombarded with traffic from a ton of other servers. Server 1 can’t keep up with the traffic so it sends a pause out to Switch B. Switch B receives the pause and stops sending traffic to Server 1. At some point, Server 1 will tell Switch B to send data again and then tell it to pause again and so on. While this is happening, the servers sending data to Server 1 will continue to send traffic and will not be paused as pause frames are only sent between adjacent ports. At some point the buffers on Switch B will start to fill and it will then send pause frames out on the ports of the servers that are generating traffic destined to Server 1. This will also include the LAG. Now if Switch B sends a pause frame to Switch A for priority 3, won’t this interfere with Server 2 and Server 3’s communications? So essentially, the LAG will be paused when Server 1’s 10gbe port gets saturated and can’t keep up. Thus, reducing the 40gbe LAG down closer to 10gbe if one server is overwhelmed and it has traffic that traverses the LAG. Am I off here?

    The reason I’m asking this is because I’m setting up Microsoft Storage Spaces Direct and I’m trying to figure out the best way to setup the network. I’ve got 2 10gbe cards in each server and each card has 2 ports. So 4 total 10gbe ports per server. One port on one card cards to one switch. The other port on the other card goes to the other switch. Same setup for the 2nd card. Now the question is does one setup all four ports on the same subnet? S2D and clustering would use simplified SMB multichannel and this would seem to present the most amount of redundancy… but what would stop traffic from traversing the LAG and potentially slowing down? Or should one setup two separate subnets, one for each switch so traffic stays on the respective switch? I suppose there is also the option of four separate subnets. Thoughts?

Leave a Reply