SMB Direct over RoCE Demo – Hosts & Switches Configuration Example

As mentioned in Where SMB Direct, RoCE, RDMA & DCB fit into the stack this post’s only function is to give you an overview of the configurations used in the demo blogs/videos. First we’ll configure one Windows Server 2012 R2 host. I hope it’s clear this needs to be done on ALL hosts involved. The NICs we’re configuring are the 2 RDMA capable 10GbE NICs we’ll use for CSV traffic, live migration and our simulated backup traffic. These are Mellanox ConnectX-3 RoCE cards we hook up to a DCB capable switch. The commands needed are below and the explanation is in the comments. Do note that the choice of the 2 policies, priorities and minimum bandwidths are for this demo. It will depend on your environment what’s needed.

We also show you in general how to setup the switch. Don’t sweat the exact syntax and way of getting it done. It differs between switch vendors and models (we used DELL Force10 S4810 and PowerConnect 8100 / N4000 series switches), it’s all very alike and yet very specific. The important thing is that you see how what you do on the switches maps to what you did on the hosts.

With the hosts and the switches configured we’re ready for the demos in the next two blog posts. We’ll show Priority Flow Control (PFC) and Enhanced Transmission Selection (ETS) in action with some tips on how to test this yourselves.

10 thoughts on “SMB Direct over RoCE Demo – Hosts & Switches Configuration Example

  1. Is there a typo on this line sir? :

    Disable-NetQosFlowControl 0,1,2,>>>4<<<,5,6,7

    I presume you meant to skip 4 in the sequence, not 3, as you enable PFC for class 4 using the command on the previous line.

    Have I got that right?

    Cheers!

  2. Hi, I have two Force10 S4810p switches. And I’m about to configure these with DCB. You tagged the vlans on the Ethernet ports and port-channel 3. What is the port-channel 3 for? I’m a neewbi at networking :). I have stacked my two switches, fortyGig 52,60. Is that a good approach?

    thanks,

    • The port channel is just an illustration that if you need to connect to other switches you must also configure DCB/PFC for the uplinks and as such tag those with the appropriate VLAN as well. Stacking will work but since you have S4810 switched I’d go for VLT (separate control planes)as that will prevent down time during switch firmware upgrades which you will have with stacking (single control plane). Where possible for redundant switches and paths I prefer MLAG (VLT) when possible/available for this amongst other reasons. Good luck!

      • Thanks, for now we have to play with stacking I’m afraid. Can you take the cluster network as well on the RDMA card with these settings?

        I’m Reading Another post that describes almost the same settings as you posted but with som differences:
        #SMB Direct traffic to port 445 is tagged with priority 3
        New-NetQosPolicy “SMBDIRECT” -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3
        New-NetQosPolicy “DEFAULT” -Default -PriorityValue8021Action 1
        #New-NetQosPolicy “TCP” -IPProtocolMatchCondition TCP -PriorityValue8021Action 1
        #New-NetQosPolicy “UDP” -IPProtocolMatchCondition UDP -PriorityValue8021Action 1

        #Enable PFC (lossless) on the priority of the SMB Direct traffic.
        Enable-NetQosFlowControl -Priority 3
        Disable-NetQosFlowControl 0,1,2,4,5,6,7

        Is this is a more suitble config if I’m going to run cluster network as well? Can you please let me understand then how this Qos settings will look like in the switch config for DCB map policy settings? Most grateful!

        • Sure, I would not make it losses. Some of the examples out there do that but it’s not needed, some of them even tagged it with the same value as the RoCE traffic. The example you have seems fine. On the switch just add it to the second or a 3rd priority group. Now this example catches just about any traffic (UDP/TCP not in SMBDIRECT and DEFAULT), in reality that will be just about anything everything, and gives it the same priority as default, so unless you want to do something different with it ETS wise (by using a different PG) it could be not much use. Your mileage may vary.

  3. Did you mean for this line

    workinghardinit(conf-if-range-fo-0/48-52)#no flowcontrol rx on tx off

    to be

    workinghardinit(conf-if-range-fo-0/48-52)#no flowcontrol rx on tx on
    ?

    • default is off off but i this use case we just disable Flow control completely on all ports and uplinks, so not very important what it is set to.

      • I’m new to these switches. I didn’t realize that setting “no” before the “flow control” would disable it entirely and thus make the rx/tx on/off settings irrelevant. So really one could do any of the 4 cominations:

        no flowcontrol rx on tx on
        no flowcontrol rx on tx off
        no flowcontrol rx off tx on
        no flowcontrol rx off tx off

        “no flowcontrol rx on tx on” feels like it conveys the intent more…

  4. I’ve got another one for you. Here is a scenario that concerns me and I was wondering if you would have any insight on it. Let’s say we have two switches; Switch A and Switch B. For simplicity, let’s say we have a LAG between them on a single 40gbe port. Now let’s say we have Server 1 connected to Switch B over a single 10gbe port. We also have Server 2 on Switch B on a 10gbe port and Server 3 on Switch A on a 10gbe port. All three servers are using priority 3. We have other servers connected that will also generate traffic with priority 3, the exact number of servers not being relevant. All ports and the LAG are configured for PFC on priority 3. There is no end to end flow control, pause frames are only sent between adjacent ports. Let’s say Server 2 and Server 3 are passing traffic to each other. This traffic would traverse the LAG. Now let’s say Server 1 suddenly gets bombarded with traffic from a ton of other servers. Server 1 can’t keep up with the traffic so it sends a pause out to Switch B. Switch B receives the pause and stops sending traffic to Server 1. At some point, Server 1 will tell Switch B to send data again and then tell it to pause again and so on. While this is happening, the servers sending data to Server 1 will continue to send traffic and will not be paused as pause frames are only sent between adjacent ports. At some point the buffers on Switch B will start to fill and it will then send pause frames out on the ports of the servers that are generating traffic destined to Server 1. This will also include the LAG. Now if Switch B sends a pause frame to Switch A for priority 3, won’t this interfere with Server 2 and Server 3’s communications? So essentially, the LAG will be paused when Server 1’s 10gbe port gets saturated and can’t keep up. Thus, reducing the 40gbe LAG down closer to 10gbe if one server is overwhelmed and it has traffic that traverses the LAG. Am I off here?

    The reason I’m asking this is because I’m setting up Microsoft Storage Spaces Direct and I’m trying to figure out the best way to setup the network. I’ve got 2 10gbe cards in each server and each card has 2 ports. So 4 total 10gbe ports per server. One port on one card cards to one switch. The other port on the other card goes to the other switch. Same setup for the 2nd card. Now the question is does one setup all four ports on the same subnet? S2D and clustering would use simplified SMB multichannel and this would seem to present the most amount of redundancy… but what would stop traffic from traversing the LAG and potentially slowing down? Or should one setup two separate subnets, one for each switch so traffic stays on the respective switch? I suppose there is also the option of four separate subnets. Thoughts?

Leave a Reply