SFP+ and SFP28 compatibility

Introduction

As 25Gbps (SPF28) is on route to displace 10Gbps (SFP+) from its leading role as the work horse in the datacenter. That means that 10Gbps is slowly but surely becoming “the LOM option”. So it will be passing on to the role and place 1Gbps has held for many years. What extension slots are concerned we see 25Gbps cards rise tremendously in popularity. The same is happening on the switches where 25-100Gbps ports are readily available. As this transition takes place and we start working on acquiring 25Gbps or faster gear the question about SPF+ and SFP28 compatibility arises for anyone who’s involved in planning this.

SPF+ and SFP28 compatibility

Who needs 25Gbps?

When I got really deep into 10Gbps about 7 years ago I was considered a bit crazy and accused of over delivering. That was until they saw the speed of a live migration. From Windows Server 2012 and later versions that was driven home even more with shared nothing and storage live migration and SMB 3 Multichannel SMB Direct.

On top of that storage spaces and SOFS came onto the storage scene in the Microsoft Windows server ecosystem. This lead us to S2D and storage replica in Windows Server 2016 and later. This meant that the need for more bandwidth, higher throughput and low latency was ever more obvious and clear. Microsoft has a rather extensive collection of features & capabilities that leverage SMB 3 and as such can leverage RDMA.

In this time frame we also saw the strong rise of All Flash Array solutions with SSD and NVMe. Today we even see storage class memory come into the picture. All this means even bigger needs for high throughput at low latency, so the trend for ever faster Ethernet is not over yet.

What does this mean?

That means that 10Gbps is slowly but surely becoming the LOM option and is passing on to the role 1Gbps has held for many years. In our extension slots we see 25-100Gbps cards rise in popularity. The same is happening on the switches where we see 25, 50, 100Gbps or even higher. I’m not sure if 50Gbps is ever going to be as popular but 25Gbps is for sure. In any case I am not crazy but I do know how to avoid tech debt and get as much long term use out of hardware as possible.

When it comes to the optic components SFP+ is commonly used for 10Gbps. This provides a path to 40Gbps and 100Gbps via QSFP. For 25Gbps we have SFP28 (1 channel or lane for 25Gbps). This give us a path to 50Gbps (2*2*25Gbps – two lanes) and to 100Gbps (4*25Gbps – 4 lanes) via QSFP28. In the end this a lot more economical. But let’s look at SPF+ and SFP28 compatibility now.

SPF+ and SFP28 compatibility

When it comes to SPF+ and SFP28 compatibility we’re golden. SFP+ and SPF28 share the same form factor & are “compatible”. The moment I learned that SFP28 share the same form factor with SFP+ I was hopeful that they would only differ in speed. And indeed, that hope became a sigh of relief when I read and experimentally demonstrated to myself the following things I had read:

  1. I can plug in a SFP28 module into an SPF+ port
  2. I can plug in a SFP+ module into an SPF28 port
  3. Connectivity is established at the lowest common denominator, which is 10Gbps
  4. The connectivity is functional but you don’t gain the benefits SFP28 bring to the table.

Compatibility for migrations & future proofing

For a migration path that is phased over time this is great news as you don’t need to have everything in place right away from day one. I can order 25Gbps NIC in my servers now, knowing that they will work with my existing 10Gbps network. They’ll be ready to roll when I get my switches replaced 6 months or a year later. Older servers with 10Gbps SPF+ that are still in production when the new network gear arrives can keep working on new SPF28 network gear.

  • SPF+: 10Gbps
  • SFP28: 25Gbps but it can go up to 28 so the name is SFP28, not 25. Note that SFP28 can handle 25Gbps, 10Gbps and even 1Gbps.
  • QSPF28: 100Gbps to 4*25Gbps or 2*50Gbps gives you flexibility and port density.
  • 25Gbps / SPF28 is the new workhorse to deliver more bandwidth, better error control, less crosstalk and an economical sound upgrade path.

Do note that SPF+ modules will work in SPF28 ports and vice versa but you have to be a bit careful:

  • Fix the ports speed when you’re not running at the default speed
  • On SPF28 modules you might need to disable options such as forward error correction.
  • Make sure a 10Gbps switch is OK with a 25Gbps cables, it might not.

If you have all your gear from a vendor specializing in RDMA technology like Mellanox this detects this all this and takes care of everything for you. Between vendors and 3rd party cables pay extra attention to verifying all will be well.

SPF+ and SFP28 compatibility is also important for future proofing upgrade paths. When you buy and introduce new network gear it is nice to know what will work with what you already have and what will work with what you might or will have in the future. Some people will get all new network switches in at once while others might have to wait for a while before new servers with SPF28 arrive. Older servers might be around and will not force you to keep older switches around just for them.

SPF28 / QSFP28 provides flexibility

Compatibility is also important for purchase decision as you don’t need to match 25Gbps NIC ports to 25Gbps switch ports. You can use the QFSP28 cables and split them to 4 * 25Gbps SFP28.

SPF+ and SFP28 compatibility

QSFP28

The same goes for 50Gbps, which is 100Gbps QFSP to 2 * 50Gbps QFSP.

SPF+ and SFP28 compatibility

SPF+ and SFP28 compatibility

 

 

 

 

 

 

 

 

 

This means you can have switch port density and future proofing if you so desire. Some vendors offer modular switches where you can mix port types (Dell EMC Networking S6100-ON)

Conclusion

More bandwidth at less cost is a no brainer. It also makes your bean counters happy as this is achieved with less switches and cables. That also translates to less space in a datacenter, less consumption of power and less cooling. And the less material you have the less it cost in operational costs (management and maintenance). This is only offset partially by our ever-growing need for more bandwidth. As converged networking matures and becomes better that also helps with the cost. Even where economies of scale don’t matter that much. The transition to 25Gbps and higher is facilitated by SPF+ and SFP28 compatibility and that is good news for all involved.

Windows Server 2016 RDMA and the Hyper-V vSwitch – Part II

Introduction

In part I this article I demonstrated that some of the rules in regards to SMB Direct and the Hyper-V vSwitch as we know them for Windows Server 2012 R2 have changed with Windows Server 2016. We focused on the fact that you can expose RDMA to a vNIC exposed to the management OS created on a vSwitch. This means that while in Windows Server 2012 R2 you cannot expose RDMA capabilities via a vSwitch, even when you are using a non-teamed RDMA capable NIC, this is no longer true with Windows Server 2016.

While a demo with a vSwitch on a single NIC as we did in part I is nice it’s unlikely you’ll use this often if at all in the real world? Here we require redundancy and that means NIC teaming. To do so we normally use a vSwitch created on a native Windows NIC team. But a native NIC teaming does not expose RDMA capabilities. And as such a vSwitch created against a Windows native NIC team cannot leverage RDMA either. Which was the one of the reasons why a fully converged scenario in Windows Server 2012 R2 was too limited for many scenarios. Loss of RSS on the vNIC exposes to the management OS was another. The solution to this in Windows Server 2016 Hyper-V comes with Switch Embedded Teaming (SET). Now using SET in each and every situation might not be a good idea. It depends. But we do need to know how to configure it. So let’s dive in.

Switch Embedded Teaming (SET) exposes RDMA to the vSwitch

Switch Embedded Teaming (SET in Windows Server 2016 allows multiple identical (make, model, firmware, drivers to be supported) NICs to be used or “teamed” within the vSwitch itself. The important thing to note here this does not use windows NIC teaming or LBFO (Load Balancing and Fail Over).

SET is the future and is needed or use with the Network Controller and Software Defined Networking in Windows. SET can also be used without these technologies. While today it supports a good deal of the capabilities of native Windows NIC teaming it also lacks some of them. In general SET is meant for full or partial converged scenarios with 10GBps or better NICs, not 1Gbps networking in a (hyper)converged Hyper-V scenario.

Please see New Windows Server 2016 NIC and Switch Embedded Teaming User Guide for Download for more information as there is just too much to tell about it.

Setting it up

We start out with a 2-node cluster where each node has 2 RDMA NICs (Mellanox ConnectX-3) with RDMA enabled and DCB configured. Live migration of VMs between those nodes works over SMB Direct works. All NIC are on the same subnet 172.16.0.0/16 (thanks to Window Server 2016 Same Subnet Multichannel) and are on VLAN 110. In Failover Cluster Manager (FCM) that looks like below.

clip_image002

We’ll now use the rNICs to create a Switch Embedded Team.

clip_image004

Note that the teaming mode is switch independent, the only option supported with SET in Window Server 2016.

clip_image006

This also gives us a vNIC exposed to the management OS (default)

clip_image008

This is also visible as a vNIC in the mamagement OS called “vEthernet (RDMA-SET-vSwitch)”

clip_image010

This will be used to manage the host and to make its purpose clear we’ll rename it.

We’ll create 2 separate management OS vNICs for the RDMA traffic later. For now, we want the HOST-MGNT vNIC to have connectivity to the LAN and for that we need to tag it with VLAN 10.

image

The vNIC actually “inherited” the IP configuration of one of our physical NICs and we need to change that to either DHCP or a correct LAN IP address and settings.

clip_image014

You can use the code below to set the HOST-MGNT vNIC to DHCP

To finalize the HOST-MGNT vNIC configuration we enable priority tagging on it. If we don’t we won’t see any traffic other than SMB Direct tagged at all!

clip_image016

Before we go any further we’ll remove the VLAN tag from the rNICS as we don’t want it interfering with egress traffic being tagged by them or ingress traffic being filtered because it doesn’t match the VLAN ID on the rNICs.

From here on we’ll focus on the RDMA capable vNICs well create and use for SMB traffic.

We create 2 vNIC on the management OS for SMB Direct traffic.

clip_image018

Now these vNIC need an IP address, this can be in the same subnet because we have Windows Server 2016 SMB multichannel.

We than also need to put the vNICs in the correct VLAN. Remember that DCB / PFC priority tagging needs tagged VLAN so carry that priority. Right now, we can see that these are untagged.

clip_image020

So we tag them with VLAN ID 110

clip_image022

We enable jumbo frames on the vNICs. Remember that the physical NICs in the SET have jumbo frames enabled as well.

clip_image024

Normally all traffic that is originated from vNICs gets any QOS values set to zero. There is one exception to this and that’s SMB Direct traffic. SMB Direct traffic gets tagged with its QoS priority and that is not reset to 0 as it bypasses the vSwitch completely. But if we set other priorities on other types of traffic for DCB PFC and or ETS that passes over these vNICs we must enable priority tagging on these NICs as well or they’ll be stripped away.

clip_image026

The association of the vNIC to pNICs is random. This also changes during creation and destruction (disabling NICs, restarting the OS). We can map a vNCI to a particular pNIC. This prevents suboptimal use of the available pNICs and provides for a well know predictable path of the traffic. We do this with the below PowerShell commands.

Finally, last but not least, we should enable RDMA on our two vNICs or SMB Direct will not kick in at all.

Right now, we have it all configured correctly on one node of our 2-node cluster. The SMB network look like this now:

clip_image028

The cluster now looks like below.

clip_image030

We can live migrate VMs over SMB Direct in this mixed scenario where one node has pNICs RDMA NICs, 1 node has SET with vNICs for RMDA.

clip_image032

When looking at this in report mode we clearly see Node-A send SMB Direct traffic (tagged with priority 4, green) over its RDMA enabled SET vNICs to Node-B which still has a complete physical rNIC set up (blue).

clip_image034

As you can see in the screen shots above we now have RDMA / SMB Direct working with SET / RDMA vNICs on one node (Node-A) and pure physical RDMA NICs on the other (Node-B). This gives us bandwidth aggregation and redundancy. To complete the exercise, we configure SET on the other node as well. But it’s clear SET and RDAM will also work in a mixed environment.

We’ll discuss some details about certain aspects of the vNIC configuration in future articles. Things like the why and how of Set-VMNetworkAdapterTeamMapping and the use of -IeeePriorityTag. But for now, this is it. Go try it out! It’s the basis for anything you’ll do with SDNv2 in W2K16 and beyond.

You cannot connect multiple NICs to a single Hyper-V vSwitch without teaming on the host

Can you connect multiple NICs to a single Hyper-V vSwitch without teaming on the host

Recently I got a question on whether a Hyper-V virtual switch can be connected to multiple NICs without teaming. The answer is no. You cannot connect multiple NICs to a single Hyper-V vSwitch without teaming on the host.

This question makes sense as many people are interested in the ease of use and the great results of SMB Multichannel when it comes to aggregation and redundancy. But the answer lies in the name “SMB”. It’s only available for SMB traffic. Believe it or not but there is still a massive amount of network traffic that is not SMB and all that traffic has to pass through the Hyper-v vSwitch.

What can we do?

Which means that any redundant scenario that requires other traffic to be supported than SMB 3 will need to use a different solution than SMB Multichannel. Basically, this means using NIC teaming on a server. In the pre Windows Server 2012 era that meant 3rd party products. Since Windows Server 2012 it means native LBFO (switch independent, static or LACP). In Windows Server 2016 Switch Embedded Teaming (SET) was added to your choice op options. SET only supports switch independent teaming (for now?).

If redundancy on the vSwitch is not an option you can use multiple vSwitches connected to separate NIC and physical switches with Windows native LBFO inside the guests. That works but it’s a lot of extra work and overhead so you only do this when it makes sense. One such an example is SR-IOV which isn’t exposed on top of  a LBFO team.

DELL EMC World 2017 Concludes

Today DELL EMC World 2017 ends with a dinner with DELL EMC management and engineers to discus our impressions on the information we took away from DELL EMC World 2017. I would like to thank the ever hard working Sarah Vela for making this possible. It’s much appreciated.

image

Professionally I’m blessed with multiple opportunities to attend conferences and summits. That’s where I get to talk to the skilled and passionate people who work on the technologies we work with intensively. This is very much a two way street where we learn from each other. And on many conferences I might also be a speaker or participate in advisory boards to provide feedback. Some of those latter discussions are under NDA. This is normal and I have NDA’s with other companies as well. That’s the legal side of the trust we place in each other in order to discuss evolving and future technologies.

I attend multiple events from different players. Some of these disagree with me and that is fine. We learn from being challenged. It helps us define more clearly what we design and build as well as why and how. More and more solutions become a more diverse, multi pronged combination of components with their specific capabilities at our disposal. These change fast and so do our solutions. An element not to be ignored in designing those solutions. That’s one take away from DELL EMC world that seems to have hit home. The other is that some companies are in a rather dire IT condition due to years of stand still.

I’m happy to see that today and tomorrow DELL EMC has the technologies needed for us to deliver modern IT solutions. The way in which we choose to do so is our choice and DELL EMC states it is committed to supporting that. As a testimonial to that we got to see the the DELL EMC Storage Spaces Direct Ready nodes based on the soon to be available generation 14 PowerEdge servers.

R740-400x239

That is how we worked for many years with DELL and we have been assured we can continue to work with DELL EMC. That what Michael Dell committed to and I have seen them deliver on that promise for many years. For me that’s enough to be confident in that until proven different. Even if that message was sometimes brought in a way that made me think Las Vegas had gotten the better of some conference managers. But let’s not get the form in the way of the content.

On a final note, Dell EMC is not anti public cloud or pro on-premises. That’s how it should be and that how we deliver IT. We use the tools at our disposal to build the best possible solutions we can. What we use depends on the needs and changes as technology evolves. That’s OK. Saying you need hardware doesn’t make you a cloud hater or vice versa. The world is not that simple.