I attended and presented at E2EVC 2015 in Berlin from June 12th to June 14th. The networking was a blast. No “marchitecure” bull shit or vendor fairy tales what so ever and lots of very open discussions on the realities we’re seeing and facing in virtualization and cloud. Most account managers and esoteric presales would die a painful (but fast) death in this environment.
One session was with my Hyper-V Amigo buddy Carsten Rachfahl and was pure demo extravaganza, so no slides. My own session was “SMB Direct – The Secret Decoder Ring” and was an attempt to position this technology what by looking at the why and where followed by the how by who and when.
I hope a lot of people had at least a better understanding of SMB Direct, RDMA and DCB. The second aim was to take away the fear many people have of this tech by showcasing it in short demos. Time constraints where a challenge so it was not a 200 level session.
You cannot afford to ignore SMB3 and it’s capabilities related to storage traffic such as multichannel, RDMA and encryption. SMB Direct over RoCE seems to have a bright future as it continuous to evolve and improve in Windows Server 2106. The need for DCB (PFC and optionally ETS) intimidates some people. But it should not.
All these talks are at extremely affordable community driven events to make sure you can attend. The sessions are given by speakers who do this for the community (speakers and attendees do this in their own time and pay for their our own travel/expenses) and who work with these technology in real life and provide feedback to vendors on the issues or opportunities we see. This makes the sessions very interesting and anything but marketing, slide ware or sales pitches. See you there!
I recently had to trouble shoot a Windows Server 2012 R2 Hyper-V cluster where SMB Direct is leveraged for live migration. It seemed to work, sometime perfectly but at times it but it was in “slow” motion. The VMs got queued for live migration, it took some time for it started and sometimes it would finish or it would fail. This did not happen between all the nodes. I diligently checked out the SMB Direct network but that was OK on all nodes. Basically the LM network was perfectly fine.
To me this indicated that the hosts potentially had issues communicating with each other to coordinate the live migration. But pings and such looked good, there was connectivity, on the surface all seemed well. In the event log details we saw indications that this was indeed the case. Unfortunately I did not get the opportunity to take screenshots or copies of the events in this particular situation.
The nodes had a separate 2*1Gbps native team LAN access and backups. But diving deeper I noticed that they had set Jumbo Frames on some of those member NICs and not on others. So these setting differed from node to node and that was leading to the symptoms we described above.
You can use Jumbo Frames on your live migration network. Testing has shown this to be beneficial. When you’re doing SMB direct it won’t make such a big difference but it doen not hurt. When SMB Direct fails you’ll fall back to SMB with Multichannel and there it helps more! See Live Migration Can Benefit From Jumbo Frames. While SMB Direct (infiniband, RoCE & iWarp) know Jumbo frames the limited testing I have ever done there indicates only a small increase (2%) in throughput so I’m not sure it’s even worthwhile when doing RDMA.
When you can use Jumbo Frames on you host LAN NIC or team of NICs (handy is you use it to do backups as well) you need to be consistent end to end. Meaning ALL hosts, ALL NICS & all switches/ switch ports. Being inconsistent in this on the cluster nodes was what cause the slow to failing live migrations. You need to have good communications between the hosts themselves and AD. Just unplug the LAN from a Hyper-V cluster host to demo this => live migration from to that node and the rest of the cluster won’t work. Mismatching Jumbo Frames or potentially other network settings make this less obvious. Another “fun” example to trouble shoot is a NIC team where the member NICs are in different VLANs.
Many MVP’s attended Microsoft Ignite 2015 in Chicago to see what our future will look like.
Carsten published the “Hyper-V Amigo Chat” we did right after Ignite. The conference was a blast for us all. Tired but happy we chat about storage space direct, Nano Server, ReFS, Dedupe, Azure Stack, … Enjoy!