Configuring an interface bond in a Ubuntu Hyper-V guest

Introduction

In this post, we take a look at configuring an interface bond in a Ubuntu Hyper-V guest. But first a quick word about NIC teaming and Hyper-V. In real life, teaming is most often done on physical hardware. But in the lab, or for some edge production cases, you might want to use it in virtual machines. The use case here is virtual machines used for testing and knowledge transfer. We are teaching about creating Veeam Backup & Configuration hardened repositories with XFS and immutability. In that lab, we are emulating a NIC team on hardware servers.

When you need redundant, high available networking for your Hyper-V guests, you normally create a NIC team on the host. You then use that NIC team to make your vSwitch. You can use a traditional LBFO team (depreciated) or a SET switch. The latter is the current technology and the way forward. But in this lab scenario, I am using LBFO, native Windows native NIC teaming.

Configuring an interface bond in a Ubuntu Hyper-V guest
99.9% of all use cases will use teaming on the Hyper-V host

Host teaming provides both bandwidth aggregation, redundancy, and failover. Typically, you do not mess around with NIC teaming in the guest in 99.99% of cases. Below we see a figure showing guest teaming. You need to use two physical NICs for genuine redundancy. Each with its separate virtual switch and uplinked to separate physical switches. Beware that only switch independent teaming is supported in the guest OS, so configure the switches and switch ports accordingly.

Configuring an interface bond in a Ubuntu Hyper-V guest
Hyper-V in guest NIC teaming

In-guest teaming is rarely used for production workloads, that is, bar some exceptions with SR-IOV, but that is another discussion. However, you might have a valid reason to use NIC teaming for lab work, testing, documenting configurations, teaching, etc. Luckily, that is easy to do. Hyper-V has a setting for your vNICs, enabling them to be functional members of a NIC team in a Windows guest OS. Als long as that OS supports native teaming. That is the case for Windows Server 2012 and later.

NIC teaming inside a Hyper-V Guest

For each vNIC member of the NIC team in the guest, you must put a checkmark to “enable this network adapter to be part of a team in the guest operating system” there is nothing more to it. The big caveat here is that each member must reside on a different external vSwitch for failover to work correctly. Otherwise, you will see a “The virtual switch lacks external connectivity” error on the remaining when failing over and packet loss.

Enable NIC teaming o the vNIC that are going to be team membersthe Hyper-V settings

There is nothing more you need to make it work perfectly in a Windows guest VM. As you can see in the image below, both my LAN NIC and the NIC get an address from the DHCP server.

Functional team in the virtual machine. Do test failover to make sure you got it right?

That’s great. But sometimes, I need to have a NIC team inside a Linux guest virtual machine. For example, recently, on Ubuntu 20.04, I went through my typical motions to get in guest NIC teaming or bonding in Linux speak. But, much to my surprise, I did not get an IP address from my DHCP server on my Ubuntu 20.04 guest bond. So, what could be the cause?

Configuring an interface bond in a Ubuntu Hyper-V guest

In Ubuntu, we use netplan to configure our networking and in the image below you can see a sample configuration.

A minimal bond configuration in Ubuntu

I have created a bond using eth0 and eth1, and we should get an IP address from DHCP. The bonding mode is balance-rr. But why I am not getting an IP address. I did check the option “Enable this network adapter to be part of a team in the guest operating system” on both member vNICs.

Well, let’s look at the nic interfaces and the bond. There we see something exciting.

Configuring an interface bond in a Ubuntu Hyper-V guest
Note that the bond and it’s member interfaces have the same MAC address that does not come from the Hyper-V host pool

Note that the bond has a MAC address that is the same as both member interfaces. Also, note that this MAC address does not come from the Hyper-V host MAC address pool and is not what is assigned to the vNIC by Hyper-V as you can see in the image below! That is the big secret.

With MAC addressed unknown to the hypervisor, this smells of something that requires MAC spoofing, doesn’t it? So, I enabled it, and guess what? Bingo!

So what is the difference with Windows when configuring an interface bond in a Ubuntu Hyper-V guest?

The difference with Windows is that an interface bond in an Ubuntu Hyper-V guest requires MAC address spoofing. You have to enable MAC Spoofing on both vNICs members of the Ubuntu virtual machine bond. The moment you do that, you will see you get a DHCP address on the bond and get network connectivity. But why is this needed? In Ubuntu (or Linux in general), the bond interface and its members have a generated MAC address assigned. It does not take one of the MAC addresses of the member vNICs. So, we need MAC spoofing enabled on both member vNIC in the Hyper-V settings for this to work! In a Windows guest, the LBFO team gets one of the MAC addresses of its member vNICs assigned. As such, this does not require NIC spoofing.

With Ubuntu (Linux) you don’t even have to check “enable this network adapter to be part of a team in the guest operating system” on the member vNICs. Note that a guest Linux bond does not need every member interface on a separate vSwitch for failover to work. Not even if you enable “enable this network adapter to be part of a team in the guest operating system.” However, the latter is still ill-advised when you want real redundancy and failover.

A VM that would not route

A VM that would not route

This blog post will address a troubleshooting exercise with a VM (virtual machine) that would not route. As it turned out it had the default gateway set to 0.0.0.0 next to the actual gateway IP address. The VM did its job as the workload it serves is in the same subnet as the client, as it happens in the same subnet of the DC and DNS. This meant it did not lose its trust with Active Directory.

But the admins could not RDP into that VM, nor would it update, But as it did its job, many months went by until it fell too far behind in updates so they could not ignore it anymore. That’s how things go goes in real life.

Finding & fixing the issue

Superficially the configuration of the VM was totally OK. The gateway for the NIC is correct.

Under Advanced we see no other entries that would cause any issues.

But we could not deny that we had a VM that would not route at hand. Let’s figure this out and fix it.

So what does one do? If you don’t trust the CLI, check the GUI, and if you don’t trust the GUI check the CLI. As in the GUI, everything seemed fine we checked via the CLI. Name resolution worked fine, both internally and externally when checking this with nslookup. But actually getting anywhere not on the same subnet was not successful. Naturally, I did check if any forward proxy was in play but that was not the case and, this was an issue for more use cases than HTTP(S).

When I ran ipconfig /all I quickly saw the culprit. We have a Default Gateway entry pointing to 0.0.0.0 next to one for the actual gateway!

So where does that come from? Not from the GUI settings, that we can see. So I ran route print and that show us the root cause

So we needed to drop the route sending traffic for 0.0.0.0 to its own IP address as the gateway. They missed this as it does not show up in the GUI at all.

I dropped all persistent routes for 0.0.0.0 via route delete 0.0.0.0.0 mask 0.0.0.0. I check if this deleted all persistent routes via route print.

At that moment routing won’t work and we need to add the gateway back to the NIC. YOu can use the GUI or route add -p 0.0.0.0 MASK 0.0.0.0 192.168.2.7 IF 9 Once I did that things lit up. We could download and install updates from the WSUS server, they had remote desktop access again. Routing worked again in other words.

How did it happen? Ah, somewhere, somehow, someone added that route. I am not paid to do archeology or forensics in this case so, I did not try to find out the what, when, and why. But my guess is that VM had another NIC at a given time with those setting and they removed it from the Hyper-V setting without cleaning up, leading to that setting being left behind in the routing table leaving a gateway 0.0.0.0 on the NIC that is only visible in via ipconfig /all. Or they have tried to add a gateway manually to address this or another issue.

A final note

When faced with this issue, some folks on the internet will tell you to reset the TCP/IP stack and Winsock with netsh, or add a new NIC with a new IP (dynamically or via DHCP) and dump the old one. But this is all bit drastic. Check the root cause and try to fix that first. You can try drastic measures when all else fails.

SMB over QUIC POC

SMB over QUIC POC

I have had the distinct pleasure of being one of the first people to implement a SMB over QUIC POC. It was in a proof of concept I did with Windows Server 2022 Azure Edition in public preview.

That was a fun and educational excercise. As a result, I learned a lot. As a result, I decided to write a lab and test guide, primarily for my own reference. But also, to share my experience with others.

SMB over QUIC POC
So happy I did this POC and I am very happy with the results!

You can read the lab guide in a two part series of articles. SMB over QUIC: How to use it – Part I | StarWind Blog (starwindsoftware.com) and SMB over QUIC Testing Guide – Part II | StarWind Blog (starwindsoftware.com)

I am convinded it will fill a need for people that require remote access to SMB file shares without a VPN. Next to that, the integration with the KDC proxy service make it a Kerberos integrated solution. In addition, the KDC Prosy service has the added benefit of allowing for remote password changes.

If you need to get up to speed on what SMB over QUIC is all about I refer your to my article SMB over QUIC Technology | StarWind Blog (starwindsoftware.com). I’m sure that will bring you up to speed.

Finally, I hope you will find these articles useful. I’m pretty sure they will help you with your own SMB over QUIC POC and testing.

Thank your for reading!

Microsoft and QUIC

Microsoft and QUIC

If you are interested in Microsoft and QUIC I have some good news for you. Recently a new article, SMB over QUIC Technology | StarWind Blog (starwindsoftware.com) was published. It is the first in a series about what Microsoft is working on in regards to QUIC. While not without some controversy, QUIC does a lot for a number of issues connectivity over “the internet at large” has been dealing with.

  • It leverages UDP.
  • TLS 1.3 is built into the protocol.
  • Reduces RTT during connection & encryption setup.
  • Handles and optimizes flow control and loss recovery.
Figure : QUIC reduces the round trips during the TLS handshake significantly
QUIC reduces the round trips during the TLS handshake significantly

Over the internet, with mobile clients, this is a big deal. Since it is secure by default people really started thinking about where this can be used to improve things for all involved.

I think QUIC is going to be more and more important in the future and this article positions QUIC in the Microsoft ecosystem. So, head over there, read it, and let me know what you think.

TLS 1.3, QUIC, HTTP/3, and SMB 3.1.1 are shaking up things a bit by challenging TCP. Microsoft dropped QUIC into Windows Server 2022 Azure edition. That went into public preview last week and I dove in to the lab to figure out what I can do with it.

As a technologist, I am having a lot of fun testing this out in the lab. Last weekend I was busy with SMB over QUIC and QUIC in IIS. I learned a lot and have made up my mind I can use this in the real world to solve requirements. I will share my findings and musing with you in near the future. But today, start with an introduction in SMB over QUIC Technology | StarWind Blog (starwindsoftware.com).