A VM that would not route

A VM that would not route

This blog post will address a troubleshooting exercise with a VM (virtual machine) that would not route. As it turned out it had the default gateway set to 0.0.0.0 next to the actual gateway IP address. The VM did its job as the workload it serves is in the same subnet as the client, as it happens in the same subnet of the DC and DNS. This meant it did not lose its trust with Active Directory.

But the admins could not RDP into that VM, nor would it update, But as it did its job, many months went by until it fell too far behind in updates so they could not ignore it anymore. That’s how things go goes in real life.

Finding & fixing the issue

Superficially the configuration of the VM was totally OK. The gateway for the NIC is correct.

Under Advanced we see no other entries that would cause any issues.

But we could not deny that we had a VM that would not route at hand. Let’s figure this out and fix it.

So what does one do? If you don’t trust the CLI, check the GUI, and if you don’t trust the GUI check the CLI. As in the GUI, everything seemed fine we checked via the CLI. Name resolution worked fine, both internally and externally when checking this with nslookup. But actually getting anywhere not on the same subnet was not successful. Naturally, I did check if any forward proxy was in play but that was not the case and, this was an issue for more use cases than HTTP(S).

When I ran ipconfig /all I quickly saw the culprit. We have a Default Gateway entry pointing to 0.0.0.0 next to one for the actual gateway!

So where does that come from? Not from the GUI settings, that we can see. So I ran route print and that show us the root cause

So we needed to drop the route sending traffic for 0.0.0.0 to its own IP address as the gateway. They missed this as it does not show up in the GUI at all.

I dropped all persistent routes for 0.0.0.0 via route delete 0.0.0.0.0 mask 0.0.0.0. I check if this deleted all persistent routes via route print.

At that moment routing won’t work and we need to add the gateway back to the NIC. YOu can use the GUI or route add -p 0.0.0.0 MASK 0.0.0.0 192.168.2.7 IF 9 Once I did that things lit up. We could download and install updates from the WSUS server, they had remote desktop access again. Routing worked again in other words.

How did it happen? Ah, somewhere, somehow, someone added that route. I am not paid to do archeology or forensics in this case so, I did not try to find out the what, when, and why. But my guess is that VM had another NIC at a given time with those setting and they removed it from the Hyper-V setting without cleaning up, leading to that setting being left behind in the routing table leaving a gateway 0.0.0.0 on the NIC that is only visible in via ipconfig /all. Or they have tried to add a gateway manually to address this or another issue.

A final note

When faced with this issue, some folks on the internet will tell you to reset the TCP/IP stack and Winsock with netsh, or add a new NIC with a new IP (dynamically or via DHCP) and dump the old one. But this is all bit drastic. Check the root cause and try to fix that first. You can try drastic measures when all else fails.