MFA for a highly available RD Gateway
Recently I decided to write up a couple of articles on how to set up MFA for a highly available RD Gateway. Why? Because so much information on the internet is fragmented and as such incomplete. So I wanted a reference document for myself. As I was making that document I realized I needed to explain the why and not just the how. The “why” is what helps people support and troubleshoot the solution during its life cycle.
The above, in combination with me being a verbose son of * led to 44 pages of information. So, I decided to publish it as a two-part article series.
You can find the articles here Transition a Highly Available RD Gateway to Use the NPS Extension for Azure MFA – Phase I and Transition a highly available RD Gateway to use the NPS Extension for Azure MFA – Phase II
Why and when should you read them?
If you have RD Gateway running and you have no MFA solution set up for it, I highly recommend you head over to read these two articles. That is especially true when your RD Gateways solution is a high availability (HA) deployment with an RD Gateway farm behind a load balancer. In that case, you want your MFA components to be HA as well! For some reason, so many guides on the internet ignore or brush over HA very cavalierly. That is one thing I hope these two articles remediate.
Next to that, it has many details on every aspect of the deployment to make sure you get it up and running successfully and correctly.
Finally, I present you with a collection of troubleshooting information and tools to help you figure out where the problem is so you can find a way to fix it.
That’s it. I really think it can help many of you out there. I hope it does.
Hello Didier, incidentally I was testing similar scenario as you describe in these articles. I’ve followed the MS guide and yours and while yours is lot more detailed, I can’t get it to work (even without MFA – disabled in registry). I get audit failures on both RDGW and central NPS saying: “The Remote RADIUS server did not respond” with ‘REQUESTS TOWARDS RD GATEWAY SERVERS’ CRP mentioned on the central NPS and ‘REQUESTS FROM CENTRAL NPS SERVERS’ mentioned on the RDGW.
I can only connect when I change Authentication in the ‘REQUESTS TOWARDS RD GATEWAY SERVERS’ to ‘Accept users without validating credentials’ (instead of ‘Forward requests …. to RDGWSERVERS’ group).
I monitored the communication with netmon and all I see on both ends (RDGW,NPS) are hundreds of Access Request messages, but nothing else, no accept/deny.
Any idea what could be wrong there?
Hey there, Cuople of pointers:
Testing disable any network policies (CAPS in RD GW Speak) on the RD GW, you really want the one on the NPS to be used.
That many messages points indicates a loop of some kind. Try to swap the FROM and Towards on the NPS Server. It might be sone rule/condition is catching everything in Towards … One trick is to add the NPS client IP address itself as a condition on the TOWARDS groups, that makkes it more specific and will not catch everything when you put it first or place the FROM first.
I’ve tried your suggestions, no change in the behavior though. Do you have any other tips? Also, there are no IAS logs being creted on either RDGW/NPS, despite logging being enabled and accounting configured.
Hard to say from here. But he hundreds of messages indicate an issue where you never get an accept or deny message and it keeps getting send over and over again. SO it looks like TOWARDS from the RDGW reaches the NPS where it is not caught by the FROM but instead gets send back again. You could start by disabling the TOWARDS on the NPS Server and test until you see the incoming message getting caught by the FROM condition. Play with the condition, ease them so the macth for anything, make them sticter until they only allow and cathc the messages frome the RDGW server(s).At least than you will know that works and the loop will be halted. When that works continue with enabling the TOWARDS condition on the NPS Server. Agaain, make it liberal so it catches anything and from there continue to make it stricter so it only sends the messages from the NPS server istelf towards the RDGW servers.
Use all the tools mentioned to correlate data in event viewers and logs … if a HA setup beging witn a non HA flow to make trouble shooting more straight forward.
Thanks for the responses, I’ve reinstalled the RDGW server, tested that it works with local NPS, recreated the policies and switched to central NPS and got somewhere – when I disable REQUESTS TOWARDS RD GATEWAY SERVERS on central NPS or when I move that policy to be processed second, I can connect through the RDGW using RDP fine – I immediatelly get Access-Accept. When I enable that policy (and its first on the list), I’m stuck in the “initiating remote connection” in the RDP client, which keeps generating the Access-Request messages over and over until the RDP client times out (5mins), never getting accept/deny.
Why does the connection even work with that policy disabled? Also, while it works with that policy being processed second, you mention in your article that in that case the RD CAP isn’t enforced, so it’s undesirable.
There are a lot of permutations in play and not all sistuations will apply to everyone. The trick to get the order of FROM/TOWARDS not to matter on The NPS Server is to add the IP address of the NPS server it self as a confition to the TOWARDS policy as “Access Client IPv4 Address” when you do that normally the order will not mess up the flow. You can try that out, the order then doesn”t matter, it can be first or secondd.
If it isn’t there there can be a situation where you land on the RDGW and if the connection policies there are still active they can be processed or another policy that is active 3rd/4th could allow entry without any device redirection being enforced, which can lead to the redirection settings being different than what you expect versus what you set on the NPS server. So, while it happens that statement is not always the case. It is meant as and example of hwat can go wrong with different orders and settings in the policies.
Thanks a lot for your pointers and help Didier, adding the NPS IP to the TOWARDS policy on the central NPS as you’ve described, seems to have fixed the issue. Now the policy is first in the list and the connection still works. Device redirection limits seem to be working too.
This stuff reminded me why I never liked the NPS/RADIUS stuff in the first place 🙂
I hope I haven’t jinxed it.
Excellent, I am happy to read this works for you now. I will see if I can find the time to improve the article a bit to remove confusion/ambigiuty