The Perfect Storm of Azure DNS resolver, a custom DNS resolver, and DNS configuration ambiguities

TL:DR

The very strict Azure recursive DNS resolver, when combined with a Custom DNS resolver, can cause a timeout-sensitive application to experience service disruption due to ambiguities in third-party DNS NS delegation configurations.

Disclaimer

I am using fantasy FQDNs and made-up IP addresses here. Not the real ones involved in the issue.

Introduction

Services offered by a GIS-driven business noticed a timeout issue. Upon investigation, this was believed to be a DNS issue. That was indeed the case, but not due to a network or DNS infrastructure error, let alone a gross misconfiguration.

The Azure platform DNS resolver (168.63.129.16) is a high-speed and very strict resolver. While it can return the IP information, it does indicate a server error.

nslookup pubdata.coast.be

Server:         127.0.0.11

Address:        127.0.0.11#53

Non-authoritative answer:

pubdata.coast.be canonical name = www.coast.be.

Name:   www.coast.be

Address: 154.152.150.211

Name:   www.coast.be

Address: 185.183.181.211

** server can’t find www.coast.be: SERVFAIL

Azure handles this by responding fast and reporting the issue. The Custom DNS service, which provides DNS name resolution for the service by forwarding recursive queries to the Azure DNS resolver, also reports the same problem. However, it does not do this as fast as Azure. Here, it takes 8 seconds (Recursive Query Timeout value), potentially 4 seconds longer due to the additional timeout value. So, while DNS works, something is wrong, and the extra time before the timeout occurs causes service issues.

When first asked to help out, my first questions were if it had ever worked and if anything had changed. The next question was whether they had any control over the time-out period to adjust it upward, which would enable the service to function correctly. The latter was not possible or easy, so they came to me for troubleshooting and a potential workaround or fix.

So I dove in with the tools of the trade. nslookup, Nameresolver, Dig, https://dnssec-analyzer.verisignlabs.com/, and https://dnsviz.net/. The usual suspects were DNSSEC and zone delegation mismatches.

First, I run:

Nslookup -debug pubdata.coast.be

In the output, we find:

Non-authoritative answer:

Name:    www.coast.be

Addresses:  154.152.150.211

          185.183.181.211

Aliases:  pubdata.coast.be

We learn that pubdata.coast.be is a CNAME for www.coast.be. Let’s see if any CNAME delegation or DNSSEC issues are in play. Run:

dig +trace pubdata.coast.be

;; global options: +cmd

.                       510069  IN      NS      a.root-servers.net.

.                       510069  IN      NS      b.root-servers.net.

..

.

.                       510069  IN      NS      l.root-servers.net.

.                       510069  IN      NS      m.root-servers.net.

.                       510069  IN      RRSIG   NS 8 0 518400 20250807170000 20250725160000 46441 . <RRSIG_DATA_ANONYMIZED>

;; Received 525 bytes from 1.1.1.1#53(1.1.1.1) in 11 ms

be.                     172800  IN      NS      d.nsset.be.

..

.

be.                     172800  IN      NS      y.nsset.be.

be.                     86400   IN      DS      52756 8 2 <DS_HASH_ANONYMIZED>

be.                     86400   IN      RRSIG   DS 8 1 86400 20250808050000 20250726040000 46441 . <RRSIG_DATA_ANONYMIZED>

;; Received 753 bytes from 198.41.0.4#53(a.root-servers.net) in 13 ms

coast.be.           86400   IN      NS      ns1.corpinfra.be.

coast.be.           86400   IN      NS      ns2.corpinfra.be.

<hash1>.be.             600     IN      NSEC3   1 1 0 – <next-hash1> NS SOA RRSIG DNSKEY NSEC3PARAM

<hash1>.be.             600     IN      RRSIG   NSEC3 8 2 600 20250813002955 20250722120003 62188 be. <RRSIG_DATA_ANONYMIZED>

<hash2>.be.             600     IN      NSEC3   1 1 0 – <next-hash2> NS DS RRSIG

<hash2>.be.             600     IN      RRSIG   NSEC3 8 2 600 20250816062813 20250724154732 62188 be. <RRSIG_DATA_ANONYMIZED>

;; Received 610 bytes from 194.0.37.1#53(b.nsset.be) in 10 ms

pubdata.coast.be. 3600   IN      CNAME   www.coast.be.

www.coast.be.       3600    IN      NS      dns-lb1.corpinfra.be.

www.coast.be.       3600    IN      NS      dns-lb2.corpinfra.be.

;; Received 151 bytes from 185.183.181.135#53(ns1.corpinfra.be) in 12 ms

The DNSSEC configuration is not the issue, as the signatures and DS records appear to be correct. So, the delegation inconsistency is what causes the SERVFAIL, and the duration of the custom DNS servers’ recursive query timeout causes the service issues.

The real trouble is here:

pubdata.coast.be. 3600 IN CNAME www.coast.be

www.coast.be.        3600 IN NS dns-lb1.corpinfra.be.

This means pubdata.coast.be is a CNAME to www.coast.be. But www.coast.be is served by different nameservers than the parent zone (coast.be uses ns1/ns2.corpinfra.be). This creates a delegation inconsistency:

The resolver must follow the CNAME and query a different set of nameservers. If those nameservers don’t respond authoritatively or quickly enough, or if glue records are missing, resolution may fail.

Strict resolvers (such as Azure DNS) may treat this as a lame delegation or a broken chain, even if DNSSEC is technically valid.

Workarounds

I have already mentioned that fixing the issue in the service configuration setting was not on the table, so what else do we have to work with?

  • A quick workaround is to use the Azure platform DNS resolver (168.63.129.16) directly, which, due to its speed, avoids the additional time required for finalizing the query. However, due to DNS requirements, this workaround is not always an option.
  • The other one is to reduce the recursive query timeouts and additional timeout values on the custom DNS solution. This is what we did. The timeout value is now 2 (default is 8), and the additional timeout value is now 2 (default is 4). That is what I did to resolve the issue as soon as possible. Monitor this to ensure that no other problems arise after taking this action.
  • Third, we could conditionally forward coast.be to the dns-lb1.corpinfra.be and dns-lb2.corpinfra.be NS servers. That works, but it requires maintenance when those name servers change, so we need to keep an eye on that. We already have enough work.
  • A fourth workaround is to provide an IP address from a custom DNS query in the source code to a public DNS server, such as 1.1.1.1 or 8.8.8.8, when accessing the pubdata.coast.be FQDN is involved. This is tedious and not desirable.
  • The most elegant solution would be to address the DNS configuration Azure has an issue with. That is out of our hands, but it can be requested from the responsible parties. For that purpose, you will find the details of our findings.

Issue Summary

The .be zone delegates coast.be to the NS servers:

dns-lb1.corpinfra.be

dns-lb2.corpinfra.be

However, the coast.be zone itself lists different NS servers:

ns1.corpinfra.be

ns2.corpinfra.be

This discrepancy between the delegation NS records in .be and the authoritative NS records inside the coast.be zone is a violation of DNS consistency rules.

Some DNS resolvers, especially those performing strict DNSSEC and delegation consistency checks, such as Azure Native DNS resolver, interpret this as a misconfiguration and return SERVFAIL errors. This happens even when the IP address(es) for pubdata.coast.be can indeed be resolved.

Other resolvers (e.g., Google Public DNS, Cloudflare) may be more tolerant and return valid answers despite the mismatch, without mentioning any issue.

Why could this be a problem?

DNS relies on consistent delegation to ensure:

  • Security
  • Data integrity
  • Reliable resolution

When delegation NS records and authoritative NS records differ, recursive resolvers become uncertain about the actual authoritative servers.

This uncertainty often triggers a SERVFAIL to avoid possibly returning stale or malicious data. When NS records differ between parent and child zones, resolvers may reject responses to prevent the use of stale or spoofed data.

Overview

Zone LevelNS RecordsNotes
.be (parent)dns-lb1.corpinfra.be, dns-lb2.corpinfra.beDelegation NS for coast.be
coast.bens1.corpinfra.be, ns2.corpinfra.beAuthoritative NS for zone

Corpinfra.be (see https://www.dnsbelgium.be/nl/whois/info/corpinfra.be/details) – this is an example, the domain is fictitious – operates all four NS servers that resolve to IPs in the same subnet, but the naming inconsistency causes delegation mismatches.

Recommended Fixes

Option 1: Update coast.be zone NS records to match the delegation NS

Add dns-lb1.corpinfra.be and dns-lb2.corpinfra.be as NS records in the coast.be zone alongside existing ones (ns1 and ns2), so the zone’s NS RRset matches the delegation.

coast.be.   IN  NS  ns1.corpinfra.be.

coast.be.   IN  NS  ns2.corpinfra.be.

coast.be.   IN  NS  dns-lb1.corpinfra.be.

coast.be.   IN  NS  dns-lb2.corpinfra.be.

Option 2: Update .be zone delegation NS records to match the zone’s NS records

Change the delegation NS records in .be zone to use only:

ns1.corpinfra.be

ns2.corpinfra.be

remove dns-lb1.corpinfra.be and dns-lb2.corpinfra.be

Option 3: Align both the .be zone delegation and coast.be NS records to a consistent unified set

Either only use ns1.corpinfra.be abd ns2.corpinfra.be for both the delegation and authoritative zone NS records, or only use dns-lb1.corpinfra.be and dns-lb2.corpinfra.be for both. Or use all of them; three or more geographically dispersed DNS servers are recommended anyway. Depends on who owns and manages the zone.

What to choose?

OptionDescriptionProsCons
1Add dns-lb1 and dns-lb2 to the zone fileQuick fix, minimal disruptionMaybe the zones are managed by <> entities
2Update .be delegation to match zone NS (ns1, ns2)Clean and consistentRequires coordination with DNS Belgium
3Unify both delegation and zone NS recordsMost elegantRequires a full agreement between all parties

All three options are valid, but Option 3 is the most elegant and future-proof. That said, this is a valid configuration as is, and one might argue that Azure’s DNS resolver’s strictness is the cause of the issue. Sure, but in a world where DNSSEC is growing in importance, such strictness might become more common? Additionally, if the service configuration could handle a longer timeout, that would also address this issue. However, that is outside my area of responsibility.

Simulation: Resolver Behavior

ResolverBehavior with MismatchNotes
Azure DNS resolverSERVFAILStrict DNSSEC & delegation checks
Google Public DNSResolves normallyTolerant of NS mismatches
Cloudflare DNSResolves normallyIgnores delegation inconsistencies
Unbound (default)May varyDepends on configuration flags
Bind (strict mode)SERVFAILEnforces delegation consistency

Notes

  • No glue records are needed for coast.be, because the NS records point to a different domain (corpinfra.be), so-called out-of-bailiwick name servers, and .be correctly delegates using standard NS records.
  • After changes, flush DNS caches

Conclusion

When wading through the RFC we can summarize the findings as below

RFC Summary: Parent vs. Child NS Record Consistency

RFCSectionPosition on NS MatchingKey Takeaway
RFC 1034§4.2.2No mandate on matchingDescribes resolver traversal and authoritative zones, not strict delegation consistency
RFC 1034§6.1 & §6.2No strict matching ruleDiscusses glue records and zone cuts, but doesn’t say they must be aligned
RFC 2181§5.4.1Explicit: child may differParents’ NS records are not authoritative for the child; the child can define its own set.
RFC 4035§2.3DNSSEC implicationsMismatched NS sets can cause issues with DNSSEC validation if not carefully managed.
RFC 7719GlossaryReinforces delegation logicClarifies that delegation does not imply complete control or authority over the child zone

In a nutshell, RFC 2181 Section 5.4.1 is explicit: the NS records in a parent zone are authoritative only for that parent, not for the child. That means the child zone can legally publish entirely different NS records, and the RFC allows it. So, why is there an issue with some DNS resolvers, such as Azure?

Azure DNS “Soft” Enforces Parent-Child NS Matching

Azure DNS resolvers implement strict DNS validation behavior, which aligns with principles of security, reliability, and operational best practice, not just the letter of the RFC. This is a soft enforcement; the name resolution does not fail.

Why

1. Defense Against Misconfigurations and Spoofing

Mismatched NS records can indicate stale or hijacked delegations.

Azure treats mismatches as potential risks, especially in DNSSEC-enabled zones, and returns SERVFAIL to warn about potential spoofed responses, but does not fail the name resolution.

2. DNSSEC Integrity

DNSSEC depends on a trusted chain of delegation.

If the parent refers to NS records that don’t align with the signed child zone, validation can’t proceed.

Azure prioritizes integrity over leniency, which is why there is stricter enforcement.

3. Predictable Behavior for Enterprise Networks

In large infrastructures (like hybrid networks or private resolvers), predictable resolution is critical.

Azure’s strict policy ensures that DNS resolution failures are intentional and traceable, not silent or inconsistent like in looser implementations.

4. Internal Resolver Design

Azure resolvers often rely on cached referral points.

When those referrals don’t match authoritative data at the zone apex, Azure assumes the delegation is unreliable or misconfigured and aborts resolution.

Post Mortem summary

Azure DNS resolvers enforce delegation consistency by returning a SERVFAIL error when parent-child NS records mismatch, thereby signaling resolution failure rather than silently continuing or aborting. While RFC 2181 §5.4.1 allows child zones to publish different NS sets than the parent, Azure chooses to explicitly flag inconsistencies to uphold DNSSEC integrity and minimize misconfiguration risks. This deliberate error response enhances reliability in enterprise environments, ensuring resolution failures are visible, traceable, and consistent with secure design principles.

This was a perfect storm. A too-tight timeout setting in the service (which I do not control), combined with the Azure DNS resolvers’ rigorous behavior, which is fronted by a custom DNS solution required to serve all possible DNS needs in the environment, results in longer times for recursive DNS resolution that finally tripped up the calling service.

Connect to an Azure VM via Bastion with native RDP using only Azure PowerShell

Connect to an Azure VM via Bastion with native RDP using only Azure PowerShell

To connect to an Azure VM via Bastion with native RDP using only RDP requires a custom solution. By default, the user must leverage Azure CLI. It also requires the user to know the Bastion subscription and the resource ID of the virtual machine. That’s all fine for an IT Pro or developer, but it is a bit much to handle for a knowledge worker.

That is why I wanted to automate things for those users and hide that complexity away from the users. One requirement was to ensure the solution would work on a Windows Client on which the user has no administrative rights. So that is why, for those use cases, I wrote a PowerShell script that takes care of everything for an end user. Hence, we chose to leverage the Azure PowerShell modules. These can be installed for the current user without administrative rights if needed. Great idea, but that left us with two challenges to deal with. These I will discuss below.

A custom PowerShell Script

The user must have the right to connect to their Virtual Machine in Azure over the (central) bastion deployment. These are listed below. See Connect to a VM using Bastion – Windows native client for more information.

  • Reader role on the virtual machine.
  • Reader role on the NIC with private IP of the virtual machine.
  • Reader role on the Azure Bastion resource.
  • Optionally, the Virtual Machine Administrator Login or Virtual Machine User Login role

When this is OK, this script generates an RDP file for them on the desktop. That script also launches the RDP session for them, to which they need to authenticate via Azure MFA to the Bastion host and via their VM credentials to the virtual machine. The script removes the RDP files after they close the RDP session. The complete sample code can be found here on GitHub.

I don’t want to rely on Azure CLI

Microsoft uses Azure CLI to connect to an Azure VM via Bastion with native RDP. We do not control what gets installed on those clients. If an installation requires administrative rights, that can be an issue. There are tricks with Python to get Azure CLI installed for a user, but again, we are dealing with no technical profiles here.

So, is there a way to get around the requirement to use Azure CLI? Yes, there is! Let’s dive into the AZ CLI code and see what they do there. As it turns out, it is all Python! We need to dive into the extension for Bastion, and after sniffing around and wrapping my brain around it, I conclude that these lines contain the magic needed to create a PowerShell-only solution.

See for yourself overhere: azure-cli-extensions/src/bastion/azext_bastion/custom.py at d3bc6dc03bb8e9d42df8c70334b2d7e9a2e38db0 · Azure/azure-cli-extensions · GitHub

In PowerShell, that translates into the code below. One thing to note is that if this code is to work with PowerShell for Windows, we cannot use “keep-alive” for the connection setting. PowerShell core does support this setting. The latter is not installed by default.

# Connect & authenticate to the correct tenant and to the Bastion subscription
Connect-AzAccount -Tenant $TenantId -Subscription $BastionSubscriptionId | Out-Null

 #Grab the Azure Access token
    $AccessToken = (Get-AzAccessToken).Token
    If (!([string]::IsNullOrEmpty($AccessToken))) {
        #Grab your centralized bastion host
        try {
            $Bastion = Get-AzBastion -ResourceGroupName $BastionResoureGroup -Name $BastionHostName
            if ($Null -ne $Bastion ) {
                write-host -ForegroundColor Cyan "Connected to Bastion $($Bastion.Name)"
                write-host -ForegroundColor yellow "Generating RDP file for you to desktop..."
                $target_resource_id = $VmResourceId
                $enable_mfa = "true" #"true"
                $bastion_endpoint = $Bastion.DnsName
                $resource_port = "3389"

                $url = "https://$($bastion_endpoint)/api/rdpfile?resourceId=$($target_resource_id)&format=rdp&rdpport=$($resource_port)&enablerdsaad=$($enable_mfa)"

                $headers = @{
                    "Authorization"   = "Bearer $($AccessToken)"
                    "Accept"          = "*/*"
                    "Accept-Encoding" = "gzip, deflate, br"
                    #"Connection" = "keep-alive" #keep-alive and close not supported with PoSh 5.1 
                    "Content-Type"    = "application/json"
                }

                $DesktopPath = [Environment]::GetFolderPath("Desktop")
                $DateStamp = Get-Date -Format yyyy-MM-dd
                $TimeStamp = Get-Date -Format HHmmss
                $DateAndTimeStamp = $DateStamp + '@' + $TimeStamp 
                $RdpPathAndFileName = "$DesktopPath\$AzureVmName-$DateAndTimeStamp.rdp"
                $progressPreference = 'SilentlyContinue'
            }
            else {
                write-host -ForegroundColor Red  "We could not connect to the Azure bastion host"
            }
        }
        catch {
            <#Do this if a terminating exception happens#>
        }
        finally {
            <#Do this after the try block regardless of whether an exception occurred or not#>
        }

Finding the resource id for the Azure VM by looping through subscriptions is slow

As I build a solution for a Windows client, I am not considering leveraging a tunnel connection (see Connect to a VM using Bastion – Windows native client). I “merely” want to create a functional RDP file the user can leverage to connect to an Azure VM via Bastion with native RDP.

Therefore, to make life as easy as possible for the user, we want to hide any complexity for them as much as possible. Hence, I can only expect them to know the virtual machine’s name in Azure. And if required, we can even put that in the script for them.

But no matter what, we need to find the virtual machine’s resource ID.

Azure Graph to the rescue! We can leverage the code below, and even when you have to search in hundreds of subscriptions, it is way more performant than Azure PowerShell’s Get-AzureVM, which needs to loop through all subscriptions. This leads to less waiting and a better experience for your users. The Az.ResourceGraph module can also be installed without administrative rights for the current users.

$VMToConnectTo = Search-AzGraph -Query "Resources | where type == 'microsoft.compute/virtualmachines' and name == '$AzureVmName'" -UseTenantScope

Note using -UseTenantScope, which ensures we search the entire tenant even if some filtering occurs.

Creating the RDP file to connect to an Azure Virtual Machine over the bastion host

Next, I create the RDP file via a web request, which writes the result to a file on the desktop from where we launch it, and the user can authenticate to the bastion host (with MFA) and then to the virtual machine with the appropriate credentials.

        try {
            $progressPreference =  'SilentlyContinue'
            Invoke-WebRequest $url -Method Get -Headers $headers -OutFile $RdpPathAndFileName -UseBasicParsing
            $progressPreference =  'Continue'

            if (Test-Path $RdpPathAndFileName -PathType leaf) {
                Start-Process $RdpPathAndFileName -Wait
                write-host -ForegroundColor magenta  "Deleting the RDP file after use."
                Remove-Item $RdpPathAndFileName
                write-host -ForegroundColor magenta  "Deleted $RdpPathAndFileName."
            }
            else {
                write-host -ForegroundColor Red  "The RDP file was not found on your desktop and, hence, could not be deleted."
            }
        }
        catch {
            write-host -ForegroundColor Red  "An error occurred during the creation of the RDP file."
            $Error[0]
        }
        finally {
            $progressPreference = 'Continue'
        }

Finally, when the user is done, the file is deleted. A new one will be created the next time the script is run. This protects against stale tokens and such.

Pretty it up for the user

I create a shortcut and rename it to something sensible for the user. Next, I changed the icon to the provided one, which helps visually identify the shortcut from any other Powershell script shortcut. They can copy that shortcut wherever suits them or pin it to the taskbar.

Connect to an Azure VM via native RDP using only Azure PowerShell

What is AzureArcSysTray.exe doing on my Windows Server?

Introduction to AzureArcSysTray.exe

After installing the October 2023 updates for Windows Server 2022, I noticed a new systray icon, AzureArcSysTray.exe.

What is AzureArcSysTray.exe doing on my Windows Server?

It encourages me to launch Azure Arc Setup.

AzureArcSysTray.exe

Which I hope takes a bit more planning than following a systray link. But that’s just me, an old-school IT Pro.

Get rid of the systray entry

Delete the AzureArcSysTray.exe value from the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run registry key. Well, use GPO or another form of automation to get this done whole sales. I used Computer Configuration GPO Preferences in the lab. See the screenshot below from my home lab. It is self-explanatory.

The added benefit of a GPO is that it will deal with it again if Microsoft pushes it again in the next update cycle.

Uninstall the feature

Using DISM or Server Manager, you can uninstall the feature altogether. Do note that this requires a reboot!

Disable-WindowsOptionalFeature -Online -FeatureName AzureArcSetup

Or

Note: This removes the systray exe and Azure Arc Setup. If someone already set it up and configured it, that is still there and needs more attention as the Azure Connect Machine Agent is up and running. Does anyone really onboard servers like this in Azure Arc?

Bad Timing

Well, this was one thing I could have done without on the day I was deploying the October 2023 updates expeditiously. Why expedited? Well, it was about 104 CVEs, of which 12 are critical Remote Code Execution issues, and 3 are ZeroDays) and a Hyper-V RCT fix that we have been waiting for (for the past five – yes, 5 – years). Needless to say, testing + rollout was swift. That AzureArcSysTray.exe delayed us, as we had to explain and mitigate it.

Is it documented?

Yes, it is, right here: https://learn.microsoft.com/en-us/azure/azure-arc/servers/onboard-windows-server. It was incomplete on Tuesday night, but they added to it quickly.

Judging from some social media, Reddit. Slack channels, not too many people were amused with all this. See https://www.reddit.com/r/sysadmin/comments/174ncwy/patch_tuesday_megathread_20231010/k4cdqsj/

https://www.reddit.com/r/sysadmin/comments/1763a7o/azure_arc/https://www.reddit.com/r/sysadmin/comments/1763a7o/azure_arc/

Conclusion

I had to explain what it was and our options to eliminate it, all while we were asked to deploy the updates as soon as possible. Finding AzureArcSysTray.exe Azure Arc Setup installed was not part of the plan late Tuesday night in the lab.

Please, Microsoft, don’t do this. We all know Azure Arc is high on all of Microsoft’s agenda. It is all the local Microsoft employees have been talking about for weeks now. We get it. But nagging us with systray icons is cheesy at best, very annoying, and, for many customers, nearly unacceptable.

ConvertFrom-Json is not serializable

Introduction

While writing Bicep recently, I was stumped by the fact that my deployment kept failing. I spent a lot of time troubleshooting many possible ideas on what might be causing this. As JSON is involved and I am far from a JSON syntax guru, I first focused on that. Later I moved to how I use JSON in Bicep and PowerShell before finally understanding the problem was due to the fact that ConvertFrom-Json is not serializable.

Parameters with Bicep

When deploying resources in Azure with Bicep, I always need to consider who has to deliver or maintain the code and the parameters. It has to be somewhat structured, readable, and understandable. It can’t be one gigantic listing that confuses people to the point they are lost. Simplicity and ease of use rule my actions here. I know when it comes to IaC, this can be a challenge. So, when it comes to parameters, what are our options here?

  • I avoid hard-coding parameters in Bicep. It’s OK for testing while writing the code, but beyond that, it is a bad idea for maintainability.
  • You can use parameter files. That is a considerable improvement, but it has its limitations.
  • I have chosen the path of leveraging PowerShell to create and maintain parameters and pass those via objects to the main bicep file for deployment. That is a flexible and maintainable approach. Sure, it is not perfect either, but neither am I.

Regarding Bicep and PowerShell, we can also put parameters in separate files and read those to create parameters. Whether this is a good idea depends on the situation. My rule of thumb is that it is worth doing when things become easier to read and maintain while reducing the places where you have to edit your IaC files. In the case of Azure Firewall Policy Rules Collection Groups, Rules collections, and Rules, it can make sense.

Bicep and JSON files

You can read file content in Bicep using. With the json() function, you can tell Bicep that this is JSON. So far, so good. The below is perfectly fine and works. We can loop through that variable in a resource deployment.

var firewallChildRGCs = [

    json(loadTextContent('./AFW/Policies/RGSsAfwChild01.json'))

    json(loadTextContent('./AFW/Policies/RGSsAfwChild02.json'))

    json(loadTextContent('./AFW/Policies/RGSsAfwChild03.json'))

]

However, I am not entirely happy with this. While I like it in some aspects, it conflicts with my desire not needing to edit a working Bicep file once it is in use. So what do I like about it?

It keeps Bicep clean and concise and limits the looping to iterate over the Rules Collection Groups, thus avoiding the nested looping for Rules collections and Rules. Why is that? Because I can do this

@batchSize(1)

resource firewallChildPolicyWEUColGroups 'Microsoft.Network/firewallPolicies/ruleCollectionGroups@2022-07-01' = [for (childrcg, index) in firewallChildRGCs: {

  parent: firewallChildPolicyWEU

  name: childrcg.name

  dependsOn: [firewallParentPolicyWEUColGroups]

  properties: childrcg.properties

}]

As you can see, I loop through the variable and pass the JSON into the properties. That way, I create all Rule Collections and Rules without needing to do any nested looping via “helper” modules to get this done.

The drawback, however, is that the loadTextContent function in Bicep cannot use dynamic parameters or variables. As a result, the paths to the files need to be hard coded into the Bicep file. That is something we want to avoid. But until that is possible, it is a hard restriction. That is because parameters are evaluated during runtime (bicep deployment), whereas loadTextContent in Bicep happens while compiling (bicep build). So, in contrast to the early previews of Bicep, where you “transpiled” the Bicep manually, it is now done for you automatically before the deployment. You think this can work, but it does not.

PowerShell and JSON files

As mentioned above, I chose to use PowerShell to create and maintain parameters, and I want to read my JSON files there. However, it prevents me from creating large, long, and complex to maintain PowerShell objects with nested arrays. Editing these is not straightforward for everyone. On top of that, it leads to the need for nested looping in Bicep via “helper” modules. While that works, and I use it, I find it more tedious with deeply nested structures and many parameters to supply. Hence I am splitting it out into easier-to-maintain separate JSON files.

Here is what I do in PowerShell to build my array to pass via an Object parameter. First, I read the JSON filers from my folder.

$ChildFilePath = "../bicep/nested/AfwChildPoliciesAndRules/*"
$Files = Get-ChildItem -File $ChildFilePath -Include '*.json' -exclude 'DONOTUSE*.json'
$Files
$AfwChildCollectionGroupsValidate = @() # We use this with ConvertFrom-Json to validate that the JSON file is OK, but cannot use this to pass as a param to Bicep
    $AfwChildCollectionGroups = @()
    Foreach ($File in $Files) {
    try{
        $AfwChildCollectionGroupsValidate += (Get-Content $File.FullName -Raw) | ConvertFrom-Json
        # DO NOT PUT JSON in here - the PSCustomObject is not serializable and passing this param to Bicep will than be empty!
        $AfwChildCollectionGroups += (Get-Content $File.FullName -Raw) # A string is serializable!
        }
        Catch
        {
        write-host -ForegroundColor Red "ConvertFrom-Json threw and error. Check your JSON in the RCG/RC/R files"
        Exit
        }
    }

I can then use this to roll out the resources, as in the below example.

// Roll out the child Rule Collection Group(s)

var ChildRCGs = [for (rulecol, index) in firewallChildpolicy.RuleCollectionGroups: {

  name: json(rulecol).name

  properties: json(rulecol).properties

}]

Initially, the idea was that by using ConvertFrom-Json I would pass the JSON to Bicep as a parameter directly.

$AfwChildCollectionGroups += (Get-Content $File.FullName -Raw) | ConvertFrom-Json

So not only would I not need to load the files in Bicep with a hard-coded path, I would also not need to use json() function in Bicep.

// Roll out the child Rule Collection Group(s)
var ChildRCGs = [for (rulecol, index) in firewallChildpolicy.RuleCollectionGroups: {
  name: rulecol.name
  properties: rulecol.properties
}]

However, this failed on me time and time again with properties not being found and what not. Below is an example of such an error.

Line |
  30 |          New-AzResourceGroupDeployment @params -DeploymentDebugLogLeve …
     |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | 2:46:20 PM - Error: Code=InvalidTemplate; Message=Deployment template validation failed: 'The template variable 'ChildRCGs' is not valid: The language expression property 'name' doesn't exist, available properties are ''.. Please see    
     | https://aka.ms/arm-functions for usage details.'.

It did not make sense at all. That was until a dev buddy asked if the object was serializable at all. And guess what? ConvertFrom-Json creates a PSCustomObject that is NOT serializable.

You can check this quickly yourself.

((Get-Content $File.FullName -Raw) | ConvertFrom-Json).gettype().IsSerializable

Will print False

While

((Get-Content $File.FullName -Raw)).gettype().IsSerializable

Will print True

With some more testing and the use of outputs, I can even visualize that the parameter remained empty! The array contains three empty {} where I expected the JSON.

ConvertFrom-Json is not serializable

I usually do not have any issues with this in my pure PowerShell scripting. But here, I pass the object from PowerShell to Bicep, and guess what? For that to work, it has to be serializable. Now, when I do this, there are no warnings or errors. It just seems to work until you use the parameter and get errors that, at first, I did not understand. But the root cause is that in Bicep, the parameter remained empty. Needless to say, I wasted many hours trying to fix this before I finally understood the root cause!

As you can see in the code, I still use ConvertFrom-Json to test if my JSON files contain any errors, but I do not pass that JSON to Bicep as that will not work. So instead, I pass the string and still use the json() function in Bicep.

Hence, this blog post is to help others not make the mistake I made. It will also help me remember ConvertFrom-Json is not serializable.