Why You Need an Immutable Backup Repository, And Why Veeam’s Just Enough OS and VBR appliance are the Smart Choice

Introduction

Let’s be honest: if you’re still relying on traditional backup strategies without immutability in today’s threat landscape, you’re playing a game of Russian roulette. Ransomware isn’t just a buzzword. It’s a business model. Insider threats aren’t hypothetical; they’re happening. And when the proverbial shit hits the fan, your backups are either your lifeline or your liability. Then there are wipers, who want to destroy data and your business, nothing else. They don’t ask for ransom payments, blackmail you to stop them from exposing your confidential files, or threaten to harm your personnel physically. Destruction and mayhem are all they care about. You need protection!

So, how do you prepare for that? You need a hardened repository, providing immutability and protection from deletion! Not just any Windows or generic Linux box with some tweaks, but a purpose-built, security-first solution. And suppose you want to avoid reinventing the wheel while staying compliant and operationally sane. In that case, the Veeam Hardened Repository ISO or its successor, Veeam Just Enough OS (Veeam JEOS), is the recommended approach.

Now, while I focus on the why related to Veeam’s Hardened Repository ISO, it is worth noting that an immutable repository does not exist in isolation. The 3-2-1-1-0 rule, CPU core and memory sizing, redundancy, high availability, IOPS, throughput, and storage and networking capacity matter! However, when it comes to the 3-2-1-1-0 rule, I have always stated that I don’t count the production workload as a copy. However, that one immutable copy is something I’m gradually changing into zero non-immutable and deletable copies.

Additionally, hardening any role in your backup fabric is now a must. Everything is a target, including your employees, via social engineering.

Hardened Linux Repository with immutability

Using a Hardened Linux Repository with immutability should be mandatory. None of this is about being paranoid; it’s about being prepared. Sure, you can laugh at me, say it is overkill or too expensive. Laughing is healthy, so keep doing that. But listen to me. It is not overkill and is not more costly. It is not even more cumbersome, except for the inevitable extra steps in a zero-trust workflow. There is even a bonus: when ransomware strikes, listening to me might keep that smile on your face!

You may have seen my blog post, Revised script for decrypting datacenter credentials from the Veeam Backup & Replication database | Working Hard In IT. That post does not mean Veeam or Windows cryptography implementations are inherently flawed; it highlights the inevitable consequences of having root access to your system. Hence, you can guess that you require any server role in your Veeam backup environment to be hardened as much as possible. Veeam is therefore also providing a Veeam Software Appliance (VeeamSoftwareAppliance_13.0.0.12109.BETA2.iso).

When you build your own Veeam Hardened Linux Repository, you must take technical measures and establish a process flow to service genuine requests and protect against both external and internal malicious actions. All that is taken care of by the Veeam appliance approach. Not too shabby, not too shabby at all!

A hardened Linux repository is a tactical and strategic asset in a backup fabric. It gives you a fighting chance and serves as an ark of Noah to start over from. Below, we will discuss why it should be a mandatory component in your architecture.

Immutability is essential

If your backups can be deleted, encrypted, or tampered with, you don’t have backups, but “hope”. You have a false sense of security. Immutability ensures that your backup data is locked down and protected from malware or rogue administrators.

Pre-Hardened OS

Security isn’t just about firewalls and antivirus. It’s about reducing your attack surface. A pre-hardened OS turns off unnecessary services, enforces strict access controls, and aligns with best practices from the outset. That means a lot of work and worrying that you don’t have to do.

STIG Compliance

Want to sleep better at night? Align with government-grade security standards. STIG compliance ensures your repository is secure, and you can reference Veeam to support that claim when needed.

Ransomware Resilience

Ransomware loves backup data. It’s the first thing attackers go after. A hardened repository isolates your backups and enforces immutability, making it a fortress against encryption attempts.

Auditability & Compliance

GDPR, HIPAA, and ISO 27001 compliance isn’t optional. Hardened repositories support forensic analysis, secure logging, and system integrity checks. You’re not just protected; you can prove to an auditor. Yes, compliance is a thing, and while the actual protection comes before compliance reports, we cannot ignore that.

Operational Stability

Misconfigurations are the silent killers of IT. A hardened repo minimizes that risk. With pre-applied security settings, even teams without deep Linux chops can deploy confidently.

Maintenance without effort

Security updates and patches? Streamlined. Veeam handles the OS and repo updates, so you don’t have to babysit your infrastructure. I still need to determine if the ISO can also handle firmware updates for you.

Insider Threat Mitigation

Not every threat comes from outside. Role-based access, BMC port protection, and restricted shell access help prevent internal sabotage, whether accidental or intentional.

Strategic Value

All the above is not just a technical and operational advantage. It’s a business win. A hardened repo ensures your backups are a reliable recovery point, even when everything else goes sideways. It is your Ark of Noah! And guess what? Have redundant Arks! One is none, two is one ๐Ÿ˜‰.

Why Veeam’s Just Enough OS ISO Is a Game-Changer

You could build your own hardened Linux repo. I’ve done it. It works. But it’s not for everyone. Veeam’s Hardened Repository ISO (VeeamHardenedRepository_2.0.0.8_20250117.iso) streamlines the process, automates the hardening, and provides a vendor-backed solution that’s ready for production.

The future Veeam Just Enough OS hardened repository (VeeamJEOS_13.0.0.12109.BETA2.iso) is well locked down, and privileged actions require security officer approval. While that is essential in a zero-trust world, it also means you must have your processes streamlined and communication lines open. When people need to reset a password or require root access for troubleshooting, they cannot wait until the next business day when the security officer is at work, let alone a week, because somebody has to bring it up at the weekly CISO approval board.

Look, you can roll your hardened repository if you have the skills, time, and appetite for ongoing maintenance. I have done that and might still do so depending on the environment and requirements. However, if you’re looking for a secure, compliant, and low-maintenance solution that works, Veeam Hardened Repository ISO or its successor, Veeam Just Enough OS, is the answer. By starting today, testing these solutions, you gain insights and experience using them, and will be optimally prepared for when Veeam Backup & Replication v13 becomes available. The Veeam Hardened Repository ISO has experimental support, enabling its use in production environments. At the very least, you can store one backup copy on it today. If you are interested in this for future use, consider Veeam Just Enough OS (VeeamJEOS_13.0.0.12109.BETA2.iso) as part of the future V13 release. However, that one is not yet production-ready. But it wonโ€™t be long now when we look at the post by Anton Gostev on LinkedIn! At the time of writing this post, it should be less than a month.

Conclusion

The above is not paranoia, and it’s not just about ticking boxes for compliance. It’s about building a backup strategy that survives real-world threats. And in that world, immutability isn’t optional. It’s your insurance policy. Look, I have seen the devastation ransomware causes. It is a horrible place to be. I don’t want you to be in that world of hurt. However, we cannot prevent it. You are a target, and you will get hit. It is a question of when, not if. So make sure you have the means to come out on top!

Revised script for decrypting datacenter credentials from the Veeam Backup & Replication database

Introduction

In a previous article (Protecting your Veeam Backup and Replication Server is critical | Working Hard In IT), I discussed my script for decrypting the datacenter credentials from the Veeam Backup & Replication database. Since then, that PowerShell code has been published dozens of times all over the internet in various articles.

However, three relevant things have changed since my original blog post:

  1. Veeam v12.1 introduced a new encryption method.
    Firstly, in Veeam 12.1, the method of encrypting passwords has changed. That means the old script no longer works (always) as it only uses the legacy method.
  2. Veeam published its encryption and decryption methods.
    Secondly, Veeam has published the methods used to encrypt and decrypt passwords in the spirit of full disclosure and to preempt anyone who attempts to claim that Veeam is insecure. Those individuals or companies demonstrate only ignorance and malicious intentions. The good news is that the article has all the information we need to write a new script.
  3. Veeam now supports PostgreSQL, in addition to Microsoft SQL Server.
    Finally, Veeam now also supports MySQL as a database, in addition to Microsoft SQL. That means we need to ensure that we can retrieve the necessary data from both database types.

Background Info & approach

I based the script on information found in the Veeam KB article “How to Recover Account Credentials From the Veeam Backup & Replication Database” (https://www.veeam.com/kb4349).

Instead of having two scripts, my old one and a newer one. I decided to create one that would work on VBR v12 and lower, as well as on VBR 12.1 and higher.

What Changed in Encryption

Until version 12, Veeam used its internal .NET static method:

[Veeam.Backup.Common.ProtectedStorage]::GetLocalString($encryptedPassword)

That method leverages the native Microsoft Data Protection API (DPAPI) under the hood. It was part of the Veeam.Backup.Common.dll and worked well up to version 12. In v12.1 and beyond, this method no longer exists. Instead, Veeam now leverages the native Microsoft Data Protection API (DPAPI), directly:

[System.Security.Cryptography.ProtectedData]::Unprotect($bytes, $salt, ‘LocalMachine’)

Since both leverage the native Microsoft Data Protection API, I figured I could also use the [System.Security.Cryptography.ProtectedData]::Unprotect static method to decrypt those legacy passwords as long as I don’t try to leverage the optionalEntropy parameter for them. The good news is that in the KB article, Veeam provides instructions on how to differentiate between the legacy and new types of password encryption. That allows me to write logic to determine the version and execute the corresponding decryption method accordingly.

By the way, once you update a password on v12.1 or up, it will be encrypted with the new method. As time passes, by rotating the passwords, legacy encryption phases out.

The new script

I did not want to maintain two separate scripts, one for the legacy password decryption method and one for the newer one. That’s why  I’ve consolidated everything into a single, unified PowerShell script. It supports:

  • Supports VBR v10 through v12.3+ and decrypts Veeam credentials from registry and database.
    • The Veeam Backup & Replication encryption salt in the registry lives here: Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication\Data.

    • The Veeam database info in the registry lives here:
      Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication\DatabaseConfigurations\
  • Per-user counters and clean output formatting
  • Supports MSSQL and PostgreSQL configurations
  • Handles multiple password formats:
    • ‘v12 and lower’
    • ‘v12.1 and up (with encryption salt)’
  • Optional filtering by username
  • Optional export to file (`Veeam_Credentials.txt` on Desktop)
  • Graceful error handling and informative console output

The script runs on Windows only, because DPAPI is a Windows-native feature. With VBR v13 introducing Linux-based deployments, this script won’t work in those environments. That’s a different challenge for another day.

Getting the script

You can find the script on GitHub at https://github.com/WorkingHardInIT/Revised-script-for-decrypting-datacenter-credentials-from-the-Veeam-Backup-Replication-database. You will also find the documentation there.

Why do I need this script?

The IT world, like everywhere else, is not a perfect place, and I need a way to deal with imperfection. It is that simple. If we are honest, we all know that  IT environments aren’t always in pristine condition. Whether it’s a lab, a forgotten backup server, or an entire backup fabric for a production environment abandoned by a previous IT partner, credentials are often missing. Documentation is sparse. And when disaster strikes, you need access, fast.

My script has already helped IT teams recover access to critical systems when no one else could. I know because I’ve seen it happen. Before Veeam ever published its KB article, my original script was quietly saving the day in real-world scenarios.

Conclusion

Knowledge is power. And while power inherently allows abuse, hiding knowledge under the guise of “security” is just security theater. Security through obscurity is not security but window dressing.

That’s why I’m glad Veeam documented their credential encryption methods. It empowers administrators to recover access responsibly. And it exposes the charlatans who twist transparency into baseless accusations of insecurity. I just felt compelled to create a handy, functional script around it that I can use when needed.

If someone uses this information to claim Veeam is irresponsible, they could not be more wrong. They prove themselves to be untrustworthy. To me, they’ve lost their reputation and credibility.

This script isn’t about hacking. It’s about recovery, accountability, and clarity. And if it helps you regain control of your environment when all else fails, then it’s done its job.

The rejuvenated push for excellence by Veeam for Hyper-V customers

Introduction

As an observer of the changes in the hypervisor market in 2024 and 2025, you have undoubtedly noted considerable commotion and dissent in the market. I did not have to deal with it as I adopted and specialized in Hyper-V from day one. Even better, I am pleased to see that many more people now have the opportunity to experience Hyper-V and appreciate its benefits.

While the UI management is not as sleek and is more fragmented than that of some competitors, it offers all the necessary features available for free. Additionally, PowerShell automation enables you to create any tooling you desire, tailored to your specific needs. Do that well, and you do not need System Center Virtual Machine Manager for added capabilities. Denying the technical capabilities and excellence of Hyper-V only diminishes the credibility and standing of those who do so in the community.

That has been my approach for many years, running mission-critical, real-time data-sensitive workloads on Hyper-V clusters. So yes, Microsoft could have managed the tooling experience a bit better, and that would have put them in an even better position to welcome converting customers. Despite that, adoption has been rising significantly over the last 18 months and not just in the SME market.

Commotion, fear, uncertainty, and doubt

The hypervisor world commotion has led to people looking at other hypervisors to support their business, either partially or wholesale. The moment you run workloads on a hypervisor, you must be able to protect, manage, move, and restore these workloads when the need to do so arises. Trust me, no matter how blessed you are, that moment comes to us all. The extent to which you can handle it, on a scale from minimal impact to severe impact, depends on the nature of the issue and your preparedness to address it.

Customers with a more diverse hypervisor landscape means that data protection vendors need to support those hypervisors. I think that most people will recognize that developing high-quality software, managing its lifecycle, and supporting it in the real world requires significant investment. So then comes the question, which ones to support? What percentage of customers will go for hypervisor x versus y or z? I leave that challenge to people like Anton Gostev and his team of experts. What I can say is that Hyper-V has taken a significant leap in adoption, as it is a mature and capable platform built and supported by Microsoft.

The second rise of Hyper-V

Over the past 18 months, I have observed a significant increase in the adoption of Hyper-V. And why not? It is a mature and capable platform built and supported by Microsoft. The latter makes moving to it a less stressful choice as the ecosystem and community are large and well-established. I believe that Hyper-V is one of the primary beneficiaries of the hypervisor turmoil. Adoption is experiencing a second, significant rise. For Veeam, this was not a problem. They have provided excellent Hyper-V support for a long time, and I have been a pleased customer, building some of the best and most performant backup fabrics on our chosen hardware.

But who are those customers adopting Hyper-V? Are they small and medium businesses (SME) or managed service providers? Or is Hyper-V making headway with big corporate enterprises as well? Well, neither Microsoft nor Veeam shares such data with me. So, what do I do? Weak to strong signal intelligence! I observe what companies are doing and what they are saying, in combination with what people ask me directly. That has me convinced that some larger corporations have made the move to Hyper-V. Some of the stronger signals came from Veeam.

Current and future Veeam Releases

Let’s look at the more recent releases of Veeam Backup & Replication. With version 12.3, support for Windows Server 2025 arrived very fast after the general availability of that OS. Hyper-V, by the way, is getting all the improvements and new capabilities for Hyper-V just as much as Azure Local. That indicates Microsoft’s interest in making Hyper-V an excellent option for any customer, regardless of how they choose to run it, be it on local storage, with shared storage, on Storage Spaces Direct (S2D), or Azure Local. That is a strong, positive signal compared to previous statements. Naturally, Hyper-V benefits from Veeam’s ongoing efforts to resolve issues, enhance features, and add capabilities, providing the best possible backup fabric for everyone. I will discuss that in later articles.

Now, the strong signal and very positive signal from Veeam regarding Hyper-V came with updates to Veeam Recovery Orchestrator. Firstly, Veeam Recovery Orchestrator 7.2 (released on February 18th, 2025) introduced support for Hyper-V environments. What does that tell me? The nature, size, and number of customers leveraging Hyper-V that need and are willing to pay for Veeam Recovery Orchestrator have grown to a point where Veeam is willing to invest in developing and supporting it. That is new! On the Product Update page, https://community.veeam.com/product-updates/veeam-recovery-orchestrator-7-2-9827, you can find more information. The one requirement that sticks out is the need for System Center Virtual Machine Manager. Look at these key considerations:

  • System Center Virtual Machine Manager (SCVMM) 2022 & CSV storage registered in SCVMM is supported.
  • Direct connections to Hyper-V hosts are not supported.

But not that much later, on July 9th, 2025,  in Veeam Recovery Orchestrator 7.2.1 (see https://community.veeam.com/product-updates/veeam-recovery-orchestrator-7-2-1-10876), we find these significant enhancements:

  1. Support for Azure Local recovery target: You can now use Azure Local as a recovery target for both vSphere and Hyper-V workloads, expanding flexibility and cloud recovery options.
  2. Hyper-V direct-connected cluster support: Extended Hyper-V functionality enables support for direct-connected clusters, eliminating the need for SCVMM. This move simplifies deployment and management for Hyper-V environments.
  3. MFA integration for VRO UI: Multi-Factor Authentication (MFA) can now be enabled to secure logins to the VRO user interface, providing enhanced security and compliance. Microsoft Authenticator and Google Authenticator apps are supported.

Especially 1 and 2 are essential, as they enable Veeam Recovery Orchestrator to support many more Hyper-V customers. Again, this is a strong signal that Hyper-V is making inroads. Enough so for Veeam to invest. Ironically, we have Broadcom to thank for this. Which is why in November 2024, I nominated Broadcom as the clear and unchallenged winner of the “Top Hyper-V Seller Award 2024” (https://www.linkedin.com/posts/didiervanhoye_broadcom-mvpbuzz-hyperv-activity-7257391073910566912-bTTF/)

Conclusion

Hyper-V and Veeam are a potent combination that continues to evolve as market demands change. Twelve years ago, I was testing out Veeam Backup & Replication, and 6 months later, I became a Veeam customer. I am still convinced that for my needs and those of the environments I support, I have made a great choice.

The longevity of the technology, which evolves in response to customer and security needs, is a key factor in determining great technology choices. In that respect, Hyper-V and Veeam have performed exceptionally well, striking multiple bullseye shots without missing a beat. And missing out on the hypervisor drama, we have hit the bullseye once more!

The Perfect Storm of Azure DNS resolver, a custom DNS resolver, and DNS configuration ambiguities

TL:DR

The very strict Azure recursive DNS resolver, when combined with a Custom DNS resolver, can cause a timeout-sensitive application to experience service disruption due to ambiguities in third-party DNS NS delegation configurations.

Disclaimer

I am using fantasy FQDNs and made-up IP addresses here. Not the real ones involved in the issue.

Introduction

Services offered by a GIS-driven business noticed a timeout issue. Upon investigation, this was believed to be a DNS issue. That was indeed the case, but not due to a network or DNS infrastructure error, let alone a gross misconfiguration.

The Azure platform DNS resolver (168.63.129.16) is a high-speed and very strict resolver. While it can return the IP information, it does indicate a server error.

nslookup pubdata.coast.be

Server:         127.0.0.11

Address:        127.0.0.11#53

Non-authoritative answer:

pubdata.coast.be canonical name = www.coast.be.

Name:   www.coast.be

Address: 154.152.150.211

Name:   www.coast.be

Address: 185.183.181.211

** server can’t find www.coast.be: SERVFAIL

Azure handles this by responding fast and reporting the issue. The Custom DNS service, which provides DNS name resolution for the service by forwarding recursive queries to the Azure DNS resolver, also reports the same problem. However, it does not do this as fast as Azure. Here, it takes 8 seconds (Recursive Query Timeout value), potentially 4 seconds longer due to the additional timeout value. So, while DNS works, something is wrong, and the extra time before the timeout occurs causes service issues.

When first asked to help out, my first questions were if it had ever worked and if anything had changed. The next question was whether they had any control over the time-out period to adjust it upward, which would enable the service to function correctly. The latter was not possible or easy, so they came to me for troubleshooting and a potential workaround or fix.

So I dove in with the tools of the trade. nslookup, Nameresolver, Dig, https://dnssec-analyzer.verisignlabs.com/, and https://dnsviz.net/. The usual suspects were DNSSEC and zone delegation mismatches.

First, I run:

Nslookup -debug pubdata.coast.be

In the output, we find:

Non-authoritative answer:

Name:    www.coast.be

Addresses:  154.152.150.211

          185.183.181.211

Aliases:  pubdata.coast.be

We learn that pubdata.coast.be is a CNAME for www.coast.be. Letโ€™s see if any CNAME delegation or DNSSEC issues are in play. Run:

dig +trace pubdata.coast.be

;; global options: +cmd

.                       510069  IN      NS      a.root-servers.net.

.                       510069  IN      NS      b.root-servers.net.

..

.

.                       510069  IN      NS      l.root-servers.net.

.                       510069  IN      NS      m.root-servers.net.

.                       510069  IN      RRSIG   NS 8 0 518400 20250807170000 20250725160000 46441 . <RRSIG_DATA_ANONYMIZED>

;; Received 525 bytes from 1.1.1.1#53(1.1.1.1) in 11 ms

be.                     172800  IN      NS      d.nsset.be.

..

.

be.                     172800  IN      NS      y.nsset.be.

be.                     86400   IN      DS      52756 8 2 <DS_HASH_ANONYMIZED>

be.                     86400   IN      RRSIG   DS 8 1 86400 20250808050000 20250726040000 46441 . <RRSIG_DATA_ANONYMIZED>

;; Received 753 bytes from 198.41.0.4#53(a.root-servers.net) in 13 ms

coast.be.           86400   IN      NS      ns1.corpinfra.be.

coast.be.           86400   IN      NS      ns2.corpinfra.be.

<hash1>.be.             600     IN      NSEC3   1 1 0 – <next-hash1> NS SOA RRSIG DNSKEY NSEC3PARAM

<hash1>.be.             600     IN      RRSIG   NSEC3 8 2 600 20250813002955 20250722120003 62188 be. <RRSIG_DATA_ANONYMIZED>

<hash2>.be.             600     IN      NSEC3   1 1 0 – <next-hash2> NS DS RRSIG

<hash2>.be.             600     IN      RRSIG   NSEC3 8 2 600 20250816062813 20250724154732 62188 be. <RRSIG_DATA_ANONYMIZED>

;; Received 610 bytes from 194.0.37.1#53(b.nsset.be) in 10 ms

pubdata.coast.be. 3600   IN      CNAME   www.coast.be.

www.coast.be.       3600    IN      NS      dns-lb1.corpinfra.be.

www.coast.be.       3600    IN      NS      dns-lb2.corpinfra.be.

;; Received 151 bytes from 185.183.181.135#53(ns1.corpinfra.be) in 12 ms

The DNSSEC configuration is not the issue, as the signatures and DS records appear to be correct. So, the delegation inconsistency is what causes the SERVFAIL, and the duration of the custom DNS servers’ recursive query timeout causes the service issues.

The real trouble is here:

pubdata.coast.be. 3600 IN CNAME www.coast.be

www.coast.be.        3600 IN NS dns-lb1.corpinfra.be.

This means pubdata.coast.be is a CNAME to www.coast.be. But www.coast.be is served by different nameservers than the parent zone (coast.be uses ns1/ns2.corpinfra.be). This creates a delegation inconsistency:

The resolver must follow the CNAME and query a different set of nameservers. If those nameservers donโ€™t respond authoritatively or quickly enough, or if glue records are missing, resolution may fail.

Strict resolvers (such as Azure DNS) may treat this as a lame delegation or a broken chain, even if DNSSEC is technically valid.

Workarounds

I have already mentioned that fixing the issue in the service configuration setting was not on the table, so what else do we have to work with?

  • A quick workaround is to use the Azure platform DNS resolver (168.63.129.16) directly, which, due to its speed, avoids the additional time required for finalizing the query. However, due to DNS requirements, this workaround is not always an option.
  • The other one is to reduce the recursive query timeouts and additional timeout values on the custom DNS solution. This is what we did. The timeout value is now 2 (default is 8), and the additional timeout value is now 2 (default is 4). That is what I did to resolve the issue as soon as possible. Monitor this to ensure that no other problems arise after taking this action.
  • Third, we could conditionally forward coast.be to the dns-lb1.corpinfra.be and dns-lb2.corpinfra.be NS servers. That works, but it requires maintenance when those name servers change, so we need to keep an eye on that. We already have enough work.
  • A fourth workaround is to provide an IP address from a custom DNS query in the source code to a public DNS server, such as 1.1.1.1 or 8.8.8.8, when accessing the pubdata.coast.be FQDN is involved. This is tedious and not desirable.
  • The most elegant solution would be to address the DNS configuration Azure has an issue with. That is out of our hands, but it can be requested from the responsible parties. For that purpose, you will find the details of our findings.

Issue Summary

The .be zone delegates coast.be to the NS servers:

dns-lb1.corpinfra.be

dns-lb2.corpinfra.be

However, the coast.be zone itself lists different NS servers:

ns1.corpinfra.be

ns2.corpinfra.be

This discrepancy between the delegation NS records in .be and the authoritative NS records inside the coast.be zone is a violation of DNS consistency rules.

Some DNS resolvers, especially those performing strict DNSSEC and delegation consistency checks, such as Azure Native DNS resolver, interpret this as a misconfiguration and return SERVFAIL errors. This happens even when the IP address(es) for pubdata.coast.be can indeed be resolved.

Other resolvers (e.g., Google Public DNS, Cloudflare) may be more tolerant and return valid answers despite the mismatch, without mentioning any issue.

Why could this be a problem?

DNS relies on consistent delegation to ensure:

  • Security
  • Data integrity
  • Reliable resolution

When delegation NS records and authoritative NS records differ, recursive resolvers become uncertain about the actual authoritative servers.

This uncertainty often triggers a SERVFAIL to avoid possibly returning stale or malicious data. When NS records differ between parent and child zones, resolvers may reject responses to prevent the use of stale or spoofed data.

Overview

Zone LevelNS RecordsNotes
.be (parent)dns-lb1.corpinfra.be, dns-lb2.corpinfra.beDelegation NS for coast.be
coast.bens1.corpinfra.be, ns2.corpinfra.beAuthoritative NS for zone

Corpinfra.be (see https://www.dnsbelgium.be/nl/whois/info/corpinfra.be/details) – this is an example, the domain is fictitious – operates all four NS servers that resolve to IPs in the same subnet, but the naming inconsistency causes delegation mismatches.

Recommended Fixes

Option 1: Update coast.be zone NS records to match the delegation NS

Add dns-lb1.corpinfra.be and dns-lb2.corpinfra.be as NS records in the coast.be zone alongside existing ones (ns1 and ns2), so the zone’s NS RRset matches the delegation.

coast.be.   IN  NS  ns1.corpinfra.be.

coast.be.   IN  NS  ns2.corpinfra.be.

coast.be.   IN  NS  dns-lb1.corpinfra.be.

coast.be.   IN  NS  dns-lb2.corpinfra.be.

Option 2: Update .be zone delegation NS records to match the zone’s NS records

Change the delegation NS records in .be zone to use only:

ns1.corpinfra.be

ns2.corpinfra.be

remove dns-lb1.corpinfra.be and dns-lb2.corpinfra.be

Option 3: Align both the .be zone delegation and coast.be NS records to a consistent unified set

Either only use ns1.corpinfra.be abd ns2.corpinfra.be for both the delegation and authoritative zone NS records, or only use dns-lb1.corpinfra.be and dns-lb2.corpinfra.be for both. Or use all of them; three or more geographically dispersed DNS servers are recommended anyway. Depends on who owns and manages the zone.

What to choose?

OptionDescriptionProsCons
1Add dns-lb1 and dns-lb2 to the zone fileQuick fix, minimal disruptionMaybe the zones are managed by <> entities
2Update .be delegation to match zone NS (ns1, ns2)Clean and consistentRequires coordination with DNS Belgium
3Unify both delegation and zone NS recordsMost elegantRequires a full agreement between all parties

All three options are valid, but Option 3 is the most elegant and future-proof. That said, this is a valid configuration as is, and one might argue that Azureโ€™s DNS resolverโ€™s strictness is the cause of the issue. Sure, but in a world where DNSSEC is growing in importance, such strictness might become more common? Additionally, if the service configuration could handle a longer timeout, that would also address this issue. However, that is outside my area of responsibility.

Simulation: Resolver Behavior

ResolverBehavior with MismatchNotes
Azure DNS resolverSERVFAILStrict DNSSEC & delegation checks
Google Public DNSResolves normallyTolerant of NS mismatches
Cloudflare DNSResolves normallyIgnores delegation inconsistencies
Unbound (default)May varyDepends on configuration flags
Bind (strict mode)SERVFAILEnforces delegation consistency

Notes

  • No glue records are needed for coast.be, because the NS records point to a different domain (corpinfra.be), so-called out-of-bailiwick name servers, and .be correctly delegates using standard NS records.
  • After changes, flush DNS caches

Conclusion

When wading through the RFC we can summarize the findings as below

RFC Summary: Parent vs. Child NS Record Consistency

RFCSectionPosition on NS MatchingKey Takeaway
RFC 1034ยง4.2.2No mandate on matchingDescribes resolver traversal and authoritative zones, not strict delegation consistency
RFC 1034ยง6.1 & ยง6.2No strict matching ruleDiscusses glue records and zone cuts, but doesnโ€™t say they must be aligned
RFC 2181ยง5.4.1Explicit: child may differParentsโ€™ NS records are not authoritative for the child; the child can define its own set.
RFC 4035ยง2.3DNSSEC implicationsMismatched NS sets can cause issues with DNSSEC validation if not carefully managed.
RFC 7719GlossaryReinforces delegation logicClarifies that delegation does not imply complete control or authority over the child zone

In a nutshell, RFC 2181 Section 5.4.1 is explicit: the NS records in a parent zone are authoritative only for that parent, not for the child. That means the child zone can legally publish entirely different NS records, and the RFC allows it. So, why is there an issue with some DNS resolvers, such as Azure?

Azure DNS “Soft” Enforces Parent-Child NS Matching

Azure DNS resolvers implement strict DNS validation behavior, which aligns with principles of security, reliability, and operational best practice, not just the letter of the RFC. This is a soft enforcement; the name resolution does not fail.

Why

1. Defense Against Misconfigurations and Spoofing

Mismatched NS records can indicate stale or hijacked delegations.

Azure treats mismatches as potential risks, especially in DNSSEC-enabled zones, and returns SERVFAIL to warn about potential spoofed responses, but does not fail the name resolution.

2. DNSSEC Integrity

DNSSEC depends on a trusted chain of delegation.

If the parent refers to NS records that donโ€™t align with the signed child zone, validation canโ€™t proceed.

Azure prioritizes integrity over leniency, which is why there is stricter enforcement.

3. Predictable Behavior for Enterprise Networks

In large infrastructures (like hybrid networks or private resolvers), predictable resolution is critical.

Azureโ€™s strict policy ensures that DNS resolution failures are intentional and traceable, not silent or inconsistent like in looser implementations.

4. Internal Resolver Design

Azure resolvers often rely on cached referral points.

When those referrals donโ€™t match authoritative data at the zone apex, Azure assumes the delegation is unreliable or misconfigured and aborts resolution.

Post Mortem summary

Azure DNS resolvers enforce delegation consistency by returning a SERVFAIL error when parent-child NS records mismatch, thereby signaling resolution failure rather than silently continuing or aborting. While RFC 2181 ยง5.4.1 allows child zones to publish different NS sets than the parent, Azure chooses to explicitly flag inconsistencies to uphold DNSSEC integrity and minimize misconfiguration risks. This deliberate error response enhances reliability in enterprise environments, ensuring resolution failures are visible, traceable, and consistent with secure design principles.

This was a perfect storm. A too-tight timeout setting in the service (which I do not control), combined with the Azure DNS resolvers’ rigorous behavior, which is fronted by a custom DNS solution required to serve all possible DNS needs in the environment, results in longer times for recursive DNS resolution that finally tripped up the calling service.