BitLooker In Veeam Backup and Replication v9

When your backup size is bigger than the amount of disk space used in the virtual machine you might wonder why that is. Well it’s deleted data who’s blocks have not been released for reuse by the OS yet. BitLooker in Veeam Backup and Replication v9 as announced at VeeamOn 2015 offers a solution for this situation. BitLooker analyses the NFTS MFT to identify deleted data. It uses this information to reduce the size of an imaged based backup file and helps reduce bandwidth needed for replication. It just makes sense!

I really like these additions that help out to optimize the consumption of backup storage. Now I immediately wondered f this would make any difference on the recent versions of Hyper-V that support UNMAP. Well, probably not. My take on this is that the Hyper-V virtual Machine is aware of the deleted blocks via UNMAP this way so they will not get backed up. This is one of the examples of the excellent storage optimization capabilities of Hyper-V.


It’s a great new addition to Veeam Backup & Replication v9. Especially when you’re running legacy hypervisors like like Windows 2 or older, or VMware. When you’ve been rocking Windows Server 212 R2 for the last three years Hyper-V already had your back with truly excellent UNMAP support in the virtual layer.

Production Checkpoints in Windows Server 2016

We’ve  had snapshots, or better checkpoints as we call them now for consistency amongst products, for a longest time in Hyper-V. I have always liked and used them to my benefit. That’s what they are intended for. But you have to use them correctly, in a supported and smart manner. Some (or perhaps not an insignificant number of) people did not read the manual and/or do not test their assumptions before trying something in production. Some times that leads to a lesson, sometimes it leads to tears.

We now have the choice between two type of checkpoints: Production Checkpoints and Standard Checkpoints.

A standard virtual machine checkpoint all the memory state of running applications get stored and when you apply the checkpoint it’s back magically. Doing this to a production SQL or Exchange Server for example causes (huge) problems. With some applications these problems are minor or transient but it’s not a healthy consistent state to be in, and recovery has to happen. Which could  happen automatically or require disaster recovery depending on the situation at hand.

Production checkpoints are made in application consistent manner. For this the leverage Volume Shadow Copy Services (or File System Freeze on Linux) which puts the virtual machines into a safe state to create a checkpoint that can be restored like a VSS based, application consistent backup or SAN snapshot. This does mean that applying a production checkpoint requires the restored virtual machine to boot from an off line state, just like with a restored backup.

The choice for what type of checkpoint can be made on a per virtual machine basis which make it’s flexible as you can pick the best option for a particular virtual machine for a specific purpose. As you might have guessed, that still requires some insight, reading the manual and testing your assumptions. But you now can have the behavior you want and way to many assumed to have.


We also have the option of allowing or disallowing for a standard checkpoint to me made if for any reason (which results in the VSS snapshot in Windows or file systems freeze in Linux  in the guest might not work or are not available) a production checkpoint cannot be made. Here’s the table of what type of checkpoint can be used when from MSDN. I conclude that the chosen default is the best fitting one for most scenarios.


You also have the option of choosing the standard checkpoints for a virtual machine. That gives you the exact behavior as with all previous versions of Hyper-V.

I love the GUI for ad hoc work but when I need to do this on dozens or hundreds of virtual machines or potentially tens of thousands when running a larger private cloud this is not the way to go. PowerShell is you long time trusted friend here!


As you can see just the “-CheckpointType” parameter you control the check pointing behavior. And as it is very easy to grab all virtual machines on a host or cluster setting this for all or a selection of your virtual machines is easy and fast. Let’s set it to “ProductionOnly” and grab the setting for that VM via PowerShell


When you create a checkpoint of a Windows Server 2016 Hyper-V host you’ll even get a nice notification (you can turn it off) by default that the Production checkpoint used backup technology in the guest operating system.


It’s also important to realize that this capability is the basis of the new checkpoint based way of making backups in Windows Server 2016 as well. But that’s a subject for another blog post. Thank you for reading!

High performance live migration done right means using SMB Direct

I  saw people team two 10GBps NICs for live migration and use TCP/IP. They leveraged LACP for this as per my blog Teamed NIC Live Migrations Between Two Hosts In Windows Server 2012 Do Use All Members . That was a nice post but not a commercial to use it. It was to prove a point that LACP/Static switch dependent teaming did allow for multiple VMs to be live migrated in the same direction between two node. But for speed, max throughput & low CPU usage teaming is not the way to go. This is not needed as you can achieve bandwidth aggregation and redundancy with SMB via Multichannel. This doesn’t require any LACP configuration at all and allows for switch independent aggregation and redundancy. Which is great, as it avoids stacking with switches that don’t do  VLT, MLAG,  …

Even when your team your NICs your better off using SMB. The bandwidth aggregation is often better. But again, you can have that without LACP NIC teaming so why bother? Perhaps one reason, with LACP failover is faster, but that’s of no big concern with live migration.

We’ll do some simple examples to show you why these choices matter. We’ll also demonstrate the importance of an optimize RSS configuration. Do not that the configuration we use here is not a production environment, it’s just a demo to show case results.

But there is yet another benefit to SMB.  SMB Direct.  That provides for maximum throughput, low latency and low CPU usage.

LACP NIC TEAM with 2*10Gbps with TCP

With RSS setting on the inbox default we have problems reaching the best possible throughput (17Gbps). But that’s not all. Look at the CPU at the time of live migration. As you can see it’s pretty taxing on the system at 22%.


If we optimize RSS with 8 RSS queues assigned to 8 physical cores per NIC on a different CPU (dual socket, 8 core system) we sometimes get better CPU overhead at +/- 12% but the throughput does not improve much and it’s not very consistent. It can get worse and look more like the above.


LACP NIC TEAM with 2*10Gbps with SMB (Multichannel)

With the default RSS Settings we still have problems reaching the best possible throughput but it’s better (19Gbps). CPU wise, it’s pretty taxing on the system at 24%.


If we optimize RSS with 8 RSS queues assigned to 8 physical cores per NIC on a different CPU (dual socket, 8 core system) we get better over CPU overhead at +/- 8% but the throughput actually declined (17.5 %). When we run the test again we were back to the results we saw with default RSS settings.


Is there any value in using SMB over TCP with LACP for live migration?

Yes there is. Below you see two VMs live migrate, RSS is optimized. One core per VM is used and the throughput isn’t great, is it. Depending on the speed of your CPU you get at best 4.5 to 5Gbps throughput per VM as that 1 core per VM is the limiting factor. Hence see about 9Gbps here, as there’s 2 VMs, each leveraging 1 core.


Now look at only one VM with RSS is optimized with SMB over an LACP NIC team. Even 1 large memory VM leverages 8 cores and achieves 19Gbps.


What about Switch Independent Teaming?

Ah well that consumes a lot less CPU cycles but it comes at the price of speed. It has less CPU overhead to deal with in regards to LACP. It can only receive on one team member. The good news is that even a single VM can achieve 10Gbps (better than LACP) at lower CPU overhead. With SMB you get better CPU distribution results but as the one member is a bottle neck, not faster. But … why bother when we have …better options!? Read on Smile!

No Teaming – 2*10Gbps with SMB Multichannel, RSS Optimized

We are reaching very good throughput but it’s better (20Gbps) with 8 RSS queues assigned to 8 physical cores. The CPU at the time of live migration is pretty good at 6%-7%.


Important: This is what you want to use if you don’t have 10Gbps but you do have 4* 1Gbps NICs for live migration. You can test with compression and LACP teaming if you want/can to see if you get better results. Your mirage may vary Smile. If you have only one 1Gbps NIC => Compression is your sole & only savior.

2*10Gbps with SMB Direct

We’re using perfmon here to see the used bandwidth as RDMA traffic does not show up in Task Manager.


We have no problems reaching the best possible throughput but it’s better (20Gbps, line speed). But now look at the CPU during live migration. How do you like them numbers?

Do not buy non RDMA capable NICs or Switches without DCB support!

These are real numbers, the only thing is that the type and quality of the NICs, firmware and drivers used also play a role an can skew the results a bit. The onboard LOM run of the mill NICs aren’t always the best choice. Do note that configuration matters as you have seen. But SMB Direct eats them all for breakfast, no matter what.

Convinced yet? People, one of my core highly valuable skillsets is getting commodity hardware to perform and I tend to give solid advice. You can read all my tips for fast live migrations here in Live Migration Speed Check List – Take It Easy To Speed It Up

Does all of this matter to you? I say yes , it does. It depends on your environment and usage patterns. Maybe you’re totally over provisioned and run only very small workloads in your virtual machines. But it’s save to say that if you want to use your hardware to its full potential under most circumstances you really want to leverage SMB Direct for live migrations. What about that Hyper-V cluster with compute and storage heavy applications, what about SQL Server virtualization? Would you not like to see this picture with SMB RDMA? The Mellanox  RDMA cards are very good value for money. Great 10Gbps switches that support DCB (for PFC/ETS) can be bought a decent prices. You’re missing out and potentially making a huge mistake not leveraging SMB Direct for live migrations and many other workloads. Invest and design your solutions wisely!

Microsoft & Bromium Make Windows 10 Most Secure Endpoint Available

There was some very interesting news last week at the Microsoft World Partner Conference (WPC). Bromium and Microsoft announced a strategic partnership, Microsoft is now endorsing Bromium micro-virtualization and is aligning with Bromium in adopting a security architecture based on isolating critical information on the endpoint in Windows 10. The combination of Bromium and Windows 10 results in the most secure PC available today. You can read all about it here Bromium Partners to Bring Micro-virtualization to Windows 10

Bromium has been around for a while and I have always like the concept. Instead of trying to aim for a 100 percent secure system they acknowledge this is impossible. This means they realize that systems will get malware, zero day exploits, etc. Trying to provide complete protection is impossible. Try and you will fail. This means that we can play with a popular saying and state that “failure is not It’s a certainty”.

Just like any secured system, like a ship for example, the idea is to accept that there will be unavoidable breaches. To mitigate the risk you need to minimize the impact of these breaches. That’s what the water tight doors, the compartmentalization and isolation in ships are for. Banking on a 100 % success rate in avoiding breaches is just unrealistic. Bromium uses this same concept.

When breached It will limit the damage to as small and isolated environment. A temporary environment for that matter, something ships can’t do. Bromium runs every process on the machine in a hardware isolated micro VM, which is based on hardware virtualization technology (minimally VT-x or AMD-V).


Figure courtesy of Bromium

This goes pretty far. Not the internet browser level or e-mail client but every tab and every e-mail you open is isolated this way. If your browser tab gets compromised by a zero day exploit the infection and damage is limited to that browser tab. Or your e-mail message or you word document. All your other documents, browser tabs and word documents are protected. You get the idea. Even better when you close that word document or browser tab, the isolated micro VM in which it existed disappears together with the infection.

Figure courtesy of Bromium

This fits in well with Microsoft its own initiatives. Windows 10 leverages hardware security features such as UEFI secure boot, a Trusted Platform Module (TPM) and virtualization to provide a more secure computing environment already. Windows Server 2016 leverages the combination of hard ware technologies and the hypervisor to create a “Virtual Secure Mode” (VSM) to deliver shielded virtual machines.

While nothing is perfect it is an interesting approach as it protects against the unknown, isolates, minimizes impact and discards malware infections. It buys time to react and respond more long term to threats once they’re known while providing protection even when still unknown. Whereas anti malware only protects against known threats and is very reactive in nature.

Read more here and have a look here How does Bromium protect you?