BitLooker In Veeam Backup and Replication v9

When your backup size is bigger than the amount of disk space used in the virtual machine you might wonder why that is. Well it’s deleted data who’s blocks have not been released for reuse by the OS yet. BitLooker in Veeam Backup and Replication v9 as announced at VeeamOn 2015 offers a solution for this situation. BitLooker analyses the NFTS MFT to identify deleted data. It uses this information to reduce the size of an imaged based backup file and helps reduce bandwidth needed for replication. It just makes sense!

BitLooker In Veeam Backup and Replication v9

I really like these additions that help out to optimize the consumption of backup storage. Now I immediately wondered f this would make any difference on the recent versions of Hyper-V that support UNMAP. Well, probably not. My take on this is that the Hyper-V virtual Machine is aware of the deleted blocks via UNMAP this way so they will not get backed up. This is one of the examples of the excellent storage optimization capabilities of Hyper-V.

UNMAP

It’s a great new addition to Veeam Backup & Replication v9. Especially when you’re running legacy hypervisors like like Windows Server 2008 R2 or older, or (at the time of writing) VMware. When you’ve been rocking Windows Server 212 R2 for the last three years Hyper-V already had your back with truly excellent UNMAP support in the virtual layer.

Is the cloud failing or are you?

The cloud is not failing. That’s the good news. Now for the bad.

Many people complain about the mess their cloud usage has become and how cloud sales people did not tell them to read the small print. As a business, whether for profit or a non profit you need people in charge with a reasonably amount of intelligence and a drive to push the organization forward, not just themselves.  You can not take the easy way out, pocket your pay check and let the “details and annoying technicalities” to your employees. Basically you’re saying “screw you” to them so don’t be surprised when that works both ways. If your cloud projects are failing is due to the same reason your other IT projects were failing. You’re doing it wrong.

In a world of political correctness, this is going to sound harsh. But that’s not the problem. The problem is that you as a business, a manager, a “leader” are failing. You are failing and you’re incapable of dealing with that fact. Because it hurts your sensitivities. Well you are hurting your employees, your customers, your future.

Way to many cloud (private/hybrid/public) projects are done as “self service” or minimal effort projects. There is no design. There is no expertise, experience, knowledge, context or a deeper understanding of the systems, their interactions, capabilities and needs. In this commodity world it just has to work. Nothing just works. Deal with it. If you don’t put value on the above that’s how things end up.

Cloud project in many environments look way too much like a classic house where they bolted on new fashioned extensions without a clue about how to do what they were doing. By doing so they ruined the roof, the wiring, the isolation, the functionality and livability. It’s leaking, it’s rotting the house and fungi rule the realm.

You did not get what you paid for but you get exactly what you value: nothing.

It’s not that you don’t spend ridiculous amounts of money. You outsourced all your in house capabilities and expertise and on top of that you’re are paying 3 to 5 times too much for services and “consultants” that have been on your payroll for decade. You don’t even even have the capabilities in house to realize the above anymore. If you do they probably have gone into hiding. You buy over priced shit on a daily basis and are told it’s great and what the industries best practices dictate.

The fallacy that IT, which is the cloud and nothing but the cloud for many today, is nothing but a commodity that has to work out of the box at the cheapest possible price is making you fail. But how could that be?  After all it’s just computers in the cloud so you don’t even have to hook up the power and a cable any more. No? These almost absurd simplifications that are in play here are totally pushing aside knowledge, experience, skills, a continuous educational effort. The end result, excellent service to your business and / or customers, dies a thousand small deaths in collateral damage.

You’re deploying cloud solutions without planning, coordination, design, governance, responsibilities, skills and what not. You’ve lost control over your (cloud)  IT. You’ve lost control over the data, the access, the backups, disaster recovery, the accounts of the service subscription, everything. These are the essential parts of a functional, maintainable, cost effective and supportable IT environment. This will bite you hard, deep and will perhaps bleed you to death.

This is not the cloud failure. It’s you. If you go about “old school” on premises IT the same way the failures are there as well. So you hate the solutions you pay way too much for, you hate the lousy service and the lack results. You get shafted every day.

The easy fix you come up with is just more of the same. More consulting, more work and responsibility avoiding, more meetings, task forces, more multi year over sized super projects that are doomed to fail because there a more than enough people to take your money form idiots.

How is this possible? Because I way too many places criticism has been banned and died. Meanwhile in that political correct always peaceful and quiet environment real damage is done to people as talent, motivation, money and value is destroyed along with a better future. No one in those places has any skin in the game as you risk more by doing your job than by watching the place go to hell. Good luck!

To any one else: there are real experts out there that can really help you. All you have to do is value results, your business and your clients.

Musings On Switch Embedded Teaming, SMB Direct and QoS in Windows Server 2016 Hyper-V

When you have been reading up on what’s new in Windows Server 2016 Hyper-V networking you probably read about Switch Embedded Teaming (SET). Basically this takes the concept of teaming and has this done by the vSwitch. Which means you don’t have to team at the host level. The big benefit that this opens up is the RDMA can be leveraged on vNICs. With host based teaming the RDMA capabilities of your NICs are no longer exposed, i.e. you can’t leverage RDMA. Now this has become possible and that’s pretty big.

clip_image001

With the rise of 10, 25, 40, 50 and 100 Gbps NICs and switches the lure to go fully converged becomes even louder. Given the fact that we now don’t lose RDMA capabilities to the vNICs exposed to the host that call sounds only louder to many.  But wait, there’s even more to lure us to a fully converged solution, the fact that we now do no longer lose RSS on those vNICs! All good news.

I have written an entire whitepaper on convergence and it benefits, drawback, risks & rewards. I will not repeat all that here. One point I need to make that lossless traffic and QoS are paramount to the success of fully converged networking. After all we don’t want lossy storage traffic and we need to assure adequate bandwidth for all our types of traffic. For now, in Technical Preview 3 we have support for Software Defined Networking (SDN) QoS.

What does that mean in regards to what we already use today? There is no support for native QoS  and vSwitch QoS in Windows Server 2016 TPv3. There is however the  mention of DCB (PFC/ETS ), which is hardware QoS in the TechNet docs on Remote Direct Memory Access (RDMA) and Switch Embedded Teaming (SET). Cool!

But wait a minute. When we look at all kinds of traffic in a converged Hyper-V environment we see CSV (storage traffic), live migration (all variations), backups over SMB3 all potentially leveraging SMB Direct. Due to the features and capabilities in SMB3 I like that. Don’t get me wrong about that. But it also worries me a bit when it comes to handling QoS on the hardware side of things.

In DCB Priority Flow Control (PFC) is the lossless part, Enhanced Transmission Selection (ETS) is the minimum bandwidth QoS part. But how do we leverage ETS when all types of traffic use SMB Direct. On the host it all gets tagged with the same priority. ETS works by tagging different priorities to different workloads and assuring minimal bandwidths out of a total of 100% without reserving it for a workload if it doesn’t need it. Here’s a blog post on ETS with a demo video DCB ETS Demo with SMB Direct over RoCE (RDMA .

Does this mean a SDN QoS only approach to deal with the various type of SMB Direct traffic or do they have some aces up their sleeves?

This isn’t a new “concern” I have but with SET and the sustained push for convergence it does has the potential to become an issue. We already have the SMB bandwidth limitation feature for live migration. That what is used to prevent LM starving CSV traffic when needed. See Preventing Live Migration Over SMB Starving CSV Traffic in Windows Server 2012 R2 with Set-SmbBandwidthLimit.

Now in real life I have rarely, if ever, seen a hard need for this. But it’s there to make sure you have something when needed. It hasn’t caused me issues yet, but I’m a performance & scale first, in “a non-economies of scale” world compared to hosters. As such convergence is a tool I use with moderation. My testing when traffic competes without ETS is that they all get part of the cake but not super predictable/ consistent. SMB bandwidth limitation is a bit of a “bolted on” solution => you can see the perf counters push down the bandwidth in an epic struggle to contain it, but as said it’s a struggle, not a nice flat line.

Also Set-SmbBandwidthLimit is not a percentage, but hard max bandwidth limit, so when you lose a SET member the math is off and you could be in trouble fast. Perhaps it’s these categories that could or will be used but it doesn’t seem like the most elegant solution/approach. That with ever more traffic leveraging SMB Direct make me ever more curious. Some switches offer up to 4 lossless queues now so perhaps that’s the way to go leveraging more priorities … Interesting stuff! My preferred and easiest QoS tool, get even bigger pipes, is an approach convergence and evolution of network needs keeps pushing over. Anyway, I’ll be very interested to see how this is dealt with. For now I’ll conclude my musings On Switch Embedded Teaming, SMB Direct and QoS in Windows Server 2016 Hyper-V

Remove Lingering Backup Checkpoints from a Hyper-V Virtual Machine

Sometimes things go wrong with a backup job. There are many reasons for this, but that’s not the focus of this blog post. We’re going to show you how to remove lingering Backup checkpoints from a Hyper-V Virtual Machine that where not properly removed after a backup on Windows Server 2012 R2 Hyper-V.

You see checkpoint with a “odd” icon that looks like this

image

You can right click on the checkpoint but you will not find an option to apply or delete it.This is a recovery snapshot and you’re not supposed to manipulate this manually. It’s there as part of the backup process. You can read more about that in my blog posts Some Insights Into How Windows 2012 R2 Hyper-V Backups Work and What Is AutoRecovery.avhdx all about?.

When we look ad the files we see the traces of a Windows Server 2012R2 Hyper-V backup

image

Some people turn to manually merging these checkpoints. I have discussed this process in blog posts 3 Ways To Deal With Lingering Hyper-V Checkpoints Formerly Known as Snapshots and Manually Merging Hyper-V Checkpoints. But you don’t have to do this! As a matter of fact you should avoid this if possible at all. The good news is that this can be done via PowerShell.

I’m not convinced the fact that you can’t do it in the GUI is to be considered a bug as some would suggest. The fact is you’re not supposed to have to do this, so the functionality is not there. I guess this is to protect people from trying to delete one by mistake when they see one during backups.

Anyway, PowerShell to the rescue!

So we run:

Get-VMSnapshot -ComputerName "MyHyperVHost" -VMName "VMWithLingeringBackupCheckpoint"

This show us the checkpoint information (note it does not show you the XXX-AutoRecovery.avhdx” as a checkpoint.

image

And then we simply run Remove-VMSnapshot

#Just remove the lingerging backup checkpoint like you would a normal one via PoSh 
Get-VMSnapshot -ComputerName "MyHyperVHost" -VMName "VMWithLingeringBackupCheckpoint" | Remove-VMSnapshot

You can see the merging process in the GUI:

image

If you inspect the remaining MyVM-AutoRecovery.avhdx file you’ll see that it points to the parent vhdx.Normally your VM is now already running from the vhdx anyway. If not you’ll need to deal with it.

image

During a normal backup that file is deleted when the merge of the redirect AVHDX is done. You’ll need to do that manually here and be done with it.

Conclusion: there is no need to dive in and start merging the lingering backup checkpoints manually. Leave that for those scenarios where that’s the only option you have left.