NVM Express over Fabrics

Any technologist who’s read, let alone used NVM Express (NVMe), is pretty enthusiastic about it’s capabilities and if it was not for availability and financial restrictions we’d all have at least a couple in our home systems and labs. Let Windows Server 2016 nested virtualization come, woehoe!

Intel-P3700-PCB-Angled-Top

It seems to succeed very well in making sure the host can keep up with the performance (low latencies, high throughput) delivered by SSD drives, better than our current interfaces. Today you can drop them into your workstation or server and get going. They’ll give your home lab stellar storage performance today and Microsoft publicly talked publicly about them being supported in Storage Spaces Direct in Windows Server 2016 at Ignite 2015. But there is more to come.

Many of us are very happy with future visions on how PCIe will dislodge SAS/SATA as the preferred SSD interface. This might seem feasible for local storage right now and it can be leveraged for caching or with local PCIe RAID controllers which if shared enable Cluster in a Box (CiB) scenarios. But how to deal with this in an actual storage array, what if we want to size this to a larger scale? There are no “PCIe JBODS”. So what does one do? Well, how did we do it in the past with FC? We created a fabric. Below we see several local & remote NVMe architectures even hybrid ones with traditional SAS.

image

That’s exactly what NVM Express Inc. is doing, creating the specs for a fabric. This holds the promise to achieve superior results due to the elimination of SCSI translation which reduces latency significantly by delivering NVMe end to end. Not only that but we also see the following efforts in the NVM Express Specification 1.2 to give it enterprise grade capabilities beyond pure performance.

  • Enhanced status reporting
  • Expanded capabilities including live firmware updates

There have been some early demos of NVMe over Fabrics mainly focusing on the “remote” performance. While local NVMe SSDs have the edge on absolute IOPS the difference with NVMe over a fabric is not significant. The reduction in latency is measured in < 10 µs,so that’s good news. The fabric leverages RDMA (yes, yet another reason that my time spending with this technology has been a useful investment). This can be Infiniband, RoCE or iWarp. There’s also the new kid on the block “Intel Omni Scale”  (even if their early demo used iWARP). There’s also a Mellanox RoCE demo.

image

Now with NVMe SSD disk speeds it seems that the writing is on the wall that ever better fabric performance will be needed to support the tremendous throughput this evolution of storage can deliver. RDMA seems poised for success in regards to this. Now, yes, strictly speaking the NVMe traffic does not require RDMA but let’s just say I don’t see anyone building it without. I also think this means even iWarp fabrics will use DCB (PFC) to make sure we have a lossless network. The amount of traffics will be immense and why not optimize for the best possible performance? I hold the opinion this is beneficial for east-west traffic today in larger environments, especially when in converged networks. Unless the Intel® Omni-Path Architecture blows everyone else away that is Smile. Too early to tell.

Now does this dictate the total and absolute obsolescence of iSCSI and FC? No. There is no reason why a NVMe Fabrics storage solution cannot offer storage to hosts via FC, iSCSI, SMB 3, NFS, FCoE, … They, potentially could even offer all RDMA flavors like iWarp, RoCE or Infiniband to the hosts so you won’t lose your prior investments or get locked into one flavor of RDMA. I have no magic ball so I cannot tell you if this will happen. What I do now that when it comes to MPIO versus multichannel for load balancing and even failover and recovery, multichannel does a (far?) superior job in my honest opinion even when the hypervisor uses separate sessions per virtual machine to achieve better load balancing over iSCSI or the like. So perhaps storage vendors will finally deliver full SMB 3 support in their stacks … if not, well we’ll just abstract your storage way with SOFS. Your loss.  Anyway, I digress. One thing I do know is that I’ll keep a keen eye on what Microsoft is doing in this space, especially in regards to Windows Server 2016 capabilities & scalabilities. It’s time to up the level on scalability & support for newer state of the art technologies once again. It will ensure we get to run our stack on the very best hardware for years to come.

Windows Deduplication And Mysterious Folder & File Sizes

There was a brief moment of “this can’t be good” the sys admin looked at the file size of the backup folders and compared it to the size reported for the files. Sure I had told him that Windows inbox deduplication rocked but this had to be too good to be true or deduplication had just eaten all the backup files and he was “toast”. It was neither. But that requires some explanation. The good news is that Windows Data Deduplication combined with a backup product that supports it like VEEAM will save you a ton of money on deduplication licenses some charge and storage costs.

This is what he saw, and what caused the raised eye brow. 12.4TB reduced to 285GB.

image

Deduplication can’t be that great, right? Did something go wrong? Checking the properties of ALL selected files themselves did not report anything else but compared to the volume info for used space something seems very wrong. That’s supposed to be 5.34 TB.

image

The volume properties report the effective spaces consumed on the volume, so that reflects the true deduplication results. You can confirm this with PowerShell

image
A savings rate of 57% and  5.34 TB of actually consumes space (5880575557632 bytes) and an unoptimized size of 12.4 TB.  Just as server manager reports.

image

So what is explorer up to at the folder and file level? Nothing, it just can’t show you the complete picture. Windows Data Deduplication stores duplicated chunks into the System Volume Information folder. Windows explorer runs under your account and has no access to that folder and doesn’t report the size of all chunks in there. The only thing it does reports are the non duplicated bits that are left in the source folder. In our case where the backups reside. The result is, as said, raised eyebrows.

The same is true for any other tool actually, like WinDirStat in the blow screenshot.

image

When we run this tools as system we get a different picture and you can navigate to the actual ChunkStore and learn more about the internals.

image

Help! Active Directory cannot read forest and domain functional level anymore or much ado about nothing

The other day I got a very worried request for support. Apparently the Forest and domain functional level of a Active Directory deployment could no longer be read. Nothing else was wrong, everything was working just fine. If I could have a quick look? So they shared the screen with me and this is what I saw.

image

And this …

image

Well that didn’t surprise me, they are supposed to be already on domain and forest functional level 2012 R2 as all their domain controllers where already on Windows 202 R2 for over a year. That error message is not right!  After being puzzled for a moment when it hit me. This was a Windows 2008 R2 host without updated tools!

Once they used the correct version of RSAT or checking on a Windows 2012 R2 host all was show to be well and the scare was over.

image

Nothing so see here, move along.