My favorite deployment for VMs with Discrete Device Assignment for GPU

Introduction

Recently I had an interesting discussion on how to leverage Discrete Device Assignment (DDA) for GPU needs when it’s only needed for a certain number of virtual machines. Someone had read my blogs on leveraging DDA that made here optimistic and enthusiastic. But she noticed in the lab she could not leverage DDA on a VM running on a cluster and she could not use storage QoS policies on a stand-alone Hyper-V host with local storage. So, what could she do?

Well for one, her findings are correct. Microsoft did not enable DDA on clustered virtual machines. It doesn’t make sense as the GPU hardware is tied to the virtual machine and any high availability, both planned (live migration) or unplanned (failover) isn’t possible and available anyway. It just cannot be done. I hear you, when you say “but they pulled it off for SR-IOV for networking”. Sure, but please keep in mind that network cards with Ethernet and TCP/IP allows for different approaches than high end video.

My favorite deployment for VMs with Discrete Device Assignment for GPU

My favorite deployment for VMs with Discrete Device Assignment (DDA) for GPU leveraged SMB3 SOFS shares for the virtual hard disks and stand-alone Hyper-V hosts that are member servers in the domain. Let me explain why.

Based on what we discussed above we have some options. One work around is running the DDA virtual machines not high available on local storage on a cluster node. But that would mean you would have a few VMs on all the nodes and that all those nodes must have a DDA capable GPU. Or if you limit the number of nodes that have such a GPU you’ll have a few odd balls in your cluster. You’ll need to manage some extra complexity and must save guard against assigning a GPU via DDA that is already in use for RemoteFX. That cause all kinds of unpleasantness, nothing too deadly but not something you want to do if on your production VDI clusters for fun. It’s a bit like not running a domain controller on a CSV and not making it highly available. If that’s the only option you have you can do that, and I do when needed as Microsoft has improved a lot of things to make this a better and less risky experience. But I prefer to have either physical one or host it on a separate non-clustered Hyper-V host if that’s an option because not all storage solutions and environments have all capabilities needed to make that fool proof.

Also note that running other storage on a S2D node isn’t supported. You have your OS on the boot disks and the disks used in storage spaces. Odd ones out aren’t supposed to be there as S2D will try to recruit them. You can get do it when using traditional shared storage

What I also don’t like about that is that if the cluster storage is not SMB3 SOFS you don’t get the benefit of storage QoS policies in Windows Server 2016, that only works with CSV. So optionally you could leave the non-clustered VM on a CSV. But that’s perhaps a bit confusing and some people might think some forgot to make the machine high available etc.

My preferred setup to get high available storage for virtual machines with DDA needs that benefits from what storage QoS polies have to offer for VDI is to use standalone Hyper-V hosts that have DDA capable GPUs and leverage SMB3 SOFS shares for the virtual Machines.

clip_image002

The virtual machines cannot be high available anyway so you lose nothing there. The beauty is that in this case, as you leverage a Windows Server 2016 SOFS cluster for Hyper-V storage over SMB3 shares, you do get Storage QoS policies.

#On a SOFS node

Get-StorageQosPolicy -Name DedicatedTier1Policy | Get-StorageQosFlow | ft InitiatorName, *IOPS, Status, PolicyID, filePath -AutoSize

#Query for the VM disks on the Hyper-V node

Get-VM -Name DDAVMSOFSStorage -ComputerName RemoteFXHost | Get-VMHardDiskDrive |fl *

clip_image004

#We generate some IO and get some stats on a SOFS node

get-storageQosFlow

get-storageQoSVolume -Mountpoint C:\ClusterStorage\SOFSDEMO\

get-storageQoSVolume -Mountpoint C:\ClusterStorage\SOFSDEMO\ | fl

clip_image006

You can start out with one Hyper-V node and add more when needed, that scale out. Depending on the needs of the virtual machines and specs of the servers (Memory, CPU cores) and the capability and number of GPU in the video cards you get some scale up as well.

To learn more about DDA go here:  https://blog.workinghardinit.work/?s=DDA&submit=Search

To learn more about storage QoS policies go here:

Some more considerations

By going disaggregated. You can leverage a SOFS share for both virtual machines running on a Hyper-V cluster or on stand-alone (non-clustered) Hyper-V that are domain members. The SOFS cluster can be leveraging S2D, traditional storage spaces with shared SAS (JBODs) or even a FC, iSCSI or shared SAS SANS if that the only option you have. That’s all OK as long as it’s SOFS running on Windows Server 2016 and the Hyper-V hosts (stand alone or clustered) are a running 2016 as well (needed for Storage QoS policies and DDA). There is no need for the Hyper-V host to be part of a cluster to get the best results you need. If I use SOFS for both scenarios I can use the same storage array, but I don’t need to. I could also use separate storage arrays. If the Hyper-V cluster is leveraging CSV instead of SOFS I will need to use a separate one for SOFS as its ill advised to mix Hyper-V workloads with the SOFS role. Keep things easy, clear and supportable. I’ll borrow a picture I got from a Microsoft PM recently, do seek out the bad ideas.

clip_image008

Veeam Vanguard nominations are now open for 2018!

Just a quick blog post on the Veeam Vanguard program. The nominations for 2018 are open! That means that if you know people who would make a Veeam Vanguard you can nominate them. You can even nominate yourself, that’s perfectly fine. It’s not frowned upon, but it also doesn’t change anything in terms of evaluation for the program.

veeamvanguardnewlogo

Rick blogged on this yesterday on the Veeam blog in “Veeam Vanguard nominations are now open for 2018!” and gave some more insight of what the program is, tries to achieve and does. He also discusses the selection. The key take-away is that you cannot study for this and that it is not some kind of certification or such. Some of the current Vanguards were quoted on how they look at the program and one thing is constant in that. The fact that the people in these programs are contributors to the global tech community and it’s about sharing and helping others getting the best out of their environment and their investment in Veeam. It also helps Veeam as they get a very communicative group of people to give them feedback on their offerings, both products and services. It’s just one more tool that helps them get things right of fix thing when they got it wrong. Likewise understanding Veeam and their products better for us helps us make better decisions on design, implementation and operation of them.

You can have a look at the current lineup of Veeam Vanguards over here.

clip_image004

You’ll find a short video on the program on that page as well. So go meet the Vanguards and find their blog, their communities and follow @VeeamVanguard and the hash tag #VeeamVanguard to see what’s going on.

clip_image006

So, people, this is the moment if you want to nominate someone or yourself to join the Veeam Vanguards in 2018. You have time until December 29th 2017 to do so. I have always felt honored to be selected and have found memories of the events I was able to go to and I to this day I’m happy to be active in the Veeam Vanguard ecosystem. It’s a fine group of professionals in a program of a great company.

Software-Defined Data Infrastructure Essentials

The last few months I spent some of my down time and commute time reading a book. A paper one actually. It’s Greg Schulz’s “Software-Defined Data Infrastructure Essentials”. It is as the sub title states about cloud, converged and virtual fundamental server storage I/O tradecraft.

It is not a book you’ll read to learn about a particular technology, product or vendor. It is a more holistic approach to educating people in todays IT landscape. That vast area of expertise in which all the considerations around storage in a modern IT environment come together. Where old and new, established and emerging ways of handling storage IO for a variety of  use cases meet and mix.

image

Reading the book helps to become more well versed in the subject and takes us out of our product or problem specific cocoons. That the main reason I’d recommend anyone to read it. I’m impressed by how well Greg managed to write a book on such a diverse subject that is accessible to all levels of expertise.The depth and the breadth of this subject make this quite a feat. On top of that this book is usable and valuable to both novice and experienced  professionals. I have said it before (on Twitter), but if I was teaching IT classes and needed to bring the student up to date in regards to the software defined cloud data center data considerations this would be the text book. It acknowledges the diversity of solutions and architectures in the real world and doesn’t make bold marketing statements. Instead it focuses on what you need to know and consider when discussing and designing solutions. I wish many IT manager, consultant and analyst would attend my fictional class but I’d settle for them reading this book and learning about a big part of what they need to manage, It would serve them well and help understand concerns other involved parties might want to see addressed.

For me an extra benefit was that I enjoy talking shop with Greg but I only get those opportunities on rare occasions during conferences.  As such, this book gave me some more time to read his views and insights. That’s the best next thing.

Webinar: 3 Emerging Technologies that will change the way you use Hyper-V

I have the pleasure of doing a webinar on October 24th 2017 with two fellow MVPs. They are Andy Syrewicze  from Altaro who organize the webinar and Thomas Mauer who’s well know expert in the tech community and beyond on Cloud and Data Center technologies.

The subject of the webinar is 3 Emerging Technologies that will change the way you use Hyper-V. It’s a panel style discussion amongst the 3 of us technology and trends that effect everyone in this business.

  • Public cloud computing platforms such as Azure and AWS
  • Azure Stack and the complete abstraction of Hyper-V
  • Containers and microservices: why they are game changers

image

There will be time for questions and discussions as we expect the subject to be of great interest to all. For my part I’ll try to look at the bigger picture of the technologies both from a product a service perspective as well as from a strategic point of view and a part of a doctrine to achieve an organizations goals.