DELL Enterprise Forum EMEA 2014 in Frankfurt

Posted on April 18, 2014 by workinghardinit

As you might have noticed on Twitter I was in Frankfurt last week to attend DELL Enterprise Forum EMEA 2014. It was a great conference and very worthwhile going to. It was a week of multi way communication between vendor, marketing, engineering, partners and customers. I learned a lot. And I gave a lot of feedback. As a Dell TechCenter Rockstar and a Microsoft MVP in Hyper-V I can build bridges to make sure both worlds understand each other better and we, the customers get their needs served better.

I’m happy I managed to go and I have some people to thank for me being able to grab this opportunity:

I cleared the time with my employer. This is great, this is a win win situation and I invested weekend time & extra hours for both my employer and myself.
I got an invite for the customer storage council where we learned a lot and got ample of opportunity to give honest and constructive feedback directly to the people that need to hear it! Awesome.
The DELL TechCenter Rockstar program invited me very generously to come over at zero cost for the Enterprise Forum. Which is great and helped my employer and myself out. So, thank you so much for helping me attend. Does this color my judgment? 100% pure objectivity does not exist but the ones who know me also know I communicate openly and directly. Look, I’ve never written positive reviews for money or kickbacks. I do not have sponsoring on my blog, even if that could help pay for conferences, travel expenses or lab equipment. Some say I should but for now I don’t. I speak my mind and I have been a long term DELL customer for some very good reasons. They deliver the best value for money with great support in a better way and model than others out there. I was sharing this info way before I became a Rockstar and they know that I tell the good, the bad and the ugly. They can handle it and know how to leverage feedback better than many out there.
Stijn Depril ( @sdepril, http://www.stijnsthoughts.be/), Technical Datacenter Sales at RealDolmen gave me a ride to Frankfurt and back home. Very nice of him and a big thank you for doing so. He didn’t have to and I’m not a customer of them. Thank buddy, I appreciate it and it was interesting ton learn the partners view on things during the drive there and back. Techies will always be checking out gear …

What did all this result in? Loads of discussion, learning and sharing about storage, networking, compute, cloud, futures and community in IT. It was an 18 hour per day technology fest in a very nice and well arranged fashion.

I was able to meet up with community members, twitter buddies, DELL Employees and peers from all over EMEA and share experiences, learn together, talk shop, provide feedback and left with a better understanding of the complexities and realities they deal with on their side.

It has been time very well spent. I applaud DELL to make their engineers and product managers available for this event. I thank them for allowing us this amount of access to their brains from breakfast till the moment we say goodnight after a night cap. Well done, thank you for listening and I hope to continue the discussion. It’s great to be a DELL TechCenter Rockstar and work in this industry during this interesting times. To all the people I met again or for the first time, it was a great week of many interesting conversations!

For some more pictures and movies visit the Dell Enterprise Forum EMEA 2014 from Germany photo album on Flickr

Copy Cluster Roles Hyper-V Cluster Migration Fails at Final Step with error Virtual Machine Configuration ‘VM01’ failed to register the virtual machine with the virtual machine service

Posted on March 19, 2014 by workinghardinit

I was working on a migration of a nice two node Windows Server 2012 Hyper-V cluster to Windows Server 2012 R2. The cluster consist out of 2 DELL R610 servers and a DELL MD3200 shared SAS disk array for the shared storage. It runs all the virtual machines with infrastructure roles etc. It’s a Cluster In A Box like set up. This has been doing just fine for 18 months but the need for features in Windows Server 2012 R2 became too much to resists. As the hardware needs to be recuperated and we have a maintenance windows we use the copy cluster roles scenario that we have used so many times before with great success. It’s the Perform an in-place migration involving only two servers scenario documented on TechNet and as described in one of my previous blogs Migrating a Hyper-V Cluster to Windows 2012 R2 for your convenience.

Virtual Machine Configuration ‘VM01’ failed to register the virtual machine with the virtual machine service

As the source host was running on Windows Server 2012 we could have done the live migration scenario but the down time would be minimal and there is a maintenance window. So we chose this path.

So we performed a good health check. of the source cluster and made sure we had no snapshots left hanging around. Yes it’s supported now for this migration scenario but I like to have as few moving parts as possible during a migration.

It all went smooth like silk. After shutting down the VMs on the source cluster node, bringing the CSV off line (and un-presenting the LUN from the source node for good measure), we present that LUN to the target host. We brought the CSV on line and when that was completed successfully we were ready to bring the virtual machines on line and that failed …

Log Name:      Microsoft-Windows-Hyper-V-High-Availability-Admin
Source:        Microsoft-Windows-Hyper-V-High-Availability
Date:          4/02/2014 19:26:41
Event ID:      21102
Task Category: None
Level:         Error
Keywords:
User:          SYSTEM
Computer:      VM01.domain.be
Description:
‘Virtual Machine Configuration VM01’ failed to register the virtual machine with the virtual machine management service.

Let’s dive into the other event logs. On the host the application security and system event log are squeaky clean. The Hyper-V event logs are pretty empty or clean to except for these events in the Hyper-V-VMMS Admin log.

Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          4/02/2014 19:26:40
Event ID:      13000
Task Category: None
Level:         Error
Keywords:
User:          SYSTEM
Computer:      VM01.domain.be
Description:
User ‘NT AUTHORITYSYSTEM’ failed to create external configuration store at ‘C:ClusterStorageHyperVStorageVM01’: The trust relationship between this workstation and the primary domain failed.. (0x800706FD)

Bingo. It must be the fact that no domain controller is available. It’s completely self contained cluster and both domain controller virtual machines are highly available and reside on the CSV. Now the CSV does come on line without a DC since Windows Server 2012 so that’s not the issue. it’s the process of registering the VMs that fails without a DC in an Active Directory environment.

Getting passed this issue

There are multiple ways to resolve this and move ahead with our cluster migration. As the environment is still fully functional on the source cluster I just removed a DC virtual machine from high availability on the cluster. I shut it down and exported it. I than copied it over to the node of the new cluster (we’re going to nuke the source host afterwards and install W2K12R2, so we moved it to the new host where it could stay) where I put it on local storage and imported it. For this is used the “Register the virtual machine in-place option”. I did not make it high available.

After verifying that we could ping the DC and it was up and running well we tried the final phase of the migration again. It went as smooth as we have come to expect!

Other options would have been to host the DC virtual machine on a laptop or other server. If you could no longer get to the the DC for export & import or heck even a shared nothing migration depending on your environment can help you out of this pickle. A restore from backup would also work. But here in that 2 node all in one cluster our approach was fast and efficient.

So there you go. Tip to remember. Virtualizing domain controllers is fully supported, no worries there but you need to make sure that if you have a dependency on a DC you don’t have the DC depending on that dependency. It’s chicken an egg thing.

Use Cases For Fluid Cache For SAN With DELL Compellent In High Performance Virtualization With Windows 2012 R2

Posted on January 10, 2014 by workinghardinit

Fluid Cache For SAN

At Dell World 2013 in Austin Texas I spent some time talking to engineers & managers about Fluid Cache For SAN. The demo in the keynote was enough to grab my distinct attention, especially as a Compellent customer.

What is it?

Dell already has Fluid Cache for DAS available in its PowerEdge servers. Now it’s time to bring this to their best SAN offering, the Compellent, and make Fluid Cache shared storage suitable for shared storage clustering. The way to do that cost effective and high performance is to build on the success of on board (local to the server) high performance storage and make that shared through software in a physical shared nothing replication/sync model. To make this happen they use a 10/40Gbps Ethernet solution leveraging RoCE (RDMA over Converged Ethernet). Yes that very technology I have been investing time & effort in for SMB Direct and which we leverage for CSV & Live Migration traffic and with SOFS in Windows 2012 R2.

Basically the super low latency an high throughput enable the memory to be synced across all nodes in a cluster and as such each node sees all the cluster memory. For redundancy you will need at least 3 nodes in a cluster. Dell will scale Fluid Cache For SAN to 128 nodes. Windows Server 2012 R2 can handle 64 nodes, which some think is ridiculously high, but then again, Dell aims even higher so it’s not as weird as you think. Some people have really huge computing needs. Just remember that 10 years ago you probably found that 16GB of RAM was extravagant.

Why this architecture?

Dell uses server based “shared nothing flash storage” & high speed low, latency synchronization to create a logical cluster wide shared pool of flash memory. This means the achieve stellar low latency as the flash storage is inside of the servers, close to the processors and as such delivers excellent performance for the workloads. Way better that “just” flash only SAN can. For data integrity they commit the data only when it written to the Express Flash drive(s) of one server and then also to another and verified. This needs to happen very fast and that where the RoCE network come sin to play. Later, at less speed critical times the data is pushed out to the Compellent SAN for storage. If that SAN is a flash based setup think about the capabilities this gives you in performance. Likewise data reads of the SAN that are highly active are pushed from the Compellent SAN and cached (also in multiple copies) on the Express Flash modules. While two servers with each a copy of the data on Express Flash modules would suffice DELL requires at least three. This is just a plain common sense N+1 redundancy design to have high availability even when a node fails. A cool think to note is that you can build larger clusters with 3 nodes each having one or more Express Flash modules and additional nodes don’t need it as long as they can read the cache of those 3. So the cost of this can be managed. The drawback is that you don’t read & write to a local Express Flash module on those extra node. If you want that you’ll need to put more $ on the table.

The thing to note here is that the Servers/SAN are connected over RoCE/RDMA. Well this look familiar. What technology can also leverages RDMA? SMB Direct in Windows Server 2012 R2! And where do we use this amongst other things? Storage IO in Scale Out File Server, CSV traffic, Live Migration …

The big benefit of this design it just it takes your SAN to the next level but also, if DELL does this right, they won’t break any of the good stuff like VSS aware snapshot with Replay Manager, Automatic Data Tiering, Live Volumes, Live Migration etc. A lot of the high IOPS/low latency solutions out their based on fast local flash break a lot of the good stuff and reduces centralized storage management. What if you can have your cookie and eat it to?

Demo Time at Dell World

Dell demonstrated an Oracle database load on an eight node cluster of PowerEdge R720 servers with Intel Xeon E5 processors, with Linux (no Windows Server 2012 R2 support yet ) These servers each used 350GB PCI-Express flash cards (“only” PCI-Express 2.0 capable by the way). This cluster, using a Compellent SAN, managed to get a result of more than 5 million IOPS at 6 millisecond response times, delivering 12,000 tps for 14,000 client connections. This was read only. If they dropped the Fluid Cache for SAN they could “only” achieve 2,000 clients (6 times less clients due to 4 time less transactions and 99% slower responses). See this movie for more info: http://www.youtube.com/watch?v=uw7UHWWAtig and watch the keynote from Dell World 2013 here

Where would I use this?

Cost will determine use cases and this is unknown for now. We can only look at what Fluid Cache for DAS cost right now and speculate. I for one hope/bet on the fact that DELL won’t price itself out of the marked (they have a lot of competition from big & smalls players in a “good enough is good enough” world with a cloud mindset all around). So make it too expensive and we might be happy with “just” 500.000 IOPS at much less cost. It’s a fine line. Price it right, support it well and you might win the bulk of sales in the storage wars. Based on the DAS solution we’re looking at least 8000 $ per server (license is 3500 for DAS => see http://www.theregister.co.uk/2013/03/05/dell_fluid_cache_server_acceleration/ + cost of PCI-Express flash module (> 5000$ => see http://en.community.dell.com/techcenter/extras/chats/w/wiki/4480.3-5-2013-techchat-fluid-cache-for-das.aspx) & yearly maintenance fee. Then we need to factor in the cost of the RDMA/RoCE capable NICs & the (dedicated) Force10 switches – 2 for redundancy that are at least 10Gbps (S4810?) or probably 40Gbps (S6000?) & cabling. So this is not a cheap solution and you won’t just “throw it in” on a quiet afternoon to see what it does for you. Not that there will be a DIY “throw it on kit” I think, it’s a step above plug and play. If they keep it affordable and do some other things for Windows Server 201 R2 / Hyper-V they can be the absolute number one SAN vendor for any Microsoft customer. But that’s another blog topic.

Cost is indeed something that might make it a show stopper for us. I just can’t tell yet. One of the key factors is that if affordable it could give the point solutions we now see pop up more and more in storage. a run for their money. While cheap and workable in good enough is good enough scenarios it takes some of the centrally shared storage advantages away. But if we ever do a state full VDI project in an environment with high end physical desktops (500GB or more local storage, SSD disks, 8 core CPU, 8-32GB DDR3, dual or more screens) that run ArcGis, AutoCad, Visual Studio, SQL Server, Outlook with 5GB mailboxes, large documents & huge files (images) this might be the enabler we need to make VDI happen & works as desired with current all-purpose Compellent SANs. IIf the price is right it could enable VDI in now “NO GO” scenarios. And those are plentiful, … Another use case I see is a virtualized SQL Server environment on Hyper-V with general purpose shared storage. We’re doing very well but the day might arrive that we need those IOPS in order to take it even further. Don’t laugh but realize how much IOPS an SSD delivers to a workstation today and that’s what your users expect & demand. Want to fail at VDI? Have it outperformed by a 4 year old physical PC where you slapped an SSD into.

Could it help in keeping excessive IOPS away from the SAN, making that capable of doing more over a longer life time? In other words can it play a part in the Storage QoS issue across server/cluster/storage system issue for non workload aware storage solutions?

So I might have some homework to do. For our next SQL Server cluster we’ll look at the next generation of servers & start counting our PCI Express slots. We now already consume 4 PCI-Express slots for 2*FC & 2*Dual Port 10Gbps) in our Hyper-V design. That’s another discussion, but they are built purposely for performance under any condition & to be highly redundant. A health check / improvement track by Microsoft for our SQL Server environment has proven this to be an outstanding setup (nice e-mail to see your bosses get by the way). I digress, free PCI-Slots should not be an issue, as we also don’t need the FC cards in the Fluid Cache Nodes. The storage IO uses the RoCE network, to which the Compellent SAN attaches.

Cost is very important in determining if we’ll ever get to deploy it. The cloud is here, and while that is far from cheap either, it’s a lot easier to sell than internal IT for various reasons. That’s just how the powers that be roll right now & how things are.

What we’ll get in our hands

There was a lot of love between Dell & Samsung at Dell World. Talking to Dell at the server/storage/networking boots I understood that Samsung is going to produce flash modules for this that support PCI-Express 3.0 and the industry backed NVM Express host interface for solid state drives which will reduce latency with 1/3 compared to now. As it seems they will produce higher capacity cards than what was used in the demos (800 GB and 1.6 TB). So capacity will increase & latency will drop even more. They leverage the Force10 10Gpbs or 40Gbps switches for the RoCE network. As Dell & Mellanox are cooperating heavily (Mellanox Collaborates with Dell to Deliver 10/40GbE Solution for Mainstream Servers and Networking Solutions) my bet is on Mellanox for the cards. Broadcom is not there yet for it to happen in time and Intel has no RoCE cards afaik. They seem to be playing the waiting game before they jump in.

Magic Ball Time, Speculation & Questions.

I’m not a DELL Server / Storage designer or architect, and those that are don’t tell me to plaster it all over the internet, so this really is magic ball time …

I’ll show my ignorance on what Samsung does under the hood when I hear that the next generation of DELL servers can have 6TB of RAM I can only speculate that with the advent of DDR4 in servers & ever dropping cost the path is open to leverage NV-RAM disk for the read/write cache in Fluid Cache for SAN as well a bit like what IDT did http://us.generation-nt.com/idt-announces-world-first-pci-express-gen-3-nvme-nv-dram-press-3732872.html. The persistence comes from writing the DRAM content to NAND at shutdown, can we do that fast enough at 1.6 TB sized caches? Can we fit enough of those modules on a card? What would that do for IOPS & latency? Does that even make sense at this moment in time?

What if we could leverage the DDR4 dims in the server itself? This would perhaps cut costs and also save us some valuable PCI-Express 3.0 slots for our 10Gbps or better addiction Smile . Sure there is no persistence than but the content is distributed redundantly over the cluster anyway? Is that safe enough to make it feasible? What if we need to shut down the cluster? I guess it’s not that easy and perhaps we just need to make sure future motherboards have 8 or more PC-Express 3.0 slots & not worry about that. Or move to 40/100Gbps & have less need for NICs. Yeah that’s what was said of 10Gbps in the early days …

Support for Windows?

While it’s not there yet I have absolutely no doubt that they will bring it to Windows Server 2012 R2 and higher. Well Windows is a huge on premise market for native workloads like SQL Server, VDI and Hyper-V. The number of sales opportunities in the Microsoft ecosystem is growing (despite cloud) while others are stagnant or dropping. On top of that the low cost of Hyper-V leaves money to be spent on Fluid Cache for SAN. As Dell is in business to make money, they will not leave that big chunk of cash on the table.

When can we get our hands on this technology?

Timing wise that will be early to late Q2 in 2014, which is my best guestimate. Interesting times people, interesting times

DELL World 2013 – Tour Of the Acoustic & Storage Testing Labs & Presenting at the Dell TechCenter User Group

Posted on December 24, 2013 by workinghardinit

While at Dell World 2013, a group of us had the opportunity to visit the Dell offices as part of the Trends in Data Center Technology Think Tank. We saw advancements in fresh air cooling, a hot house,

the storage lab and, new to me, the acoustic labs. Below is a picture of Chris Peterson, the acoustic Architect (he was involved in the design of the DELL VRTX, which is a unique solution and achievement in the industry). Like wise the also have thermal engineers and both of these expertises are closely related.

I will never look at acoustic / thermal engineering for servers & storage in the same way I used to and I have way more respect for the effort and a better understanding of what efforts go in to this research and why.

For some more information on the acoustic lab read this white paper Dell Enterprise Acoustics and watch these videos:

Dell thermal & acoustic engineers discussing the VRTX

Chris Peterson on Dell PowerEdge Generation 12 Server acoustics

Next to all that I attended briefings, had one to one conversations with network, storage & server managers & engineers. I had a lot of information, questions & request to share from our Microsoft MVP Community in regards to our needs & wishes for the best possible support for Windows Server 2012 R2, Hyper-V, ODX, UNMAP, SMB Direct, SOFS, Management & cloud. I even jumped into an open source breakfast discussion on * cloud computing. Last but not least we joined fellow Rock Stars Jonathan Copeland (@VirtSecurity), Rasmus Haslund (@haslund) & Dell Tech Center’s community manager Jeff Sullivan (@JeffSullivan) to discuss what community & social media means to us.

I also shared our experiences with Windows Server 2012 R2, Hyper-V, DVMQ, vRSS & ODX at the Dell Tech Center User Group during Dell World 2013.

Want to talk and demo DVMQ & vRSS? Start with the basics: RSS

To all my community buddies a very festive end of the year and a great 2014! If you want to know even more about how rewarding being part of a community can be, check out this blog Mindset of the community by Marc van Eijk (@_marcvaneijk)

Working Hard In IT

My view on IT from the trenches

Category Archives: DELL