A highly redundant Application Delivery Controller Setup with KempTechnologies

Introduction

The goal was to make sure the KempTechnologies LoadMaster Application Delivery Controller was capable to handle the traffic to all load balanced virtual machines in a high volume data and compute environment. Needless to say the solution had to be highly available.

A highly redundant Application Delivery Controller Setup with KempTechnologies

The environment offers rack and row as failure units in power, networking and compute. Hyper-V clusters nodes are spread across racks in different rows. Networking is high to continuously available allowing for planned and unplanned maintenance as well as failure of switches. All racks have redundant PDUs that are remotely managed over Ethernet. There is a separate out of band network with remote access.

The 2 Kemp LoadMasters are mounted a different row and different rack to spread the risk and maintain high availability. Eth0 & Eth2 are in active passive bond for a redundant management interface, eth1 is used to provide a secondary backup link for HA. These use the switch independent redundant switches of the rack that also uplink (VLT) to the Force10 switches (spread across racks and rows themselves). The two 10GBps ports are in an active-passive bond to trunked ports of the two redundant switch independent 10 Gbps switches in the rack. So we also have protection against port or cable failures.

image

Some tips: Use TRUNK for the port mode, not general with DELL switches.

This design allows gives us a lot of capabilities.We have redundant networking for all networks. We have an active-passive LoadMasters which means:

  • Failover when the active on fails
  • Non service interrupting firmware upgrades
  • The rack is the failure domain. As each rack is in a different row we also mitigate “localized” issues (power, maintenance affecting the rack, …)

Combine this with the fact that these are bare metal LoadMasters (DELL R320 with iDRAC –  see Remote Access to the KEMP R320 LoadMaster (DELL) via DRAC Adds Value) we have out of band management even when we have network issues. The racks are provisioned with PDU that are managed over Ethernet so we can even cut the power remotely if needed to resolve issues.

Conclusion

The results are very good and we get “zero ping loss” failover between the LoadMaster Nodes during testing.

We have a solid, redundant Application Deliver Controller deployment that does not break the switch independent TOR setup that exists in all racks/rows. It’s active passive on the controller level and active-passive at the network (bonding) level. If that is an issue the TOR switches should be configured as MLAGs. That would enable LACP for the bonded interfaces. At the LoadMaster level these could be configured as a cluster to get an active-active setup, if some of the restrictions this imposes are not a concern to your environment.

Important Note:

Some high end switches such as the Force10 Series with VLT support attaching single homes devices (devices not attached to both members on an VLT). While VLT and MLAG are very similar MLAGs come with their own needs & restrictions. Not all switches that support MLAG can handle single homed devices. The obvious solution is no to attach single homed devices but that is not always a possibility with certain devices. That means other solutions are need which could lead to a significant rise in needed switches defeating the economics of affordable redundant TOR networking (cost of switches, power, rack space, operations, …) or by leveraging MSTP and configuring a dedicates MSTP network for a VLAN which also might not always be possible / feasible so solve the issue. Those single homed devices might very well need to be the same VLANs as the dual homed ones. Stacking would also solve the above issue as the MLAG restrictions do not apply. I do not like stacking however as it breaks the switch independent redundant network design; even during planned maintenance as a firmware upgrade brings down the entire stack.

One thing that is missing is the ability to fail over when the network fails. There is no concept of a “protected” network. This could help try mitigate issues where when a virtual service is down due to network issues the LoadMaster could try and fail over to see if we have more success on the other node. For certain scenarios this could prevent long periods of down time.

CryptoWall 3.0 Strikes To Close for Comfort

Instead of testing Windows Server 2016 TPv4 a bit more during “slow” hours we got distracted from that a bit CryptoWall 3.0 strikes to close for Comfort. Last week we, my team and I, had to distinct displeasure of having to tackle a “ransomware” infection inside a business network. Talk about petting a burning dog.

We were lucky on a few fronts. The anti malware tools got the infection in the act and shut it down. We went from zero and 100 miles per hour and had the infected or suspect client systems ripped of the network and confiscated.  We issue a brand new imaged PC in such incidents. No risks are taken there.

Then there was a pause … anything to be seen on the anti malware tools? Any issues being reported?  Tick tock … tick tock … while we were looking at the logs to see what we were dealing with. Wait Out …

Contact! The first reports came in about issues with opening files on the shares and soon the service desk found the dreaded images on subfolders on those shares.

image

Pucker time as we moved to prevent further damage and started an scan & search for more encrypted files and evidence of damage. I’m not going to go into detail about what, why, when and how. As in all fights you have to fight as you are. No good wishing for better defenses, tools, skills or training. At that moment you do what you think you need to do to contain the situation, clean up, restore data and hope for the best.

What can I say? We got lucky. We did our best. I’d rather not have to do that again. We have multiple types of backup & restore capabilities and that was good. But you do not want to call all data lost beyond a point and start restoring dozen of terabytes of corporate data to a last know good without any insight on the blast radius and fall out of that incident.

The good thing was our boss was on board to do what needed and could be done and let us work. We tried to protect our data while we started the cleanup and restores where needed. It could have been a lot uglier, costlier and potentially deadly. This time our data protection measures saved the day. And at least 2 copies of those were save from infection. Early detection and response was key. The rest was luck.

Crypto wall moves fast. It attempts to find active command and control infrastructure immediately. As soon as it gets it public key from the command and control server that it starts using to encrypt files. The private key securely hidden behind “a pay wall” somewhere in a part of the internet you don’t want to know about. All that happens in seconds. Stopping that is hard. Being fast limits damage. Data recovery options are key. Everyday people are being trapped by phishing e-mails with malicious attachments, drive by downloads on infected website or even advertisement networks.

Read more on CryptoWall 3.0 here https://www.sentinelone.com/blog/anatomy-of-cryptowall-3-0-a-look-inside-ransomwares-tactics/  Details on how to protect and detect depend on your anti malware solution. It’s very sobering, to say the least.

It makes me hate corporate apps that require outdated browsers even more. Especially since we’ve been able to avoid that till now. But knowing all to well forces are at work to introduce those down grade browsers with “new” software. Insanity at its best.

KB3063283 Updates the Hyper-V Integration Components for Windows Server 2012 R2 to 6.3.9600.17831

While investigating a backup issue with some VMs I noticed an entry in the VEEAM Backup & Replication logs that the Hyper-V integration components were out of date.

image

This was the case on all the guests on that particular cluster actually. A quick look at the IC version on the host showed them to be at 6.3.9600.17831.

image

Comparing that to the ones in the guest made clear very quickly that those were at 6.3.9600.16384. So lower.

image

A web search for Hyper-V Integration components led us to KB3063283 “Update to improve the backup of Hyper-V Integration components in Hyper-V Server 2012 R2”on their Hyper-V hosts. They keep a tight ship but due to regulations they are normally 3 to 4 months behind in patches and updates. So in their case they only recently installed that update. KB3063283 Updates the Hyper-V Integration Components for Windows Server 2012 R2 to 6.3.9600.17831

So a little word of warning while you are keeping your Hyper-V environment up to date (you should), don’t forget to update the integration components of your virtual machines. A good backup product like Veeam Back & Replication will log this during backups. It might not make the backups fail per se but they have been updated for a good reason. This upgrade  was even specifically for backup related issues so it’s wise to upgrade the virtual machines to this version a.s.a.p..

Kemp LoadMaster OEM Servers and Dell Firmware Updates with Lifecycle Controller

When you buy a DELL OEM based Kemp Technologies LoadMaster you might wonder who will handle the hardware updates to the server. Well Dell handles all OEM updates via its usual options and as with all LoadMasters Kemp Technologies handles the firmware update of the LoadMaster image.

KempLM320

Hardware wise both DELL and Kemp have been two companies that excel in support. If you can find the solution that meets your needs it’s a great choice. Combine them and it make for a great experience.  Let me share a small issue I ran into updating Kemp Loadmaster OEM Servers and Dell Firmware Updates with Lifecycle Controller

For a set of DELL R320 loadmasters in HA is was upgrading ( I not only wanted to move to 7.1-Patch28b-BARE-METAL.bin but I also wanted to take the opportunity to bring the firmware of those servers to the latest versions as that had been a while (since they had been delivered on site).

There is no OS that runs in those server,s as they are OEM hardware based appliances for the Loadmaster image. No worries these DELL servers come with DRAC & Lifecycle controllers so you can leverage those to do the firmware updates from a Server Update Utility ISO locally, via virtual media, over over the network, via FTP or a network share. FTP is either the DELL FTP Site or an internal one.

image

image

Now as I had just downloaded the  latest SUU at the time (SUU-32_15.09.200.74.ISO – for now you need to use the 32 bit installers with the life cycle controller) I decided to just mount it via the virtual media, boot to the lifecycle controller and update using local media.

image

image

But I got stuck  …

It doesn’t throw an error but it just returns to the start point and nothing can fix it. Not even adding “/repository”  to the file path . You can type the name of an individual DUP (32 bit!) and that works. Scanning the entire repository however wouldn’t move beyond step 2 “Enter Access Details”.

Scanning for an individual DUP seemed to work but leaving the file path blank while trying to find all eligible updates seemed not to return any results so I could not advance. The way I was able to solve this was by leveraging the DRAC ability to update it own firmware using the firmware image file to the most recent version. I just got mine by extracting the DUP and taking the image file from the payload sub folder.

image

You can read on how to upgrade DRAC / Lifecycle Controller via the DRAC here.

image

When you’ve done that, I give the system a reboot for good measure, and try again. I have found in all my cases fixes the issue. My take on this is that older firmware can’t handle more recent SUU repositories. So give it a try if you run into this and you’ll be well on your way to get your firmware updated. If you need help with this process DELL has excellent documentation here in “Lifecycle Controller Platform Update/Firmware Update in Dell PowerEdge 12th Generation Servers”

image

image

image

The end result is a fully updated DELL Server / Kemp Loadmaster. Mission accomplished. All this can be done from the comfort of your home office. A win-win for both you and your customer/employer. Think about it, it would be a shame to miss out on all the benefits you get from working in the cloud when your on premises part of a hybrid infrastructure forces you to get in a car and drive to a data center 70 km away. Especially at 21:21 at night.