Upgrade the firmware on a Brocade Fibre Channel Switch

NOTE: content available as pdf download here.

Upgrade the firmware on a Brocade Fibre Channel Switch

In order to maintain a secure, well-functioning fibre channel fabric over the years you’ll need to perform a firmware upgrade now and again. Brocade fibre channel switches are expensive but they do deliver a very solid experience. This experience is also obvious in the firmware upgrade process. We’ll walk through this as a guide on how to upgrade the firmware on a Brocade fibre channel switch environment.

Have a FTP/SFTP/SCP server in place

If you have some switches in your environment you’re probably already running a TFTP or FTP server for upgrading those. For TFTP I use the free but simple and good one provided by Solarwinds. They also offer a free SCP/SFTP solution. For FTP it depends either we have IIS with FTP (and FTPS) set up or we use FileZilla FTP Server which also offers SFTP and FTPS. In any case this is not a blog about these solutions. If you’re responsible for keeping network gear in tip top shape you should this little piece of infrastructure set up for both downloads and uploads of configurations (backup/restore), firmware and boot code. If you don’t have this, it’s about time you set one up sport! A virtual machine will do just fine and we back it up as well as we store our firmware and backups on that VM as well. For mobile scenarios I just keep TFTP & FilleZilla Server installed and ready to go on my laptop in a stopped state until I need ‘m.

Getting the correct Fabric OS firmware

It’s up to your SAN & switch vendors to inform you about support for firmware releases. Some OEMs will publish those on their own support sites some will coordinate with Brocade to deliver them as download for specific models sold and supported by them. Dell does this. To get it select your switch version on the dell support site and under downloads you’ll find a link.

clip_image002

That link takes you to the Brocade download page for DELL customers.

clip_image004

Make sure you download the correct firmware for your switch. Read the release notes and make sure you’re the hardware you use is supported. Do your homework, go through the Brocade Fabric OS (FOS) 7.x Compatibility Matrix. There is no reason to shoot yourself in the foot when this can be avoided. I always contact DELL Compellent CoPilot support to verify the version is support with the Compellent Storage Center firmware.

When you have downloaded the firmware for your operating system (I’m on Windows) unzip it and place the content of the resulting folder in your FTP root or desired folder. I tend to put the active firmware under the root and archive older one as they get replaced. So that root looks like this. You can copy it there over RDP or via a FTP client. If the FTP server is running your laptop, it’s just a local copy.

clip_image005

The upgrade process

A word on upgrading the firmware

I you move from a single major level/version to the next or upgrade within a single major level/version you can do non-disruptive upgrades with a High Availability (HA) reboot meaning that while the switch reloads it will not impact the data flow, the FC ports stay online. Everything keeps running, bar that you lose connectivity to the switch console for a short time.

Some non-disruptive upgrade examples:

V6.3.2e to V6.4.3g

V7.4.0a to v7.4.0b

V7.3.0c to v7.4.0b

Note that this way you can step from and old version to a new one step by step without ever needing downtime. I have always found this a really cool capability.

You can find Brocades recommendations on what the desired version of a major release is in https://www.brocade.com/content/dam/common/documents/content-types/target-path-selection-guide/brocade-fos-target-path.pdf

I tend to way a bit with the latest as the newer ones need some wrinkles taken care of as we can see now switch 7.4.1 which is susceptible to memory leaks.

Some disruptive upgrade examples (FC ports go down):

7.1.2b to 7.4.0a

6.4.3.h to 7.4.0b

Our upgrade here from 7.4.0a to 7.4.0b is non-disruptive as was the upgrade from to 7.3.0c to 7.4.0a. You can jump between version more than one version but it will require a reboot that takes the switch out of action. Not a huge issue if you have (and you should) to redundant fabrics but it can be avoided by moving between versions one at the time. IT takes longer but it’s totally non-disruptive which I consider a good thing in production. I reserve disruptive upgrades for green field scenarios or new switches that will be added to the fabric after I’m done upgrading.

Prior to the upgrade

There is no need to run a copy run or write memory on a brocade FC switch. It persists what you do and you have to save and activate your zoning configuration anyway when you configure those (cfgsave). All other changes are persisted automatically. So in that regards you should be all good to go.

Make a backup copy of your configuration as is. This gives you a way out if the shit hits the fan and you need to restore to a switch you had to reset or so. Don’t forget to do this for the switches in both fabrics, which normally you have in production!

You log on switch with your username and password over telnet or ssh (I use putty or kitty)

MySwitchName:admin> configupload

Hit ENTER

Select the protocol of the backup target server you are using

Protocol (scp, ftp, sftp, local) [ftp]: ftp

Hit ENTER

Server Name or IP Address [host]: 10.1.1.12

HIT ENTER

Enter the user, here I’m using anonymous

User Name [user]: anonymous

Hit ENTER

Give the backup file a clear and identifying name

Path/Filename [<home dir>/config.txt]: MySwitchNameConfig20151208.txt

Hit ENTER

Select all (default)

Section (all|chassis|switch [all]): all

configUpload complete: All selected config parameters are uploaded

That’s it. You can verify you have a readable backup file on your FTP server now.

clip_image007

The Upgrade

A production environment normally has 2 fabrics for redundancy. Each fabric exists out of 1 or more switches. It’s wise to start with one fabric and complete the upgrade there. Only after all is proven well there should you move on to the second fabric. To avoid any impact on production I tend plan these early or late in the day also avoiding any backup activity. Depending on your environment you could see some connectivity drops on any FC-IP links (remote SAN replication FC to IP ó IP to FC) but when you work one fabric at the time you can mitigate this during production hours via redundancy.

Log on to first brocade fabric switch with your username and password over telnet or ssh (I use putty or kitty). At the console prompt type

firmwaredownload

This is the command for the non-disruptive upgrade. If you need or want to do a disruptive one, you’ll need to use firmwaredownload –s.

Hit Enter

Enter the IP address of the FTP server (of the name if you have name resolution set up and working)

Server Name or IP address: 10.1.1.12

User name: I fill out anonymous as this gives me the best results. Leaving it blank doesn’t always work depending on your FTP server.

User Name: anonymous

Enter the path to the firmware, I placed the firmware folder in the root of the FTP server so that is

Path: /v7.4.0b

Hit enter

At the password prompt leave the password empty. Anonymous FTP doesn’t need one.

Password:

Hit enter, the upgrade process preparation starts. After the checks have passed you’ll be asked if you want to continue. We enter Y for yes and hit Enter. The firmware download starts and you’ll see lost of packages being downloaded. Just let it run.

clip_image009

This goes on for a while. At one point you’ll see the prom update happening.clip_image011

When it’s done it starts removing unneeded files and when done it will inform you that the download is done and the HA rebooting starts. HA stands for high availability. Basically it fails over to the next CP (Control Processor, see http://www.brocade.com/content/html/en/software-upgrade-guide/FOS_740_UPGRADE/GUID-20EC78ED-FA91-4CA6-9044-E6700F4A5DA1.html) while the other one reboots and loads the new firmware. All this happens while data traffic keeps flowing through the switch. Pretty neat.

When you keep a continuous ping to the FC switch running during the HA reboot you’ll see a short drop in connectivity.

image

But do realize that since this is a HA reboot the data traffic is not interrupted at all. When you get connectivity back you SSH to switch and verify the reported version, which here is now 7.4.0b.

clip_image014

That’s it. Move on to the switch in the same fabric until you’re done. But stop there before you move on to your second fabric (failure domain). It pays to go slow with firmware upgrades in an existing environment.

This doesn’t just mean waiting a while before installing the very latest firmware to see whether any issues pop up in the forums. It also means you should upgrade one fabric at the time and evaluate the effects. If no problems arise, you can move on with the second fabric. By doing so you will always have a functional fabric even if you need to bring down the other one in order to resolve an issue.

On the other hand, don’t leave fabrics unattended for years. Even if you have no functional issues, bugs are getting fixed and perhaps more importantly security issues are addressed as well as browser and Java issues for GUI management. I do wish that the 6.4.x series of the firmware got an update in order for it to work well with Java 8.x.

Creating a bootable VHD or VHDX from an existing one

Creating a bootable VHD or VHDX from an existing one is a great capability to have.There are a couple of reasons why one might need or want to do this. In windows 2012 (R2) this is even a part of normal live migration operations. Storage live migration for example is nothing but the live streaming of the data of your live virtual hard disk into a new VDH/VHDX. You have multiple options when it comes to creating a bootable VHD/VHDX from an existing one and they all serve their specific purposes,which might or might not overlap.

This is great stuff to do migrations, reorganize storage, defrag your internal dynamic VHDX structure etc.  But you’re not limited to those options. When you want to convert from VHD to VHDX you’ll leverage Convert-VHDX. You can also create a new VHDX with an old one as the source with New-VHDX. Great for all kind of operations including off line migration, updates, testing on exact copies of the original disk etc. You might think it’s better to just copy the disk but for a conversion that will not work, that won’t deal with internal fragmentation which can be important for performance testing when your migrating to new storage, a new cluster & Hyper-V version and such.

Recently people asked me if this would work with their OS disk. The virtual disk that the boot from. Yes that will work. Both New-VHD and Convert-VHD will create a fully bootable new virtual disk if the source virtual disk was bootable to begin with. No problem, They have to, if you think about it. Using Convert-VHD to move from VHD to VHDX and even change the cluster sizes of the disk would be no good if the VM doesn’t boot anymore. Like wise with New-VHD.

The only thing that need some real tender loving care is when you convert a VM from generation to generation 2. The script provided to to that by John Howard (MSFT) use fully supported technologies. The script itself is not a supported product, but you’re not doing anything unsupported with it.

So all people needing to convert, defrag or move  VMs to new virtual hard disks. Do a few test to verify your assumptions and go forward. Step into that bright new future you’ve been missing out on for the past 3 years.

DELL Microsoft Storage Spaces Offerings

Dell was the 1st OEM to actively support and deliver Microsoft Storage Spaces solutions to its customers.

image

They recognized the changing landscape of storage and saw that this was one of the option customers are interested in. When DELL adds their logistical prowess and support infrastructure into the equation it helps deliver Storage Spaces to more customers. It removes barriers.

In June 2015 DELL launched their newest offering based on generation 13 hardware.

image

Recently DELL has published it’s docs and manuals for Storage Spaces with the MD1420 JBOD Storage Spaces with the MD1420 JBOD

You can find some more information on DELL storage spaces here and here.

I’m looking forward to what they’ll offer in 2016 in regards to Storage Spaces Direct (S2D) and networking (10/25/40/50/100Gbps). I’m expecting that to be a results of some years experience combined with the most recent networking stack and storage components. 12gbps SAS controller, NVMe options in Storage Spaces Direct. Dell has the economies of scale & knowledge to be one of the best an major players in this area. Let’s hope they leverage this to all our advantage. They could (and should) be the first to market with the most recent & most modern hardware to make these solutions shine when Windows Server 2016 RTM somewhere next year.

Hyper-V Storage QoS in Windows Server 2016 Works on SOFS and on LUNs/CSVs

Introduction

I addressed storage QoS in Windows Server 2012 R2 at length in a coupe of blog posts quite a while ago:

I love the capability and I use it in real life. I also discussed where we were still lacking features and capabilities. I address the fact that there is no multiple host QoS, there is no cluster wide QoS and there is no storage wide QoS in Windows. On top of that, if there is QoS in the storage array (not many have that) most of the time this has no knowledge of Hyper-V, the cluster and the virtual machines. There is one well know exception and that GridStore, possible the only storage vendor that doesn’t treat Hyper-V as a second class citizen.

Any decent storage QoS that not only provides maximums but also minimums, does this via policies and is cluster and hypervisor even virtual machine aware. It needs to be easy to implement and mange. This is not a very common feature. And if it’s exists it’s is tied to the storage vendor, most of the time a startup or challenger.

Windows Server 2016

In Windows Server 2016 they are taking a giant step for all mankind in addressing these issues. At least in my humble opinion. You can read more here:

Basically  Microsoft enables us to define IOPs management policies for virtual machines based on virtual hard disks and  IOPs reserves and limits. These are shared by a group of virtual machines / virtual hard disks.  We can have better resource allocation between VMs, or groups of VMs. These could be high priority VMs or VMs belonging to an platinum customer /tenant. Storage QoS enhances what we already have since Windows Server 2012 R2.  It enables us to monitor and enforce performance thresholds via policies on groups of VMs or individual VMs.

Great for SLA’s but also to make sure a run away VM that’s doing way to much IO doesn’t negatively impact the other VMs and customers on the cluster. They did this via via a Centralized Policy Controller. Microsoft Research really delivered here I would dare say. A a public cloud provider they must have invested a lot in this capability.

At Ignite 2015 there was a great session by Senthil Rajaram and Jose Barreto on this subject. Watch it for some more details.

What caught my eye after  attending and watching sessions, talking to MSFT at the boot was the following marked in red.

image

So not enabled by default on non SOFS storage but can you enable it on your block level CSV Hyper-V cluster? There is a lot of focus on Microsoft providing Storage QoS for SOFS. Which ties into the “common knowledge” that virtualization and LUNs are a bad idea, you need file share and insights into the files of the virtual machines to put intelligence into the hypervisor or storage system right? Well perhaps no! I Windows Server 2016 there is now also the ability to provide it to any block level storage you use for Hyper-V. Yes your low end iSCSI SAN or your high End 16Gbps FC SAN … as long as it’s leveraging CVS (and you should!). Yes, this is what they state in an awesome interview with my Fellow Hyper-V MVP Carsten Rachfahl at Ignite 2015.

Videointerview with Jose and Senthil Storage QoS Thumb2

Senthil and Jose look happy and proud. They should be.  I’m happy and proud of them actually as to me this is huge. This information is also in the TechNet guide Storage Quality of Service in Windows Server Technical Preview

Storage QoS supports two deployment scenarios:

Hyper-V using a Scale-Out File Server This scenario requires both of the following:

  • Storage cluster that is a Scale-Out File Server clusterCompute cluster that has least one server with the Hyper-V role enabled.
  • For Storage QoS, the Failover Cluster is required on Storage, but optional on Compute. All servers (used for both Storage and Compute) must be running Windows Server Technical Preview.

Hyper-V using Cluster Shared Volumes. This scenario requires both of the following:

  • Compute cluster with the Hyper-V role enabled
  • Hyper-V using Cluster Shared Volumes (CSV) for storage

Failover Cluster is required. All servers must be running Windows Server Technical Preview.

So let’s have a quick go following the TechNet guide on a lab cluster leveraging CSV over FC with a Dell Compellent.image

Which give me running Storage QoS Resource

image

And I can play with my new PoSh Commands … Get-StorageFlowQos, Get-StorageQosPolicy and Get-StorageQosVolume …

image

The guide is full of commands, examples and tips. Go play with it. It’s great stuff Smile. I’ll blog more as I experiment.

Here’s my test VMs doing absolutely nothing, bar one on which I’m generating traffic. Even without a policy set it shows the IOPS the VM is responsible for on the storage node.image

Yu can dive into this command and get details about what virtual disk on what volume are contributing to the this per storage node.

image

More later no doubt but here I just wanted to share this as to me this is very important! You can have the cookie of your choice and eat it to! So the storage can be:

  1. SOFS provided (with PCI RAID, Shared SAS, FC, FCoE, ISCI storage as backend storage) that doesn’t matter. In this case Hyper-V nodes can be clustered or stand alone
  2. The storage can be any other block level storage: iSCSI/FC/FCoE it doesn’t matter as long as you use CSVs. So yes this is clustered only. That Storage QoS Resource has to run somewhere.

You know that saying that you can’t do storage QoS on a LUN as they can’t be tweaked to the individual VM and virtual hard disks? Well, that’s been busted as myth it seems.

What’s left? Well if you have SOFS against a SAN or block level storage you cannot know if the storage is being used for other workloads that are not Hyper-V, policies are not cross cluster and stand alone hosts are a no go without SOFS. The cluster is a requirement for this to work with non SOFS Hyper-V deployments.  Also this has no deep knowledge or what’s happening inside of your storage array. So it knows how much IOPS you get, but it’s actually unaware of the total IOPS capability of the entire storage system or controller congestion etc. Is that a big show stopper? No. The focus here is on QoS for virtualization. The storage arrays storage behavior is always in flux anyway. It’s unpredictable by nature. Storage QoS is dynamic and it looks pretty darn promising to me! People this is just great. Really great and it’s very unique as far as I can say. Microsoft, you guys rock.