You know the drill: The server team upgrades to a new operating system version and, in doing so, introduces connectivity problems into your storage environment. You discover the only way to fix the problem is to upgrade the firmware. You dig a little deeper only to realize the upgrade has dependencies that will affect your entire environment, launching you into a firmware- and driver-upgrade frenzy filled with lots of planning, lost weekends and late nights.
Many storage administrators don't manage drivers or firmware until something happens, such as an upgrade or new hardware implementation. Of course, the amount of firmware you need to manage is in direct proportion to the complexity of your environment and the systems you've implemented. Some vendors don't stipulate firmware and driver versions as long as the other systems in the environment support installed versions. Other vendors have charts, matrixes, spreadsheets and executables for every proprietary product in the environment, all of which seem to overlap and have cross dependencies, making a firmware upgrade a heavily researched and carefully planned event. Most environments seem to fall between these two extremes.
|Firmware upgrade matrix|
Get it down on paper
When evaluating and possibly upgrading your environment, the first step is to document everything in the storage area network (SAN), including:
- Make and model numbers of servers and their firmware versions (motherboard, etc.)
- Operating system versions, service packs and patch levels
- Host bus adapter (HBA) firmware, drivers, failover or redundancy software
- HBA load-balancing software
- Infrastructure firmware/software versions (switches, directors, SAN extenders)
- Storage resource management (SRM) software package and current revision levels
- Disk subsystems and proprietary software
Some SRM packages and software from disk subsystem vendors can gather much of this information. For example, EMC Corp.'s Grab utility corrals all of the server, EMC software and HBA data, and EMC will convert this information into a Host Environment Analysis Tool (HEAT) report, which is a cleaned-up version of the Grab output. This report also compares the server's configuration to that of EMC's support matrix and highlights anything out of compliance. Only EMC can convert grabs to HEAT reports.
However you gather this information, make sure you create a document that's easy to access and update by anyone on your team. Some administrators try to keep all of this information in a spreadsheet, some write scripts that pull the data from their SRM tool and stick it in a flat file, while others create and store individual server configuration files tucked away in a shared folder.
If you have an accurate record of your firmware and driver revisions, you're light-years ahead of most of your peers. However, many documentation efforts start with great intentions only to be abandoned within a few months because of the effort required to maintain this information. It's a time-consuming chore, but having all of this information readily available is worth the effort.
|Get a grip on your environment|
Once you have all the firmware and driver information in a format you like, it's time to determine what needs to be upgraded. Most firmware and driver requirements come from the disk subsystem vendors, rather than the operating system, server, HBA or infrastructure manufacturers. Start by understanding what your disk vendor's specification limits are and their requirements; then do the same for the rest of the environment. Make sure to include any SAN-related software packages such as HBA failover, SRM, etc. Certain software packages may require minimum firmware levels on some of your hardware. Understanding all of these requirements will help you to formulate an upgrade plan.
Consult with your team as well as vendors to determine the versions of firmware you need for your upgrade. Consider issues such as bug fixes, feature releases and compatibility concerns. Once you have an idea of what you want the environment to look like, it's time to start documenting again.
For this round of documentation, put the firmware and driver requirements in a matrix organized so that the rows and columns have the same titles. There should be an entry for everything in your environment. You may also want to include the most current firmware, driver or software versions from each of the hardware and software components in your environment. The matrix should reflect where you want your environment to be, not what it is today.
When building the matrix, many cells may be blank or require information that's not available; try to make the entries as detailed as possible, as overlooked items might introduce problems. Don't be surprised if many of your vendor's firmware requirements have related firmware or driver requirements. The more complex the environment, the more likely it is that you'll need to make revisions to your matrix.
Once you figure out and document where you need the environment to be, look for those products that have multiple firmware requirements and work with those first. Determine which version(s) of firmware satisfy all of the requirements.
For example, in our sample firmware matrix (see Firmware upgrade matrix), if we choose an HBA firmware version between 1.8 and 1.9, we satisfy the HBA requirements for every element in our environment. If we choose the most current version available, 2.1, we may introduce problems with HBA failover and our disk subsystem. If you decide to go with the latest version, your disk vendor may ask you to back rev if new problems arise. Particular attention should therefore be paid to disk vendors' requirements. Typically, a disk vendor will get the initial call for help regardless of the actual source of the problem you're experiencing. This is probably why disk vendors' requirements are the most restrictive.
Take steps to upgrade
If you haven't performed firmware and driver upgrades in some time, it may be necessary to take a phased approach to upgrading. You might have to upgrade your HBAs to a newer version to upgrade another product and, once that second product is upgraded, you may have to upgrade the HBAs again to satisfy other requirements in your environment. Whatever the case, best practice calls for upgrading only one thing at a time and then verifying that the environment is still stable.
Once you've made the necessary firmware upgrades, update the environment document with the new firmware information. You may want to retain a copy of the original to use as a historical reference in the event you need to troubleshoot or roll back firmware versions.
Make it easier next time
How can you eliminate some of this work in the future? If you answered "standardize," give yourself a pat on the back. Try not to let the purchasing agent buy whatever HBA is on sale and make sure the server team doesn't use any old firmware or driver version for that HBA. Create unambiguous standards with sufficient detail to indicate the specific firmware and drivers required for each component make and model in your system. You should supply all the firmware, drivers and software you expect everyone to use. An intranet or network share can be used to publish all software revisions and related information.
Job No. 1 for storage administrators and managers is to keep the system up and running. Ignoring firmware and driver upgrades may help you to avoid the pain associated with upgrades, but it may cause many more problems in the long run with unplanned outages, intermittent problems or unnecessarily complicated upgrades when adding new hardware. Setting up a good system will pay off now and later because just when you think you're caught up with firmware upgrades, it'll probably be time to start all over again.