Almost every company I deal with is working on standardizing storage (and backup) management under a single umbrella. But for various reasons, most haven't gotten far. If your organization wants to move to a storage management group, but you're not sure how to get it there, consider this step-by-step approach.
Step 1: Evolve and specialize
Let's get this out of the way up front: You're understaffed. Every IT group I've seen has too few people expected to fill too many roles. But another organizational problem is even worse: Expecting a staff of generalists to emerge as IT heroes.
Technology is too complicated for anyone to understand it all. And storage is too different from servers and applications to be a part-time focus. Most generalist administrators choose what they want to work on, or they're deflected by the inevitable crises that plague IT infrastructure. In the worst cases, entire IT staffs run about like Keystone Kops chasing after the same problems.
The biggest loser, as always, is backup. Failing to protect data can go unnoticed or ignored for months at a time, as long as there isn't an incident requiring a restore. So assessing the coverage and success of backups often ends up at the bottom of the to-do list.
The time is ripe to build an organization of specialists. Your existing staff probably already has the skills to manage the current environment, if only they were allowed to focus on it. You probably already have a guy who knows all about EMC Corp.'s Symmetrix or a gal who works with Veritas Software Corp.'s NetBackup. Let them be the seed employees for your new storage management group.
Step 2: Playing a part
The next step is defining the demarcation of the storage management group, and the roles each member will play. This will give you a better idea of how many people you'll need than any "terabytes per admin" metric. Here are some basic roles and responsibilities:
- Group leader. Someone has to be responsible for the success of the organization. The lead has to be able to work with other groups, both within and outside IT, to determine how to map your technical capabilities to a business strategy.
- Storage engineer. In small environments, a single storage engineer can design, implement and debug all of the disk storage, but larger groups may opt to distribute these responsibilities.
- Backup engineer. This position is responsible for making sure your backup system runs according to plan and that new storage is protected appropriately.
- Business analyst. The business analyst is charged with harmonizing your technical capabilities with the demands of the greater organization.
- Operators. Someone has to watch the big board and call for help when the indicators go red. The IT operations group that's already in place is best suited for these tasks and handling other duties, such as the daily labeling and shipping of tape cartridges.
These are the basic roles, and understudies will also be needed in case the leads are unavailable. And some environments are just too large or complex to rely on one or two people. So how many of each type do you need? I like to have at least one engineer per 10 systems or two locations. Complexity and time zones can more than double that ratio.
I caution against combining these roles, even in small organizations. For example, backup is more likely to be overlooked if it's not someone's primary job responsibility. So assume that at minimum, you'll need four or more people in your storage management group.
Step 3: What's your policy?
Now that you've got the band together, make sure everyone's playing the same tune. Decide exactly what the group--and each individual--is expected to do. People tend to hear what they want to hear, rather than what's really being said, so commit your policies to paper.
First, define the team's responsibilities. The question of demarcation usually boils down to these three questions:
- Do you configure volume managers and file systems on servers? If so, you'll need root/administrator access, and you can expect to be part of the planning and debugging process for servers.
- Does the storage realm end before or after the host bus adapter? Once you get inside the server's case, you have to be prepared to take on much more work as system changes happen.
- How about managing a Fibre Channel (FC) or Ethernet storage area network (SAN)? Cisco Systems Inc. SANs and iSCSI blur the once-clear line separating network and storage teams.
Once everyone agrees on team responsibilities, it's time to determine the specific tasks that each member will take on. I've outlined four general roles, but individual job descriptions are also needed to make sure everyone understands expectations.
A service framework like the IT Infrastructure Library (ITIL) can offer some ideas for the tasks that will need to be performed. But ITIL predates the storage specialty, so some creativity is needed to develop a complete list. At GlassHouse Technologies, I was part of the team that defined our storage-specific service framework (the GlassHouse storage management lifecycle), so I can vouch for the amount of effort that's required to put a good list together.
Step 4: Protect and serve
The time has come to make it happen. SLAs are the key to working with customers for a service organization. The trick is to be proactive: Decide on the levels of service that you intend to offer and then get buy-in from the end users. Here's what to do:
- Translate your technical capabilities into service level definitions. Think about things in terms that your users will understand--talk about availability and performance, rather than RAID levels and FC.
- Create "value meals" of typical service level choices. These are the tiers of service you will offer your customers. Avoid using loaded terms such as "Class A" or "Tier 3" that may make business users or applications seem less important than others. Instead, cite specifics of service levels with phrases such as "fully redundant" and "copied off site."
- Share the financial implications of SLA choices with your users. Not everyone wants a chargeback scheme, but most people will understand that shipping every byte written to a remote data center can be an expensive proposition. So be prepared to discuss the costs associate with the different options.
Step 5: Turn process into procedure
I've mentioned process maturity at least twice here, but a third time won't hurt. The predictability and repeatability of a process is directly related to its level of definition. If you want to keep your customers happy, you have to take your ad hoc processes and turn them into concrete standard operating procedure (SOP) documents. Then you have to ensure that the procedures are followed every time.
First, look at the tasks that you defined in step 2. Anything that's standardized and frequently repeated should have an SOP. Come to a consensus about how those tasks should be performed and then write it down. That's how a process is transformed into a procedure.
Not everything has to be documented in an SOP. But even if only the 10 most important tasks are documented--and the procedures followed--it will go a long way to stabilizing the environment. My top picks for SOPs are storage and backup provisioning, change control, monitoring and escalation of alerts and service level reporting.
Remember:Anything your new storage management group isn't doing internally should be as defined as possible. Insist on written SLAs and SOPs, especially if you plan to implement a "virtual team" or in-sourced management operation. Mature processes can help ensure success.
Even if you already have a central management team, you may not have completed the whole process as laid out here. If you have been through all this, I'd love to hear from you. Drop me a line and we can all benefit from your experience!