But now the phone is ringing off the hook. Users want to know what they should do first, while vendors are assessing how much security is needed in their products and when. This is a huge and very encouraging change, but storage security is a subset of a bigger topic: storage risk management.
I recently had the opportunity to discuss risk management with three storage security services firms: Computer Associates, GlassHouse Technologies and Kasten Chase. Each of these firms offers risk and gap analysis assessments that define problems; measures performance against industry standards like IT Infrastructure Library/IT Service Management (ITIL/ITSM), Committee of Sponsoring Organizations/Control Objectives for Information and related Technology (COSO/COBIT), or ISO 17799; and recommends remediation activities to address deficiencies and decrease overall storage risk.
These meetings were eye-openers to say the least. I heard horror stories, as well as a number of common mistakes worth detailing here. Before I do that, however, let me clarify what I mean by risk management as it relates to storage activities and infrastructure.
There's probably an official classification of risk, but to me storage risks can be defined as the following:
- An event or process that can lead to or extend storage downtime
- An event or process that can lead to data corruption or theft
Subtle DR issues lurk around every corner
To meet recovery point objectives and recovery time objectives, strong DR practices depend on meticulous attention to detail. Services professionals I spoke with pointed to the following common areas where this discipline is lacking:
- DR sites are often too close together. I heard several stories of large companies with DR facilities a few miles from production data centers. This model usually had historical roots to when a "disaster" equated to a hard disk crash.
Back in the good old days, you could put your storage professionals in a car and drive them across town to grab some backup tapes. Unfortunately, today's disasters aren't as quaint. Storage professionals need to think in terms of events like Sept. 11 and Hurricane Katrina. Geographic separation is important when an entire area is out of commission and employees are too busy protecting their families to show up for work. This is such an important detail that the SEC once suggested (and nearly mandated) that DR facilities should be located hundreds of miles apart. If your DR facility is a short drive away, you'd better find a more distant location, pronto.
- DR is built around systems not business applications. Storage professionals are trained to ensure that data is available, but raw data isn't very useful without the applications that turn ones and zeros into business processes. To obviate this problem, storage professionals need to map data, files and LUNs to applications to understand data value and properly plan recovery, and then work with IT to establish a map of application interdependencies. This will help to ascertain the data's value and set the process for seamless recovery.
- Testing is too high level. This results from an IT- and storage-centric perspective. I heard numerous stories about how storage professionals did regular DR testing of their storage and system infrastructure, but not all the way up to the application layer. Restoring business applications is certainly a harder task, but ask yourself this question: "Does anything else really matter?"
To me, the downside to the ever-lower price of magnetic disk is the challenge of managing data. Service professionals I spoke with reported that most people have a very limited knowledge about what they have stored and where it lives. Common scenarios include:
- Loads of garbage. I heard identical stories from two different service providers after each one performed a storage assessment to determine the actual content being stored on an enterprise-class storage system. One found an IT administrator's complete music library, while the other found a terabyte of porn. To some extent this waste is understandable as it costs more to police disk utilization and content than it does to just let it go. Nevertheless, an employee's private terabyte directory is a bit excessive. This is especially true when it contains offensive material that could lead to legal trouble. Companies should adopt acceptable-use policies with strict penalties, but the storage team is a last line of defense. Audit storage content on a regular basis to avoid legal problems and unnecessary capital spending.
Processes remain informal
Government regulations like Sarbanes-Oxley and industry mandates like the Payment Card Industry Data Security Standard demand defined and auditable controls, but many storage groups just aren't there yet. My meetings revealed that many organizations suffer from the following:
- Willy-nilly processes that depend on individuals (The IT hero syndrome). One or two people on the storage team are considered gurus and they're the firefighters who put out the flames. The problem is that critical data availability depends upon individuals, not documented and repeatable processes. This places the whole organization at risk when the storage superhero is out sick or on vacation. Storage executives should institute formal standard operating procedures and document everything.
- Lax day-to-day operations. When it's time to do a big storage consolidation project, storage professionals tend to be extremely meticulous at managing the details. But this care goes out the window during the daily grind. Storage administrators often make undocumented changes on the fly--behavior that can be the equivalent of a ticking time bomb. When something finally breaks, there's no information documenting technical changes or administrative access. A relatively minor hiccup can become a major headache without this audit trail.
- Fuzzy lines of delineation. The handoff between storage groups and other IT teams is a frequent grey area. The lack of clearly defined responsibilities leads to two untenable consequences: either too many people are involved in redundant activities or critical tasks remain undone as each group assumes that it's the other person's job. Formal processes can help, but strong management, cooperation and communication are also needed.
To paraphrase an old security saying, "The risk management chain is only as strong as its weakest link." You can spend gobs of money, create volumes of documents and hire the best minds in the business, but if you haven't tested your business recovery at the business application layer, you may already be in trouble.
Storage executives need to look at risk from a business perspective and include people, processes and technologies in their assessments and action plans. Testing and auditing are also perennial requirements. If you're lucky, you'll never need this preparation, but there sure seems to be a lot of bad luck around these days.