Many organizations have a good handle on external risk. They've implemented disaster recovery (DR), business continuance and security measures to protect their data and applications. On the internal security front, companies have instituted systems that limit physical and digital access to critical systems to reduce the likelihood of a disgruntled or unauthorized employee purposely or accidentally damaging/absconding with crucial data. But while focusing on these obvious perils, firms may overlook the seemingly mundane--but potentially more damaging--dangers that can arise due to lax administration and procedures.
Inadequacies in storage governance and weaknesses in data management are often subtle and may pose far less-visible risks to a company's data. To mitigate these threats, you must be aware of the impact and probability of these risks so you can take pre-emptive action to reduce or eliminate them.
Internal risks stem from two broad exposure areas:
- Governance exposures: weaknesses in management practices (policy, procedure and control infrastructure)
- Data exposures: weaknesses and inadequacies in data protection
By consciously evaluating and addressing these areas, you can substantially reduce threats to your data, lower costs and improve business-unit relations.
Alignment: When IT and business units have common goals, a partnership of enablement (and even appreciation) supplants the old view of IT as a necessary evil or even an impediment. Lack of alignment can result in inadequate or poorly communicated policies that can cause data to be inappropriately handled and exposed to undue risk. You can test for alignment using soft or hard measures. Soft measures include an assessment of your relationship with the CIO, as well as an assessment by managers and key business analysts with their counterparts in the business community. Some issues to consider include how often to meet, whether to converse on an ad-hoc basis or only at scheduled meetings, and so forth.
More empirical measures include defining policies for interaction between IT and business units. For example, company policy might require IT to provide services in tiered offerings with the business units responsible solely for choosing (and paying for) those services. In such a case, you may consider tracking the following:
- The percentage of the IT budget related directly to business unit-initiated projects
- The percentage of the IT budget spent on maintenance vs. development
- Time delivery of commitments, service levels, problems and projects
- Business unit satisfaction
- A defined process to regularly ensure continuous alignment
Cost management: Data management costs may be another indicator, as cost overruns reflect badly on how efficiently storage is organized and managed. Gartner Inc. and other analyst firms say that 70% of a storage organization's costs are for administration, not hardware. Besides knowing where your budget dollars go, you should consider:
- How costs are tracked
- If a formal cost model identifies realistic costs to provide specific services to business units
- The ability to correlate operational metrics to costs
- Whether staffing levels are built on an empirical basis of a known transaction handling capability (e.g., number of alerts or number of tape movements)
Asset inventory: It's difficult to manage something you don't know about. If storage assets at the component, connection and dependency level aren't documented, inadequate change management can open the door to risk. The interdependencies of all hardware and software components in the environment must also be documented, or unwelcome consequences can occur. For example, connecting another server to an available port can impact interswitch links and increase latency to the point where a key database application is disabled, perhaps losing data until the problem is fixed.
Roles and responsibilities: Well-defined roles and responsibilities are essential, but lines of demarcation between job functions must be clearly drawn. In many organizations, these lines are often vague and responsibilities seem to overlap. For example, who's responsible for host bus adapter installations--the server group or storage staff? Another example may involve interaction between backup architects and operations staff regarding ownership of backup servers and the backup LAN. Poorly defined responsibilities could result in important activities being overlooked, leaving data insufficiently protected.
In most organizations, IT services are requested in a variety of ways and in a timeframe that inevitably requires an understanding of cross-functional workflow, inputs, outputs, handoffs and control points. A classic example of such a cross-functional requirement is the process governing change control and provisioning. The interactions, roles, responsibilities and cross-functional handoffs involved in these processes must be documented and understood, and buy-in by all those participating in the process is required. If roles, responsibilities and demarcation lines aren't clearly understood, there will be no accountability.
Staffing and organization: Staffing levels ensure that defined responsibilities can be met, but they're often a cause of friction between IT and those who control the budget. This contention may be exacerbated because IT often has difficulty making an empirical case for staffing (see "Building a staffing model," below). The old adage of "X number of storage administrators per a certain number of terabytes" is too broad to be useful. A method to calculate workload for each task based on tangible entities such as alerts, provisioning requests and changes is essential to make accurate staffing decisions. This ensures that workloads can be staffed without compromising service levels or opening the firm to risk by using underskilled people for critical storage tasks.
Procedures: Standard operating procedures (SOPs) are a key element to mitigating risk. Procedures need to be in place to ensure data consistency and quality. SOPs provide a baseline, demonstrate to auditors that a defined process has been executed, and show that compliance, completion and quality metrics have been produced. SOPs allow a consistently repeatable process with lower-level skills. Without SOPs, consistent results can't be guaranteed.
Value of data: Another key risk is being unable to ascertain the value of data under your management. If data hasn't been valued, it's unlikely to be managed appropriately and may not be available when needed. Inevitably, the value of data is equivalent to the value of the business application accessing it.
In many organizations, once an application has been implemented, its associated data is identified and placed into complex command lines of various backup engines. Over time, all trace of ownership becomes lost and no one knows what application owns a particular file. It's critical to be able to tie data to an application and to tie that application to the business unit it services.
The key identifier of data--its application and related business owner--must be maintained in a manner similar to that of an asset register. This register should include data interdependencies, both parent and sibling. Without an understanding of where data comes from and where it goes, application interdependency can't be determined. Without application interdependency, recovery of a logically verifiable point becomes extremely difficult, if not impossible. This means that while applications and data may be recoverable, the combined application functionality may fail because dependencies weren't synchronized and managed during the recovery process.
Archiving awareness: With the emphasis on information lifecycle management and compliance, many organizations are migrating data from high-cost primary storage tiers to lower-cost tiers or even to offline storage. If the migration involves moving rather than copying data, applications may need to be archive-aware. If data is moved from the application into a structure the application can't access, retrieving data archived for compliance activities in a timely manner may be difficult.
Data retention: Retention policies are often created without close consultation with business units or are based on rudimentary compliance requirements. Organizations often institute an across-the-board retention policy. In years past, data volumes weren't high enough to warrant particular differentiation and there were few compliance requirements beyond IRS retention rules. Today, data is growing at an annual rate of 50%-plus and company lawyers are increasingly tempted to mandate keeping everything forever to avoid the consequences of non-compliance. In this environment, it's critical to develop a retention class of service with attributes identifying retention periods for various legislative initiatives, as well as required immutability, rendering, integrity and security attributes. Addressing these complex issues outside the framework of a class of service can lead to significant complexity, which could impact administration costs and lead to potential legal exposure in retention, retrieval and security compliance.
Recovery objectives: Unrealistic recovery point objectives (RPOs) and recovery time objectives (RTOs) are a major risk exposure. In an attempt to respond to business needs, RPO and RTO commitments may be made that don't adequately consider the realities imposed by logistics and technologies. From a logistics perspective, any RTO of less than 12 hours will probably require an automated failover. It's impractical to expect people to evaluate and declare a disaster, initiate DR at alternate points, notify the DR team and sequence the recovery, resynchronization and restart of applications in less than 12 hours.
On the technical side, the infrastructure needed to support a one-hour RTO is the same as for a four-hour or eight-hour RTO. It's only when the delta hits 24 hours that significant differentiation in the support infrastructure is required, except perhaps in very small organizations with a limited number of servers. Getting the DR team to the alternate site is one major challenge; sequencing the multitude of servers that need to be brought back is another. Once physical recovery has been completed, additional effort is required for logical synchronization that can blow out the most practical recovery objectives. DR tests that use a Dev/Test/QA infrastructure to bring applications into actual production mode (i.e., operated by users) for a 24-hour test period will reveal any exposures in this area.
Data integrity: At some point in any compliance investigation, you must prove that your data hasn't been changed by unauthorized people/functions and hasn't been corrupted by intent/malfunction. The policies and SOPs supporting the protection of data integrity must include audit-agreed checkpoints and controls. It's not enough to have data archived on WORM media. Your SOPs and related controls must demonstrate a chain of custody that protected data from change the moment it reached a status that demanded immutability. A great example of this need is when to capture immutable copies of received and sent e-mails. Certainly, it must precede any capability for deletion or modification; the SOP must demonstrate--through completion, compliance and quality metrics--the consistent accomplishment of stated goals.
Data rendering: When data is archived, retrieval requirements may prevent the data from being rendered. If data requires rendering to information, risk may be incurred if the original platform application is unavailable. For example, if invoice data is archived, can the application be used to subsequently render that data back into information, i.e., the invoice? This is a critical issue that an organization's legal team needs to address. If data can't be rendered, it must be stored as information using an interchangeable format such as XML.
Data Security: Security is a major issue in every organization, but most of the focus has been on access control, intrusion detection and containment. While controlling access to servers limits access to data, there are many other paths to this data. Any management device in the Fibre Channel or Ethernet fabric provides a potential entry point for an intruder. Data in production can be at risk if these exposures aren't carefully managed. Data at rest is also significantly exposed; this has been dramatically demonstrated by recent reports of lost backup tapes containing sensitive data. Encryption techniques are touted as risk mitigation, but encryption raises its own risks related to retaining, securing and accessing the encryption key when needed. In storage environments, attention should be paid to securing data moving over the desktop LAN, the WAN, backup-based networks and specialized high-speed, point-to-point networks. The obvious issue is whether the data can be read as it travels over the link. Additional exposure comes from allowing development and test staff to have free access to live data that may include sensitive information.
Awareness is the first step in reducing data risk. By considering the internal risks outlined here, you can develop an appropriate risk profile and mitigation plan (see "Getting started," above). Sharing your risk analysis and mitigation plans (including business impact issues) spreads the responsibility around. It will also provide an empirical basis for CFO and CEO support for any necessary investments.