Many organizations have a good handle on external risk, having implemented strong architectures, disaster recovery (DR), business continuance and security mitigation. The hidden areas of risk in internal operations are far less visible and can arise unbidden at any moment from inadequacies in storage governance as well as from weaknesses in data management and protection. If not consciously addressed and mitigated, these internal risks can result in outcomes similar to those of your external risks. Here is what you can do to help protect yourself:
Get closer to your users. Everyone talks about alignment but no one seems to know how to measure it. When IT and business units have common goals, a partnership of enablement (and even appreciation) supplants the old view of IT as a necessary evil or even an impediment. You can get a better understanding of alignment by examining the following factors:
- Percentage of budget on new projects vs. maintenance
- Percentage of projects directly aligned with business projects
- Business analyst ratio to business units
- Percentage of users satisfied (survey)
- Ratio of business analysts to business units
Know the real cost of storage. The cost of production storage often triggers a requirement for secondary storage of 10 to 20 times the production storage. This is needed for backup, archiving, DR and even development, test and QA copies. Its prudent to include the following items in your storage cost model:
- Cost of backup (including regular testing for recovery compliance)
- Cost of archiving (and retrieval, particularly if beyond media life or platform life)
- Cost of maintaining DR copies of the data
Track your environment. It's difficult to manage something you don't know about. If the asset inventory and interdependency is not up to date and visible it can impact the integrity of the change management process by failing to identify downstream risk. Connecting another server to an available port in an undocumented environment can impact inter-switch links and increase latency to the point where a key database application is disabled, perhaps losing data until the problem is fixed.
Define demarcation. Well-defined roles and responsibilities are essential, but lines of demarcation between job functions must also be clearly drawn. Just a few of the most contentious demarcation issues are:
- What are the handoffs between operations and engineering?
- What are the handoffs between engineering and architecture?
- Who's responsible for host bus adapter installations -- the server group or storage staff?
- Who owns the backup servers?
- Who owns the backup LAN?
- Who decides safety factors in storage allocation for the database?
Sell realistic staffing levels. We all know the risk we run through inadequate staffing levels. An empirical method to calculate workload based on tangible entities is the key to removing the subjectivity from the equation and a prerequisite to selling the request to the CFO. Develop a staffing model based on actual events and activities such as average time to handle alerts, provisioning requests, restores and changes. The resulting model provides an empirical foundation for staffing levels and one that can be dynamically modified to reflect changes in volumes, competencies and "what if" situations.
Insulate against audit. Standard operating procedures (SOPs) mitigate risk by supporting documented and measured repeatability. SOPs demonstrate to auditors that a defined process has been executed, and executed in accordance with the procedure, as well as identifying compliance, completion and quality artifacts. Without SOPs, consistent results can't be guaranteed and worse, you can't demonstrate to auditors and to your boss just how good you really are.
Understand the value of the data: If you are unable to ascertain the value of the data under your management it's unlikely to be managed appropriately. More attention needs to be given to the care and feeding of high-value data. This is often determined in a BIA process but can also be determined through a review of the organization's finances to see which business units make the most contribution. Tiering your efforts, architecture and SOP based on value of data will help ensure that the organizations most valued assets are treated accordingly.
Architect for data protection: Backup and archiving requirements are exploding. In the current compliance sensitive climate, it's critical to develop a formal class of service for data protection. The attributes for each tier can then be used to drive the appropriate architecture and SOP. Attributes for this architecture should include at least:
- Retention periods for various legislative initiatives
- Required immutability
- Rendering constraints
- Integrity artifacts
- Security requirements
- Chain of custody requirements
- Indexing needs
- Retrieval time objectives.
- How will the recovery team get to the DR site, will they want to?
- Does the recovery clock start at time of the actual disaster or at time of declaration. The evaluation and declaration process can take anywhere from 1 to 12 hours. Then, the team needs to be notified and brought together before you can even start.
- In a multiple-machine environment, the infrastructure needed to support a one-hour RTO is often the same as for a four-hour or eight-hour RTO. It's only when the delta hits 24 hours that significant differentiation in the support infrastructure is required are possible.
- DR tests that bring applications into actual production mode (i.e., operated by users) for a 24-hour test period is the only real test of DR capability.
Secure the storage. Beyond host access, the storage team needs to think about risk mitigation in the following areas:
- Role-based access to any management device in the storage environment
- Standards to prevent spoofing of world wide names or misplaced HBAs
- Dedicated management LANs, air-gapped from production LANs
- Pros and cons of encryption for data at rest
- Where the encryption should take place: inline, offline, file-based, column-based
- Can data be trapped or monitored as it moves over LAN segments?
- Can data be trapped or monitored as it moves over WAM segments?
Awareness is the first step in reducing data risk. By considering the internal risks outlined here, you can develop an appropriate risk profile and mitigation plan. Sharing your risk analysis and mitigation plans (including business impact issues) spreads the responsibility around. It can also provide an empirical basis for CFO and CEO support for any necessary investments.
Do you know...
Read "Hidden threats to your data" in its entirety.
About the author: Dick Benton is a principle consultant at GlassHouse Technologies, Inc.
This was first published in July 2006