Questions concerning automated data centers, Part 1
I read your predictions for 2004
regarding the coming automated solutions for storage and have some reservations, especially around the concept of the "completely automated data center."
A couple of the observations you made I'd like to comment on are shown below in quotation marks.
1. "...be able to create a policy that automatically eliminates stale, useless, and non-essential data."
This concept is a bit problematic from a RIM perspective. Unless the records and information management organization is involved in the development of this policy, it is nearly impossible to ensure the policy takes into account the need for specific data to be managed appropriately and in accordance with specified retention periods. Most of which are decided upon once and remain constant but some of which change periodically. In addition to the specified periods, some records are called into litigation and must be held by organizations once a legal action takes place. The system designed must be able to accommodate these periodic "destruction moratoriums", which can crossover the lines established by the data management policy. This is especially true if the policy is designed (as you suggest) along the lines of applications.
You also provided a list of questions that need to be asked when investigating solutions, which included the following:
2. "What data-types should be stored on expensive high-end storage and which on tape or cheap disks."
This can be a rather dynamic situation, changing based on business needs over time and the decisions made need to be flexible enough to accommodate these potentially changing needs. Data that is commonplace one day may become critical to the organization at a later date, and decisions made to move it to "tape and/or cheap disks" may regrettably be irreversible.
3. "What are the performance metrics for which data types?"
Again, the concept of making the policy decisions based on "data types" can be a flawed one, when the assigned value to a data type may change over time. The systems designed needs to have enough flexibility in it to modify decisions once made, based on changes to retention requirements or other changes.
Thank you for your thoughtful reply. You make some very good observations where I would need to delve a bit deeper than I did on my initial remarks. See below for my response to each.
1. You are absolutely correct. In the space I had in the initial column, I did not have room to cover the details of each point. Regulatory compliance
for SEC rule 17a-4, HIPAA
, Sarbanes-Oxley Act
, etc. include retention policies for data that may have to be readily accessible for years. The data also must be kept secure (HIPAA), and stored in a format that is non-modifiable (SEC 17a-4). The old method here was to use WORM
-based storage solutions for the retention period. This is changing now with the update to the current rules. Object-based storage arrays can now be used in place of WORM as long as they can guarantee no modification to data and data that has been archived can be recovered at the whim of any auditor.
The records information management
group of the company, the legal department, and up to and including the CFO and CEO level need to be involved in the policy process for this type of data, which can include e-mail, scanned images, Instant messages, or any other mechanism used between client and broker. That is why I mentioned in my predictions that the hard part of creating policy-based solutions to automate the way IT handles data, especially for financial institutions, is the up front planning that needs to be done by all the business groups involved.
2. You are again correct. This is why the solution used to manage data movement between different pools of storage requires intelligence. The
initiative will hopefully give the ability of SRM/SAM management solutions to dynamically inquire the devices on the storage network and determine their capabilities. This needs to be done across heterogeneous
platforms. Each storage device will be able to advertise its properties, functions, methods, via UDDI and it's associated CIMOM (CIM Object Manager). The software will then be able to classify each device on the storage network, which will allow your software to implement your data polices, and dynamically move data between devices as required. Storage becomes a huge pool, rather than discreet buckets, where data is placed manually.
3. I guess I should have used "types of data" rather than "data types" in the column. Data type seems to imply things like .XLS or .DOC or .MPG, etc. What I meant to imply was data characteristics for every part of a tiered application. In other words, storage of cached Web pages would be different that storage of SQL or Oracle databases. The management software would need the ability to analyze, in real-time, all the data traffic and move that data dynamically between different classifications of storage based on heuristics and policy. This can already be accomplished with some high-end storage arrays internally. The goal for this year is to make that capability available at the fabric level.
CLICK here for Part 2.
This was first published in February 2004