Published: 10 Apr 2006
Turn data into intelligent information
To manage stored data effectively, you need to understand its content so you know how to handle it and where to put it.
MOST COMPANIES BLINDLY manage their databases, files and e-mails. Applications create the data and it's saved on storage devices. The data is later copied or moved for business-continuity or data protection purposes. While this worked in the past, new business requirements such as record-retention regulations and increasingly frequent e-discovery activities have forced organizations to rethink data management. IT departments are also struggling with growing capacity requirements and trying to cope with the constant flow of new requirements that stress their capital and operating budgets.
Organizations must improve their understanding of data and then use that understanding for more efficient and effective data management. New technology solutions that leverage the context of data--the attributes and index that describe the data--can help organizations turn data into information. Attributes, such as the data's creator or a numeric pattern such as a Social Security number, add a level of intelligence to the data, making it more usable and manageable.
Intelligent Information Management (IIM) comprises the processes and technology products that enable organizations to understand and organize data, and then take appropriate actions with it. While information lifecycle management (ILM) is a popular catchphrase, it hasn't found much traction because we were missing the underlying intelligence.
The two major processes within IIM are information preparation and information management. Information preparation transforms data into information. This process identifies the data an organization creates (or has created), develops information categories, analyzes the data for specific criteria and automatically classifies it into pre-defined categories. Once the data is properly categorized, it can be acted upon intelligently and automatically. Information management is the process of doing something with the data, such as archiving, encrypting or deleting duplicate copies.
Before buying technologies to improve data management, organizations should determine what data it creates and formulate general information groups. Criteria should then be developed to identify what data belongs to what groups. Finally, organizations should establish policies (and the associated enforcement) and management actions related to the data groups. These groups are information categories that take into account external influences such as retention and privacy regulations, information security risks and accessibility requirements. For example, the criteria for a "confidential financial information" category may be any Excel spreadsheet created by a senior executive or a finance department staff member. A policy for the category may establish a retention period of three years, with an associated action that says the file should be archived to immutable storage. Determining information categories, criteria, rules and actions is largely a manual process that should involve IT and the internal groups that establish corporate policies regarding information access and privacy, as well as regulatory compliance.
The second step in the information preparation process is data analysis. Data is scanned for criteria established during categorization. All sources of data, as well as new and historical data, can be scanned. Because this involves a vast amount of data, data analysis needs to be automated. This is the most crucial part of the information preparation process because it's when context and associated attributes are indexed. Data is scanned and analyzed to extract attributes such as creator, creation date and the file type--attributes typically referred to as meta data. The analysis process should also index the contents of the data. The value of the index and attributes increases as more details are identified.
The culmination of the categorization and data analysis is information classification, which Enterprise Storage Group considers a defined market segment. Information classification products, such as those from Abrevity, Fast Search & Transfer, Kazeon Systems, Scentric and StoredIQ, perform the analysis and automatically add attributes to data when analysis results identify criteria matches with the information categories. The new attributes include the policies and associated management actions that should be taken with the data. For example, all of the file servers are scanned and analyzed, and all of the Excel files created by finance employees are tagged with a retention period and destined for immutable storage. The classification solution passes the data and its attributes to an information management application that's responsible for enforcing the retention and storage policies.
The information management process handles the actual manipulation and movement of data. Information management solutions, segmented by the actions they perform, receive data from information classification software and use the attributes to determine what actions to perform.
The various information management segments include data filtering, encryption, archiving, de-duplication, data movement/migration, copy, quarantine and inspection, and limiting access to data. Using our confidential financial information example, a file-system archiving app uses the retention period and storage requirement attributes and retains the data for three years on optical media.
Data filters--like e-mail spam blockers--help keep non-business-related data out of the system. For security and privacy, the data might be encrypted before it's stored or sent to external recipients. Organizations may also choose to further secure the data using products that add attributes to restrict access to certain employees or departments. De-duplication eliminates multiple copies of the same data, leaving stubs so applications can find it when needed. Data migration can free up primary storage capacity and consolidate resources. To protect the data, it may be copied, or moved to a second or third system. For regulatory compliance or legal review, data may be set aside or quarantined for inspection by appropriate departments. Lastly, digital archiving is the long-term retention and management of historical data that's retained to satisfy regulatory compliance, corporate governance, litigation support, records management or data management requirements.
As data is classified and managed, the attributes and the contents of the data--along with the actions performed--create an inventory of intelligence about the data. This inventory is a rich information index that can be searched by attributes or keywords within the data's contents. Building a comprehensive index is a crucial element that occurs during information preparation and management processes.
In some cases, products may perform both the information preparation and management processes. For example, e-mail content-filtering software scans and analyzes all messages entering an organization, and produces a spam score that's compared to spam categories. If the spam score matches the criteria, the e-mails are filtered and not delivered. Similar to anti-spam software, MessageGate and Orchestria offer products that classify and act on messages. Actions include setting retention periods, blocking messages that violate regulatory policies from being sent and establishing legal hold groups if messages need to be reviewed by counsel.
IIM is possible because of the available products that can quickly analyze, classify and take actions with data. Other parts of the IT infrastructure, especially data protection software and storage systems, can use the attributes to further increase the availability and accessibility of data. New storage systems can store data attributes in the same device as the data. In addition, data protection software can increase the frequency of backups or serve as management software that moves the data to an online secondary storage system that facilitates quicker restores. Data protection software can also preserve the attributes along with the data, ensuring that all information and indexes can be restored.
When archiving files, information prep is handled by the file archiving software which, via integration with an intelligent storage system, establishes/enforces a retention period.
It's all context
By deploying IIM applications, organizations can improve resource management by eliminating the storage of duplicate data, reduce risk by quickly responding to discovery requests, comply with record-retention and privacy regulations, and restore the right data faster.
IIM provides the context to manage data efficiently. Data without context leads to unmanageable risks and significantly strains IT resources, both human and capital.