Storage efficiency is the product of good infrastructure decisions, good storage resource management and good storage service management, but it also has a lot to do with data management. Data makes storage infrastructure and services purposeful, yet it tends to be managed poorly or not at all.
Data stored by companies is predominately in the form of files. Files represent a significant challenge to effective data management because they are typically anonymous. There may be nothing in the file name, extension or metadata from which business context, criticality or governance requirements may be derived. Without that information, it's difficult to discern where the file should reside, or what resources and services it should be provisioned.
Failure to classify data, or at least to understand its relative importance/protection/preservation requirements, is what causes storage to become a veritable junk drawer. It's possible to implement strategies for very basic hierarchical storage management (HSM) or rudimentary archive based on metadata describing the last time the file was accessed or modified. However, the lack of any granular understanding of business context impairs our ability to truly align data with business processes.
This is not to underestimate the value of either basic HSM or archive. The former is a time-tested strategy to move data -- whether file or block data -- to less expensive tiers of storage based on metadata that describes access and update frequency. HSM typically involves the migration of data from memory to Tier 1 fast disk, to Tier 2 slow-capacity disk and then Tier 3 tape or optical storage, where each tier becomes appropriate to the data's usage characteristics. The overall impact of a well-groomed HSM practice is to cut the cost of storage by as much as 60% when compared to non-tiered storage.
Implementing simple HSM requires you to first classify storage by performance and cost per gigabyte. That way, you establish the targets where data will be moved. You then need to implement a data mover, hand-tooled scripts or pre-packaged software that physically moves data from one storage tier to another based on an automated trigger (for example, reaching a threshold for file inactivity that signifies a file is ready to be moved to a less responsive and less expensive tier). Prepackaged wares include QStar Archive Manager, everStor Hiarc HSM and a variety of others. If you want to archive files to Linear Tape File System-enabled tape, Crossroads Systems' StrongBox appliance might be a good fit.
Using archive technologies for effective data management
Simple archiving can also be accomplished on both file and block data using numerous off-the-shelf products from CommVault, Crossroads Systems, Metalogix, QStar, Symantec and most three-letter storage vendors.
Software vendor Tarmin stands out from the pack at present, together with the Black Pearl archive server appliance from Spectra Logic, for its application of content object technology to file and block archiving. Instead of using a basic file system for storing data, content object technology stores binary information in a database-like structure that enables more sophisticated data sorts, service assignments, routing and lifecycle management. Both are comparatively easy to implement in most environments.
Getting to greater granularity in data management is always preferred. Having more information about a group of data blocks, files or objects, enables us to fine tune how we manage them across infrastructure and contributes automation that can drive up storage efficiency significantly. To enable data for more granular control, you may want to look into Microsoft’s File Classification Infrastructure (FCI) that enables administrators to configure end-user workstations so each file they save is earmarked with a reference to the business process or organizational unit to which it pertains (for example, accounting data, human resources data). That way, policies to govern that data’s preservation or protection can be automated.
Other products that can allow for more granular handling of archival data include SGI Trusted Edge and Novell Storage Manager, both of which classify files based on the Active Directory profile of the user who creates them. Both of these third-party products and Microsoft FCI provide a means for granular management of data, but be aware that they do not discriminate between important files and those a user might create and save that have nothing whatsoever to do with his or her job within the organization. So, irrelevant data created by a user might find its way into the formal data management process unless care is taken to purge it.
Effective data management is one way to boost storage efficiency overall. The cost savings accrued to storage hardware acquisitions from effective data management may be sufficient to pay for the entire cost of your storage efficiency strategy, so you have nothing to lose by exploring better ways to manage your data.
Tips for designing your deep archive
How a data lifecycle management strategy can impact capacity