What's reference data? The Milford, MA-based Enterprise Storage Group (ESG) defines reference data as: "digital assets retained for active reference and value. It includes, but is not limited to: electronic documents such as contracts, e-mail and e-mail attachments, presentations, CAD/CAM designs, source code and Web content; certain digitized information such as check images, blueprints, historical documents, medical images, geophysical, satellite, and surveillance information, computer-generated images (CGI), genomics, bioinformatics, video, photographs and voice data." (For more information, see "
Reference data is different than traditional data in several ways. For one, it's growing faster. According to ESG, reference information is growing at 92% CAGR through 2005, compared to 61% for traditional data. Part of this explosive growth is related to the fact that reference data is composed of large files that can be several megabytes apiece. Reference data also has different usage patterns than other corporate information.
Transactional data, databases and PowerPoint files are often accessed on a daily basis and maintained only as long as necessary. Reference data is usually accessed infrequently, but maintained for years or decades. Finally, reference data can be industry-specific and as such may have a regulatory element to it. Think financial services (customer information and the Graham-Leach-Bliley Act) or healthcare (patient information and HIPAA).
Companies trying to manage their reference data within a traditional data infrastructure could run into numerous issues. Storing reference data on high-priced enterprise storage equipment creates an unnecessary expense, but opting for tape may not satisfy business or performance needs. In addition, reference data growth will place burgeoning demands on storage operations teams tasked with configuring systems, responding to outages and backing up and restoring data.
To properly address business, financial and IT requirements, companies must develop a reference data strategy. Here are three steps for developing a successful reference data strategy.
Step 1: Assess the business need
Any reference data project should start with a full understanding of current and future business requirements. CIOs should assign this task to a business-savvy project manager who can match business and IT strategy and work with the storage team.
A good place to start is by sorting existing data into two buckets, traditional and reference. You'll probably find that a lot of the storage capacity contains information that can be classified as reference data, but is managed like traditional data. You can probably improve this situation with more cost-effective storage, archival tools and appropriate backup management. That exercise alone is worthwhile, as it will expose inefficiencies and lead to operational improvements.
You should also explore future business initiatives. Are there plans to digitize data or are there impending regulatory changes that will mandate these types of activities? Are there new business opportunities that will require this type of information? You'll need to know who will access the data, how often, where they will be located and what type of performance is expected. These are difficult questions, but a thorough exploratory process will yield an appropriate, affordable business solution.
This was first published in June 2003