Database application data has a way of proliferating; as it does, storage and security concerns also grow.
Thirty-four states have them, eight states are evaluating them and eight more states have no imminent plans to have them. I'm not talking about gun control laws or other headline-grabbing regulations, I'm referring to information privacy breach laws. And it's not just happening at the state level--Washington is also getting into the act. The Enterprise Strategy Group (ESG) is following at least 30 information privacy laws being debated by the U.S. Congress. And if you do business internationally, you can look forward to a dozen or so country-specific regulations in Europe.
Why should storage architects, managers and administrators care about this? Right now, most of the focus is on breach laws that define remediation processes and penalties once personal and confidential information is accessed by unauthorized individuals. Regulations currently being debated, both domestically and abroad, center more on preventive measures to thwart data breaches. If these laws gain momentum, IT may feel the pain because the threats are both external (hackers and other black hats) and internal (user errors and disgruntled employees).
Because storage and tape systems are the final resting places for most corporate information, many organizations have deployed encryption appliances that secure data at rest. But it may not be enough. The interaction between applications and databases with the storage infrastructure poses additional risk for data breaches or, at the very least, presents the opportunity for unauthorized access to sensitive information. For example, database administrators often create replicas of tables and instances to test new application features. These copies--containing bank account information, credit card numbers, employee addresses and other confidential data--are stored on vulnerable secondary storage systems while the work is being done.
It might be easy to point fingers and call this a "database admin's problem," but storage teams, along with their database counterparts, can improve the security of their test and development environments and get some benefits in return.
Batten down that database
ESG research indicates that, on average, organizations classify 54% of their database content as confidential, and a large portion of this data is retained on a centralized storage infrastructure. That makes security the database and storage groups' problem.
This task is very complicated as organizations create multiple copies of nonproduction data for development purposes. Tracking the copies becomes unwieldy, especially if engineering resources are geographically separated from product teams or if the work is outsourced to contractors. While database admins make copies to test new application upgrades or features, the downstream risk may manifest itself in the theft of a laptop that has a copy of the information because of the lack of processes and control over the extra data sets.
The trend to outsource application and database development, coupled with the retention of more information to meet record-retention regulations such as HIPAA, exacerbates the security risk. There are also significant costs associated with constantly copying data. More storage is required to keep the information and all copies of a database are typically backed up nightly. Storage and tape costs can get out of hand if there are an excessive number of database copies to be kept and protected.
The benefits of database archiving
One collaborative opportunity for the storage and database groups is database archiving, which can manage and store database information more efficiently and securely. Database archiving products can create a subset of a database; subsetting is used to truncate, insert or delete data from the original database. By creating a smaller database, organizations can reduce storage capacity while maintaining enough data to complete application development and testing.
During the subsetting process, sensitive info can be secured through a variety of techniques. Masking data ensures that the database subset doesn't copy the original values from the primary database. For example, production databases store valid social security and credit card numbers, but the masked subset may keep only the last four digits of the original values. The masked data remains usable for development quality assurance and processes, but renders itself useless for any criminal intent. If the data does end up in the wrong hands due to a security breach, the company remains protected.
Depending on the implementation, data can be scrambled so that an entirely new set of values is generated or the values may be scrambled within the subset. Scrambling moves the values around so that an address, for example, is associated with a different employee in the subset, making it easy to test an application with valid data, but very difficult for someone with access to the test data to identify where a person lives. Organizations should understand what type of testing they'll be performing and which database security method protects the information without impeding the development process.
Retaining significant amounts of capacity on enterprise-class storage arrays exclusively is extremely cost prohibitive. Unfortunately, most storage systems have historically been designed for "production" data--transactional data with very high speed, very high reliability and very robust availability features--which means it's typically very expensive. Most data, however, is transactional for only a brief portion of its overall life and then it becomes fixed content (nonchanging, reference data, persistent data, etc.). When subsets are created, archiving software can move them to different storage systems while maintaining integrity with the source database. Archiving data to disk-based storage systems is a rapidly emerging market trend enabled by lower cost ATA-based devices that allow information to be retained relatively inexpensively.
Control through collaboration
The combination of more information privacy laws, record-retention regulations and data growth poses challenges to both database and storage groups. Buying more disk capacity or database licenses isn't a cost-effective way to solve security risks, as the proliferation of database copies increases the chance that an individual can access confidential information.
Because there are inherent storage and database management benefits that can be derived from database archiving, these solutions should be the bridge that connects these groups. There may be dissension related to who should pay for this software and who should manage it, but there are far too many reasons why these should be resolved sooner rather than later. If these issues are a sticking point, invite the head of corporate communications to the next meeting. After all, that's the person who will have to handle the press calls and public scrutiny when a breach occurs.