Manage Learn to apply best practices and optimize your operations.

Controlling unstructured file data storage growth: Five storage reduction tips

By its nature, unstructured file data storage is uncontrolled and a quite unruly. Here are five tips on how to reduce your file data storage growth.

By its nature, unstructured file data storage is uncontrolled and a quite unruly. Unlike the more civilized nature of structured databases, the world of file servers is a free-for-all land grab. In file-server land, individual users can eat through storage space from the inside out without regard for the business value of the information they are storing or its cost. Here are five techniques to help you control unstructured file data storage growth:

    1. Implement quotas. Most file servers have user, group and tree quota functionality. Network-attached storage (NAS) appliances made by EMC Corp. and NetApp Inc. support quotas as do Windows Storage servers and Windows 2003 R2 file servers. It's usually best to implement user or group quotas for home directories and tree quotas, which limit the size of a directory, for organization shares like the "HR" document storage location. Implementing quotas seems like an easy and obvious solution, but be prepared, taking disk space from your users is often politically charged and can be a difficult task. Some users have special requirements and will demand an exception to a one-size-fits-all policy. The best bet is to have senior-level management sponsorship for your quota policy and require management endorsements on any deviations from that policy.

  • Implement file-type restrictions. File-server administrators regularly find themselves in a space crunch at the most inconvenient times. After investigation, it is common to learn that users are storing vast quantities of non-business-related data on the corporate file servers. Most file servers can block file types in specific folders. For example, you may disallow .mp3 files in home directories. It is a lot easier to prevent files from being stored than it is to get permission to delete them. Your NAS appliance or Windows 2003 R2 file servers can be configured to block files types that don't belong.

  • Compress your data. Even in the early days of the PC, our desire to retain data outpaced our ability to afford the storage. DOS-based disk compression tools like DriveSpace, Stacker and Superstore were deployed on our old PCs to compress files. The compromise was that because compression algorithms are CPU intensive, they usually slowed down access to the files. CPUs today offer significant computational power improvements and accessing compressed files is only marginally slower than uncompressed files. Depending on the data type, modern Windows NTFS file system compression can significantly increase free space in a file system by removing redundant data within each file, shrinking each one down to its smallest possible size.

    More sophisticated filer operating systems like EMC Celerra DartOS and NetApp Ontap have free data deduplication features that can dramatically reduce disk utilization. File-system deduplication has all the advantages of file-level compression, but extends the capability to all of the files in the file system at once. This means that if a big PowerPoint file stored five times, it only consumes the disk space of one compressed version if it were on a compressed Windows NTFS file system.

  • Archive data. I've been involved in many user file-server utilization studies. We consistently find that 80% of the data on file servers is older than 90 days. Additionally, once a file reaches that age, it is rarely accessed again. To save space on the filer and move less-frequently accessed files to cheaper and slower storage, administrators can deploy a stub-based file archive or hierarchical storage management (HSM) product like Symantec Corp. Enterprise Vault to transparently relocate less-frequently accessed files to lower cost archive storage such as content-addressed storage (CAS). Archiving in this fashion is a form of information lifecycle management (ILM), yet moving files off of one low-cost storage platform (NAS) and onto another (CAS) of near equal cost may not appear to save money on the surface, but remember that archived data needs significantly less (or no) backup protection and administration.

  • Classify files and move or delete expired data. Storage administrators commonly understand that a file server contains old or expired files. File classification and relocation tools like those from Abrevity can be used to scan the file system and delete or relocate files based on several attributes like file age, last access time, file type, file name or file contents. Data classification tools work differently than the archive/stub tools described above because they physically delete or move targeted files to a new location. For example, the tool may move aged files to another directory or file server, leaving no trace of the file in the old location behind. It takes human intelligence to know where to find the moved files so it usually works best with user file shares instead of applications. File classification allows users to align the cost of storage to the business value of the data.

Most file servers are growing without a lot control. When allowed to run unchecked, data storage costs to spiral out of control. Managing file data growth is challenging but solvable, take the first step start by implementing a few of these storage reduction techniques.

About the author: Brian Peterson is an independent IT infrastructure analyst. He has a deep background in enterprise storage and open-systems computing platforms. He has consulted with hundreds of enterprise customers who struggled with the challenges of disaster recovery, scalability, technology refreshes and controlling costs.

Do you have comments on this tip? Let us know. Please let others know how useful this tip was via the rating scale below.

Do you know a helpful storage tip, timesaver or workaround? Email the editors to talk about writing for

Dig Deeper on Unstructured data storage