Unstructured data is everything that doesn't fit into the neatly arranged rows and columns of a database. It can be a file server that has become a dumping ground for all of the Word documents, Excel spreadsheets and PowerPoint presentations your organization uses. It can also include photos, videos, email messages and a whole lot more.
For years, IT pros have struggled to find ways to deal with unstructured data. The three key unstructured data storage challenges they face most are the following:
- The sheer volume of unstructured data continues its unchecked growth.
- New kinds of unstructured data appear all the time. For example, IoT devices log files must go somewhere.
- Without a way to better manage and mine unstructured data, it consumes storage capacity without adding value. Generally, storage systems don't make it easy to find unstructured data after it's been stored.
In recent years, IT vendors have tried to help businesses make sense of unstructured data and make it more usable and valuable. We're seeing the results of some of those efforts now. Below are approaches some vendors are taking to unstructured data storage challenges.
Wrestling with the volume
As mentioned, the sheer volume of data is something enterprises have contended with for, well, decades. This problem won't go away soon, particularly as organizations become increasingly data-hungry in the face of analytics, machine learning and AI needs. The challenge becomes finding a way to continuously scale to meet the insatiable appetite for more data.
Companies such as Pure Storage have developed purpose-built appliances to deal with this specific problem. Pure's FlashBlade provides a scale-out home custom designed to support unstructured data storage and deal with some unstructured data storage challenges. With support for common protocols, including NFS, S3 and SMB, FlashBlade can drop into most environments with no problem.
Handling the variety
Not knowing what's in unstructured data can also pose a huge security risk. Files lie around that include credit card and Social Security numbers and other personally identifiable information (PII). There's often no overarching management watching over what's being stored. At a minimum, anything with PII in it should be encrypted, but even that doesn't always happen.
Some vendors have taken steps to solve the PII problem. For instance, Cohesity provides a secondary storage platform that vacuums up all unstructured data, indexes it and identifies problems. Through pattern detection, Cohesity finds and eradicates files with sensitive PII, which enables tighter adherence to best practices and, in many cases, compliance requirements. Cohesity's indexing engine also helps organizations find value in their data.
Cohesity's scale-out architecture and indexing engine address the volume, variety and value unstructured data storage challenges. It also provides a clear approach to the platform question since it becomes a secondary platform in and of itself.
Gaining insight into unstructured data requires a combination of tools and teams. One person going it alone probably won't be able to solve all of an organization's data challenges. Data value may be derived through mining or placing unstructured data into workflows that enhance business processes and outcomes.
NetApp is working on this challenge through the use of its StorageGrid, a massively scalable object data storage across the hybrid cloud, enabling organization-spanning automated workflows. For example, you can create a workflow for media files that moves some to lower-cost storage, others to higher-performing storage or to a location for a specific team's action.
The bigger picture
The challenge around unstructured data is that there's so much of it, and it offers different kinds of value to different people in different ways. The products discussed above are just three of hundreds from vendors working on the critical challenges created by unstructured data storage.