Sergey Nivens - Fotolia
Hitachi Data Systems Corp. has designed a high-level content search and analytics capability for its object-based Hitachi Content Portfolio.
Hitachi Content Intelligence extracts data and metadata from repositories to perform data analytics on unstructured data. The software is built into Hitachi Content Portfolio (HCP) and is built as a cluster architecture on Docker container technology. IT administrators can deploy it on a bare-metal system, virtual environment or in a public cloud. Content Intelligence requires the 64-bit version of Linux, Docker 1.10, and is designed on the open source cluster manager Apache Mesos.
"Hitachi has had a strong object storage solution on the market for a long time," said Scott Baker, senior director of product marketing for Content Intelligence at HDS, based in Santa Clara, Calif. "What we really needed was a solution in place that understands the data. The idea was built to break the data silos and connect to selected data sources, such as inside Amazon S3 [Simple Storage Service] or Microsoft Azure or file systems."
Baker said the Content Intelligence engine works on extracting object storage metadata or file system metadata through specific connectors, and then loading the information through an extraction, transform and loading pipeline to understand the various characteristics of the data, so it can be either placed in an index or moved to the HCP repository for data migration.
"There are a number of steps applied to the data so it can be analyzed," Baker said. "Content Intelligence understands the document and then classifies it, such as zip file or PDF or video. We apply different kinds of custom data, such as pattern matching for social security numbers or looking for different data formats."
The idea is to centralize organizational data and transform it into valuable and relevant business information. The tool automates the extraction, classification and categorization of data for different departments or levels in an organization.
Steven Hill, senior analyst for storage technologies at 451 Research, said this type of content intelligence technology is an evolution for object storage and its metadata for more sophisticated and granular data analysis.
"I believe the use of metadata is key to long-term data management," Hill said. "Metadata is more or less database entries containing detailed information about the data itself that stays with the data as part of the storage environment. Those entries can be used to establish policies for the handling of that data in a way that traditional file and block systems cannot.
"Metadata can be used as a tool to mine, manage and move data, regardless of where it resides. And the Hitachi Content Intelligence platform is all about building good metadata and helping customers figure out how metadata can help achieve their business and IT goals."
Hitachi Content Intelligence has up to 36 methods to analyze the data and create a customized subset of the metadata for pattern matching. Customers can write their own customization stages for specific data sets, or pull information from specific data sources for areas such as compliance or medical records. The HCP search engine allows users to pull information based on queries to choose what files to sort for the transforming and loading process.
"The workflow is designed as a drag-and-drop once you have defined the connector. You can also test to see how the process pipeline affects either what will end up in an index or associated with documents that will be migrated with HCP," Baker said. "This allows you to see the process that was defined to the general results of what users will benefit from."
Baker said the software tool also lets administrators test to ensure the right kind of content is being extracted and loaded based on a specific query.
What are the use cases for object storage?
Compare object vs. file storage for cloud apps
How to use SSD flash with object storage