3dmentat - Fotolia
Small World Big Data
Published: 06 Sep 2016
Cheaper and denser CPUs are driving smarter built-in intelligence into each layer of the data storage infrastructure stack.
Take storage, for example. Excess compute power can be harnessed to deploy agile software-defined storage (e.g., Hewlett Packard Enterprise StoreVirtual), transition to hyper-converged architectures (e.g., HyperGrid, Nutanix, Pivot3, SimpliVity), or optimize I/O by smartly redistributing storage functionality between application servers and disk hosts (e.g., Datrium).
There is a downside to all this built-in intelligence, however. It can diminish the visibility we might otherwise have between our data storage infrastructure and, well, changes -- any IT change, really, whether due to intentional patching and upgrades, expanding usage and users, or complex bugs and component failures. Or, to put it another way, native, dynamic optimization enabled by powerful and inexpensive processors is making it increasingly difficult for us humans to figure out what's going on with our infrastructures.
So while it's really great when we don't need to know any details, and can simply rely on low-level components to always do the right thing, until there is an absolutely autonomous data center -- and, no, today's public cloud computing doesn't do away with the need for internal experts -- IT may find baked-in intelligence a double-edged sword. Furthermore, while smarter data storage infrastructure helps us with provisioning, optimization, growth plans and troubleshooting, it can blind or fool us and actively work against our best efforts to bend infrastructure to our "will."
Still, in spite of all these potential negatives, given the choice, I'd rather live in a smarter and more autonomous IT world than not (even if there is some risk of runaway AI). I'll explain.
It's all about the data
Remember when analysis used to be an offline process? Capture some data in a file; open Excel, SAS or other desktop tool; and weeks later receive a recommendation. Today, that kind of analysis latency is entirely too long and naïve.
Given the speed and agility of our applications and users nowadays, not to mention bigger data streams and minute-by-minute elastic cloud brokering, we need insight and answers faster than ever. This kind of intelligence starts with plentiful, reliable data, which today's infrastructures are producing more and more of every day (in fact, we'll soon be drowning in new data thanks to the internet of things [IoT]), and a way to process and manage all that information.
Storage arrays, for example, have long produced insightful data, but historically required vendor-specific, complex and expensive storage resource management applications to make good use of it. Fortunately, today, there are a series of developments helping us become smarter about IT systems management and better (and faster) users of data generated by our infrastructures:
- Data processing. In line with the IoT growth mentioned above, storage components are producing increasing amounts of detailed, machine-level metrics. This growing flood of data requires big data analysis techniques within IT itself. If you are an IT admin, it might be time to learn some Python and Spark skills.
- Consumable APIs. Modern storage platforms now serve or produce easy-to-consume REST APIs (representational state transfer APIs) that allow anyone -- with permissions -- to access key data directly with almost any type of third-party analytical tool. Standard APIs also enable and power third-party system management by integrating platforms like OpenDataSource.
- Call home support. Most storage vendors today build call home support into their arrays to enable them to send detailed machine logs back to the vendor for processing on a daily basis. The vendor can then aggregate data with big data tools to provide proactive support and insight derived across customers for better product management and marketing. Call home functionality is also available to vendors as a service from the likes of Glassbeam, which can also help provide a client portal as a "value-add" to deliver usage and performance insight directly back to IT end-users.
- Visualization. With IT-oriented big data comes a host of excellent visualization tools normally leveraged by corporate business intelligence folks (e.g., Tableau). So IT itself can now build business-friendly -- and business-familiar -- dashboards and reports. Many vendors, meanwhile, have used visually cool and readily accessible open source visualization libraries (like d3.js) to easily create and offer custom product dashboards and shareable widgets.
- Next-gen smarts. Some vendors are doing really smart things beyond visualization. It's not enough to dump all that detailed data on customers when the vendor can help with advanced product-specific key performance indicators (e.g., VMware vRealize Operations, Tintri, Pernix/Nutanix) to elevate insight into actionable intelligence. As a first step (albeit a big one), vendors today can smartly accumulate low-level data streams into an expert "model" of health, capacity or risk. Some models are used to produce linear predictions based on trending the unique "scores" for each specific platform. Really advanced modeling can take into account future plans for workload growth and data storage infrastructure upgrades, and may make nonlinear performance predictions based on analyzing queuing behavior.
With advances in big data analysis and IoT trends, there is certainly room for exciting new developments that produce ever more intelligent data storage infrastructure.
For example, I think we've only seen the beginnings of applied machine learning in the systems management space. Be on the lookout for smarter machine learning optimizations emerging as software-as-a-service analytical services, baked into customer consoles for dynamic operations, in dashboards and portals for intelligent strategic planning, and even pushed down into devices to help make them increasingly autonomic.
If cars are soon going to drive themselves, we shouldn't be surprised when our storage arrays start telling us to back off and let them handle things. Someday soon, we might have to give new storage arrays an enterprise IT-oriented IQ test to see if they are ready for our production data center.
About the author:
Mike Matchett is a senior analyst at Taneja Group.
Learn about storage infrastructure partitioning
Provisioning data within your storage infrastructure
Adopting infrastructure for big data
how do I make copy data storage a vital part of my storage strategy