Access your Pro+ Content below.
Big file storage scales for large data applications
This article is part of the February 2014 Vol. 12 No. 12 issue of Storage magazine
There are two sides to the big data story: the more familiar one involves analytics using vast numbers of small files, but dealing with big file storage is another issue. Much of the discussion around big data analytics involves dealing with extensive data sets that typically comprise thousands or millions of smaller data objects gleaned from sources such as Web traffic, transactional databases or machine sensor output. But there's another side to the big data discussion where rather than focusing on analytics using huge numbers of smaller files, the processes involved require the handling and manipulation of much larger files. Use cases would include "big data archive" and similar applications, and some of the unique characteristics of big files will warrant special consideration when it comes to storage systems design. Big file data defined Typically, big file data involves some kind of images or video, with the most common example being digital content such as movies and television. The production processes used to create ...
Access this Pro+ Content for Free!
Features in this issue
This "Sweet 16" roster of storage products represents the leading technical innovation of the past year.
Don't make your DR planning process harder than it is by trying to do too much or cutting corners. Careful planning is key to a successful recovery.
There are two sides to the big data story: analytics using vast numbers of small files, and dealing with storage for really big files.
Our latest survey charts the storage architecture alternatives readers are using in their storage shops.
Columns in this issue
Cloud closures, flash-in-the-pan solid-state vendors … storage might seem a little more dangerous these days, but it just might be innovation at work.
Filling drives with helium doesn't advance the art of hard disk design, it just makes it possible to stuff more old tech into a new package.
There aren't many reasons not to virtualize your servers, but there are plenty of compelling data protection reasons to virtualize them all.
Using Hadoop to drive big data analytics doesn't necessarily mean building clusters of distributed storage; a good old array might be a better choice.