This content is part of the Essential Guide: Choosing storage for streaming large files in big data sets

Big data storage challenges eased by metadata, caching and compression

Systems that provide fast access to metadata, caching and heavy compression help ease media-rich big data storage challenges, Wikibon CTO says.

Systems that provide fast access to metadata and offer caching and heavy compression options will become increasingly important in overcoming big data storage challenges in the media and entertainment industry, according to the chief technology officer at Wikibon.

David Floyer, CTO and co-founder of the Marlborough, Mass.-based Wikibon research and analysis firm, said a total system that can find metadata quickly helps to enable the fine-grained search capabilities consumers have come to expect on a wide range of devices. Caching and heavy compression helps in the delivery of media-rich big data, he added.

In this interview with Carol Sliwa, a senior writer for TechTarget's Storage Media Group, Floyer also discussed best practices for storing and managing big data, and the ways in which the storage of large audio and video files will have an impact on the media and entertainment industry in the future.

What are the major challenges associated with storing big data audio and video files?

David Floyer: The major challenge is essentially one of cost and, to some extent, performance when you want to get it off [the server]. It's cost because there's such a huge amount of it, and it's growing all the time. Every year there is an extra times two or times four on the amount of data held in these huge files. So, the major challenge is one of size and increasingly one of holding the metadata.

Metadata traditionally is held within the data object itself. With audio and these rich media files for films, etc., you really want to know externally what's in that metadata. So, you want a copy of that metadata held either very close to the data itself, or held in a central place so people can search more easily and search the data so they can find their favorite pieces or compare two different ways a scene was done. There is so much potential added value the metadata can add to it. And that is a major challenge of having a total system that can find the metadata quickly, find the different copies across different media, etc. Those systems are starting to go in and starting to be designed, and they're going to be an exciting extension to the newer capabilities.

What does an IT organization need to keep in mind when designing a storage environment for media-rich files?

Floyer: Data is heavy. Data has a lot of gravity, and it doesn't like being thrown around. So, minimizing the amount of data that is transferred is of very high importance. If you're looking at transferring huge amounts of data individually to every person over the Internet, your network costs are going to increase dramatically. That's why some technologies such as caching and very, very heavy compression are very important, especially if you're going to devices such as mobile phones or other technologies that can't display the same amount of capability. Holding the data in a way that allows you multiple ways of extracting that data, suitable for the end platform that it's going to be shown on, needs to be built into the design of systems that are going to use this in different ways.

What are the top three best practices you recommend for storing and managing media-rich files?

Floyer: Know your audience. Know who you're providing this information for. Is it for the media room inside CBS or ABC or NBC? Is it the end user, you and I, wanting to watch a film on a Friday evening? Is it the researcher wanting to look at the remakes of films and see the comparison between them? What is the experience required for the end user? That's the first best practice ... that you have to design it for the end user and know intimately what the relationship is with that.

The second best practice is you've got to hold things in a native format that can be viewed in many different formats and separate out ... the type of file that's going to be used for these different densities of media. There is going to be such ongoing change in how it's going to be viewed, the richness of different technologies. Who would have understood a few years ago that tablets and phones would be a major source of viewing films in the future? So, there will be many, many more different ways that it's going to be consumed; and on top of that, the amount of data, the richness of content of these films will continue to grow over the next few years.

And the third one is you have to keep a very close eye on cost, and, where you can, tape-based systems are almost certainly going to be much lower cost. Modern [Linear Tape File System] LTFS-based tape systems are going to deliver five to 10 times lower cost than disk-based systems. So, if it's feasible to do it that way, you should try and do it that way because that will make it much more competitive than disk-based systems.

How will the storage of media-rich files have an impact on the media and entertainment industry in the future?

Floyer: That's a profound question, and it's going to make a very, very big difference. It's interesting when you look at the disruption that's happened within the different industries. So, for example, simple text-based systems -- very small, very quick and easy to send over the Internet -- made a profound change to the print industry. The newspapers and all other sorts of print industries have had to adapt very, very radically to that first change. And that was the first disruption.

The second was the music industry, and that was, again, changed by the ability to rapidly move large numbers of songs. And that has again changed absolutely radically from disk-based media, such as CDs, to "buy the song, buy what you want," consuming it in a completely different way and paying for it in a completely different way.

And the third is going to be the media industry, and it's in the process of happening. What has held it back is how long it takes and how much resources it takes to move these very large files around. That's getting better and faster as the Internet is getting faster and wider in the pipes that it offers. And that again is going to have a profound impact, as is the ability to look for information on films and on all other sorts of media, on surveillance tapes or whatever other types of moving images there are out there. So, people will expect their video, their media to be presented where they want it, on what type of device they want it on. They will expect that in the same way as they get that today with print and with music.

There is an opportunity for providing far greater value and greater additional data about this experience. So, those two drivers for high-functioning-type ways that people will consume this rich media and, driven from the end user, by having to satisfy that the end user will decide when and how they consume it, it's going to radically change the current ways media is distributed. And DVDs will be a thing of the past, if they're not already, and channels on cable will equally have to adapt very, very radically to this new way that media will be distributed.

Dig Deeper on Big data storage