Before it began filling theatres two weeks ago, the animated 3D movie Puss in Boots filled hundreds of terabytes...
of DreamWorks Animation SKG’s storage systems, most of it on a Hewlett-Packard Ibrix-based NAS cluster.
During production, DreamWorks set aside 250 TB on its HP X9720 storage, according to DreamWorks staff engineer Scott Miller. Like all of its movies, the studio will also keep an archive copy of the film. In the case of Puss in Boots, that archive copy takes up 70 TB.
With three or four movies in production a year and all of its released movies archived, it’s no surprise DreamWorks has 1 PB capacity on its scale-out NAS platform and 6 PB of total storage capacity.
Animation consumes a great deal of capacity, especially with 3D films. Up to 400 rendering artists work on a movie, Miller said, and that requires high-performance storage as well as large capacities.
We have a lot of data, but I don’t consider it big data... We have 31 features today – that’s a lot of data, but not traditional big data where we need to process our data.
Scott Miller, Staff Engineer at DreamWorks Animation
“Our product is data,” Miller likes to say. And that data translates into money for the studio as it moves across storage tiers.
“Our product has a clear timeline from when production starts,” Miller said. “At release date, the data set is less interesting. It goes from primary storage to nearline storage. Then when the DVD comes out, we can move that to the tier I call farline – that’s our densest, lowest-cost disk. And we put a copy on archive tape for asset preservation, but we also leave it on disk for a franchise or sequel.”
Even with all that data, Miller doesn’t consider DreamWorks a big data shop, because most of the data it stores is in archives.
“We have a lot of data, but I don’t consider it big data,” he said. “Most of it is not an active data set for producing films, it’s the outgoing data for our library. We have 31 features today – that’s a lot of data, but not traditional big data where we need to process our data.”
DreamWorks has data centers at its Glendale, Calif., corporate offices, and in studios in Redwood City, Calif., and Bangalore, India. “A big business driver for us is to unify our sites so artists can work on anything from anywhere,” Miller said. “We also have the requirement to keep everything in one file system namespace. We need our data to be available even if it’s on cheaper storage tiers.”
DreamWorks is primarily an HP IT shop, and that’s reflected in its storage. Besides the scale-out NAS, the studio uses HP’s XP2000, EVA6400 and HP 3PAR SANs. DreamWorks also has 10 nodes of NetApp FAS storage to complement the XP9720, and IBM Total Storage 3494 Tape Library and Tivoli Storage Manager (TSM) software for backup.
“Our biggest challenge is the expectation of forever retention,” Miller said. “We don’t want to remove data. We keep everything, but we keep it organized.”
He said data integrity, reliability and low-latency are the biggest features he looks for in a storage system.
DreamWorks has shifted much of its backup and archiving from disk to tape, and Miller said he is considering cloud storage to replace the rest of his tape. He said DreamWorks uses the cloud for compute, but the economics aren’t quite there yet for storage.
“We use a large [IBM] tape library for cold data and a disk repository for hot data,” he said. “If I had a cloud connection, I wouldn’t have to deal with tape at all. Our biggest pain point is the logistics of managing tape.”
DreamWorks began using Ibrix clustered NAS in 2005 before it was part of HP. The studio originally built two eight-node Ibrix clusters on HP hardware. “I was happy with the file system and when HP bought Ibrix I started pushing them to make it all play nicely, and they’re working towards that,” Miller said. “We’re helping HP build a good NAS. NAS is more than just a file system – there’s data protection, a software development ecosystem, and a data protection ecosystem.”
Miller said he also looked at NetApp and Isilon for his animation storage before picking Ibrix. He said NetApp cost 10 times as much as Ibrix, and Isilon’s small file performance was lacking. He added that EMC hasn’t improved Isilon’s small file efficiency since acquiring Isilon, although Miller did go back and add NetApp nodes after NetApp began offering the cluster mode for its Data Ontap operating system.
Miller is also looking at object-based storage for even greater scalability. “One of our goals is geographic independence, and it’s hard to do that with NFS,” he said. “We’re re-architecting how we do file storage, and object storage fits in well there.” He is considering OpenStack software as well as NetApp StorageGrid, Data DirectNetworks WOS and Caringo CAStor object storage.
“I expect a big part of our front-end assets for the artistic stuff will wind up in object stores,” he said. “File storage will stay around as scratch space, but for high performance we’ll stop doing NFS and CIFS over the WAN.”