Weta Digital, the New Zealand company that created digital effects for the blockbuster movie "Avatar", said the...
film's detailed animations required more horsepower than one clustered network-attached storage (clustered NAS) system could provide on its own.
To support the project, which included new breakthroughs in animating the faces of 3-D characters, Weta Digital set up a combination of BlueArc Corp.'s Titan clustered NAS arrays and NetApp Inc.'s FlexCache. As the effects grew more advanced, the capacities and performance demands involved began to outstrip even the very large systems Weta Digital had in place to support previous projects such as 2005's "King Kong." "'King Kong' used 100 terabytes of storage," said Paul Ryan, Weta Digital's chief technology officer. "For 'Avatar,' we have 100 terabytes of RAM [in our server farm]."
To support the digital-effects rendering process, Weta Digital has a server farm, which it refers to as a "render wall," containing 35,000 CPU cores. During the rendering process, multiple layers and pieces of an image are put together to form the completed frame of a movie. "[This] creates some interesting problems in storage," Ryan said. "Namely, we can get into a situation where we have 10,000 processes in the render wall all trying to access the same file or group of files, which leads to hot spots in our storage."
BlueArc Titan serves large files
To mitigate this problem, the company first brought in three four-node configurations of BlueArc's Titan 3200 clustered NAS systems with 200 TB capacity in each system to support "Avatar." BlueArc's systems are marketed for serving large amounts of large files, which are frequently used by companies like Weta Digital in media and entertainment. A fully configured 3200 cluster can hold up to 4 PBs of capacity, and BlueArc claims the 3200 can support a maximum of 200,000 IOPS or up to 20 Gbps throughput. Ryan said Weta Digital had used one Titan 3200 cluster in the past.
But there was still a problem. "We have one set of texture data which [is] a fairly small data set, between 1 to 5 terabytes in total, but pretty much every single process that's going on in the render wall wants to access that data," Ryan said. Because of the data access patterns, "we found that no matter how much bandwidth we put behind that texture data, the render wall will consume it all."
NetApp FlexCache helps replicate 'hot' data
Ryan said "we've had a very long relationship with NetApp," estimating the company has been using NetApp filers for at least 10 years. It already had approximately 600 TB of NetApp storage serving user file shares. About nine months ago, Weta Digital brought in a new dual-node high availability cluster of NetApp's high-end FAS6080 system, along with eight of NetApp's FlexCache appliances, also configured into dual-node high-availability clusters.
NetApp's FlexCache is designed to support applications like Weta Digital's render wall. It adapts to changing usage patterns by automatically replicating "hot" data by using local caching volumes.
Although the NetApp and BlueArc systems don't talk to each other, Weta Digital has found a way to make them co-exist productively. The NetApp cluster feeds the render wall, while the BlueArc system stores the frames as they are produced by the rendering system. "We knew the BlueArcs were good, we knew they were fast and they've lived up to expectations absolutely," Ryan said. "But the new thing that's sort of jumped out at us in the last year is the FlexCache."
Ryan said automated performance management was a big selling point for FlexCache. "We were using regular file servers for these texture files beforehand, but that required us to manually manage the replication. We'd have to have a copy of these texture files on lots of different file servers," he said.
While the current setup is working well, "we're always looking for more fine-grained tools to figure out hot spots and what clients are trying to access," Ryan said. "Adding BlueArc's power can delay the onset of the problem, and FlexCache put in more bandwidth, but we're still having trouble diagnosing hot spots when they come up."