This article can also be found in the Premium Editorial Download "Storage magazine: The best high-end storage arrays of 2005."
Download it now to read this article plus other related content.
A different mindset
Beyond the collaborative aspects, IT managers need to think differently about their systems and data. Systems, for example, tend to be file- rather than database-oriented, notes IDC's Villars. Instead of huge volumes of data transactions churning through the systems every day or every hour with frequent reads and writes, the design groups tend to load only a few files and work on them for hours at a stretch. And the files may not be that large. Although a few files sometimes hit 100MB or more, most run less than 5MB, not much more than a large PowerPoint file. The designs may ultimately be rendered as rich graphics--a task often done in batch mode--but until then they consist primarily of mathematical equations that don't consume much storage space.
There's also a new vocabulary, as well as a new set of acronyms. Product data management (PDM) refers to the process of capturing and storing all data about a product and its design. "This is meta data, and it's easy to centralize meta data," says Mark Kerr, technical manager for IBM's industrial sector group.
PLM is concerned with managing product data from inception through the end of the product's lifecycle, which could stretch for decades. "We're still supporting equipment that was made in the 1940s and '50s," says Caterpillar's Olson. Kerr recalls RFPs from the aerospace industry
that specified 50 years of retrievable data. PLM data for new products today may still be in use in 2050. "IT managers will need a strategy for moving the data to new media every five years," advises Kerr.
Collaborative product development (CPD) describes a development process in which teams of product designers and engineers work together on a project. Essentially a groupware challenge, the data must be stored and managed to ensure that each participant is working with the latest copy of the data. CPD involves synchronizing data, tight version control and data caching at widely dispersed sites to reduce the amount of data repeatedly fetched over the network.
"A lot of engineering that was once kept in our facility is now moving around the world," notes Olson. The process can also get complicated when subcontractors and other third parties are involved.
Centralized or distributed?
With the advent of global product development and manufacturing, large- and midsized manufacturers have to decide where product data will reside. If it's centralized, it's easier to manage, protect and secure. However, remote users may encounter performance issues if they must continually retrieve and store files across a WAN. When the operation is far flung, the cost of multiple global links can quickly mount.
"A centralized or distributed [product data] vault is the big debate," says IBM's Kerr. If the organization has a big network in place and the data volumes are comparatively small, then a centralized approach is preferred. "If you're setting up multiple vaults, your challenge will be to synchronize them; [then] you'll have the challenge of backup and recovery," he adds.
Because the data primarily takes the form of files, the latest versions of NFS can simplify some of this. "NFS v.4 has replication, caching and a global namespace," Kerr reports. That allows a company to set up a single file system globally. Organizations can then set up multiple local NAS servers, point them to a centralized SAN-based file system and use NFS to move the files around the network. Such a global file system will enable companies to scale out their NAS storage.
"With a global file system and a global namespace, you can keep adding NAS servers and still have a single storage pool," says Kerr.
Another option, suggests PTC's Gerdes, is to store content and meta data centrally, but to deploy content caching servers at remote sites. Designers and engineers can work off the caching servers while the work is periodically saved at the central location and the cache is refreshed. "The nice thing about a content caching server is that you don't have to back it up since all the data has been stored and backed up centrally," he points out. In addition, content caching servers typically are inexpensive compared to other servers.
Santa Clara, CA-based National Semiconductor Corp. (NSC) operates multiple chip design centers in the U.S. and overseas. It prefers to store and manage design data centrally, where it has mirrored Network Appliance (NetApp) Inc. file servers, each with 20TB of usable capacity, and a mirrored data center. "But we still have local sites, too," says Klaus Preussner, director of information services.
To ensure that remote sites can get to the stored design data, NSC maintains a large VPN that uses multiple T1 and T3 links to its Santa Clara headquarters. At remote sites, "we have filers that we use for intelligent data staging," says Preussner, which are the equivalent of Gerdes' content caching servers. Data synchronization is performed in batch mode. The company is also evaluating the use of memory cache at remote sites.
Pitney Bowes also opted to centralize all of its CPD work at its Danbury, CT, data center. It maintains an EMC Corp. SAN with 170TB of raw disk and a Hewlett-Packard (HP) Co. 9000 Superdome server to run the Windchill CPD system for its product design and engineering group. Pitney's design engineers in the U.K. and France access the appropriate design files from Danbury over a T3 link. However, duplicate copies of active files are kept locally, says Patrick Leahy, senior analyst for enterprise business apps at Pitney. Windchill tracks the meta data and automatically synchronizes the local copies with any updates. "Whether the data is in Danbury or local to our users, it all looks like local storage," adds Leahy.
Caterpillar also prefers to centralize design data, which can be accessed by design engineers from as far away as Japan. The master copy is stored at the data center and can be directly accessed by engineers anywhere in the world. Some remote offices save and store their work locally as well as on the main system, but the central IT group isn't involved with that. "We are definitely not replicating data all over the place," says the firm's Olson. Caterpillar invested a considerable amount of money in building its own global T1 and T3 network. "We now own a lot of our fiber so we don't have to lease lines," adds Olson. The upshot: Caterpillar design engineers can get to any data they want over the network from anywhere in the world, and get it quickly.
Caterpillar's engineering data is stored primarily on the central SAN, which consists of multiple arrays from EMC, Hitachi Data Systems and IBM Corp. The documentation groups and some business units prefer NAS storage, for which the company uses EMC Celerra for enterprise NAS as well as a number of NetApp filers deployed by the various business units, Olson explains. Although IT would like to centralize and standardize, in practice the firm ends up mixing centralized and local storage of various types because the business units insist on their storage preferences.
However IT managers deploy the manufacturing infrastructure, they must take performance into consideration. Moving design files over a crowded corporate network, even if the files are only the size of average PowerPoint presentations, raises performance issues.
But when it comes to performance for design tasks, IT managers need to set expectation levels for their users. "We're not talking about business transactions where people expect an instant response," says Kichler Lighting's Sink. The engineers typically download a drawing when they begin work and may work on it for the rest of the day. "If it takes 15 seconds for the drawing to load, they can wait. It all depends on how you set expectations," he says.
To deliver the expected performance, Kichler relies on its Gigabit Ethernet corporate backbone with 100Mb delivered to the desk. "Design and engineering work is predictable traffic, not like OLTP. We don't see a lot of sudden spikes," says Sink.
For storage performance, the company relies on its EMC Clariion CX600. Kichler expects a planned upgrade to the CX700, or possibly a DMX, to give it a 30% boost in performance. Although Kichler's most graphic-intensive design files can run 50MB to 60MB, these are the exception. Most of its AutoCAD files come in under 1MB. "These aren't very big, but we do have 10,000 to 15,000 of them," says Sink.
To control cost without sacrificing performance, NSC is implementing tiered storage. Much of its design data is saved on primary storage. However, the company plans to use a second tier of low-cost disk for less-frequently accessed design data, the firm's Preussner explains. This tier will offer lower performance at a lower cost. A third tier also consists of low-cost disk used by designers for scratch space--temporary storage that's not protected and, if lost, won't matter.
|Centralized vs. distributed product data|
Manufacturing IT infrastructures are coming under closer scrutiny because of regulatory compliance and litigation concerns, as well as the need to support long product lifecycles and engineers' penchants "to keep things forever," says IDC's Villars. He adds that it will force manufacturing organizations to find ways to cost-effectively store more data for longer periods of time. He expects manufacturers to turn to tiered storage and information lifecycle management in various forms.
"We also expect manufacturing companies to move to grid architectures," says Villars. Grid units (Hewlett-Packard calls them smart cells, while IBM calls them bricks) combine a CPU and storage as a single entity. For example, with 1,000 grid units some of the units can be grouped for NAS and dedicated to the engineering department, others can be reserved for primary block storage for critical applications and a few grid units can be reserved for low-cost archival storage.
An IT manager taking on the responsibility for a company's product design and development infrastructure will find the challenges different, but not terribly difficult. The added storage requirements won't be overwhelming, although network bandwidth for organizations that don't have robust networks in place will present some problems.
Coming from a corporate environment increasingly driven by the demand to maintain 24/7 availability and where every system, including mail and compliance, is becoming critical, the product design and development world may actually be a refreshing change. "Order entry and fulfillment are mission-critical applications. Engineering work isn't usually mission-critical," says Olson. "Should something go down, the business will manage if some engineers don't have their systems for a little while." How often do corporate IT storage managers hear that?
This was first published in August 2005