This article can also be found in the Premium Editorial Download "Storage magazine: Distance: the new mantra for disaster recovery."
Download it now to read this article plus other related content.
File-based approach to storage
A file-based fixed-content solution consists of a shared, network-attached file server with a common repository of ATA storage arrays. The fixed-content application interfaces with the file server and stores information in a standard file system directory structure located on the ATA array. The ATA storage array could be deployed as the primary layer in an overall storage hierarchy, consisting of HSM software and secondary storage for long-term retention as the data access frequency drops off to near zero.
| Understanding your
An example of this type of solution is the StorageTek BladeStore and ASM system. BladeStore is a storage area network (SAN)-attached storage array powered by an LSI Logic RAID controller with a StorageTek-developed ATA disk array. Each BladeStore array contains up to 10 800GB storage blades, for a capacity of up to 8TB per array. A single BladeStore system can scale from a minimum of 4TB up to 160TB of capacity.
ASM is storage management software with a high-performance file system that runs on a Solaris or Windows server, and may be shared across an IP network via NFS, common Internet file system (CIFS) or SAMBA. The ASM software integrates with fixed-content applications such as e-mail archive, document management, video surveillance and medical imaging. ASM is certified with applications from AGFA, Siemens, Kodak, Philips, and others. ASM can also replicate the information across multiple arrays, perform an automated backup to high capacity tape or optionally migrate the data to secondary storage. Don Baune, vertical markets manager for StorageTek, says, "ASM is not tied to any specific storage technology. You can choose the technology based on the service levels required for your data."
BladeStore disk is priced between approximately 1.5 cents/MB and 2 cents/MB, while a complete storage solution consisting of a server, ASM, BladeStore and an optional tape library for disaster recovery is fewer than 3 cents/MB, depending on configuration and capacity.
Network Appliance's NearStore system is another example of a lower cost ATA disk-based solution. NearStore is based upon Network Appliance's filer technology, Data ONTAP operating system, and WAFL (Write Anywhere File Layout) file system. The NearStore R150 is available in two system module capacities, 12 TB or 24 TB. Multiple NearStore modules may be configured and managed via Network Appliance's storage management software. Fixed content applications read and write information directly to the NearStore, which appears as a very large file system shared via NFS or CIFS. No application changes are required. "File based access is very flexible. Over 95% of applications know how to do it", says John Kim, Marketing Manager, Rich Content Storage, for Network Appliance. For fixed content data with regulatory requirements that specify that information cannot be changed or deleted (such as SEC 17a-4), Network Appliance offers an optional WORM (Write Once Read Many) function for NearStore called SnapLock. With SnapLock, either a portion or all of the NearStore capacity may be configured as WORM storage. Files written to a SnapLock volume can be copied, but not altered, moved, or deleted. Stored information may also be replicated to another NearStore at a remote location for disaster recovery purposes. NearStore interfaces with fixed content software applications from third party vendors including AGFA, Documentum, FileNet, IXOS, and KVS.
"File-based access is very flexible. Over 95% of applications know how to do it," says John Kim, marketing manager, rich content storage, for Network Appliance. For fixed-content data that can't be changed or deleted, NearStore provides an optional snapshot point-in-time copy capability. Stored information may also be replicated to another NearStore at a remote location for disaster recovery purposes. NearStore interfaces with fixed-content software applications from third-party vendors including AGFA, Documentum, FileNet, IXOS, and KVS.
Priced at approximately 1.2 cents/MB to 1.6 cents/MB, NearStore is attractive to budget-constrained users interested in consolidating their fixed-content information onto a low-cost storage repository. At 12TB, the system scalability isn't as granular as other solutions, so Network Appliance gives users option to partition the back-end ATA-based disk, and share the storage through Fibre Channel (FC) SAN connectivity. Because most of the leading backup software vendors including Computer Associates, IBM, Legato, and Veritas support NearStore, backup to tape is also an option.
Object-oriented (OO) storage technologies offer a new and different approach to meeting the growing storage demands of fixed-content applications. With OO storage, applications interface with the storage system's API over an IP network to store information as objects, rather than as files or blocks as in traditional network-attached storage (NAS) or SAN storage architectures. While traditional file systems provide the host application with a location-based directory of where a file is stored, they typically don't provide a mechanism for capturing additional attributes, or metadata about the file. An OO storage system assigns a unique identifier or fingerprint to the stored object for application access and retrieval. Like a fingerprint, the identifier is permanently associated with the object, even if the underlying storage technology changes. Additional metadata about the object such as retention period or expiration date may be stored with the object as well.
OO storage systems essentially present the image of a large storage pool to the application. Because the application only needs to know the storage system, IP address and object identifier to access the information, it doesn't need to be aware of a file system layout or the physical storage configuration. The OO storage system stores and manages object information transparently to the fixed-content application, simplifying storage management and administration. If additional storage capacity is required, it may be added to the storage pool in a non-disruptive manner.
EMC's Centera is an example of an OO storage system specifically designed for fixed-content applications. Centera is a network-attached device, but it isn't NAS. EMC refers to Centera as content-addressed storage or CAS (see "Pros and cons of content addressed storage"). Centera's hardware architecture is based upon a redundant array of independent nodes (RAIN) architecture consisting of storage and access nodes. Each node is comprised of a 1GHz Pentium III processor, four 250GB ATA disks and three 10/100 BaseT network connections. The access nodes provide an interface to the client applications, while the storage nodes store application information in object form. Additionally, the nodes may be deployed in a clustered configuration for availability and performance. Centera's entry point is an 8-node configuration configured for either 2.9TB of usable capacity (mirrored protection) or 4.3TB of parity protected capacity. Each Centera cabinet can contain up to 32 nodes, and 16 cabinets can be configured as a single cluster. Centera can also be managed as a domain, which scales up to seven clusters, holding more than a petabyte of storage.
Unlike file-based storage solutions, fixed-content applications must support the Centera API for object storage and retrieval. An example of an application that's fully integrated with the Centera API is Xact Enterprise Content Integration Software, from Systemware in Dallas, TX. According to Systemware, Xact provides users with the ability to automatically set and enforce data retention policies, enable Web-based access to information, and repurpose existing content to drive new business opportunities. With Xact, users with fixed-content information on Unix, Windows, Linux and even mainframe platforms can centralize it all on Centera. According to EMC, there are over 50 Centera-integrated applications now available.
"The value of the solution is in the software," says Roy Sanford, Vice President Marketing & Alliance Development for EMC's Centera Division. Sanford is referring to CentraStar, the CAS software that powers Centera, which is responsible for storing, retrieving, verifying, and replicating objects. According to Sanford, Centera was built for fixed-content applications with regulatory requirements such as data immutability (proof that the data hasn't changed), and long-term retention periods. Objects are stored in a WORM format, meaning they cannot be updated or changed in place. If an object is read and changed in any way, a new Content Address will be generated. Centera will automatically prevent identical objects from being stored twice to minimize wasted space. For example, Centera will store only one copy of identical email attachments and return multiple Content Addresses or pointers to the stored object back to the email archive application.
With the Compliance Edition option, Centera prevents deletion of the object until after the retention period has expired. EMC is targeting this capability to stock brokers that must conform to Security Exchange Commission (SEC) regulatory requirements for records which must be unaltered, and kept in a non-erasable and non-rewritable format. Centera with Compliance Edition Plus stores these records as objects that can't be deleted or erased. "In this case, the only way to dispose of the record is to take the disk drive out and destroy it," says Sanford.
Centera is a good fit for companies interested in implementing a scalable disk-based solution to reduce data access time, while ensuring regulatory compliance for fixed-content information. Centera's content addressing scheme and ability to generate a unique ID for each stored object provides a secure safeguard to ensure the information can't be modified or erased. However, be sure that your fixed-content application vendor has certified their software with the Centera API. Be prepared to pay extra for Centera's enhanced functionality. Centera is priced at approximately 3 cents/MB to 4 cents/MB, depending on capacity and data protection method.
This was first published in May 2003