Grid computing will change storage


This article can also be found in the Premium Editorial Download "Storage magazine: How storage managers can survive e-mail archiving."

Download it now to read this article plus other related content.

A clash of revolutions
Grid computing requires storage I/O that's typically orders of magnitude greater than most storage systems can provide today. This means that aggregate storage throughput will be measured in hundreds of gigabytes to terabytes per second instead of megabytes per second. Aggregate storage I/O will be measured by the hundreds of millions to many billions of IOPS. Additionally, the storage must be capable of being geographically distributed while maintaining a single image. Finally, that storage must be capable of scaling both capacity and performance linearly up to tens of petabytes, exabytes and even yottabytes in a single image. (A yottabyte is a billion petabytes.) Many in the grid computing community have postulated that the storage systems themselves must be able to work together in a parallel storage grid. These are by no means trivial tasks, and appear to require a paradigm shift in storage thinking.

Storage systems and SANs have gone through a different and well-documented revolution over the past five years. There has been an accelerating move away from direct-attached storage (DAS) toward storage area networks (SANs). As you know, storage systems have become increasingly more sophisticated providing additional functionality such as replication, snapshot, mirroring, SRM and even policy-based management. Storage applications and intelligence are starting to show up in

Requires Free Membership to View

the SAN fabrics in the form of appliances and intelligent switches.

So what happens when the storage revolution meets a diametrically different grid computing revolution? The answer is something changes. The revolution that comes out on top is the one that provides the greatest overall value to the IT organizations.

Since the mid-'90s, storage has become the tail that wags the IT budget dog. The premise behind this is that the data is the critical asset or family jewels to the organization. Therefore, the proper storing and protection of that data is mission critical. Over the last decade, there has been an outpouring of products that more efficiently and reliably, store, protect, access, service and secure data. And now the focus of most new storage products is to help better manage storage at a lower cost.

Grid computing will reduce capital and operating costs for the all aspects of the computing environment in the IT organization. It will concurrently increase the IT organizations capabilities and flexibility while making these systems a living adaptive entity. Storage will have to adapt to grid computing.

There are two logical paths storage systems will probably evolve as a result of grid computing's requirements. The first path is to deploy a more sophisticated and more capable storage system than any of today's familiar high-end names. This storage system will be massively parallel and capable of linearly scalability in both capacity and performance. Multiple storage subsystem controllers must have the ability to work together as peers locally or geographically distributed on high-speed networks just like the grid computer resources.

The second path is the complete opposite where the storage subsystem becomes simpler and completely slaved to specific compute resources. The compute resources are already massively parallel and can provide most if not all of the storage applications. A closer examination will show how each approach meets the storage requirements of grid computing.

Storage subsystem controllers are in effect purpose-built servers. This means they are servers that are hardware- and software-optimized for storage. It's not a big stretch imagining these purpose-built servers moving from being a slave (as it is today) to becoming a peer in the grid. This would allow the grid to allocate storage resources identically to compute resources. When an application on the grid requires local block or file storage access, the storage subsystem controller that best fits this need can be located and allocated dynamically. As far as the grid is concerned, the storage subsystem is a compute resource specifically for storage.

One example comes from YottaYotta's NetStorager storage system. This storage system can scale linearly in both capacity and performance to potentially yottabytes of storage in a single image. The storage can either be local or distributed across the room or across the world.

YottaYotta accomplishes this scale by its unique cache coherency across multiple controllers. The cache in one controller knows what's in the cache of all the other controllers. "We started with a clean sheet design for the NetStorager to specifically address the requirements of HPC and grid computing," says Wayne Karpoff, CTO of YottaYotta. "We knew that many of the grid's storage requirements such as replication, throughput,and real-time geographic distribution needed to be resolved."

This type of storage system is ideal for grids that span multiple geographic locations. If an application requires more compute resource and more storage resource, the peer-to-peer storage system will be able to efficiently tie the additional resources together transparently. This makes the grid faster and more efficient.

This was first published in August 2003

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: