Published: 04 Aug 2003
Grid computing--also called computing on demand--has the potential to change the computing model we've used for more than a half of a century. Computers have essentially been self-contained systems with local memory, I/O and storage. This model has been expanded and refined over the years while remaining essentially unchanged.
Of course, there have been numerous computer revolutions in the last five decades: networked computers on LANs; two-tier computing or client-server computing, which separates the application from the data; three-tier computing that separates the user interface from the applications and the data such as Java. And the short list of giant leaps in computing would certainly need to include the Internet, WWW, SANs and high- performance or massively parallel computers.
Yet in spite of these admittedly significant advances, the basic model remained the same. Grid computing is the first real change to the model that leverages these advances. For the first time, users will be able to get the computing they need, on demand, without having to know anything about computing. Computing on demand will have a huge impact on the way data is used, shared and stored.
|Grid Computing's Impact on storage|
Internet on steroids
Some have called grid computing the Internet on steroids. In reality, it's the virtualization of computing to the user. Grid computing allows IT organizations to forego having to provide individual computers to provide computational capabilities.
The Global Grid Forum and the IETF for Grid
Computing define the grid as a type of parallel and distributed system that enables the sharing, selection and aggregation of resources distributed across multiple administrative domains based on their availability, capability, performance, cost and users' QoS requirements.
Grid computing is often confused with cluster computing. If distributed resources happen to be managed by a single, global centralized scheduling system, then it's a cluster. In a cluster, all nodes work cooperatively with a common goal and objective as a centralized, global resource manager performs the resource allocation. In a grid, each node has its own resource manager and allocation policy.
What this means is that applications will be able to dynamically draw processing cycles, memory, I/O and storage from anywhere on the grid based on its moment-by-moment requirements and resource availability. The grid continuously identifies computer processing, I/O, memory or storage demand and assigns the appropriate computer resources to fulfill that demand.
The compute resources on the grid must be interconnected across a high-speed, high-performance network. These resources can be local or geographically distributed. The grid resource allocation software takes into account the physical location of the resources before assigning them. Bandwidth and performance to each compute resource have an impact on the decision algorithm.
Grid computing has enormous potential and has captured the attention of the computer industry as well as a fistful of research dollars from companies such as Fujitsu Softek, Hewlett-Packard, IBM, Microsoft, Siemens, Sun Microsystems and others. Grid computing has the potential to reduce research time by orders of magnitude. It could allow cancer researchers to test thousands of drug candidates in minutes vs. years, or astronomers to map all near earth objects in months vs. decades or provide meteorologists with an incredibly accurate forecast of any given spot in the world. Grid computing is at the same point that the World Wide Web was in 1991.
That sounds almost too good to be true, but today there are dozens of working grid projects around the world. Of course, there are still many significant challenges ahead to take grid computing beyond the do-it-yourself experiment stage to commercially shrink-wrapped viability. Perhaps the most important challenge for grid computing will be how to manage the storage.
A clash of revolutions
Grid computing requires storage I/O that's typically orders of magnitude greater than most storage systems can provide today. This means that aggregate storage throughput will be measured in hundreds of gigabytes to terabytes per second instead of megabytes per second. Aggregate storage I/O will be measured by the hundreds of millions to many billions of IOPS. Additionally, the storage must be capable of being geographically distributed while maintaining a single image. Finally, that storage must be capable of scaling both capacity and performance linearly up to tens of petabytes, exabytes and even yottabytes in a single image. (A yottabyte is a billion petabytes.) Many in the grid computing community have postulated that the storage systems themselves must be able to work together in a parallel storage grid. These are by no means trivial tasks, and appear to require a paradigm shift in storage thinking.
Storage systems and SANs have gone through a different and well-documented revolution over the past five years. There has been an accelerating move away from direct-attached storage (DAS) toward storage area networks (SANs). As you know, storage systems have become increasingly more sophisticated providing additional functionality such as replication, snapshot, mirroring, SRM and even policy-based management. Storage applications and intelligence are starting to show up in the SAN fabrics in the form of appliances and intelligent switches.
So what happens when the storage revolution meets a diametrically different grid computing revolution? The answer is something changes. The revolution that comes out on top is the one that provides the greatest overall value to the IT organizations.
Since the mid-'90s, storage has become the tail that wags the IT budget dog. The premise behind this is that the data is the critical asset or family jewels to the organization. Therefore, the proper storing and protection of that data is mission critical. Over the last decade, there has been an outpouring of products that more efficiently and reliably, store, protect, access, service and secure data. And now the focus of most new storage products is to help better manage storage at a lower cost.
Grid computing will reduce capital and operating costs for the all aspects of the computing environment in the IT organization. It will concurrently increase the IT organizations capabilities and flexibility while making these systems a living adaptive entity. Storage will have to adapt to grid computing.
There are two logical paths storage systems will probably evolve as a result of grid computing's requirements. The first path is to deploy a more sophisticated and more capable storage system than any of today's familiar high-end names. This storage system will be massively parallel and capable of linearly scalability in both capacity and performance. Multiple storage subsystem controllers must have the ability to work together as peers locally or geographically distributed on high-speed networks just like the grid computer resources.
The second path is the complete opposite where the storage subsystem becomes simpler and completely slaved to specific compute resources. The compute resources are already massively parallel and can provide most if not all of the storage applications. A closer examination will show how each approach meets the storage requirements of grid computing.
Storage subsystem controllers are in effect purpose-built servers. This means they are servers that are hardware- and software-optimized for storage. It's not a big stretch imagining these purpose-built servers moving from being a slave (as it is today) to becoming a peer in the grid. This would allow the grid to allocate storage resources identically to compute resources. When an application on the grid requires local block or file storage access, the storage subsystem controller that best fits this need can be located and allocated dynamically. As far as the grid is concerned, the storage subsystem is a compute resource specifically for storage.
One example comes from YottaYotta's NetStorager storage system. This storage system can scale linearly in both capacity and performance to potentially yottabytes of storage in a single image. The storage can either be local or distributed across the room or across the world.
YottaYotta accomplishes this scale by its unique cache coherency across multiple controllers. The cache in one controller knows what's in the cache of all the other controllers. "We started with a clean sheet design for the NetStorager to specifically address the requirements of HPC and grid computing," says Wayne Karpoff, CTO of YottaYotta. "We knew that many of the grid's storage requirements such as replication, throughput,and real-time geographic distribution needed to be resolved."
This type of storage system is ideal for grids that span multiple geographic locations. If an application requires more compute resource and more storage resource, the peer-to-peer storage system will be able to efficiently tie the additional resources together transparently. This makes the grid faster and more efficient.
|Storage and the future
of the data center
Compute resource slave
As opposed to the approach YottaYotta took, the computer resource slave paradigm calls for smaller less expensive storage subsystems. These subsystems would be slaved to specific compute resources and allocated by the grid as a package. This simpler approach could take several guises including DAS and simple (e.g. non-intelligent) SANs, multi-ported shared RAID, very low- cost RAID, JBOD or SBOD all tightly coupled with the compute resource. The grid's software can then decide (based on policies) which data is copied, distributed or highly available.
Massively parallel database (MPbase) from Open Sky Technologies uses loosely coupled Linux or Solaris compute resources that are tied together over high performance TCP/IP networks. MPbase can be local and/or geographically distributed. Each MPbase compute resource is tightly coupled with an appropriate amount of storage (as low as two to four disks.) MPbase works better when compute resources use fewer disks.
MPbase organizes the data in a naturalized fashion versus normalized for RDBMS systems. Naturalized meaning that like data aligns with like data instead of having to be placed in tables. In the process of this naturalization MPbase natively stores the data both compressed and encrypted. It's also capable of searching and processing the data in that mode.
MPbase typically utilizes fewer than 1% to 2% of the storage required with current RDBMS systems. This database system automatically replicates, mirrors, and encrypts between any of the locations on the grid. It was designed from the ground up to be a grid based database structuring environment supporting any-to-any and many-to-many data structures. As queries become more complex, MPbase actually works faster.
This data structure system works best if it's storage is unsophisticated. Ken Tratar, partner in the Open Sky venture likes to use this analogy: "The current data structure environment is like having an F-18 tied to the Empire State building. Grid computing complicates the picture. It's like having 500 F-18s tied to the Empire State building. Eventually the building breaks. MPbase frees those F18s from the building."
Richard Foster, CTO of WestGrid (a consortium of western Canadian universities funded by the Canada Foundation for Innovation ) has a strong opinion of how storage technology must evolve to meet the requirements of grid computing. He's an advocate of the sophisticated storage path.
According to Foster, "grid computing is requiring ever-larger and increasing numbers of datasets to be accessed anywhere over large geographical distances...[and] these datasets will need a parallel geographically dispersed storage grid that matches the compute grid. Storage anywhere must be aware of the storage everywhere and appropriately align dataset requests with bandwidth, I/O performance, and location."
Which design wins?
Both the more sophisticated and simpler storage paths have merit and risk. The tendency in storage is to always reduce the risk first and then reduce the cost. This adds weight to the arguments for a simpler approach. The immaturity of grid computing software pushes the arguments the opposite way towards the more sophisticated approach.
Neither model is likely to triumph completely. Both could coexist in the same organization, for that matter. And regardless of which path an organization selects, the impact on SANs will be dramatic.
If storage subsystems become more sophisticated and a peer in the grid, then the SAN will very likely be relegated to a simple switched interconnect. Complex zoning schemes and security become redundant because of the grid's own built-in security. This should drive towards a lower cost invisible SAN--something many end users are asking for.
On the other hand, if the storage subsystems are simpler and a slave to compute resources on the grid, then market pressures will drive that simplicity and lower cost into the SAN. Either way, storage approaches will be applying significant commoditization pressure on SANs.
What about the storage applications moving into the fabric itself? The grid will manage many storage applications, which would mean that if there were appliances, they would have to become grid aware and a peer in the grid. If they are application platform switches, they could end up being obsolete.
Grid computing has a high probability of making iSCSI far more important. There are already visions of a "World Wide Grid" along the lines of the "World Wide Web." Grids will be geographically dispersed. This means that the span of the SAN must extend outside of the traditional data center. The iSCSI protocol was designed with that requirement in mind. With iSCSI's ability to go great distances and the coming RDMA on TCP/IP and GigE, it will likely play a roll in future grids that Fibre Channel cannot.
Over the past decade, IT budgets have been increasing as a percentage for storage and SANs while decreasing for servers. The benefits of grid computing will put pressure in the opposite direction, toward a greater investment in the fundamental computing infrastructure. Just at the dawn of storage networking, we may be at the apogee of storage power right now.