This article can also be found in the Premium Editorial Download "Storage magazine: Better disaster recovery testing techniques."
Download it now to read this article plus other related content.
Current grid computing projects manned mostly by scientific teams offer some tantalizing prospects for general corporate computing. Imagine making your organization's data accessible throughout the world or replicating data to multiple, geographically dispersed sites--even sites you don't own or control, but with which you collaborate.
If you use traditional access-control methods, the barriers to this scenario are substantial. You could, for example, set up replicated Web FTP mirror sites with user logins and passwords to all of the sites providing access, or set up VPN access to each site holding the data.
Globus is the progenitor of many of today's grids. The Globus Alliance and the Global Grid Forum (GGF) support the Globus Toolkit, and have developed some of the fundamental services required to implement a grid. The GGF is also charged with popularizing the grid by making it easier for all users to participate in grid work.
There are essentially three different modes of Globus support software: the API-based model in Globus Toolkit version 2.0 (GT2), the service model in GT3 and the Web services resource framework in Globus Toolkit version 4.0 (GT4), released last May.
There are many compute-data grids in operation around the world, including AstroGrid, the Biomedical Informatics Research Network, the Enabling Grids for E-sciencE, Grid Physics Network and the Particle Physics Data Grid.
With so many storage vendors touting some sort of grid architecture these days, an accurate definition of a grid may be elusive. For the purposes of this article, a grid spans sites, companies and continents with non-proprietary hardware, software and protocols supporting authenticated access, replication and compute services. Clustered file systems don't qualify as data grids because they typically exist at one or two sites and require high bandwidth connections between nodes. Wide-area file systems come closer to a data grid model, but they don't currently offer continent spanning or multicompany hosted data; they also require proprietary hardware, software and internode protocols.
It's possible to use a grid to securely share your data and compute services. To tap into these capabilities, you need to implement standard, compliant grid services on your systems. These services are available from the open-source community; proprietary grid products are also available from some vendors, including IBM Corp., Oracle Corp., Silicon Graphics Inc. (SGI)/YottaYotta Inc. and Sun Microsystems Inc.
Data grids are perfect for organizations that need a collaborative work environment despite having diverse, distributed resources where data resides across multiple business and/or organizational domains. Data grid services allow users to access and manipulate data residing at sites around the world. Data can be retrieved from any location on the grid, and can be deposited or replicated to any location with space.
A compute grid can schedule computation to occur at one site with the results transmitted to another (see "Open-source grids," above), and a compute grid may exist with or without a data grid. Together, a compute grid and data grid can interoperate to move data residing throughout the grid to where computation can occur and send results wherever required.
For example, animators can publish images on a grid and provide access to other artists to supply the background, foreground and other elements. Further processing can be done on any grid-enabled system with available cycles. Results can be transmitted back to the original location or sent elsewhere for further processing. Computations can be handed from one system to another to take advantage of each node's capabilities.
This was first published in October 2005