Data grids for storage


This article can also be found in the Premium Editorial Download "Storage magazine: Better disaster recovery testing techniques."

Download it now to read this article plus other related content.

Current grid computing projects manned mostly by scientific teams offer some tantalizing prospects for general corporate computing. Imagine making your organization's data accessible throughout the world or replicating data to multiple, geographically dispersed sites--even sites you don't own or control, but with which you collaborate.

If you use traditional access-control methods, the barriers to this scenario are substantial. You could, for example, set up replicated Web FTP mirror sites with user logins and passwords to all of the sites providing access, or set up VPN access to each site holding the data.

Requires Free Membership to View

Open-source grids
Globus is the progenitor of many of today's grids. The Globus Alliance and the Global Grid Forum (GGF) support the Globus Toolkit, and have developed some of the fundamental services required to implement a grid. The GGF is also charged with popularizing the grid by making it easier for all users to participate in grid work.

There are essentially three different modes of Globus support software: the API-based model in Globus Toolkit version 2.0 (GT2), the service model in GT3 and the Web services resource framework in Globus Toolkit version 4.0 (GT4), released last May.

There are many compute-data grids in operation around the world, including AstroGrid, the Biomedical Informatics Research Network, the Enabling Grids for E-sciencE, Grid Physics Network and the Particle Physics Data Grid.

But it isn't easy to replicate data to alternate sites with an FTP site, and user IDs/passwords become a major hassle with multiple sites. VPNs require different passwords and configurations for each data repository site, and users would certainly balk at having to navigate 10 or 100 VPN connections to get one piece of data. Another--and better--solution is to use a data grid.

Data grids
With so many storage vendors touting some sort of grid architecture these days, an accurate definition of a grid may be elusive. For the purposes of this article, a grid spans sites, companies and continents with non-proprietary hardware, software and protocols supporting authenticated access, replication and compute services. Clustered file systems don't qualify as data grids because they typically exist at one or two sites and require high bandwidth connections between nodes. Wide-area file systems come closer to a data grid model, but they don't currently offer continent spanning or multicompany hosted data; they also require proprietary hardware, software and internode protocols.

It's possible to use a grid to securely share your data and compute services. To tap into these capabilities, you need to implement standard, compliant grid services on your systems. These services are available from the open-source community; proprietary grid products are also available from some vendors, including IBM Corp., Oracle Corp., Silicon Graphics Inc. (SGI)/YottaYotta Inc. and Sun Microsystems Inc.

Data grids are perfect for organizations that need a collaborative work environment despite having diverse, distributed resources where data resides across multiple business and/or organizational domains. Data grid services allow users to access and manipulate data residing at sites around the world. Data can be retrieved from any location on the grid, and can be deposited or replicated to any location with space.

Compute grids
A compute grid can schedule computation to occur at one site with the results transmitted to another (see "Open-source grids," above), and a compute grid may exist with or without a data grid. Together, a compute grid and data grid can interoperate to move data residing throughout the grid to where computation can occur and send results wherever required.

For example, animators can publish images on a grid and provide access to other artists to supply the background, foreground and other elements. Further processing can be done on any grid-enabled system with available cycles. Results can be transmitted back to the original location or sent elsewhere for further processing. Computations can be handed from one system to another to take advantage of each node's capabilities.

This was first published in October 2005

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: