This article can also be found in the Premium Editorial Download "Storage magazine: How storage managers can survive e-mail archiving."
Download it now to read this article plus other related content.
|Storage and the future
of the data center
Compute resource slave
As opposed to the approach YottaYotta took, the computer resource slave paradigm calls for smaller less expensive storage subsystems. These subsystems would be slaved to specific compute resources and allocated by the grid as a package. This simpler approach could take several guises including DAS and simple (e.g. non-intelligent) SANs, multi-ported shared RAID, very low- cost RAID, JBOD or SBOD all tightly coupled with the compute resource. The grid's software can then decide (based on policies) which data is copied, distributed or highly available.
Massively parallel database (MPbase) from Open Sky Technologies uses loosely coupled Linux or Solaris compute resources that are tied together over high performance TCP/IP networks. MPbase can be local and/or geographically distributed. Each MPbase compute resource is tightly coupled with an appropriate amount of storage (as low as two to four disks.) MPbase works better when compute resources use fewer disks.
MPbase organizes the data in a naturalized fashion versus normalized for RDBMS systems. Naturalized meaning that like data aligns with like data instead of having to be placed in tables. In the process of this naturalization MPbase natively stores the data both compressed and encrypted. It's also capable of searching and processing the data in that mode.
MPbase typically utilizes fewer than 1% to 2% of the storage required with current RDBMS systems. This database system automatically replicates, mirrors, and encrypts between any of the locations on the grid. It was designed from the ground up to be a grid based database structuring environment supporting any-to-any and many-to-many data structures. As queries become more complex, MPbase actually works faster.
This data structure system works best if it's storage is unsophisticated. Ken Tratar, partner in the Open Sky venture likes to use this analogy: "The current data structure environment is like having an F-18 tied to the Empire State building. Grid computing complicates the picture. It's like having 500 F-18s tied to the Empire State building. Eventually the building breaks. MPbase frees those F18s from the building."
Richard Foster, CTO of WestGrid (a consortium of western Canadian universities funded by the Canada Foundation for Innovation ) has a strong opinion of how storage technology must evolve to meet the requirements of grid computing. He's an advocate of the sophisticated storage path.
According to Foster, "grid computing is requiring ever-larger and increasing numbers of datasets to be accessed anywhere over large geographical distances...[and] these datasets will need a parallel geographically dispersed storage grid that matches the compute grid. Storage anywhere must be aware of the storage everywhere and appropriately align dataset requests with bandwidth, I/O performance, and location."
Which design wins?
Both the more sophisticated and simpler storage paths have merit and risk. The tendency in storage is to always reduce the risk first and then reduce the cost. This adds weight to the arguments for a simpler approach. The immaturity of grid computing software pushes the arguments the opposite way towards the more sophisticated approach.
Neither model is likely to triumph completely. Both could coexist in the same organization, for that matter. And regardless of which path an organization selects, the impact on SANs will be dramatic.
If storage subsystems become more sophisticated and a peer in the grid, then the SAN will very likely be relegated to a simple switched interconnect. Complex zoning schemes and security become redundant because of the grid's own built-in security. This should drive towards a lower cost invisible SAN--something many end users are asking for.
On the other hand, if the storage subsystems are simpler and a slave to compute resources on the grid, then market pressures will drive that simplicity and lower cost into the SAN. Either way, storage approaches will be applying significant commoditization pressure on SANs.
What about the storage applications moving into the fabric itself? The grid will manage many storage applications, which would mean that if there were appliances, they would have to become grid aware and a peer in the grid. If they are application platform switches, they could end up being obsolete.
Grid computing has a high probability of making iSCSI far more important. There are already visions of a "World Wide Grid" along the lines of the "World Wide Web." Grids will be geographically dispersed. This means that the span of the SAN must extend outside of the traditional data center. The iSCSI protocol was designed with that requirement in mind. With iSCSI's ability to go great distances and the coming RDMA on TCP/IP and GigE, it will likely play a roll in future grids that Fibre Channel cannot.
Over the past decade, IT budgets have been increasing as a percentage for storage and SANs while decreasing for servers. The benefits of grid computing will put pressure in the opposite direction, toward a greater investment in the fundamental computing infrastructure. Just at the dawn of storage networking, we may be at the apogee of storage power right now.
This was first published in August 2003