As Microsoft's collaboration app gains popularity, it will hold more and more corporate data; however, protecting that data isn't so easy.
In my last column (see "Are backups a waste of time?," Storage, March 2007), I suggested that, given the range of data protection options available today, it's necessary to develop a more unified view of how various data protection components are deployed in an environment. To accomplish that, one must adopt a view of data protection from the application perspective. This generally involves sitting down with a team, including members from applications, database, storage and backup, to piece together the various data protection activities each group performs. From there, one might hope to identify duplicate or overlapping activities to improve efficiency or identify/fix holes in the data protection plan. Less common is the opportunity to design a comprehensive data protection strategy from scratch for a new app. Rarer still is the opportunity to do that for a new class of application.
With the release of Microsoft Office SharePoint Server 2007, organizations find themselves in the last situation. The latest version of SharePoint features new functionality and a degree of integration with other Office products that will likely cause it to become a de facto standard for collaboration in many environments--but don't construe that as a recommendation or endorsement. The wisdom of relocating documents from a file system to a database is debatable, but I suspect it will become a fact of life that many storage and data protection professionals will need to address.
SharePoint presents a number of data protection challenges that, in a large enterprise, can result in significant levels of complexity rather quickly. Furthermore, by its very nature, SharePoint can mushroom throughout an organization, leading to high storage consumption levels. It's therefore critical that storage and data protection groups work with application teams to plan an effective data protection strategy. Let's look at the structure of SharePoint and its built-in data protection capabilities, and then discuss some approaches to ensure efficient recoverability.
SharePoint infrastructure basics
SharePoint can be thought of as an enterprise portal, intranet, content management and collaboration tool that provides indexing and search, document versioning and a Web-based development platform. It's a multitiered application consisting primarily of Web front-ends (WFEs), application servers and database servers. While these three components could theoretically reside on a single server, in most cases multiple servers are used. A collection of these servers constitute a SharePoint farm (see "Sample Microsoft SharePoint farm," below). Each SharePoint farm contains one common configuration database and one or more content databases.
|Sample Microsoft SharePoint farm|
An enterprise SharePoint implementation may consist of multiple farms, each with numerous WFEs, application servers and content databases. In addition, special-purpose servers for indexing and application support, such as Excel Calculation Services servers or Microsoft Project servers, can reside within a farm. With these varied components, it's easy to see how a SharePoint environment can become a data protection nightmare. While most of the data resides in content databases, data related to the farm itself and its components resides in the configuration database, while customization and other special-purpose information for the WFEs, application servers, index servers and so on sits on each of these systems themselves. As a result, managing all of these diverse elements requires a good deal of planning.
Recovering SharePoint content
To effectively plan a SharePoint data protection strategy, you have to consider content recovery and disaster recovery (DR). Widespread adoption of SharePoint within an organization will likely mean that SharePoint servers will eventually replace existing file servers. In other words, users will access documents via SharePoint Web portals (and, ultimately, SQL databases) rather than as files in a shared directory. I don't believe the potentially enormous implication on traditional nightly backup has been adequately considered. There are masses of files currently managed as individual entities by a backup app that would now be stored in monolithic (from a backup app perspective) SQL databases under SharePoint.
But a more immediate issue might be understanding how SharePoint content is recovered. While this version represents a significant step forward from earlier versions in this regard, it still leaves much room for improvement. Consider the analogy to another wildly successful enterprise Microsoft application--Exchange. Most backup administrators have vivid memories of the difficulties restoring content from early versions of Exchange. Because of API limitations back then, most backup applications only supported recovery of an entire Exchange Information Store--a time-consuming procedure of restoring to an alternate server, searching for the specific messages or mailboxes to be recovered, and then migrating the data into the production Information Store.
Prior to the 2007 version, a similarly arduous process was required for SharePoint. But Microsoft has added an important timesaver--a recycle bin--to ease the pain. Actually, SharePoint now has two recycle bins: a user and a site-level (or administrative) recycle bin. Recycle bin functionality can be disabled, and both size quotas and object expirations can be applied. In addition, SharePoint inherently supports document versioning, so it's possible to revert to an earlier version without necessarily having to do a recovery. While the recycle bin will help eliminate many nuisance-level file recovery issues, if a document must be recovered the only options are the traditional Exchange-like recovery process or via a third-party product--more about these later.
DR for SharePoint environments is even trickier. SharePoint provides GUI and command line (stsadm.exe) options to back up an entire farm, for example, but Microsoft subtly suggests that these options are most effective for "small to medium" deployments. Also, there's no scheduling mechanism within SharePoint, so automation would need to be performed by scripts executed via the Windows Task Scheduler.
Of course, because SharePoint largely consists of SQL Server databases, one approach is to simply back up or protect those databases using the tried-and-true approach of your choice. This is certainly a viable option, but one must further consider that, because traditional SQL backups will lack knowledge or context of a SharePoint environment, additional time-consuming postprocessing activities will have to be performed to reintegrate the database back into SharePoint. Furthermore, SQL backups can't adequately address the aforementioned content recovery issues nor the additional steps needed to recover WFEs and application servers. Even index servers, if they're large, may need a recovery strategy as rebuilding would be very time-consuming.
To meet recovery time objectives and recovery point objectives for both operational recovery and DR, most organizations will turn to third-party products. The obvious first place to look is at your backup app. Major vendors have vowed support for SharePoint 2007, but to date only Comm-Vault has announced a comprehensive strategy for both content and disaster recovery. Symantec has supported content-level recovery via a Backup Exec 11d agent since SharePoint 2003 and is expected to provide similar capabilities for SharePoint 2007. Enterprise backup products such as Symantec Veritas NetBackup and EMC NetWorker offer limited SharePoint-specific functionality at this time.
Startup vendor AvePoint has carved out an interesting niche as a SharePoint data protection specialist. Its DocAve 4.1 suite offers a comprehensive set of solutions, including the ability to perform live backups without suspending indexing. By the time you read this, AvePoint is expected to announce a relationship with IBM to incorporate its product as Tivoli Storage Manager's SharePoint protection offering.
Beyond backup apps, other technologies that lend themselves particularly well to SharePoint data protection are snapshotting, replication and virtualization. Regarding the last, a number of companies are running SharePoint 2003 servers within virtualized environments with considerable success. A word of warning, however: Microsoft says it doesn't officially support virtual configurations at this time. That's unfortunate because management of all SharePoint elements--WFEs, app and index servers, and databases--could be dramatically simplified using virtualized servers.
It's imperative to define and deploy standard configurations and consistent processes for SharePoint components and to be sure that these are fully documented. As SharePoint 2007 environments become commonplace, more technology options are sure to appear along with a well-defined body of best practices.