Data storage shops often struggle with anticipating new capacity requirements and ensuring that business needs can be met. Ford Motor Company took a unique approach to the problem and made capacity planning as simple as balancing your checkbook.
By Thomas Woods
A funny thing happened to the Ford Motor Company enterprise storage team: Storage Sunday maintenance post-mortem review meetings became boring. The meetings were no longer spiced up with horror stories about outages caused by maintenance action failures. Thanks to the enterprise virtual storage program, many high-risk and high-impact maintenance actions had been mitigated by the ability to transparently, and without server interruption, move critical virtual logical unit numbers (LUNs) from physical storage arrays before scheduled maintenance took place.
Aside from the woeful exploits of the Detroit Lions, Sundays would have been perfect. For those attending the weekly enterprise data storage infrastructure capacity forecasting meetings, the stress-reducing benefits of virtual storage were less evident.
Ford's enterprise storage architecture progressed in stages, from attaching servers directly to storage arrays, to attaching servers to storage-area networks (SANs) to physical storage arrays, to the current virtualized environment where SAN-attached servers link to virtualized physical storage arrays.
This transition occurred even as IT budgets were under intense scrutiny, with the requirement that new infrastructure purchases be directly traceable to end-user requests. But in the virtual world, the correlation of server to storage array is no longer a direct line. Adding to the complexity was a requirement that overhead and white space (capacity that's allocated but not used) had to be tracked and explained.
Virtualized storage raises new issues
After the introduction of virtual storage into the Ford environment, the data storage capacity forecasting meetings at the company reflected the new reality: All stakeholders had a piece of the action, but the teams lacked the tools, methods and approach to assemble a single consolidated view they could effectively communicate to the business side of the house. Some of the missing pieces were:
- The enterprise storage teams, specifically storage operations and storage engineering, didn't have an effective way to communicate with finance and purchasing.
- Capacity planning couldn't translate forecasts into storage capacity requirements.
- Finance and purchasing didn't fully grasp the multidimensional nature of virtual storage environments.
The frustration in the capacity planning meetings soon exacerbated long-standing trust issues among operations, engineering, sales and finance.
Because the communication problem crossed multiple team boundaries, a cross-functional team was created to address the issues, with representatives from all interested parties -- finance, storage operations, business management, capacity planning and storage engineering. The team reached a consensus that a systematic, repeatable and traceable method to track and forecast network-attached storage (NAS), SAN and backup infrastructure was needed. This method would have to facilitate understanding for all storage capacity stakeholders by providing the following capabilities:
- Infrastructure ordering personnel would be able to place customer orders without being required to be "storage Gurus."
- Storage administrators would be able to use customer orders to create forecasts against the key physical and virtual storage components, while meeting customer requirements and business constraints, in a repeatable fashion.
- Business managers would be able to track customer demand to proposed infrastructure projects.
- Storage managers would be able to justify non-customer infrastructure capacity increase requirements, such as temporary storage needed for a data migration or to accommodate the organic growth of a specific storage subsystem.
- Storage engineers would need to verify that storage infrastructure is deployed consistently and adheres to engineering standards.
The storage management solution
The cross-functional team implemented a storage ledger, or storage "checkbook" with multiple "currencies" related to key storage consumables. For an ordinary checkbook, money is the only consumable, but a storage ledger checkbook has many consumables. The team decided to create three separate ledgers to track all consumables related to three major storage activities: NAS, SAN and backup. A single ledger was considered as it would have reduced the work for the groups that order storage, but the ledger would have been more complicated and considerably larger.
How a storage ledger works
Each ledger consists of two types of spreadsheets: a main ledger sheet and the component aggregation sheet used to calculate the top line (beginning balance) for each storage consumable that needs to be tracked. The main ledger sheet is divided into two sections. The first includes ordering columns where customers of storage services can write post-dated storage capacity checks. The information entered in the ordering columns drives the entries placed in the second set of columns, the storage capacity columns. The storage capacity columns aren't visible to the customers writing checks. The storage capacity columns include information needed by data storage administrators to map requests to a set of storage consumables associated with the appropriate environment and technology.
Click here to see a larger view of Ford Motor Company's capacity ledger.
The ledger process consists of the following steps:Step 1. Update the main ledger sheet top-line balances that are on the capacity side of the main ledger sheet. The top-line balance is calculated by adding all of the component capacities for the specific subsystems. The specific subsystems are tracked in separate tracking sheets and map to top-line ledger balances as the enterprise architecture dictates. The capacity columns are divided not only by technology, but also by other factors such as location or functional environment.
Step 2. Normalize the top-line balance for percent overhead. Many storage systems require a certain amount of overhead to run efficiently and can't be managed at 100% capacity. The percent overhead should be determined so that it will indicate the point at which new infrastructure is needed. The component aggregate sheet allows a business analyst to reference the source data if there's a concern that the percent overheads are too generous.
Step 3. Reconcile the ledger entries. Forecasted storage actions that have been executed or cancelled should be marked as completed and taken off the active forecasting ledger.
Step 4. Analyze the updated ledger. After the ledger has been updated, a storage infrastructure capacity forecasting meeting is held. Using the ledger's graphical output, the team determines what actions, if any, are required to address forecasted activities. The team works together to consider different scenarios, such as the effects of:
- Extending leases vs. buyouts of storage infrastructure
- Placing new or existing data on different storage performance tiers
- Backup and disaster recovery (DR) scenarios
- NAS vs. SAN scenarios
After the analysis and scenario assessments, the team recommends a best course of action to management. Because the ledger process has been initiated, the management review process has been considerably more collaborative because management is now provided with a set of high-quality and transparent options.
Step 5. Open ledger for general use. The ledger, like your checkbook, is to be used throughout the week. Steps 1 through 4 are intended for periodic baselining and reconciliation activities.
Step 6. Begin reconciliation. Close ledger for general use and return to Step 1.SAN storage ledger
The SAN storage ledger ordering columns are as follows:
- Storage performance level
- Type of storage performance
- Type of replication (none, local or remote)
- Capacity requested
- Forecast date
- Project name
The SAN storage ledger storage capacity columns are determined by:
- Environment (data center, virtual storage array environment, major business unit)
- Disk performance and type (high speed, SATA, etc.)
- Storage types (mirrored data center to data center, instant copy using local snapshots of data)
Besides storage end users, data storage administrators also place entries into the storage ledger. SAN storage admins may create ledger entries to forecast storage infrastructure needs because of lease expirations, to reserve temporary staging capacity needed for storage maintenance, and to document the impact of future storage realignments such as the effect of transferring data between storage tiers or technologies (e.g., SAN to NAS).
NAS and backup administrators may also write checks against the SAN storage environment. A NAS administrator is required to create a SAN ledger entry to track removal of storage from a NAS gateway that's slated to migrate to NAS Fibre-attached storage (FAS). A backup admin might write a check to request more disk pool storage for online disk backups.
NAS storage ledger
When the NAS storage ledger process started, the NAS environment at Ford Motor Company was more complicated than it is today. The environment consisted of multiple vendors with both gateway-based NAS (NAS devices connected to external storage-area networks administered by the SAN team) and FAS-based network-attached storage (NAS gateway heads built into OEM vendor-provided Fibre-attached storage arrays).
For gateway-based NAS, the NAS administrators are required to write SAN storage ledger checks to increase or decrease NAS gateway SAN capacity. The old system used tape-based Network Data Management Protocol (NDMP), which meant entries had to be made in the backup ledger. Ford has been migrating to a pure FAS-based NAS environment with a non-tape backup infrastructure supplied by a single vendor. In addition, only a single performance type of NAS storage is now offered. As a result, the NAS ledger has become simpler to reflect the less-complicated environment.
From a customer perspective, the NAS entries are the amount of data needed and the location of primary storage; the capacity columns track the amount of primary and backup mirrored storage requested. When there are a lot of small entries for the NAS ledger, they're bundled together to reflect a forecastable amount. NAS team entries consist of mainly major projects, a bundle of small projects for a specific month and organic growth predictions. NAS file systems that aren't protected by file system size quotas are the main causes of organic growth.
Backup storage ledger
The backup ledger is the most complicated ledger; Ford Motor Company is currently in the process of rolling out this ledger. Backup capacity isn't as straightforward as NAS or SAN capacity. For NAS and SAN capacity, the actual amount of storage is the dominant key performance indicator (KPI). But for backup, the amount of storage is just one of many criteria that must be tracked. Ford uses a progressive incremental backup approach, meaning that after an initial full backup only changed files are backed up. The team couldn't create a forecasting ledger without first understanding the backup system design criteria.
The first step in creating a backup forecast strategy is to develop criteria to address design tradeoffs, otherwise the backup systems could inadvertently slip into sub-optimal performance. The team developed the following design criteria:
- Daily backup disk pools are large enough to contain a single day's worth of file system incremental backup.
- Weekly backup disk pools are large enough to contain one week's worth of backup data before it's pushed to tape.
- Database size shouldn't exceed 60 GB (to ensure that DR recovery targets can be met).
- Data migration infrastructure must be able to dump disk pools to tape within four hours.
- Server backups will be co-located to dedicated tapes.
- Workstation backups will be co-located to dedicated tape groups.
For the daily incremental disk pool, a 14-day high-water mark of the disk pool average is derived and then averaged with all of the servers in that same environment. For example, assume there are 10 backup servers at a data center, each disk with 500 GB of disk pool for an aggregate disk pool of 5,000 GB. The 14-day average of each of the 10 servers is 350 GB (aggregate of 3,500 GB). The aggregate high-water mark backup capacity shouldn't go above 90%, so additional disk pool capacity should be provided before the 90% threshold is reached. Therefore the capacity is 5,000 GB times 90% minus 3,500 GB or 4,500 GB minus 3,500 GB = 1,000 GB of available aggregate disk pool.
It should be noted that there's a difference between capacity forecasting and day-to-day tuning and monitoring. Capacity forecasting is used primarily to determine future infrastructure capacity requirements; efficiently balancing current infrastructure should be a normal part of a storage admin's tasks.
The ledger's bottom line
Ford Motor Company's data storage forecasting and purchasing meetings are now more effective and collaborative, which helps ensure that the right amount of infrastructure is delivered at the right time. Storage forecasting capacity meetings are now data-driven and leverage dynamic what-if scenarios that can be created instantly. If there are questions on source data, stakeholders can quickly view ledger or spreadsheet data for more information. The ledger process also allows Ford and its IT service providers to work together better to ensure that capacity is tracked and forecasted appropriately. Any organization can implement new technology, but what truly makes a difference is how the organization adapts its actions to maximize the return on the investment.
BIO: Tom Woods is currently global ITIL services transition manager at Ford Motor Company. At Ford, Tom has held storage operations, engineering and architecture positions, and has supervised the backup and NAS teams.