Applications are the devices that produce revenue and boost employee efficiency. As hardware manufacturers approach their goal of 100% uptime, we as IT professionals need to do the same. We also need to understand that our applications are not nearly as reliable as hardware, and there are some strategies we can use to work toward the goal of "maximum application efficiency."
Start with a simple brainstorming session
There are some proven techniques that you can use to extract very insightful observations from a group. Use them to evaluate circumstances associated with "the system" misbehaving. Include representatives from all disciplines (e.g. administrators, users and developers). Compile the findings into a prioritized list and research them for validity. Standardize your terminology. The "system" will often mean different things to each individual. Typical observations will be the ones that everyone is familiar with, but when you push the envelope you will uncover intuitive observations that may have never been expressed.
Compare your brainstorming results to the experts'results
You will probably find that application problems will be associated with an increased number of components, change and human intervention and recovery time.
Build a written plan that systematically addresses the cause of each problem
Here is what I think you will find: As the number of components in your infrastructure increases there will be more spot failures. Statistical analysis supports the theory that fewer components produce fewer failures. An example is the theory of RAID. Using a five-disk stripe, the mean time between data loss (not disk failure) is about 50 years. That is the likelihood that two disks will fail at the same time. Increase the stripe size to 10 disks and the MTBDL drops to 12 years. You better have hot spares available! The same holds true for cable connections, hubs, switches, controllers and even servers. There is an exponential increase in the likelihood of a failure as the number of components increases.
Take advantage of designs that incorporate backplanes
Mechanical design is an issue. Take advantage of designs that incorporate backplanes instead of loose cables. Blade servers and blade storage are good examples. The components are plugged directly into shelves that eliminate cables and provide direct high-speed connections to each other. The blades take on the characteristics of a single system in essence reducing the number of individual components, hence contributing to reliability.
We generally tout flexibility as a feature. But it is undisciplined flexibility that introduces human error and contributes to interruptions. Script and automate administrative tasks. Test on isolated servers and networks. Use management software that standardizes procedures and minimizes the number of interfaces to reduce confusion. Change and manual intervention are the biggest causes of application downtime.
Have the backups ready
The third biggest cause of application downtime is the time it takes to recover from a failure. Design your backup solution to recover critical data first. Develop or purchase applications that will support operations while superfluous data is still being restored.
The focus on application efficiency thus far has been on reliability -- minimizing interruptions. The other side of the efficiency story is performance tuning. Good performance can squeeze productive time out of each day and save money by fully utilizing available resources. But with tuning comes change. Proceed with caution!
For more information, check out Bob Feihel's white paper, "Application efficiency strategy via information systems consolidation".
About the author: Bob Feihel is an independent consultant, currently working with CSM Consulting, with over 20 years experience in the computer industry. He has held various engineering, sales, professional services and management positions with EMC, Data General and United Technologies. His background includes technical marketing, specialized software development, business systems analysis and design and systems integration, with experience in automotive data collection, aviation products and open-systems servers and storage systems. He is a Boston University certified project manager.