Our countdown, brought to you by SearchStorage.com high availability expert Evan Marcus, includes some common sense tips for the everyday storage admin to follow.
#9: Invest in failure isolation
Apps should check for all error conditions
- Act on them when you find them
- Requires developer training
Failure in one component shouldn't propagate
- Network failures not seen by router or network management layer
- Disk failure, not seen by application after write error
Catching errors late probably means data corruption
- Error has propagated through system
- May leave other unknown side-effects
Looking for more great Evan Marcus information?
Check out the Evan Marcus availability tips section of SearchStorage.com.
Also, visit our bookstore for Evan's book: Blueprints for high availability: Designing resilient distributed systems.
Have your own tips for the everyday admin? Submit them here.
This material is copyright 1997-2002 by Evan Marcus and Hal L. Stern. It may not be used in whole or part for commercial purposes without the express permission of both authors.