Manage Learn to apply best practices and optimize your operations.

Tips for the everyday admin -- #9

Tips for the everyday admin -- #9

Evan Marcus

Evan Marcus is our expert in high availability. Evan is also a Principal Engineer at Veritas Corp..

If you have a question for Evan, enter it here.

Also, if you are looking for more on high availability, view more of Evan's expert answers.

Our countdown, brought to you by high availability expert Evan Marcus, includes some common sense tips for the everyday storage admin to follow.

#9: Invest in failure isolation

Apps should check for all error conditions
- Act on them when you find them
- Requires developer training
Failure in one component shouldn't propagate
- Network failures not seen by router or network management layer
- Disk failure, not seen by application after write error
Catching errors late probably means data corruption
- Error has propagated through system
- May leave other unknown side-effects

Looking for more great Evan Marcus information?

Check out the Evan Marcus availability tips section of

Also, visit our bookstore for Evan's book: Blueprints for high availability: Designing resilient distributed systems.

Have your own tips for the everyday admin? Submit them here.

This material is copyright 1997-2002 by Evan Marcus and Hal L. Stern. It may not be used in whole or part for commercial purposes without the express permission of both authors.

Dig Deeper on Data storage strategy

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.