Availability, part 9: Look beyond clustering for high availability

When it works, when you should try something else.

In previous tips in this series on system availability, we introduced the idea that implementing availability requires...

taking a layered approach, and then following that layered approach, we looked at good system administrative practices, backups, disks and storage (and why larger disks aren't always better), networking, system's local environment, and server-based applications. The eighth level in the Availability Index (introduced in part one) is clustering.

It is very interesting to me that clustering is always used as a synonym for high availability. If you only get one point through this series of columns, it should be that you cannot achieve high availability simply by implementing clustering and walking away. High availability isn't clustering any more than it's mirroring or replication.

When you cluster two systems together (let's not worry about more than two right now), the second system (call it peppermints) automatically steps in for the first, incense, should incense stop working for some reason. All clustering really does is allow the critical services that were running on incense to recover more quickly than they would if peppermints weren't in the picture.

Since failover always adds some risk (what if the other machine is down, or what if something was changed since the last time, and nobody changed the failover configuration along the way…), it's always better NOT to need to failover. It's therefore better to build your systems so that they can survive as many types of outages as possible without requiring them to failover.

I have had users call me and tell me that they had a critical homegrown application that was crashing their server every four hours. Could they use clustering software to make it more highly available? The answer to that question is yes. They *could* use clustering software.

The better question is, should they? And that answer is no.

Actually, the real question they are asking is what they can do to increase the availability of their critical application. And while clustering would increase the availability, there is a much better solution: fix the application.

Fixing the application is more work than implementing the cluster, and may take longer to complete. It's more expensive. It's harder. But it's the right way to increase the application's availability. Attack the root cause; anything else is a band-aid fix.

If clustering were always the right way, or the only way, to increase system availability, then it would not be at the eighth level of the Availability Index. The construction of a critical highly available system is like the construction of a tall building; if you start on the eighth floor, the building simply will not stand. If you try building highly available systems on top of shoddy applications, bad disks, and an unreliable operating system, clustering simply won't help. The system will continue to suffer interruptions. As we'll discuss in a future column, although it costs money to implement the tools that give your systems their required level of availability, it often costs more money when the systems are down. Whether or not it makes financial sense to implement a particular protective measure depends on the cost of doing so, balanced against the value that the application will deliver by being up a greater percentage of the time. There is no universal right answer.

Life would be so much easier if there were.

Evan L. Marcus is the Data Availability Maven at VERITAS Software. You can contact him at evan@veritas.com.

This was first published in February 2003

Dig Deeper on Data center storage



Find more PRO+ content and other member only offers, here.



Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:



  • Tintri VMstore T5000

    Like all of its VM-aware storage systems, Tintri’s first all-flash array -- the Tintri VMstore T5000 -- allows admins to bypass ...

  • SolidFire SF9605

    The high-capacity SolidFire SF9605 uses SolidFire’s Element OS 8 (Oxygen) to deliver new enterprise features such as synchronous ...

  • HPE 3PAR StoreServ 20850

    HPE 3PAR StoreServ 20850 holds 1,024 solid-state drives (SSDs). Hewlett Packard Enterprise claims it can deliver more than three ...





  • Asigra Cloud Backup Version 13

    Asigra Cloud Backup Version 13 provides an AWS Elastic Block Store Snapshot Manager and the ability to support Docker container ...

  • Veeam Availability Suite v8

    Veeam Availability Suite v8 offers several key backup software components in one package, including Veeam Cloud Connect, Snapshot...

  • Druva inSync 5.5

    Druva inSync 5.5 endpoint backup software stands out with its proactive compliance, cloud app integration, full text search and ...