Ezine

This article can also be found in the Premium Editorial Download "Storage magazine: Top 15 Storage hardware and software Products of the Year 2006."

Download it now to read this article plus other related content.

How long until the ERP app is up?

    Requires Free Membership to View

It's a familiar refrain: The business unit wants to know how long the application will be unavailable if it sustains a failure and IT's standard answer is "It depends." Why? The time required depends on the type of clustering, the operating system, the time for the database to be returned to a stable state, as well as other factors. When an active-passive cluster sustains a failure, there are several steps required to prepare the passive server to activate the application.
  1. The passive server activates access to the storage through the operating system. This action is similar to the action taken during a normal boot, but the only disks activated are those associated with the database application. Journaling techniques reduce the time to activate.
  2. The passive server then starts the database application on its node. Because the disk has now been activated to the operating system, the database application loads in memory and begins to check the status of the database tables on the recently activated disks.
  3. The database application now running on the passive server performs recovery of the tablespace on what it perceives as a power failure.
  4. Once the tablespace is back to usable form, the passive server associates the floating ERP IP with its internal NIC and begins to service user requests.
When an active-active cluster has a database node fail, several of the above steps aren't required. Because the database application is already running in memory on multiple nodes and the database application has access to the disk, users won't perceive any downtime. There's no requirement to restart the app in memory, or to activate the disk to the operating system and then to the database app. While this certainly increases application uptime during a failover, distance failover isn't possible and there are some restrictions on performing upgrades.

Business function vs. ERP app
In the 1990s, I was part of a team that implemented a clustered SAP environment using the latest techniques available at the time. We were confident that if we sustained a server failure, we'd be able to keep the SAP application, along with the central instance, up and functional. Users might see a pause of the application, but it would be up and running in minutes.

Several months into our implementation we sustained a server failure, and our SAP instance moved as planned from the production server to the failover server. Before the IT team could celebrate its success, however, the business unit reported that the company wasn't able to accept Electronic Data Interchange (EDI) orders. The EDI application wasn't communicating with the active SAP application--a significant problem because 85% of the company's orders were received electronically.

Though we had carefully protected the SAP application, the SAP central instance and the underlying Oracle database, we failed to protect the business function of taking and processing an order. Most business functions rely on several applications that must also be protected to increase the availability of what's important to the business.

For an application to provide data to users, three elements must work in harmony: the user request must traverse the network to the correct subnet; an application must be running at the IP address that answers the request; and the application must have access to the underlying data. Clustering software controls these aspects of responding to a user request.

In the case of many ERP applications, "load balancing" is built into the application architecture, which gives the ERP application additional scalability. ERP applications, like Oracle and SAP, are architected to support multiple "application servers." The application servers can respond to a user request, but because there are multiple servers at this layer and they don't have access to the data, these servers aren't single points of failure and therefore aren't clustered.

Each application server must communicate with a database server. The database server and underlying storage are considered single points of failure. Because clustering is all about availability, ERP clustering activities are focused on the database server (see "How long until the ERP app is up?"). The goal is to eliminate the single points of failure of the database server and its underlying storage.

This was first published in February 2007

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: