Published: 08 Dec 2004
K-Box Kicks Replication Up a Notch
Often, the inclination is to look to the major players when researching replication products, but there are some startup firms offering solid alternatives. One such vendor is Kashya Inc., San Jose, CA. Aramaic in origin, the word Kashya literally means "a difficult, puzzling problem," which is a bit of a paradox because the company's product solves much of the mystery behind storage area networks (SANs) and data replication.
Kashya's KBX5000, an IP-based replication appliance, is a cost-effective alternative to array- and some host-based solutions. It's a second-generation product that comprises software delivered on commodity hardware in the form of IBM xSeries 335 blades. A typical configuration would include a KBX5000 at each site (primary and remote), but Kashya also supports pairs of clustered KBX5000s to yield a total of four KBX5000s between the two sites (see Clustered KBX5000 replication configuration).
Up and running
A standard KBX5000 configuration includes a single QLogic Fibre Channel (FC) host bus adapter (HBA) to connect to the SAN fabric, as well as a Gigabit Ethernet NIC to replicate configured volumes over an IP WAN pipe to a second KBX5000.
Kashya, or one of its professional services partners, handles the KBX5000 installation. The boxes are delivered with the application software installed; the installers do node discovery, and configure the source and target sites, as well as the consistency groups.
An out-of-band, heterogeneous app, the KBX5000 captures data destined for the managed volume by subscribing to the same multicast group as the volume. This is done by registering with the Alias Server at the FFFFF8 address in the fabric. When writes are sent to a managed volume in the multicast group, the same blocks are sent to the local KBX5000 buffer, shipped across the IP WAN to the receiving KBX5000's IP port and written to an FC-attached volume at the remote site.
As an update to the KBX4000, the KBX5000 now certifies support of active-active Window Clusters across hundreds of miles, as well as direct- attached storage (DAS). Last month, the KBX5000 also received a Solaris-ready certification from Sun.
The heterogeneous nature of the KBX5000, and its non-disruptive implementation, gives users options when deploying remote disk targets. For example, you could have an EMC Clariion with SATA drives acting as a target for a Symmetrix supporting local writes, but could reap greater savings by using a less-expensive box as the target, assuming it adheres to applicable standards.
As long as the QLogic HBA firmware is compatible with your fabric, and both support multicast groups, the KBX5000 should discover every node on the SAN by communicating with the management server in the fabric. The Gig-E port is a standard NIC interface used to deliver enveloped FC frames in IP packets.
The quality of the IP WAN pipe between the KBX5000s is of paramount importance. Performance will degrade on a "noisy" line with faulty connections, and there is increased exposure to data loss during synchronous replication due to write acknowledgements sent when the data block reaches the buffer of the local KBX5000. The WAN pipe should be a quality connection with scalable performance characteristics.
|The KBX5000 management console:|
The Kashya Management Console is clean and uncluttered, and offers a configuration view, KBX5000 status monitor and volume details.
The KBX5000 interface
The KBX5000's management graphical user interface (GUI) is intuitive and, provides a visual representation of your storage infrastructure (see "The KBX5000 management console" on this page). On the left side (New York) of the pane are hosts, switches and storage with clustered KBX5000s (K-Boxes) connected to the SAN fabric and IP WAN pipe. The same configuration, minus the application hosts, is represented on the right side (Boston) of the pane.
Once the KBX5000 on the left discovers the nodes on the SAN, it's easy to include the nodes in a consistency group and replicate volume data to the remote SAN according to service-level policies. You can change the direction of the replication of discovered volumes to allow for disaster recovery support.
Consistency groups are the heart of Kashya's arrangement, and are logical representations of the hosts, storage and applications sharing common business policies as they relate to the level of service volumes experienced while replicating.
By marshalling SAN nodes and applications into consistency groups, a storage administrator can apply replication quality-of-service (QoS) policies to each application.
Bandwidth consumption is potentially the largest line item of any replication technology. Kashya tries to control this cost by implementing a patent-pending 7:1 compression algorithm (depending on the application), as well as features found in other replication technologies. For example, when the KBX5000 recognizes that the same block of data is updated within the same snapshot window, only the last version of the changed block is sent across the WAN pipe.
Furthermore, large block sizes (64KB) typically associated with larger database objects or audio/video files can be subdivided by the KBX5000, with only the delta sent to the remote replicated volume, again saving on bandwidth. Additional compression is possible for applications such as Oracle, SQL Server and DB2 because Kashya has tailored its compression algorithms to match the output data characteristics of those applications.
Big buffers, superior compression algorithms and delta differentials are no longer a luxury in long-distance data replication. More and more users expect these benefits to be included as part of an overall replication configuration.
Selecting a replication policy is not a matter of choosing synchronous or asynchronous delivery, as in most solutions. In the Kashya scheme, policies manage the bandwidth of the IP WAN pipe by specifying minimal and maximum lag times, or even a minimum bandwidth to provide QoS functionality on the outgoing KBX5000's IP port.
The KBX5000 will toggle between synchronous and asynchronous delivery modes as needed to keep up with demanding applications or even to provide a choke point for non-critical applications consuming too much bandwidth. This is a smarter, more flexible approach because very few applications experience the same traffic behavior all the time. There are times when an application will benefit more from synchronous delivery, and times when asynchronous is ideal. As long as there's engineering to hide the switching between delivery modes from the user and to ensure that synchronous writes experience the same perceived amount of service as asynchronous writes, then all is well. Otherwise, an application may respond slowly when the KBX5000 switches from asynchronous to synchronous delivery, making the user wait for the remote commit.
Kashya solves this problem by using a large buffer in the KBX5000. When an application server performs a synchronous write to a managed volume, it does so by sending a second copy of the write to the KBX5000's buffer. At that time, the write returns to the application, the user continues to work and the KBX5000 is responsible for guaranteeing the write of the data at the other end.
This implies that the IP WAN pipe required is of high quality and resilient, which also implies increased cost. When the application is told that the remote synchronous write has been completed, the highest probability of successful completion is required to ensure the integrity of the replicated data.
Kashya's implementation is more cost efficient than most array-to-array replication technologies, but its implementation of synchronous writes indirectly increases the cost of the IP network portion of the configuration, as well as the chance that data will be lost if the buffer isn't highly available.
|Clustered KBX5000 replication configuration:|
This is a typical configuration for a clustered implementation of Kashya's KBX5000 replication appliances. These out-of-band appliances can dynamically switch between synchronous and asynchronous replication to respond to application demands.
KBX5000 vs. other techniques
When it comes to IP storage replication, there's no shortage of products. There are array-to-array applications such as EMC's SRDF and HDS' TrueCopy, and host-based products like Veritas Volume Replicator (VVR). An array-based solution suggests like equipment at both sites. Additionally, implementation of synchronous writes when replicating across great distances can affect application performance. While VVR offers a more cost-effective approach than array-based replication, the amount of resources "stolen" from other production processes running on the host will constantly need to be budgeted and revised according to the data growth and availability of the server. The KBX5000, on the other hand, scales independently of the amount of storage per server and doesn't require installing agents on the managed hosts in the consistency group.
And because the KBX5000 doesn't "touch" the array or install host-based agents in the local SAN, license management is limited to the KBX5000 itself, with costs dependent on the amount of storage being replicated. Array- and host-based replication products are typically priced on a per-unit basis in addition to the amount of data replicated.
It should be noted that a Kashya driver needs to be installed on every host whose volumes are replicated. The driver intercepts the write and then sends one copy to the KBX5000 for replication and the other to the source volume.
Better price, performance
All of this adds up to the KBX5000 offering better price performance than array-based, long-distance replication, while being competitive with the more resource-intensive, host-based products.
For example, to replicate 10TB of data using four KBX5000s with the unlimited snapshot and bandwidth-reduction options and a three-year software maintenance agreement (WAN and SAN infrastructure costs are not included), the cost breakdown is as follows:
- Four KBX5000 appliances: $25,500
- Two base licenses (for up to 2TB): $28,800
- Two unlimited snapshots option: $24,000
- Bandwidth-reduction option: $12,000
- Three-year software maintenance: $26,730
- Implementation services: $2,500
The KBX5000 is a non-disruptive, out-of-band replication application that, by dint of its large buffers, doesn't impede the performance of synchronous writes. This promotes application response time and user productivity, but there's still a chance that resynching might be required and a small potential for data loss if the buffer is not supported by battery backup.
By allowing the appliance to profile WAN traffic at the application level and to dynamically switch between synchronous and asynchronous replication policies, Kashya lets applications share the IP WAN pipe without one of them squeezing out another whose traffic is more "bursty" in nature.
But perhaps the biggest benefit the KBX5000 provides is its support of heterogeneous hosts, switches and storage. Because KBX5000 integration happens in the SAN fabric, any nodes that have successfully performed fabric login can be discovered and managed in a consistency group. This is almost always a superior approach to array- and host-based solutions.