One of the major issues facing today's corporate data centers is growth -- the sheer volume of data is growing at a rate anywhere from 60% to 100% each year. Moving such substantial volumes of data across a wide area network (WAN) can easily become impossible. It takes longer to move that data from one point to another, and the cost to increase available bandwidth can be prohibitive. Companies are turning to WAN optimization and WAN acceleration technologies that improve bandwidth utilization by reducing the effective amount of data being transferred and alleviating the inefficiencies inherent in WANs. Some of the most common data reduction tactics are examined below.
Reducing redundant data
A surprising portion of corporate data is redundant; this includes redundant files, blocks and even bytes. As a simple example, a typical backup data set may contain numerous copies of the same PowerPoint presentation or .pdf documents. By eliminating redundant data, particularly at the block or byte level, WAN bandwidth utilization can be dramatically decreased. This is generally referred to as data deduplication (aka intelligent compression). WAN optimization appliances can eliminate redundant data from all TCP traffic and improve data reduction even further for performance-sensitive data, like file sharing, email, CAD/PDM, ERP, databases, VoIP, video or Cirtix. Some WAN optimization vendors claim to reduce WAN bandwidth requirements as much as 60% to 95%.
Cache is an essential part of WAN optimization because data must be stored on both sides of the WAN link. There are numerous reasons for this. Caching reduces the burden on data center storage allowing the appliances at both ends of the WAN link to compare their data and find changes or differences. Having content cached at both ends of the WAN link also helps to guard against inevitable WAN disruptions, and a remote user can sometimes continue to work on a locally cached copy of data that can be resynchronized later when the WAN is restored. Enterprise-class WAN optimization appliances may provide over 1 terabyte (TB) of local disk storage capacity.
You can see how cache works in data reduction. In a storage system, data deduplication only stores one copy of the data on disk. The process is similar on WAN appliances that read and index all WAN traffic, caching the unique pieces of data to disk on the appliance at each end of a WAN link. Pieces of data that have been seen before and are already indexed are not transferred across the WAN -- only a very small reference to the indexed data is sent. One reference can refer to potentially huge amounts of data that have already been transferred over the WAN.
TCP-based networks generally exchange data in small packets. This requires many more packets to transfer data between points, and each packet may require acknowledgement to ensure successful delivery. This packet overhead -- combined with the latency incurred from hubs, switches, routers and other network hardware across the WAN -- can seriously impair application performance. Some forms of WAN acceleration dramatically increase packet sizes. Fewer larger packets can be handled far more efficiently, accelerating the network traffic. Some vendors claim to reduce packet requirements by 65% to over 90%.
Certain application types, like CIFS, MAPI, NFS, HTTP/S and MS-SQL, can also be "chatty," requiring dozens (even hundreds) of handshakes to perform basic operations, like loading a file. Again, WAN latency multiplies the impact of these handshakes and impairs apparent performance. So, another way to accelerate WAN performance is to reduce application handshaking. Some WAN acceleration appliances claim to reduce the handshaking of key application types by over 90%. This in turn speeds up the application across a WAN.
Compression and changes
WAN utilization can also be reduced outside of an appliance, often using backup or replication software with compression or delta differencing features. Compression performs a mathematical algorithm on data, replacing redundant data with small tokens. The tokens are converted back into original data by reversing the compression algorithm during a read operation. Conventional compression typically achieves a modest data reduction of 2 to 1 -- effectively cutting your data volume in half. However, not all data compresses the same. For example, text data can achieve high levels of compression, while audio and image data will compress little, if at all.
Another means of data reduction is to only move new or different data across the WAN -- a technique known as delta differencing. In this scenario, a baseline backup or data set is established and any changes are tracked, then only the changes are periodically transferred across the WAN. For example, rather than making a complete 500 GB remote backup each night, users might only transfer the 30 GB of changes that occurred. Differencing requires far less time to accomplish, significantly reducing the backup window, or can work across a slower and less costly WAN link.