adimas - Fotolia
Flash-based storage systems promise to eliminate many of the performance challenges that data centers face today. This allows for the design of denser server and virtual desktop infrastructures and more scalable database environments. These denser, more scalable environments produce a faster and longer-lasting return on investment, but storage efficiency techniques must be used to offset the extra cost of flash-based systems.
One of the least talked about components of flash efficiency is replication. These days, replication is thought of as the ability to synchronously or asynchronously replicate data to a remote disaster recovery site. It can also be used to replicate to a second system locally (as it has been used historically for high availability of applications). This creates a multi-tiered recovery approach that can protect against storage system failure as well as site failure.
As with any other storage system, replication should be a requirement for flash-based storage systems. By definition, flash systems host the most active data set in the environment. Consequently, ensuring that the data on them is actively protected and available should be a priority. Once-a-day backups become out-of-date and useless too quickly for these systems, and recoveries that require the copying of data back to a primary system require too much downtime. Both methods are too inefficient for the data sets that require flash storage.
Interestingly, replication is the last feature that start-up vendors add to their new flash storage systems. In fact, many vendors have not yet delivered the functionality. As a result, the storage administrator is left to go to third parties to make sure that this critical feature is provided. The problem is these third-party products create an inefficient replication process that does not leverage the storage efficiency features like deduplication and compression. It is the lack of these features that can make third-party replication software a bad fit for all-flash systems.
Replicate locally to eliminate backups
Flash-based systems, unlike hard drive-only systems, have the performance attributes to hold many snapshots for an extended period of time. Flash-based or assisted snapshots can, for all intents and purposes, replace much of what we count on the legacy backup process for: recovery of recently deleted or corrupted data. Leveraging snapshots for this purpose could mean a dramatic shift in the backup process. The classic backup copy routine would only be needed for a once-a-month or quarterly data archive for retention purposes. Snapshots would fulfill the more immediate needs for recovery.
The one weak spot in this strategy is the storage system itself. If the storage system fails, then all the "backups" stored in snapshots are also lost. Given the reliability of today's storage systems, the chances of a storage system failure for an extended period of time are low. However, the consequences of complete data loss are severe enough that IT professionals will want to take extra measures to protect themselves.
In the past, that involved making a separate copy to tape or backup disk. But what if you simply replicated that data in near-real time to another storage array in the data center before you replicated it to the DR site? This approach would allow for completely protected snapshots retained for a significant period of time. The strategy could almost eliminate traditional backup to disk and tape could be leveraged for the long-term archiving of data.
Replication for disaster recovery
Today, the more common use case for a storage system replication feature is disaster recovery. In most cases, this means the near-real time asynchronous copying of data to a disaster recovery site. This type of technology has been available in hard drive-based storage systems for over a decade but flash changes the game and new considerations need to be taken into account.
DR site performance
The first consideration, and one that is often overlooked, is the performance and efficiency of the target storage system located in the DR site. While third-party software replication tools provide the ability to replicate data from any array to any array, there is some risk involved in this flexibility. As flash systems become the norm, infrastructures will become denser and highly scaled. In other words, the infrastructures and even the applications themselves will come to count on flash performance. Returning to hard drive performance may not only be unacceptable, the application may not be able to run effectively without flash.
If the DR site is going to be used for a place to conduct business during a disaster, then the DR storage systems need to have some flash complement. Maybe not an all-flash array but certainly a hybrid array; otherwise the performance of the applications while running from the DR site may be so bad that the users consider them "down."
WAN bandwidth utilization
Once the needs of the target storage system in the DR site are understood, WAN bandwidth efficiency needs to be understood. The problem with using independent software replication vendors with all-flash systems is that they can't leverage the data efficiency capabilities on the flash storage system. They have to recreate their own data efficiency techniques to optimize WAN traffic. This is typically done by identifying changed blocks, compressing them, and then transferring them over the wire. Few software-based replication tools have the ability to do deduplication.
But, deduplication can play an important role when used in conjunction with block-level copies. Once a new block of data has been created or an old block has been modified, replication software without deduplication capabilities transfers that data immediately. If a third-party replication software product had deduplication capabilities, it could first check to see if the data has been sent to the DR site from either the same server or another one. While there is additional latency involved with checking the data stored on the other side of the WAN connection, if these extra checks could eliminate five times the amount of transfers, WAN replication would be significantly more efficient.
Because most flash systems have efficiency techniques already running on the primary storage system, flash systems can leverage them as it transmits data across the WAN to the DR site (or to the second system on site described above). The engine that performs the compression and deduplication could also send the data to the remote site. There would be no need to do a "dedupe double-check" with the remote side because only unique data would qualify for replication. The result would be an entirely integrated process adding almost no extra latency to get data to the DR site.
A third-party software application could be developed to perform deduplication and compression as well as replication. This would mean turning off the deduplication and compression provided by the vendor (some vendors have always-on optimization) or selecting a flash vendor without these capabilities and adding the software. Doing so would also provide the user with greater flexibility in hardware selection. We expect software-defined storage products to eventually fulfill this capability.
For some reason, replication is a final checkmark for many storage system suppliers when it should be one of the first. If they don't have the capability available, the tendency is to push customers to third-party replication options to fulfill the local site copy and DR need. The problem with this approach is that these software applications can't leverage the data efficiency techniques like deduplication and compression that are already built into the array. Integrating that technology into the storage system is an important next step for flash array vendors and one that customers should be demanding. Or, they should begin to research software-defined storage products that can provide data efficiency and replication.
Overcoming flash-based storage limitations
The pros and cons of flash-based storage in servers
All-flash storage systems: Types and use cases