- George Crump, Storage Switzerland
Public and private clouds are no longer an either-or option for most organizations. Instead, hybrid clouds are considered the best cloud practice today. Businesses also want flexibility in selecting which public cloud is used and the ability to move between clouds. Connecting private and public cloud remains challenging, however.
Latency, bandwidth and in-cloud performance all affect what data is placed in the cloud and how it is accessed. So instead of trying to find the perfect cloud application that can meet all of an organization's needs, IT professionals should look for solutions for specific cloud storage use cases.
Read on to find out the most effective strategies for connecting on-site infrastructure and private clouds with one or more public clouds under the most common cloud storage use cases. Those cases include cloud bursting, cloud as primary compute and storage, cloud as a backup and disaster recovery target, and cloud as a data archive.
Most organizations build their data centers, both in terms of storage and computing, for the worst-case scenario when peak demands are made on their resources. In between these high-demand peaks, a majority of resources go unused. When enough workloads are added or current ones scale close to the limits of a data center's capabilities, organizations typically budget for additional investment in these resources. The goal of cloud bursting is to break this costly cycle of continuously staying ahead of the demand curve.
With a solid cloud bursting strategy, organizations can design their data centers for the norm instead of the peak. When demand goes beyond current data center resources, they can start certain applications or workloads in the cloud.
The quality of the connection for cloud bursting use cases is largely dependent on the amount of pre-planning and how much notification an enterprise has prior to needing to push workloads into the cloud. With the right amount of pre-planning, a relatively standard, business-class internet connection is enough. Pre-planning requires replicating data into the cloud prior to the peak. The replication needs to be continuous so that the cloud copy is no more than a few minutes out of sync with the on-premises copy. The pre-seeding approach also has value for disaster recovery, since critical applications are pre-positioned in the cloud. The downside of pre-seeding the cloud with potential burst candidates is that cloud storage resources are continually consumed, which adds to the cost.
If an organization wants to move workloads to the cloud more dynamically, they will have to invest in a much faster connection. A more dynamic approach does not consume additional cloud resources and better enables picking and choosing workloads to move to the cloud at the moment peak demand occurs. Also, there are applications available that can more optimally move data than typical file transfer utilities.
The cloud as primary storage
Potentially, the most interesting of the cloud storage use cases -- but also the most challenging -- is using the cloud as primary storage or primary compute. Using the cloud as primary storage requires resolving any latency issues. Unlike cloud backup and recovery -- where the connection concern is mostly a bandwidth issue -- primary storage is typically more transactional, making latency the primary concern.
A main use for the cloud as primary storage is network-attached storage. Vendors in this space focus on creating a cloud-hosted file system that can automatically ensure a copy of the most active data is on an on-premises appliance or edge device. If an on-premises user modifies or changes that data, the cloud copy is updated. If the user accesses a file that is not on the local edge device, it is then retrieved from the cloud. In most cases, unless the file is large, the time to retrieve it is barely noticeable.
Enterprises can easily put these edge appliances in all their data centers and remote offices since all storage is effectively in one place: the cloud. A few vendors in this space have also added global file-locking capabilities; if a file is in use in one location, users in all locations see a read-only notification when accessing the same file.
Some of these systems also support multi-cloud use. When a volume is created, the administrator can connect it to a specific cloud account. Moving data between providers requires moving a copy from one volume to another, which means all data routes back through the on-premises appliance.
Primary block storage cloud instances are more challenging than file storage. First, applications are not as patient as users waiting on files. Applications will time out and even crash if data isn't accessed quickly enough. In the past, the only way to ensure application stability was to make the on-premises appliance large enough that the chances of data not being on it were very small. The problem is that this approach doesn't save money.
There are two methods to resolve this issue. First, many cloud providers now have direct-connect options where a standard enterprise storage system is directly connected to cloud computing resources. Vendors will work with hosting providers located within close proximity of the public cloud provider so a high-speed connection is possible. This type of scenario means an organization will use the cloud primarily for its computing resources and a more traditional storage system for storing data. It can also use backup applications to back up this traditional storage system, and store those backups in cloud storage. Again, because the connection is so fast and so close, these backups are done relatively quickly.
Multi-cloud facilitates data transfers
There are several managed hosting facilities that have direct access to multiple public cloud providers. These facilities are physically and geographically so close to the public cloud providers' data centers that the cloud providers' computing resources can access storage with latency similar to storage within the providers' data centers. Since the data is "stationary," no migration efforts are required. If the organization wants to use another provider to access more powerful or cheaper compute functionality, then IT can easily move between providers.
Another option is to tier data to a secondary tier prior to storing data in the cloud. With these multi-tier offerings, organizations can implement relatively small flash-based caches on premises for active data, which is tiered to a geographically close secondary provider to store the warm data. Once the data is very cold, is it stored exclusively in the cloud. The result is an on-premises cache equal in size to the capacity of daily active data, with warm data only a few milliseconds away so as not to interfere with application execution. All data is replicated to the cloud as it is created or modified, but that replication is asynchronous, so it doesn't affect production performance. This cloud copy acts as a disaster recovery copy. It also means as data ages out of the first and second tiers, it doesn't actually need to be copied. It is simply deleted from those tiers since it is already on the cloud tier.
The multi-tier primary cloud storage strategy typically supports multiple clouds, but since data is eventually stored in a single cloud as a central repository, movement between providers is the same as any other migration effort. The on-premises appliance strategy can point to multiple clouds, but more than likely all data will need to be migrated -- by contrast -- back on premises before being sent to the new provider.
Cloud backup and recovery
The most popular, and often the initial, use for connecting on-premises infrastructure and public clouds is data backup and recovery. Thanks to technologies like compression, deduplication and block-level incremental backups, the connection between an on-premises backup storage system and public cloud storage doesn't need to be particularly high speed. A basic business-class connection typically suffices.
In terms of on-premises backup storage, each vendor treats it differently. Legacy backup vendors often view on-premises storage as the primary backup copy and the cloud copy as for disaster only. The cloud is viewed as a replacement for tape. Other more modern backup software offerings view public cloud storage as a more tangible asset. An on-premises device serves as a cache or tier, and older backups are moved to the public cloud tier automatically based on access time. The advantage of the cache-tier method is that the on-premises investment is relatively small and seldom needs to be upgraded.
While compression, deduplication and block-level incremental backup have lowered the bandwidth required by the backup process, backup vendors only recently addressed restores by taking advantage of disaster recovery as a service. DRaaS enables the recovery of applications as a cloud virtual machine, temporarily eliminating the concern of connection speed back to the on-premises data center. In the event of a disaster, all data movement is within the cloud data center and does not require an internet connection. Depending on how the software uses cloud resources, applications can return to operation within four hours of a disaster being declared.
When IT decides to move the application back on premises, internet bandwidth will be a concern, unless the cloud provider has the ability to ship data in bulk. While many DRaaS tools can replicate data back on premises in the background, doing so with a low-bandwidth connection will take days, if not weeks. Unfortunately, deduplication and block-level incremental backup technology won't help with recovery speed.
What do you do with backup data once it's in the cloud?
In many cases, an organization may want to have the data it has backed up into the cloud just sit there, only to be used when a recovery request comes in. In other cases, it may want to do more with the data by using cloud computing resources. For example, because DRaaS uses cloud computing resources instead of just cloud storage, an organization may want to use the cloud copy of its data for testing, development or running reports and analytics. The challenge is that most backup applications store data in a proprietary format that can't be directly read by cloud computing resources. This means IT needs to first restore the data so it is in a native format. If the time to perform that restore is prohibitive, look for a backup application that stores data in a native format.
Archive old data
A cloud archive may actually be the best use case as it doesn't typically require any changes to network bandwidth and provides a significant ROI. Archive products analyze on-premises production storage for data that has not been accessed in a user-defined period of time, typically more than one year. Those files are then moved to a secondary storage device that is less expensive on a per-terabyte basis.
The problem with traditional archive offerings is that they require a significant upfront investment in a secondary storage system, typically 50 TB or more. Most organizations don't have 50 TB of capacity to archive on the first day. And even if they do, they won't want to. Cloud archiving solves this problem by gradually archiving data, on a per-gigabyte basis as necessary.
The gradual approach enables the use of a more modest bandwidth connection. Archives are, by definition, rarely accessed, so this scenario does not have the same bandwidth-retrieval concerns as other cloud storage use cases.
One area of concern is metadata. Most storage I/O is metadata-related. For example, if a user wants to perform a listing of a cloud-archived directory, then all that metadata needs to be sent over the broadband connection, which takes time. To get around the metadata problem, some vendors store metadata locally and in the cloud so queries access the local metadata copy and the user experiences an instant response.
Most cloud archive offerings can send data to multiple clouds, and some can even support multiple clouds simultaneously. The challenge in switching providers is the cost and time required to move data from one cloud location to another, which -- especially in the case of archives -- can mean moving a large amount of information. The same problem appears if an organization wants to move an archive out of the cloud and store its old data on premises because the rental cost of the cloud is more expensive than the "buy" price of an on-premises object store. Moving previously archived cloud data back on premises is time-consuming and expensive due to network and egress fees.
Connecting your private cloud to the public cloud is easier than ever. There are a variety of cloud storage use cases where on-premises and public cloud storage work well together. There are others where the public cloud may be a suitable replacement. Organizations need to develop a plan and gradually execute it, possibly in stages. It makes sense to focus on specific use cases and move to others as success and comfort with the cloud take hold.
- How should our team manage data stored in the cloud? –SearchCloudComputing.com
- Making multi-cloud work –ComputerWeekly.com
- Computer Weekly - 5 November 2019: The benefits of API-first software ... –ComputerWeekly.com
- How to get the best value from Office 365 –ComputerWeekly.com
Dig Deeper on Cloud storage
Achieve multi-cloud data protection with archiving, backup and DR
Get to know storage-as-a-service providers and their offerings
Where integrated hybrid cloud storage makes sense in the enterprise
Cloud and backup top storage priorities for 2019