A CIO's guide to cloud computing investments
A comprehensive collection of articles, videos and more, hand-picked by our editors
If you spend any time in or even near a data center, you've undoubtedly heard the word "cloud" at least five times...
already today. When talking about storage infrastructure, however, hardware and software vendors have largely sidestepped the issue of cloud storage services. But with many companies looking to cloud services to help ease their infrastructure burden, hardware and software vendors can no longer afford to ignore cloud service offerings.
Storage vendors, both hardware and software, now offer on-premises products that enable access to data stored externally. In this emerging market, products range from treating cloud storage as another production storage tier to enabling long-term archival. Regardless of the product, the goal is to offer hybrid cloud benefits that enable an on-premises experience with data stored in the cloud.
Gateway to the cloud
Most organizations are unlikely to move 100% of their data into cloud storage services. However, the relatively low cost of cloud services has encouraged many organizations to test the waters, so to speak, of accessing data externally. The most common implementation of this strategy is a hybrid model that uses a cloud gateway. Cloud gateways are simply software or hardware appliances that aim to treat cloud data as on-premises data. These types of products come in three forms: gateway appliances, array-based tiering software and combinations that work both ways.
A cloud gateway appliance serves to accelerate access to data in the cloud. It looks very much like a traditional storage array, except it will have limited local storage capacity. But unlike a storage array, data is ultimately written to a cloud storage provider rather than maintaining the data on local storage.
The appliance uses integrated caching software to keep a local copy of the data most likely to be needed and readily accessible; older data is sent to the cloud service. When data needs to be recalled from the cloud, the appliance leverages the speed of its on-premises flash storage to mask the network latency that slows the retrieval of the data stored in the cloud.
Initial offerings in the space, like Nasuni and Panzura, were limited to supporting NAS protocols, but that is changing. EMC, with its acquisition of TwinStrata, and Microsoft, with its purchase of StorSimple, have extended gateway appliances to block storage. These hybrid cloud benefits can be excellent for a medium-sized organization as they won't have to maintain as much on-site storage. But the appliances do have scalability limitations, so as the working set of data grows, the amount of cache in the appliance will also need to grow.
Most traditional storage arrays include tiering software to move data between classes of storage media within the device. Extending the array's ability to tier data to a cloud storage service is a natural adaptation of existing array capabilities. These hybrid cloud benefits appeal to organizations as they provide a simple and effective way to take advantage of low-cost cloud storage without having to add devices or alter processes significantly.
But this type of cloud tiering design comes with one major challenge: network latency. Applications, and the people who use them, are impatient. The typical answer to latency issues is to ensure that only the data that's unlikely to be accessed will be shipped to the cloud storage service. The existing software stack in a mature storage array helps handle this problem.
However, many vendor implementations still require a cloud gateway appliance rather than relying on the array software alone. For example, to create this hybrid cloud environment, NetApp filers require NetApp's AltaVault cloud gateway to handle the cloud integraton. Other vendors, such as SolidFire, provide the ability to send snapshots directly to cloud storage.
Having to add an appliance to an array incurs additional costs and can complicate configuration and management of a system. But that situation appears to be changing. EMC, for example, will include software-only virtual versions of its CloudArray cloud gateway with its new VCE VxRail hyper-converged systems.
Appliance vs. software distinction blurs
The line between what is a cloud gateway appliance and what is a storage array with cloud-enabled data tiering is far from distinct. A cloud gateway appliance and storage array are really only different in their design and capacity limitations. A storage array is designed from the ground up to store data locally, whereas cloud gateways are designed to mimic the functions of a storage array, and take advantage of cloud storage. Companies like ClearSky Data are blurring this line even further with a storage array designed to cache data locally, in a private regional location and in the cloud. As architectures mature and hybrid cloud benefits become more apparent, it is likely the line between storage arrays and cloud gateways will eventually vanish.
Infinite cloud storage can cause data sprawl
One of the greatest benefits of cloud storage is that it's so easily consumed on an as-needed basis. However, that same ease of expansion can lead to data sprawl. With an endless supply of storage and no way to manage the ensuing sprawl, costs can mount quickly. When evaluating cloud storage service providers, look for those that provide management features to help cope with data sprawl. Several new companies, such as Komprise, have begun to spring up to offer help with data sprawl by using data analytics and automation to manage the massive scale of data growth.
Hybrid cloud eases test and dev
Many IT organizations struggle with the new dynamic nature of application testing and development in a DevOps environment. Development and quality assurance staff need to create temporary but complete clones of production environments, which creates a big problem in the storage world, where copying data can be slow and problematic.
Storage vendors have leveraged snapshots and data deduplication to mitigate the capacity costs, but new cloud integrations are emerging in this space. Using snapshotting technology, some storage vendors can create clones of production data in the cloud for use by development teams. With the rise of DevOps and Agile development, relying solely on on-premises infrastructure can hamper development processes.
Hybrid cloud-enabled data storage can help bridge that gap between infrastructure and data agility. Products such as Amazon's AWS Storage Gateway appliance provide software that allows you to snapshot data and copy it to an Amazon EBS cloud storage volume. The development environment can then run on Amazon EC2 for as long as needed. You should, however, be aware of your company's security policies, as some organizations restrict the types of data that may be cloned between production and cloud-based test environments.
Snared by the net
Another challenge to integrating cloud and on-premises storage is network access. In data storage environments, latency is a key performance indicator and a significant factor in accessing data over the Internet. To secure connectivity over the Internet, most companies use virtual private network (VPN) software, but the Internet connection still may not meet the performance requirements for data access.
Cloud providers such as Amazon address this problem by allowing a company to establish dedicated network connections between the organization's data center and the cloud provider's resources. Setting up these connections is relatively easy, but it does require a commitment to a particular cloud provider and added expense.
Safe data in flight and at rest
Security is paramount to every organization, and particularly important when connecting on-premises storage to a cloud service. Most cloud providers offer data encryption at rest, and most of the services previously mentioned provide encryption in flight. However, not all cloud security encryption is the same. It's important to understand who will manage the encryption keys. If the cloud service manages them, technically, they'll have the ability to unencrypt your data, which may be a regulatory and compliance issue for some organizations.
Keep cloud data available and secure
Considering security, we tend to think of encryption first, but equally important is the ability to integrate with existing authentication products, especially when user-accessible data is involved. Another key factor is the acceptable access time to reach and use the cloud-stored data. Ultra-low cloud cost storage, like Amazon Glacier, is designed for data that is accessed so infrequently that how long it takes doesn't matter. Using a local cache of data helps offset data access times, but the trick is ensuring that the required data is actually in the cache. Knowing the value of a piece of data will help you understand where data should reside.
Data that never dies
Many companies are just beginning to realize just how much value they can extract from the massive amounts of data they produce. Big data analysis has highlighted the rediscovered value of data long past its traditional life, which means that traditional data retention policies may need to be adjusted.
"The benefits of hybrid cloud deployments can have a significant impact on making data more readily available to big data analysis software such as Hadoop. With direct integration into the application software, data can be made truly portable. Hortonworks has embraced the concept of hybrid Hadoop deployments by partnering with Microsoft to allow Hadoop workloads to burst to Azure's storage cloud. Other cloud providers, such as Google, offer software that allows the Hadoop Distributed File System (HDFS) to run directly on its cloud storage.
Cloud storage becoming a clear alternative
Storage hardware and software vendors are finally embracing cloud services as a valid way to store data and augment their systems. Most major hardware array vendors have the ability to treat cloud storage like another tier of data, but most still require additional hardware to act as the middleman that may be complex to set up.
The value of having an infinite pool of storage for test/dev, big data and many other application workloads is too great to be ignored. Despite their potential complexity, many of the existing integration products offer rich features and tremendous value. It's clear that as hybrid cloud benefits become more readily apparent, user demand for hybrid deployments will grow, and that the future of on-premises storage systems will include some form of built-in cloud access.
Three lesser-known benefits of hybrid cloud
Get more from hybrid cloud storage
Three ways to address hybrid cloud storage technology