boscorelli - Fotolia
Small World Big Data
Published: 02 Jun 2015
Organizations aren't likely to move 100% of their data into cloud services, but most will want to take advantage...
of cloud storage benefits for at least some data. The best approaches to using cloud storage in a hybrid fashion create a seamless integration between on-premises storage resources and the cloud. The cloud tiering integration can be accomplished with purpose-built software, cloud-enabled applications or the capabilities built into storage systems or cloud gateway products.
Why chase clouds?
This may be the year that public cloud adoption finally moves beyond development projects and Web 2.0 companies and enters squarely into the mainstream of IT. Cloud service providers can offer tremendous advantages in terms of elasticity, agility, scalable capacity and utility pricing. Of course, there remain some unavoidable concerns about security, competitiveness, long-term costs and performance. Also, not all applications or workloads are cloud-ready and most organizations are not able to operate fully in a public cloud. However, these concerns lead to what we are seeing in practice as a hybrid cloud approach, attempting to combine the best of both worlds.
Taneja Group research supports that view, determining that only about 10% of enterprise IT organizations are even considering moving wholesale into public clouds. The vast majority of IT shops continue to envision future architectures with cloud and on-premises infrastructure augmented by hyperconverged products, at least within the next 3-5 years. Yet, in those same shops, increasing storage consolidation, virtualization and building out cloud services are the top IT initiatives planned out for the next 18 months. These initiatives lean toward using available public cloud capabilities where it makes sense -- supporting Web apps and mobile users, collaboration and sharing, deep archives, off-site backups, DRaaS and even, in some cases, as a primary storage tier.
The amount of data that many IT shops will have to store, manage, protect and help process, by many estimates, is predicted to double every year for the foreseeable future. Given very real limits on data centers, staffing and budget, it will become increasingly harder to deal with this data growth completely in-house.
Using cloud storage
For many compelling reasons (See: "Why cloud storage?"), many organizations will adopt cloud storage as their data storage needs increase. This adoption will not necessarily be a wholesale migration into the cloud, but will likely take a hybrid form of storage architecture. The optimal balance for most organizations (at least for the next couple of years) will include some storage infrastructure on-premises while integrating with the cloud tier where it makes most sense.
There are several approaches to building hybrid cloud storage services. One is to simply move some workloads, like user file sync-and-share, to the cloud, while keeping other more performance sensitive applications on-premises. Some of these applications can use cloud storage services like Amazon Web Services' Elastic Block Store (EBS), their new Elastic File Store (EFS), or IT-managed, cloud-resident file storage such as SoftNAS' offering.
The most popular cloud storage is based on objects. And many modern applications (applications that can live in either the cloud or on-premises) now support HTTP-based storage protocols (REST-based APIs) to access cloud-friendly object storage directly.
Why cloud storage?
There are many compelling motivations for IT to leverage cloud storage. Among these are:
- Low cost per GB
- Near infinite capacity on-demand
- Elastic subscription cost basis
- Little or no up-front Capex investment
- Distributed accessibility
- Regionally distributed replication
- Low Opex (not managed in-house)
- Potential recovery in the cloud (DRaaS)
Object storage products are used with every cloud platform, whether that cloud is private or public. Many object storage products can be set up to run distributed in a hybrid way, naturally spanning across internal data centers and public clouds. Amazon Web Services S3 is the leading object store API, followed closely by the OpenStack Swift APIs for OpenStack cloud builders.
A close cousin to cloud-side storage and distributed object stores are a range of software-defined storage (SDS) products that can span cloud and on-premises infrastructure, with appliances and/or virtual machine-based storage nodes. Depending on the kind of hybrid architecture you want to build and the level of necessary storage services, storage offerings like those from Maxta, Nexenta, Qumulo or Tarmin might be just the ticket.
Gates to cloud heaven
If your applications aren't going to change as fast as you'd like into cloud-aware denizens, and you aren't ready to directly manage cloud infrastructure, a cloud gateway could be a good option. Cloud storage gateways basically look like a traditional array to workloads, but internally function as a large high-performance local cache fronting cloud capacity on the back end. Cloud storage gateways can be based on physical or virtual appliances, and can, in some cases, seamlessly replace traditional block and file storage products.
Gateways can differ in the way their local caching or tiering algorithms are designed. Some are based on recent user behavior, some are application-aware, and some can even coordinate snapshots back into the cloud. Most have some built-in data transfer optimization like deduplication and compression, although the returns on these can vary and may duplicate network-level WAN optimization. Some gateways go farther with advanced IO journaling and buffering, IO prioritization and off-peak transfer scheduling.
These differences come into play depending on how the gateway storage is used. For example, it may be a deep-capacity backup target aiming at providing hundreds of TB of object storage cheaply (e.g., NetApp SteelStore). Or it could be a front-line array replacing local primary storage in ROBO deployments (e.g., CTERA Networks). With the latter option, IT can reap multiple benefits as remote primary storage is not only cached locally for performance, but automatically synced back to the cloud, protected and effectively made available anywhere.
For larger data centers, long-time gateway vendors like Nasuni and Panzura aim to provide a more cost-effective and tremendously scalable NAS offering to replace islands of traditional filers with a single global namespace. One of the key things to consider with such widely distributed file systems is how they handle regional distribution, caching, versions and/or file locking.
Arrays tier up to the cloud
Another example of a hybrid approach is Microsoft's StorSimple, which is a primary block storage array with a cloud tiering capability fully integrated.
Real-world cloud storage
Common use cases today for cloud storage include:
- Capacity storage target for backups -- hot and cold
- Active archives where data can be accessed on-demand
- Data stores for cloud-based analytics processes
- Primary data for web and mobile applications
- Distributed file sync-and-share
- Video and image warehousing, often taking advantage of cloud transcoding/processing services
- Cloud storage tier behind on-premises primary storage
Most traditional storage vendors will likely build cloud-storage tiering directly into their traditional arrays in the near future. Recent Taneja Group research shows that the majority of enterprises expect to add cloud as a new tier of storage within the next 3-5 years. Also, EMC and NetApp have acquired established cloud storage gateway products (TwinStrata and Riverbed SteelStore, respectively).
Is cloud storage right for you?
Before jumping into cloud storage, IT teams need to first consider what they really want to accomplish. When architecting a hybrid approach, the following are some key areas to consider:
Network connectivity and bandwidth. The network is a key resource between the data center and the cloud that impacts performance, availability and cost. Networks are still inherently unreliable. When considering which and how much of your data can live in the cloud, evaluate end-to-end capabilities like the size of forward cache and WAN optimization including deduplication and compression. These features can be provided in dedicated network appliances or built into cloud storage gateways directly, but you may not need both. For larger cloud data replication and migration challenges, also consider offerings like Attunity's CloudBeam which is designed to ease big data flow.
Security. Look for integration with your existing on-premises authentication. Most cloud storage offers solid encryption in flight and at rest in the cloud, but check where and how keys are managed and protected. Are there policies that can automatically enforce security provisions, and regulatory/compliance restrictions?
Data accessibility. Look for the breadth of accessibility. Should your data only be really available for access from within your data centers or do you want to enable cloud-based processing or global file sharing? Where does data get replicated and encrypted/decrypted for mobile and distributed users?
Cost/data sprawl controls. Cloud storage can be awfully easy to use in large volumes, but that means your costs over the long term can shoot sky-high. Look for cost allocation and data sprawl management features.
Performance. Can performance be delivered when and where users and workloads need it? End user file sharing has different needs than your critical point-of-sale database.
Migrating out. Does your hybrid solution need to support multiple public cloud providers, maybe even two at once just in case?
In the end, it is inevitable that cloud storage will be a tier of data centers in the near future. So, the question is probably not if, but how you can best take advantage of it. A cloud storage gateway is a good way to jumpstart use of public cloud storage, which should be followed by a serious consideration of where a global scale-out NAS might layer in strategically.
Purchase the right cloud storage gateway appliance
Traditional NAS gives way to cloud gateways
Product comparison: Automated storage tiering
- 3 Common Cloud Challenges Eradicated with Hybrid Cloud –SearchStorage.com
- Jargon buster guide to hybrid cloud storage –ComputerWeekly.com
- Cloud Storage for Primary or Nearline Data –SearchStorage.com
- 4 Options for Architecting Hybrid Cloud Storage –Hedvig Inc