carloscastilla - Fotolia
- Marc Staimer, Dragon Slayer Consulting
Data goes into cold storage when it's infrequently or never accessed. Cold storage is for data that is kept for...
compliance reasons; has possible future value; or because IT is concerned that if the data is deleted, that's when it will be needed. This type of storage generally costs much less than primary and secondary storage and has correspondingly lower performance.
Cold data is frequently conflated with cold data storage, but actually can exist on any storage media and systems. Cold data storage, on the other hand, is a system specifically architected for storing cold data. With cold storage, there's considerable variation in everything from the frequency and performance of data access to media longevity and data resilience and durability. Cold data can become warm or hot again if users suddenly need it. This turn of events complicates the use of the system itself and can add unexpected costs.
Recently, cold data storage has become a hot topic for several reasons, including the following:
Exponential data growth. Analysts from IDC expect the amount of data created annually will mind-bogglingly exceed 44 zettabytes by 2020 and continue to accelerate from there. Most of that data isn't active or frequently accessed, with approximately 80% or more of it unstructured data and much of it machine generated by the likes of security videos and log files.
Primary storage consumption. Storage is the only technology in the data center that's consumed. Most data will stay on the first storage it lands on for its lifecycle, which is essentially forever. Even when primary storage is refreshed, cold data moves to the new system and continues to consume expensive primary storage and NAND flash SSD media.
Using those assets for active data makes perfect sense, but not for cold data that's rarely, if ever, accessed. When primary storage is consumed by cold data, more of it has to be purchased and implemented for active data. Cold data doesn't need the high performance, low latency and functionality of primary storage systems.
Unfortunately, cold and cool data accounts for the vast amount of data consuming primary storage. It consumes 75% to 90% of data center storage. Heat maps tracking data over time show that data is hottest in the first 72 hours after creation. It cools rapidly from there, becomes quite cool after 30 days and is effectively cold after 90 days.
Pack rat syndrome. IT organizations have no appetite for tossing out data. There's an underlying anxiety that any data thrown away will suddenly be needed. This goes hand-in-hand with the perception, right or wrong, that all data has value.
Regulatory compliance. New standards and regulations that require data-related compliance are on the rise. These include the European Union's General Data Protection Regulation; New York State banking and cybersecurity regulations for financial institutions; Health Insurance Portability and Accountability Act; HITECH Act; Basel I, II and III; Sarbanes-Oxley; and OSHA. Many of these regulations require keeping certain types of data for decades and even centuries.
Unstructured data analytics. With as much as 80% of all new data unstructured, it makes sense to find a way to mine it for actionable insights. This has led to an explosion of unstructured data analytics, with revenue from those products growing to more than $125 billion in 2015, according to IDC. Storing this unstructured data for future analysis must be cost-effective.
Cost-effectiveness: Cold storage is practical because the cost of storing cold data is commensurate with its low value. There are several cold data storage systems and media options available, as well as numerous cloud service options. Although each has its pros and cons, all drive down the cost of cold storage, making it quite affordable.
Cold data storage systems
Cold data storage systems have been around for decades, initially as automated tape libraries and optical jukeboxes with removable media used by larger organizations. Just as unstructured data has grown exponentially, so, too, have cold storage systems evolved to meet the challenge. New cold systems based on Linear Tape File System (LTFS) and object storage have emerged. Facebook and the Open Compute Project, the open source hardware design organization it founded, have been big drivers of those new systems (see "Cold storage pioneer"). These and other developments have led to four types of cold data storage systems:
- LTFS front-end automated tape library (ATL). The LTFS or object store front end is a small, relatively scalable local cache to an ATL that looks and feels like a disk storage system to applications and users. It speeds up writes and, in some cases, reads and provides similar performance to HDD-based NAS or object stores. Vendors include Dell EMC, Fujifilm-StrongBox Data Solutions, Fujitsu, Hewlett Packard Enterprise (HPE), IBM, Oracle, Quantum, Siemens and Spectra Logic.
- Skinny object storage or scale-out NAS HDD systems. Traditional object storage comes with unlimited scalability and is historically used for inexpensive, large capacity active archives. The slimmed-down version, with fewer storage server nodes, has been used for cold data storage. It provides exceptional data durability -- often as high as 99.999999999% -- through the use of sophisticated erasure codes, and it consumes far less overhead than multicopy mirroring. For example, with triple copy mirroring used in Hadoop storage, each copy consumes 100% more storage. Three copies consume 300% more storage. Protecting against three concurrent failures with erasure codes consumes at most 33% more storage, usually less. Erasure coding also provides exceedingly high data durability regardless of the underlying media hardware.
Vendors include Caringo, Cloudian, Concurrent with its Aquari product, DataDirect Networks, Dell EMC, Elastifile, Hitachi Data Systems, HPE, IBM Cleversafe, NooBaa Inc., OpenIO, Quantum, Qumulo, Red Hat Ceph Storage, Rozo Systems, Scality, SwiftStack and Western Digital HGST.
- Skinny object storage or scale-out NAS 3D quad-level cell (QLC) flash systems. These nascent cold data storage systems are due to come online by the first half of 2018. They operate similarly to skinny object storage HDD systems, but with key differences. The 3D QLC SSDs are considerably faster and denser -- 10 to 20 times denser -- than HDDs, and, more importantly, they store data quite differently.
The smallest writeable unit on an SSD is the program erase (PE) block, ranging from 512 bytes to 256 KB. Data can't be altered when written to a PE block; the PE block must be erased first, and only a limited number of erasures can occur. The number of writes, meantime, is determined by the number of bits per cell. QLC flash is limited to 100 to 1,000 writes per PE block. And PE blocks are more likely to fail than the entire SSD drive. Erasure coding can treat PE blocks the same way it treats drives, but to do that requires modifications at the flash translation layer. This approach makes object storage or scale-out NAS with 3D QLC SSDs highly practical and cost-effective for cold storage.
Tachyum is the only vendor working on 3D QLC flash at this time.
- Highly scalable optical archive cold storage systems. Optical storage systems, aka optical jukeboxes, have kept pace with the cold data explosion. Historically, these used small capacity media with slow streaming performance. That's not true anymore.
Optical disk capacity has increased from 100 GB to 300 GB, with 500 GB and 1 TB optical platters expected within a few years. Twelve optical disks are bundled in tape-like cartridges, with each cartridge addressable as a single storage drive. A jukebox can use dozens to hundreds of these cartridges, addressing them in parallel. This approach improves transfer or throughput performance to rival disk, tape and SSDs at as much as 360 MBps.
Highly scalable optical archive system vendors include Panasonic and Sony.
Cold storage cost-effectiveness
Cost-effective cold storage systems require cost-effective cold storage media. It comes down to capacity density -- the amount of raw capacity in an HDD or SSD drive or a tape or optical cartridge -- and total cost of ownership. TCO includes acquisition expenses and supporting infrastructure costs, such as power, cooling, maintenance and operations.
Removable media such as tape and optical cartridges require less -- none when removed -- power and cooling than HDDs and SSDs. HDDs can be spun down to reduce power and cooling, while high-density 3D QLC flash SSDs use a small fraction of both, compared with HDDs. HDDs run primarily 7200 rpm in a 3.5-inch form factor, topping out at 12 TB raw, today. These range from 4 TB to 12 TB and are colloquially called fat drives.
Cold data storage media options
HDDs are relatively inexpensive per gigabyte, quite effective for search or analytics and have high data durability when paired with erasure coding. But they're electromechanical devices that use a lot power, generate too much heat and require a correspondingly large amount of cooling. They have a relatively short wear life and unpowered drives cannot maintain the data for more than approximately four years. High-capacity fat HDDs are available from Seagate, Toshiba and Western Digital.
3D QLC flash SSDs are incredibly dense in raw capacity, requiring fewer drives, racks, power, cooling and personnel support. They work well with data reduction and erasure coding. Relatively low 3D yields from fabrication plants, combined with high demand, is keeping flash SSD prices higher than expected, which in turn reduces the 3D QLC value proposition. Yields and supplies should increase by 2018, causing prices to fall more in line with cold storage requirements. 3D QLC flash drives will be available from SK Hynix, Micron-Intel, Samsung and Western Digital, with the first ones shipping in late in 2017.
LTO tape cartridges are the lowest-priced cold storage media. Performance keeps ratcheting up with every release, with the tape technology now specified through LTO-10. Recently announced tape-density advancements by Fujifilm and IBM will let LTO increase capacities to as much as 330 TB raw and 825 TB compressed per cartridge sometime within the next 10 years, making tape even more cost-effective. With tape, however, searchability and interactive performance are limited and slow, and large robotic tape libraries are required for any significant amount of data. And when tapes are removed from a library, searches and analytics become more difficult and excruciatingly slow. LTO-7 tape cartridges are available from Fujifilm, IBM, Sony and OEMs of these vendors.
Optical media cartridges are the most immutable media available. They have the longest life without data loss, ranging from 50 to 1,000 years. Throughput performance has been catching up to HDDs and LTO tape cartridge levels. On the downside, suppliers are limited to MDISC, Panasonic and Sony, with the latter only halfheartedly committed. Interactive performance is still slow and all optical data is effectively permanent.
LTO tape, currently at LTO-7, tops out at 6 TB raw, 15 TB compressed. The 3D QLC (4 bits per cell) SSDs aren't shipping yet, but promise capacities of 128 TB raw in a 2.5-inch form factor. Optical -- Blu-ray, archival disc and MDISC -- have made significant capacity gains.
Cold storage cloud services
The cold storage renaissance is often attributed to Facebook, but Amazon Web Services (AWS) may be even more of a driver. When AWS first came out with Glacier cold cloud storage at the low price of 1 cent per gigabyte per month, now 0.4 cents per gigabyte per month, it sparked enormous competition among cloud storage service providers.
Cold storage pioneer
Facebook, as a giant hyperscaler, has experienced the extraordinary growth in cold data. It has pioneered highly scalable optical, skinny object storage on HDDs and 3D quad-level cell (QLC) flash SSDs. Facebook continues to try to improve its ability to transparently handle and manage petabytes (PB) to exabytes of cold data storage and releases everything it develops to OpenCompute.org.
The company uses massively scalable optical jukebox and skinny object storage on top of HDDs. Skinny object storage minimizes the number storage server nodes and maximizes the number of drives per node while leveraging erasure codes. Facebook is still perfecting skinny object storage with 3D QLC SSDs.
Facebook's cold storage system designs have been commercialized. The high-capacity optical jukebox is available from partner Panasonic as freeze-ray, currently scalable to 1.9 PB per 19-inch rack and slated to increase to more than 6 PB by 2020. Partner Tachyum plans to commercialize Facebook's in-development skinny object storage with 3D QLC SSDs.
Today, dozens of companies offer various types of cold storage services, including variations of skinny object storage on HDDs. Others use LTFS tape systems. All of them are inexpensive, ranging from 1 cent per gigabyte per month to as little as 0.1 cent per gigabyte per month. However, fees can double, triple, even quadruple, depending on how fast a system reads and retrieves data. This is cold data storage, though, so the assumption is any data retrievals are trivial and rare. Cold cloud storage is available from all the major cloud providers, including AWS, Google, IBM, Microsoft, Oracle and many more.
Unstructured data management and movement
Unstructured data must be moved from the storage it was initially stored on to cold storage. This tends to be an ad hoc challenge of labor-intensive manual data migrations. Each data movement becomes a major project requiring a lot of personnel or professional services or both. Projects like these can cost more than the savings of moving data to cold storage, which explains why cold storage was a cold market for so long.
Software for managing unstructured data has changed the cold data landscape, however. It transparently moves data from primary storage to cold data storage based on policies, such as amount and frequency of access, age of the data and time since last access. Files and objects are copied, moved and deleted from the original storage, freeing up that storage for active hot data. Users and applications are relinked to their data automatically.
The software can create as many copies of files and objects as required, pushing them to cloud cold data storage, LTFS front-ended ATLs, optical jukeboxes and skinny object stores, regardless of media. Vendors providing this software include Actifio, Catalogic, ClarityNow, Cohesity, Commvault, Enmotus, Komprise, Moonwalk Universal, NTP Software, Primary Data, Rubrik, Starfish, StrongBox Data Solutions and Veritas.
The unstructured data management and movement software, in particular, combined with new cost-effective cold storage systems and media have made cold storage practical. It has turned up the heat on what had been a frozen market.
More on cold cloud storage options
Is cloud archiving right for your organization?
What's next for public cloud storage?
- Create a data archiving process for your growing data sets –SearchDataBackup.com
Dig Deeper on Long-term archiving
Vast boosts QLC flash offer with ransomware-proof snapshots
Flash memory guide to architecture, types and products
Performance, reliability tradeoffs with SLC vs. MLC and more
An overview of Microsoft Project Silica and its archive use