Automated storage tiering moves data to different performance levels of available storage, depending on performance, space and other criteria. Depending on the implementation and specific requirements, these policies can be determined by administrators or by relying on the software to determine the best use of storage for specific pieces of data.
How does automated storage tiering manage the process of migrating older data to lower-performing tiers of storage?
Greg Schulz: [It] depends on if the solution is doing intersystem (e.g., between different systems or sites) or intrasystem (within a system). Intra-system would, with most implementations, be handled and under the control of that system or solution, [which would decide] what to move and when, based on learning or history, manual configuration or some [other] combination.
Intersystem approaches may work cooperatively in a federated manner, or be under the control of a separate policy management tool that watches over and tells the different solutions what to do -- and when, where and how -- to meet objectives. Some are more automated than others; some are vendor- or product-specific; while others are independent of vendors, systems and access methods or protocols.
So some storage controllers monitor how data is used and migrate it to the appropriate storage tier and others rely on time-based policies?
Schulz: Different solutions will be event- or activity-based, [while] others will be time or schedule driven [and] tied to policies and defined rules …. This is where the connection to caching comes into play: [Some] solutions can do an intelligent job of tracking what to keep in cache, what to pre-fetch, what to de-stage, etc.
Some approaches simply move things in and out on a coarser basis, requiring more cache space that might result in what appears to be efficient cache utilization. However, how effective is it?
Is it possible to move entire volumes of data at the block level or does it happen at the file level?
Schulz: It depends on the implementation, solution, product and approach. Some [products] will move an object, file, volume or blocks, while others make a copy [and leave] the original intact unless there is an update, [and] then a change is made.
There are lots of variations given the different implementation approaches that address different preferences or ways of thinking. They include move or copy; leaving a stub or move everything; schedule- or time-based versus dynamic on the fly. Some are more transparent than others.
How does tiering compare to caching?
Schulz: From a broad or pragmatic perspective, tiering (automated, semi-automated and automated) are forms of caching. Some are more intelligent than others and, of course, working on different granularities (e.g., volumes, files, blocks, objects) versus bytes, blocks and sectors.
The goal and objectives should be the same: Improve effectiveness and productivity versus simply driving up perceived efficiency with a higher utilization that may actually be introducing bottlenecks elsewhere. With an effective solution, there is improved efficiency usage, however there should also be better productivity and positive impact for those using the solution. [It shouldn't be] simply moving things around or masking issues.
One of the biggest differences is where the caching or tiering is done: the level of automating and associated intelligence; as well as on what level -- from objects to files to blocks; [and whether something is moved] or simply a copy made for cache purposes.
Does automated tiering increase wear and tear on flash storage due to write and erase cycles caused by the tiering process?
However, if you are moving, copying, updating or writing data, those NAND cells in flash have a duty cycle that in normal use should last for many years. If they are constantly being updated -- re-written -- they will wear out faster, but that should still be in the several years range.
With tiering, or any type of data movement and migration where data gets re-written, that's going to consume some of those program/erase (P/E) cycles faster if there are lots of updates.
Can vendors or even users do something to mitigate this?
Schulz: A good solution will also do smart or intelligent writes -- grouping them together, using nonflash buffers and caches to minimize the number of flash writes -- to help increase the duty cycle or prolong the life of a device.
An example is grouping updates together, not to mention doing fewer yet larger updates versus many small ones. NAND flash cells, or the underlying technology, work in large allocation sizes that can be 10x to 100x larger, depending on implementation rather than a regular 512 byte disk sector.
Your system and application may see a 512 byte sector presented by the NAND flash SSD, but behind the scenes, that 512 byte sector or block is being organized in much larger pages that can be measured in tens of thousands of bytes versus hundreds of bytes.
This means that solutions, which perform group writes or I/Os in hundreds of kilobytes or megabytes, will reduce the wear and tear [that can be caused by] thousands of smaller updates.