A cache -- pronounced CASH -- is hardware or software that is used to store something, usually data, temporarily in a computing environment.
A small amount of faster, more expensive memory is used to improve the performance of recently accessed or frequently accessed data that is stored temporarily in a rapidly accessible storage media that's local to the cache client and separate from bulk storage. Cache is frequently used by cache clients, such as the CPU, applications, web browsers or operating systems (OSes).
Cache is used because bulk, or main, storage can't keep up with the demands of the cache clients. Cache shortens data access times, reduces latency and improves input/output (I/O). Because almost all application workloads depend on I/O operations, caching improves application performance.
How cache works
When a cache client needs to access data, it first checks the cache. When the requested data is found in a cache, it's called a cache hit. The percent of attempts that result in cache hits is known as the cache hit rate or ratio.
If the requested data isn't found in the cache -- a situation known as a cache miss -- it is pulled from main memory and copied into the cache. How this is done, and what data is ejected from the cache to make room for the new data, depends on the caching algorithm or policies the system uses.
Web browsers, such as Internet Explorer, Firefox, Safari and Chrome, use a browser cache to improve performance of frequently accessed webpages. When you visit a webpage, the requested files are stored in your computing storage in the browser's cache.
Clicking back and returning to a previous page enables your browser to retrieve most of the files it needs from the cache instead of having them all resent from the web server. This approach is called read cache. The browser can read data from the browser cache much faster than it can reread the files from the webpage.
Cache is important for a number of reasons.
- The use of cache reduces latency for active data. This results in higher performance for a system or application.
- It also diverts I/O to cache, reducing I/O operations to external storage and lower levels of SAN traffic.
- Data can stay permanently on traditional storage or external storage arrays. This maintains the consistency and integrity of the data using features provided by the array, such as snapshots or replication.
- Flash is used only for the part of the workload that will benefit from lower latency. This results in the cost-effective use of more expensive storage.
Cache memory is either included on the CPU or embedded in a chip on the system board. In newer machines, the only way to increase cache memory is to upgrade the system board and CPU to a newer generation. Older system boards may have empty slots that can be used to increase the cache memory, but most newer system boards don't have that option.
Instructions for how the cache should be maintained are provided by cache algorithms. Some examples of cache algorithms include:
- Least Frequently Used (LFU) keeps track of how often an entry is accessed. The item that has the lowest count gets removed first.
- Least Recently Used (LRU) puts recently accessed items near the top of the cache. When the cache reaches its limit, the least recently accessed items are removed.
- Most Recently Used (MRU) removes the most recently accessed items first. This approach is best when older items are more likely to be used.
Write-around cache writes operations to storage, skipping the cache altogether. This prevents the cache from being flooded when there are large amounts of write I/O. The disadvantage to this approach is that data isn't cached unless it's read from storage. That means the read operation will be relatively slow because the data hasn't been cached.
Write-through cache writes data to cache and storage. The advantage here is that because newly written data is always cached, it can be read quickly. A drawback is that write operations aren't considered complete until the data is written to both the cache and primary storage. This can cause write-through caching to introduce latency into write operations.
Write-back cache is similar to write-through caching in that all the write operations are directed to the cache. However, with write-back cache, the write operation is considered complete after the data is cached. Later on, the data is copied from the cache to storage.
With this approach, both read and write operations have low latency. The downside is that, depending on what caching mechanism is used, the data remains vulnerable to loss until it's committed to storage.
Popular uses for cache
Cache server: A dedicated network server or service acting as a server or web server that saves webpages or other internet content locally. A cache server is sometimes called a proxy cache.
Disk cache: Holds recently read data and perhaps adjacent data areas that are likely to be accessed soon. Some disk caches cache data based on how frequently it's read. Frequently read storage blocks are referred to as hot blocks and are automatically sent to the cache.
Cache memory: Random access memory, or RAM, that a microprocessor can access faster than it can access regular RAM. Cache memory is often tied directly to the CPU and is used to cache instructions that are frequently accessed. A RAM cache is much faster than a disk-based cache, but cache memory is much faster than a RAM cache because it's so close to the CPU.
Flash cache: Temporary storage of data on NAND flash memory chips -- often using solid-state drives (SSDs) -- to fulfill data requests faster than would be possible if the cache were on a traditional hard disk drive (HDD) or part of the backing store.
Dennis Martin, president of Demartek, explains the benefits of using SSDs as cache.
Persistent cache: Considered actual storage capacity where data isn't lost in the case of a system reboot or crash. A battery backup is used to protect data or data is flushed to a battery-backed dynamic RAM (DRAM) as additional protection against data loss.
Types of hardware cache
With CPU caching, recent or frequently requested data is temporarily stored in a place that's easily accessible. This data can be accessed quickly, avoiding the delay involved with reading it from RAM.
Cache is helpful because a computer's CPU typically has a much higher clock speed than the system bus used to connect it to RAM. As a result, the clock speed of the system bus limits the CPU's ability to read data from RAM. In addition to the slow speed when reading data from RAM, the same data is often read multiple times when the CPU executes a program.
With a CPU cache, a small amount of memory is placed directly on the CPU. This memory operates at the speed of the CPU rather than at the system bus speed and is much faster than RAM. The underlying premise of cache is that data that has been requested once is likely to be requested again.
CPU caches have two or more layers or levels. The use of two small caches has been found to increase performance more effectively than one large cache.
The most recently requested data is typically the data that will be needed again. Therefore, the CPU checks the level 1 (L1) cache first. If the requested data is found, the CPU doesn't check the level 2 (L2) cache. This saves time because the CPU doesn't have to search through the full cache memory.
L1 cache is usually built on the microprocessor chip. L2 cache is embedded on the CPU or is on a separate chip or coprocessor and may have a high-speed alternative system bus connecting the cache and CPU. Level 3 (L3) cache is specialized memory developed to improve L1 and L2 performance. L4 cache can be accessed and shared by the CPU and the graphics processing unit (GPU).
L1, L2 and L3 caches have historically been created using combined processor and motherboard components. Recently, the trend has been to consolidate the three levels on the CPU itself. Because of this change, the main method to increase cache size has shifted to buying a CPU with the right amount of integrated L1, L2 and L3 cache.
Translation lookaside buffer (TLB) is memory cache that stores recent translations of virtual memory to physical addresses and speeds up virtual memory operations.
When a program refers to a virtual address, the first place it looks is the CPU. If the required memory address isn't found, the system then looks up the memory's physical address, first checking the TLB. If the address isn't found in the TLB, then the physical memory is searched.
As virtual memory addresses are translated, they're added to the TLB. They can be retrieved faster from the TLB because it's on the processor, reducing latency. The TLB can also take advantage of the high-running frequencies of modern CPUs.
TLBs support multiuser computers with a user and a supervisor mode, and they use permissions on read and write bits to enable sharing. However, multitasking and code errors can cause performance issues. This performance degradation, known as cache thrash, is caused by computer activity that fails to progress because of excessive resource use or caching system conflicts.
Cache vs. RAM
Cache memory and RAM both place data closer to the processor to reduce response time latency. Cache memory is usually part of the CPU or part of a complex that includes the CPU and an adjacent chipset where memory is used to hold frequently accessed data and instructions.
A RAM cache, on the other hand, usually includes permanent memory embedded on the motherboard and memory modules that can be installed in dedicated slots or attachment locations. The mainboard bus provides access to these memories.
CPU cache memory is between 10 to 100 times faster than RAM, requiring only a few nanoseconds to respond to the CPU request. RAM cache, however, is faster in its response time than magnetic media, which delivers I/O at rates in milliseconds.
Cache vs. buffer
A buffer is a shared area where hardware devices or program processes that operate at different speeds or with different priorities can temporarily store data. The buffer enables each device or process to operate without being delayed by the others.
Buffers and cache both offer a temporary holding place for data. They also both use algorithms to control the movement of data in and out of the data holding area.
However, buffers and cache differ in their reasons for temporarily holding data. Cache does so to speed up processes and operations. A buffer aims to let devices and processes operate separately from one another.