Q

# How does RAID5/parity really work?

How does RAID5/parity really work? Does the parity block act as a temporary buffer? What exactly is going on here...

(XOR) wise? Also could you point me to a reference?

RAID5 works by gathering up three or more physical disk drives (usually from 3 to 32 but you usually see 15 as the max in most RAID arrays) and logically combining them into a single logical entity.

RAID5 uses "distributed parity." This means as parity information is generated for each write command to the logical disk combined into a RAID5 "set", the parity data is "striped" across all the disks in the set.

Parity for a RAID5 set is generated, usually in hardware by the RAID controller, by a process known as: "Read, modify, write." On a RAID5 set, a write operation is not complete until the following process occurs:

1. Read old data from the disks
2. Read the old parity from the disks and calculate the difference between the old data and the new data to be written, using an XOR operation
3. Write the new data to disks
4. Write the new parity to disks

This means that for every write operation, four operations actually happen. This is known as the "RAID5 write penalty." RAID5 is a tradeoff between speed, reliability and disk space. RAID5 has an advantage of using less disk space for parity data than RAID1 mirror sets but there is a performance hit due to the overhead.

The actual mathematical calculation of the parity data is done as follows:

The parity is generated by grouping together the bits (0 an 1s) to be written, then flipping the bits for the parity data by making the 1 bits 0s and the 0 bits 1s. (This is an XOR operation.) A bit (0 or 1) is then added to each group of data bits so that it will have either an odd or even number of 1s. When reading back the parity, if the parity that was generated was odd, then any group of bits that arrives with an even number of 1s for that data must be in error. The data can then be regenerated using the parity information from the other disks.

Some storage array controllers can eliminate most of the RAID5 write penalty by gathering writes in mirrored cache, calculating the parity only once for all the data gathered, then writing it down to disks. This reduces the number of reads and writes that need to occur for multiple write streams which is very efficient for streaming write data like transaction logs for databases.

The best place I found to reference this stuff, is from a book called "The RAID Book", which was published by Digital press a few years back.

Chris

Editor's note: Do you agree with this expert's response? If you have more to share, post it in one of our .bphAaR2qhqA^0@/searchstorage>discussion forums.

This was last published in November 2002

## Content

Find more PRO+ content and other member only offers, here.

#### Have a question for an expert?

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

#### Start the conversation

Send me notifications when other members comment.

## SearchSolidStateStorage

• ### Will the eMMC controller market keep up with flash innovation?

EMMC host controllers may have a hard time handling advances in flash memory technology, like 3D NAND and newer connection ...

• ### Small but mighty eMMC flash storage grows its enterprise role

Many common devices, like your cell phone and tablet, use eMMC flash for storage. But the internet of things will soon make eMMC ...

• ### How eMMC 5.0 can improve your organization's small storage needs

The latest eMMC specification puts the tiny flash storage devices on a level playing field with many SSDs when it comes to speed ...

## SearchConvergedInfrastructure

• ### Holy COW! New Hampshire med center turns to Pivot3 vSTAC for VDI

Southern New Hampshire Medical Center put its traditional server-storage architecture out to pasture when it added ...

• ### Examining the state of the hyper-converged infrastructure market

HCI market leaders have emerged, but some question how long they'll retain their hold over the rapidly evolving segment.

• ### Nutanix networking management includes microsegmentation, APIs

Nutanix adds 'one-click networks' to its hyper-convergence as part of its plans to become an on-premises version of Amazon Web ...

## SearchCloudStorage

• ### Hitachi Content Intelligence searches, analyzes data

Hitachi Content Intelligence, built into Hitachi Content Portfolio object storage, extracts data and metadata from repositories ...

• ### OpenStack Newton storage features include data encryption

Storage updates in OpenStack's Newton release include at-rest data encryption in Swift, a message API for async tasks in Cinder ...

Google Cloud Platform expands Zadara Storage VPSA and ZIOS hyperscale cloud SaaS options, which already support Amazon Web ...

## SearchDisasterRecovery

• ### Case closed: Law firm selects iland DRaaS for faster, easier DR

Minutes count in legal work, and Graubard Miller needed a simpler platform for disaster recovery. The verdict: The law firm chose...

• ### Disaster recovery and business continuity plans require updating

Updating business continuity and disaster recovery plans can seem daunting, but it becomes easier when you delegate tasks and ...

• ### Enhance cloud resiliency with proper data management

Explore factors that can influence your level of cloud resilience, such as outages in different geographic locations, and the ...

## SearchDataBackup

• ### Mobile data backup helped by encryption, data policies

More and more corporate data is being created and living on mobile devices such as tablets and smartphones. That dynamic requires...

• ### Veeam backup software protects mental health facility's Hyper-V

A mental health and addiction facility had an ongoing problem with its virtual machine backup and recovery until it was solved ...

• ### Commvault backup software builds in cloud capability, virtualization

By embracing new technology, like the cloud and virtualization, Commvault Systems provides businesses with a complete backup ...

Close