Guide to RAID For Dummies

by admin

We’ll start at the very beginning and cover all the terms you might encounter when discussing RAID. There are some terrific concepts here, so stick with it and remember that you will ultimately benefit with faster speed or safer data storage or both, and you will understand the trade-offs and how to describe your requirements.

We hope you enjoy this guide. Our day job is recovering data from RAID systems and drives – if you need our assistance, call us on 0800 999 4447 or get in touch using the contact form on the left.

What is RAID?

RAID is an acronym that stands for Redundant Array of Independent Disks. So with “redundancy” built-in you might assume you will never need Data Recovery or back-ups – wrong!

RAID refers to a storage volume composed of multiple discrete hard drives and defines the manner in which the collection is presented to the outside world (typically your PC or Mac). The hard drives used are usually standard off the shelf S-ATA, IDE, SAS or SCSI drives.

Typically the hard drives will plug into some form of controller which will implement the configuration.

The controller will most commonly be one of the following

  • A card within a PC which the hard drives plug into.
  • Within an external hard drive (where the external houses more than one internal hard drive). For example the Lacie BigDisk is a 1TB capacity external drive comprising two internal 500GB drives.
  • A stand-alone enclosure, usual within an industrial/commercial environment.
  • A stand-alone RAID controller unit
Stand-alone RAID controller unit | RAID For Dummies Stand-alone RAID controller unit

Why would I want RAID?

There are essentially two reasons for having your hard drives set in a RAID configuration:

  1. For speed of operation, you want to minimize the access time.
  2. For data redundancy, if any one drive fails you want to be able to continue operating without loss of data.

These reasons are unrelated and yet the term RAID is applied to both. The most common configurations for domestic users are referred to as RAID0 and RAID1.

  • RAID 0 provides speed benefits but (crucially) no redundancy, while
  • RAID 1 provides redundancy but has the penalty of delayed access times.

For RAID usage in commercial applications (and even in some domestic situations too) a commonly used combination of the need for speed and redundancy is the RAID5 implementation.

Many externals sold now house 2 hard drives internally and on the first power-up you as the user will be asked how you want to configure them. A typical scenario is a 1TB external drive composed of 2 x 500GB drives. The setup software will ask if you want to configure these as RAID0 (striped) or RAID1 (mirrored). The implications of this choice must be understood .

Note For Non-Technical Managers

You may now already have enough to nod knowledgeably in meetings where RAID is discussed! Be aware that we get more technical from this point, so grab a cup of coffee and bookmark this page if you get interrupted.

What happens within a RAID controller?

The answer to this depends of course upon the configuration- let’s go through the 3 implementations that you are most likely to meet:

RAID 0 For Dummies:

This is designed for speed, it does not provide any redundancy, if one of the hard drives in the array fails then you have lost your data.

Here’s why:

Take the example of a RAID 0 array composed of 2 hard drives. Data that is written to this storage volume will be split between the two. The reason, as stated, is for speed. In a RAID0 array the data is presented simultaneously from the 2 drives through the controller to your PC or Mac.

The controller will read/write a certain amount of data from/to the first drive and the same amount from/to the second, then back to the first and so on. The amount of data written or read each time is constant for any given array and is referred to as the stripe size. Typical stripe sizes are in the order of tens to hundreds of KB. If (for example) your RAID0 array has a stripe size of 64KB and you write a 100KB document to it then it must exist partly on one drive and partly on the second. From this it can be seen that if one of the drives in a RAID0 array fails every file which was in part stored on the failed drive has been at best severely damaged. You will still find some files that are both smaller than 64KB and are fortunate enough not to have been stored across a stripe boundary but these are typically small in number and tough to find because the operating system will certainly be in a mess.

The illustration below demonstrates how this might work in practice for a 320KB document stored on a 2 drive RAID0 storage volume which uses a 64KB stripe size:

2 Drive RAID0 | Tierra Data Recovery 2 Drive RAID0

From a recovery point of view, with a RAID0 you must have access one way or another to all members of the array. If this is not the case then the loss of data will be very high if not complete.

RAID 1 For Dummies:

Also referred to as a “mirror” this is designed purely for redundancy. The contents of one half of the storage volume is identical to the other. Typically a RAID 1 will be composed of 2 identical capacity hard drives, each time data is written it is written to both hard drives, there is no striping, the data is simply written twice. This of course takes longer but should one half fail there is no loss of data.

RAID 5 For Dummies:

This is designed primarily for redundancy rather than speed and requires a minimum of three hard drives.

Just as with RAID0 the data is striped across the hard drives which make up the RAID array, however with a RAID 5 configuration if any single hard drive within the array fails then the volume will continue to operate (in all probability users would not notice that anything had happened), this is possible because a portion of the storage volume is dedicated to parity.

Parity is information additional to the user data. It is created by the RAID controller and allows it to reconstruct the user data if one of the hard drive fails.

Imagine that you want to store 4 numerical values:

3, 5, 2 and 7

Now imagine that you need to recover these values even if one of them becomes lost. You could store a 5th parity value, let’s imagine that the parity value will be the sum of the 4 numbers- 17, so now you store:

The four values that you want (analogous to the user data) and

The sum of the values (analogous to the RAID parity value):

3,5,2,7 and 17

If you lose any one of the user values you know what it was because you know what all 4 values added up to. This is essentially how parity within RAID5 operates.

An example illustration is shown below:

RAID5 | Tierra Data Recovery RAID5

In this RAID5 example the stripe size is 64KB and the volume is spread across 5 individual hard drives.

The row#1 parity information allows the controller to reconstruct the data if any one of the 1st, 2nd, 3rd or 4th 64KB stripe of the document file is lost. Similarly the row#2 parity information allows the controller to reconstruct the data if any one of the 5th stripe of the document or the 1st, 2nd or 3rd 64KB stripe of the photograph is lost.

A RAID5 need not have 5 hard drives, it has a minimum of 3 but it can have many more. RAIDs composed of 12 are common (in that case there would be 11 stripes of data and 1 stripe of parity). If any one of the 12 stripes is lost then the controller can reconstruct the missing stripe.

Hopefully, from this a few points become clear:

  1. If a single drive is lost within the array you can keep working with the data because the controller can calculate what the missing data was.
  2. If more than one hard drive within the array fails then it is no longer possible to recover the user data (imagine 2 of the numerical values in the previous example being lost- you know the sum of all 4 values but now that 2 have been lost you cannot know the original values stored).
  3. Since a proportion of the RAID volume is given over to storage of parity rather than user data then there must be some loss of storage space.

To address this last point, the parity used in a RAID5 volume is not in fact a simple summing of user data it is much more efficient in terms of storage space required. For each row of stripes across all of the members of the RAID5 array, one of those stripes must be parity, therefore in terms of space one of the drives will be given over to parity while all the others will be used for user data.

Therefore a 4 drive RAID 5 composed of four 250GB hard disks will provide a user storage space of 750GB (3 quarters for user data and 1 quarter, i.e. 1 drives-worth, for parity).

Similarly, a 12 drive RAID5 composed of twelve 1TB drives will provide a user storage area of 11TB (again, 1 drive-worth of space will be used for parity).

Usable storage space = (n-1) x (individual hard drive capacity)

Where n= the number of hard drives in the RAID 5 array. This of course assumes that they are all the same size, which in all but exceptional cases they are. Where drives of differing sizes are used the controller will assume that each is the size of the smallest.

It is also worth bearing in mind that while the RAID 5 diagram shown earlier shows all the parity stripes on the same drive, in fact manufacturers tend to distribute the parity stripes across all the members of the array, different RAID controllers and manufacturers will distribute the parity in their own idiosyncratic ways but always – for each row of stripes across the hard drives in the array, one will be parity and all the others in the row will be user data.

What about other forms of Implementation?

The majority of controllers are set up for level0, 5 or 1. But many are set-up with a mixture of these in order to combine the speed and redundancy benefits of each, the more commonly encountered ones are:


Similar to RAID 5 but the parity stripe for each row is always on the same drive- in other words all of the parity information is stored on the same physical disk (in fact the simplified illustration shown earlier to demonstrate RAID 5 could be the start of a RAID4 array).


Each row has 2 parity stripes (i.e. user data storage space is n-2), this means that 2 drives can fail and the data can still be reconstructed.

RAID 0+1

2 pairs of drives, each pair has data striped across it (RAID0) but each pair is mirrored (RAID 1), this does benefit access speeds and provides excellent redundancy but in practice is complicated to configure and maintain.

Want More?

We hope you found benefit in our starter guide – if there are any areas you’d like us to expand on, just drop us a comment.

{ 17 comments… read them below or add one }

Charles Dean January 26, 2011 at 5:00 pm

Nice article, thanks. I would note that while a 12-drive RAID 5 configuration has more useable storage space and a lower percentage of space for parity, having that many drives increases the risk of having more than one drive failure at a time. RAID 6 can protect you from data loss even if there are two drive failures, so it is more fault tolerant than RAID 5.

C. Campus August 13, 2011 at 12:13 pm

This is a great article. Great job in explaining this complicated storage setup in nontechnical terms. Now I have a Web site I can direct others to in order to get a better understanding of each RAID function.

Ron Patten December 14, 2011 at 12:00 am

I was under the impression that I could store my programs on one drive and my work on the other. I did not know the “can of worms” that I opened when I purchased my new computer. I do a lot of graphics for my small business and thought that keeping the programs separate from the work would act like a backup, but I’m finding it is much more.

falcon09 February 29, 2012 at 7:06 am

is it possible to use a 1TB partitioned into 4 in implementing RAID5?

Tom March 1, 2012 at 7:55 am

Yes it is, but you are missing the point. If your hard drive fails, your data is beyond your reach. There is no hardware redundancy. RAID5 is not a back-up methodology, so you are gaining none of the RAID5 benefits but taking on all of the disadvantages (low write speeds, loss of usable drive space, processing overhead). Man up, buy some more 1TB drives and do it properly.

Nick July 28, 2012 at 6:44 pm

What RAID configuration offers the best speed and reliability? Also what should be the back up methodology?


PlatterSwapper July 28, 2012 at 10:25 pm


The reason there are so many RAID levels and permutations is that RAID is used to address many problems in many situations.

In general, I would point you towards RAID5 and using father-grandfathering as a back up methodology. Then, based on your budget and attitude to risk, you can flesh these out or even switch to alternatives once you understand the trade-off for your own situation.


Michael Muntus August 16, 2012 at 11:27 am

Dear PlatterSwapper,

That was a great help in understanding where the school I am erstwhile IT Technician at is at. We have RAID 5 but I always said it was overkill for such a small organisation (10 workstations – 190GB data) and now it’s gone wrong (2 drives failed during the school holidays) and even with Symantec BESR 2012 nobody seems able to understand the entire configuration enough to sort it out. I would have preferred RAID 1 + 0 because of the redundancy.

Well done anyway



Ray August 17, 2012 at 7:14 pm

I was under the impression that it was a bad idea to put your OS on a RAID. It has been many years since I used RAID arrays, so I could be very confused. What is the relationship between arrays and the OS?

PlatterSwapper August 30, 2012 at 9:42 am

It depends on what type of RAID it is and your business objectives. If it’s a RAID0, optimised for speed at the risk of data safety, you would not put your OS on the RAID. If it was RAID5 and the system was properly administered, you might choose to have the OS on there (many smaller servers do).

The general rule of thumb though is to keep the OS separate from the data and to use RAID for the data. Nowadays, that data can even include virtual servers. Obviously, a strict backup regime is needed as well.

PlatterSwapper August 30, 2012 at 10:08 am

Hi Michael,
pleased this helped you. I am rather partial to RAID5 myself because of the nice balance between redundancy and overhead cost. RAID5 failure is rare (needing two drives to fail in the array as you have experienced) but the failure rate does depend on the number of drives in the array.

It is extremely unusual for two drives in a RAID5 to fail at the same instant. More typically, one drive fails but is not noticed (as the RAID5 serves data and accepts write requests with a small drop in performance). Then some time later a second drive fails and the RAID controller shuts the array down. That gets all eyes on the problem but of course by then it’s too late and you are reliant on back-ups (which always lag the real-time data).

The missing part here is the protocol that was implemented on your RAID controller – that first failure should have automatically notified responsible person(s) even if it was a holiday. They should then have replaced the failed drive before a subsequent failure.

If your school was indeed the victim of an instantaneous double drive failure, you have been exceptionally unlucky.

When our clients experience RAID5 failure there tends to be a post-mortem on the back-up regime. Often the back-up/test process has been allowed to degrade as budgets are cut and departments are squeezed. Such incidents can help persuade the purse-holders to restore appropriate funds.

Best of luck.

Guest September 15, 2012 at 11:31 am

This was awesome. I was reading this on wikipedia and was wondering what value of n was. I think God for the Dummies books. They never assume we know what they are talking about

Walter Person September 15, 2012 at 11:34 am

I was wondering what the n stood for in 1-1/n. thanks

Gerald March 9, 2013 at 10:25 pm

Please fix your definition for 0+1, as what you have described is just one of two opposing definitions and is therefore ambiguous. RAID 0+1 and RAID 1+0 have both been described as Striped Mirrors and conversly they have both been described as mirrored stripes.
We now use the term RAID10 which has a single definition of Striped Mirrors ie pairs of mirrored disks that are then striped (although someone has to tell Dell that, as last time i looked their support site had an incorrect definition)

Richard October 20, 2013 at 2:16 pm

I have had a raid configuration for a couple of years. A 1 TB drive mirrored to another 1 TB drive. I have about 55 GB of storage space remaining. I want to add a pair of 4 TB drives to the array. Not sure how to configure the drives. Are the 2 new drives added on to the existing drives. Or do I pair one of the new drives with each of the older drives? Thanks

PlatterSwapper November 10, 2013 at 9:23 am

Best approach differs depending on the capabilities of your RAID controller and the host device for your drives. Let’s assume your existing RAID controller can handle 4TB hard drives and they are not used as a boot device (2TB BIOS boot device size limit).

You would pair the 4TB drives with each other, not with the 1TB drives. Otherwise you miss out on 3TB of redundancy per pair.

My own approach would be to replace the 1TB drives rather than augment. I’d replace one of the 1TB drives with a 4TB drive, wait for RAID to rebuild, replace the other 1TB with the second 4TB drive, wait for RAID to rebuild, then use partition software to expand size of logical volume from 1TB to 4TB (during all this, your first 1TB drive behaves as an insurance policy).

Your 1TB drives are old and slower than your 4TB drives so this is a good opportunity to upgrade. If you are determined to make use the old 1TB drives, I’d do the above upgrade first, check data was good, wipe the 1TB drives then add the 1TB drives as a second logical volume RAID1 pair.

burak May 28, 2014 at 2:09 pm

Only problem is at the RAID 5 diagram. Parity information are also distributed at all disks. If RAID 5 was like this when Hard Disk #5 gets corrupted or broken, all the parity would be lost and so there will be no meaning of making RAID 5. So, Parities should be distributed. Kind of;

Data1 Data2 Data3 Data4 Parity1
Data5 Data6 Data7 Parity2 Data8
Data9 Data10 Parity3 Data11 Data12
Data13 Parity6 Data14 Data15 Data16
Parity5 Data17 Data18 Data19 Data20

Leave a Comment

Previous post:

Next post: