What is RAID and why would you use it?
RAID stands for “redundant array of inexpensive (or independent) disks”. This doesn’t clearly define what it truly is or why it would be used.
RAID can be used for the following benefits:
- Performance. By grouping disks together in various configurations, RAID can provide performance increases that range from mild to wild.
- Security. By grouping disks together while using parity (RAID 5), no single disk has data that can be reconstructed. The entire array must be used to recover data. If a drive fails in the array, the drive can simply be thrown away and replaced with a new one. The tiny fraction of data on the discarded drive cannot be used to recover any data.
- Capacity. While the capacity of drives has increased over the years, the amount of data stored still vastly overshadows the amount of storage that any single hard drive can provide. By grouping the drives together, you create a much larger pool (in most circumstances) that offers a capacity of almost all the drives added together.
- Availability (safety). If data is important enough to be stored, it should be stored in a manner that prevents a hardware failure from destroying the data. RAID offers several configurations that (in MOST cases) spread the data across multiple drives in a way that any single drive can fail and data is still available.
Different levels for different functions
RAID 0 (zero, or STRIPING). With this level, you spread the data across two (or more) drives equally. This can result in very fast reads and writes but comes at a cost. This configuration actually DECREASES your level of safety. Without RAID, if you store data on a single drive, it only takes the failure of a single drive to wipe out data. With RAID 0, it now takes the failure of any single drive (even if you use 4 or 10) to wipe out that data. The likelihood of one of many drives failing is much higher than the likelihood of a single drive failing. You get great performance, but this is the most dangerous way of storing data. If this is so dangerous, why does this level exist? You will find out in a moment!
RAID 1. With this configuration, you take the concept of a single drive and then copy it to another. The RAID controller (used in every level of RAID) simply presents two physical drives as a single logical drive. You copy data to the drive and the controller handles the copying between drives without you knowing about it. You get great data safety, but no performance increase. We are making progress, but the downside is that you are limited to the capacity of whatever two drives you choose. If you choose a pair of 10TB drives, you only have a capacity of 10 TB. We haven’t gained much.
RAID 5. Now we are entering the realm of big data and big safety. With RAID 5, we start with at least 3 drives and use parity to make sure that any one of the drives can fail and the data is still available. You get performance benefits from having data spread across at least 3 drives (or as many as your server can accommodate) along with the safety of being able to lose a drive safely. The downside is that your array will be slower when a drive has failed because now the RAID controller must do some serious mathematical computations to rebuild the data on the fly. The rebuilding of data on the fly is the parity piece of this puzzle. Without going into great detail, the controller reads the data from the remaining online, drives and computes a checksum to figure out what data is missing, and sends that back to the server as if nothing happened. The array is slower but still available. You can replace the failed drive (often while the server is still running) and the RAID controller will see the new drive and rebuild the array in the background. You will suddenly see all the lights flashing in the array and array speeds will deteriorate a bit more while the array rebuilds. This process can take minutes or days, depending on the amount of data that was stored on the (now in the trash) failed drive.
Side note: there is a RAID 6 that allows for up to TWO failed drives in a RAID 6 array for the ultimate in data safety. RAID 5/6 is good but it should be noted that the concept of “N-1” in RAID 5 or “N-2” in RAID 6 applies. In a RAID 5 array, you lose the capacity of one drive. Let’s say you purchase 5 hard drives and each drive has a ten terabyte capacity. You don’t end up with 50TB of storage capacity, you end up with 40TB. That last drive is the capacity lost. Since RAID 6 allows for the failure of up to two hard drives, you lose the capacity of 2 drives. Using the same 5 drives with 10TB each, you only end up with 30TB. From a practical standpoint, many administrators choose many drives of a smaller capacity rather than only a few drives with a higher capacity. Higher-capacity drives are more expensive than smaller drives. This reduces the money needed for the array. If a slightly smaller drive fails, the time needed to rebuild can shrink quite a bit. Don’t forget your array is in a pretty bad state once a single drive has failed! Another failed drive before replacement and you have still lost all your data. Once a drive fails, get it replaced ASAP. Most network admins will keep a hot or cold spare to minimize the amount of time spent with a degraded array.
RAID 10. Remember when I mentioned RAID 0 being dangerous? OK, imagine this: you have this really fast(but really dangerous) array. What if we took that RAID 0 array, made another RAID 0 array, and then took those two arrays and mirrored them for safety? That is RAID 10, just a fancy way of saying RAID 0, and RAID 0, in a RAID 1 array. Very fast, very safe. In fact, each of those raid arrays could lose a drive and the array would be fine… Yes, you could lose TWO drives (from the same RAID 0 array) and continue as if nothing happened.
Pros and cons of raid vs. SSDs
You might be asking how RAID compares to SSDs. It’s simply grouping drives together. Those drives can be mechanical (often called, “spinning rust”) or solid state. Drives can be combined for a higher storage capacity and this is even more important when dealing with the smaller capacities of solid state devices. While solid state drives can be grouped for better performance, so far that improvement has been measured in the amount of data that can flow per second, minute, or whatever metric you use.
The real benefit of using SSDs in RAID is when you measure IOPS or “Input/Out Per Second”. Because a hard drive is a spinning mechanical drive, it is slow, in terms of IOPS. Spinning drives are typically running 100 IOPS due to the mechanical heads needing to move. An SSD is measured from 1,000 (consumer drives) to 100,000 (enterprise m.2 drives). This drastic improvement is only amplified when used in an array of any configuration.
Hard drives still have a place in consumer or enterprise storage for two reasons. First they have a much larger capacity than an SSD. Second, SSDs have a known lifespan. Their storage chips wear out at a predicted rate and when they get to zero percent remaining, they fail rather quickly. A hard drive can last years while SSDs are measured in bytes written. Writing a lot to the SSD will kill it within weeks if used for data mining. That may not be bad, though. The performance of SSDs is so great that the act of replacing them after they burn out may be worth the higher cost and maintenance intervals. Also, enterprise SSDs usually have double, or even triple, the endurance of consumer SSDs.
What would benefit from RAID? Workstation, server, or laptop?
RAID is a great technology, but the functions need to exist in the machine before an array can be created. Servers often have this feature built-in due to their need to always be online and available. They are often used in conjunction with “hot swap” bays that allow the drives to be removed from the front of the server. A new drive can be plugged in, and the server will start rebuilding the array automatically. Many high-end workstations also have limited versions of RAID as well, but without the hot-swap feature. It only takes a minute or two to replace a failed drive in a workstation and this downtime only affects a single user. Most laptops do not have RAID controllers built in, although some high-end machines do. Generally speaking, the need for a server to remain online means that most will have some type of controller. Workstations and laptops (should) typically store data on a server or in the cloud so a drive failure just means the operating system needs to be reloaded. Even this drawback can be mitigated through the use of imaging software or the use of PXE to load an image on boot.
I hope this article has been informative for you. Interon has now set up many “open source offices” and the economic benefits for our clients have been immense. If you would like to learn more or hire us to implement any or all of the technologies discussed feel free to call us or use our CONTACT US page to request more information!