Dear RAID, It’s not me…. It’s you.

Home   >   Blog

Dear RAID, It’s not me…. It’s you.

RAID, you have been around for my entire career, and you did a great job for a long time. But, now that hard drives are approaching 10 TB, our relationship is over…and it has been for several years. Sure, you still work well enough with small drives, and even for some workloads… OLTP and high performance databases rely on 1 and 10, but everything else? Well, it’s time to break up.

Sure, protection against drive failure is still important and always will be, but focusing on drive level protection is history. Protection at the object level, or file, or block, is the future, and it is a future without you. Now that I can deploy object storage that is fast enough for the majority of my workloads, why do I need your old-style controller based protection? I can protect my data by applying erasure coding and replication at the object level, making my storage a commodity by leveraging software. Sure, software needs to run on hardware to be useful, but I can use common hardware instead of your specialized proprietary stuff that costs too much and lacks flexibility.

Now, don’t think that I am leaving you completely. You can still run in my hardware, but only on the boot drives. My software-defined storage still needs protection against a boot drive failing, and you are really good at that. For the hundreds of terabytes of data in the system, sorry, but you can’t help.

Yes, you have RAID 5 and 6, but erasure coding gets me more benefit in a large system with a lot of data. I can rebuild objects, not drives. I also have the flexibility to apply different levels of protection using the same drives. I’m not stuck with your antiquated RAID groups any longer. You know how you put additional load on drives when one fails and the set is rebuilding? Well, erasure coding lets me distribute the work across more drives in the system, while focusing on objects and not drives, which means that the probability of another drive failure due to excessive activity goes down. More drives participating means that each one is doing less work. I know, I know, you think you can solve this with larger RAID sets, but it just isn’t true. What’s that old saying about lipstick on a pig?

I even have local replication for high performance and low latency requirements. You thought that RAID 1 and 10 were the only way to get high performance, but you were wrong. Local replication lets me keep multiple copies of each object, on different drives and even different nodes. I can even store more than just 1 replica. With local replication, I can store 9 replicas, which means 10 copies of each object. Sure, that is a lot of copies, but just try and do that with RAID 1. It doesn’t mean I have to store 10 copies, but I like to keep my options open.

So, RAID, we can still be friends. You get the small data, like boot drives. The big data, important data, and the data I really care about are all going to be stored as objects with local replication or erasure coding.

Sincerely, Everybody