About 18 months ago, I had a disk failure in a Drobo RAID 5 array. It took two weeks to rebuild.
I considered that unacceptable, and got rid of all my Drobos, replacing them with Synology boxes. On Tuesday, I had a disk failure in a larger Synology RAID 6 array. I replaced the failed 3TB drive with a larger one, and started a rebuild.
Today, about 46 hours into the rebuild, I checked the status:
It looks like the whole process will take about two days. That’s OK, but I am configuring my newer arrays for RAID 1 — mirroring — to cut down on rebuild times.
[Addition] The rebuild finished right on schedule, and the array is doing fine.
Frank says
two weeks is crazy. lol
I think Raid 10 is batter, but space will cut 1/2. Orz
Walter Monat says
Bear in mind that bit errors are the biggest problem with RAID technology today.
In the event of a disk failure in a RAID set, every remaining disk must be read perfectly from start to finish. Any read error during RAID-5 rebuild will cause data loss since we need to read each sector of all other disks during rebuild.
http://www.smbitjournal.com/2012/05/when-no-redundancy-is-more-reliable/
On SATA disks with a bit error rate of 1 in 10^14 (read failure once every 12,5 TB of read operations), that means if the data on the surviving disks totals 12.5TB, the probability of the array failing rebuild is close to 100%. If read failure happens during RAID-6 rebuild, you will be driving without spare tire, vulnerable to 2nd disk failure, from which RAID-6 is supposed to protect in the first place. Enterprise class disks are safer by factor of ten, but still risky. Here is the math:
http://www.lucidti.com/zfs-checksums-add-reliability-to-nas-storage
jim says
Walter,
Thanks. Here’s something from the last link in your comment that makes no sense to me: “Every remaining disk must be read perfectly from start to finish or else rebuild will fail, leading to a total data loss.”
I worked on RAID storage a bit in the early 1990s. Back then, if you couldn’t read a block on rebuild, you couldn’t calculate parity on that block, and you thus lost all or part data in the block. But not any more data than that.
Also, the BER numbers that you are quoting are an order of magnitude higher than the ones I’ve been seeing.
Jim