[Discuss] On Btrfs raid and odd-count disks

Thu Apr 11 09:28:34 EDT 2013

Richard Pieri <richard.pieri at gmail.com> writes:

> In retrospect, if you're looking at file systems as a means to prevent
> write holes with RAID 5/6 then you're going about it wrong. Write holes
> happen with every RAID level. They happen with RAID 5 and 6. They happen
> with RAID 1 and RAID 10. Do not believe anyone who says that write holes
> are unique to RAID 5/6 and their derivatives. They are mistaken. Any two
> or more storage devices in a RAID set that are not atomically locked
> together can suffer write holes. They can even happen with ZFS.

The reason I'm looking at a filesystem here is that the WAY writes occur
can affect the write-holes you get in RAID5 and RAID6.  For example, ZFS
does not overwrite the existing block, it will write to a new block and
then after the write succeeds will it change the block-pointer.

> This is not a RAID issue. RAID is about making the hardware tolerant to
> faults. RAID does not care about the integrity of your data.

And *THAT* is the problem.  I was fault-tolerance *AND* data integrity.
Which is why I'm looking towards ZFS and BTRFS as potential solutions
that provide it.

> Write holes happen when power to the storage devices is lost during
> write operations. UPS and redundant power are the primary ways of
> preventing write holes. If the server doesn't lose power, or it has time
> to perform a graceful shutdown when mains fail, then no holes appear in
> the data it holds.

Or power to the CPU (assuming software raid) in the middle of a write.
See above as to how ZFS works around this problem.  Note, however, that
ZFS assumes that *MEMORY* is not corrupted, so you definitely need to
use ECC RAM.

> Battery-backed cache is the second line of defense against write holes.
> The battery prevents cache loss if redundant and backup power fail.
> Non-volatile cache (SSD) is becoming a popular alternative to
> battery-backed cache, although flash has its own set of power-related
> problems.
>
> The last line of defense against corruption is a good backup history.
>
> ZFS and Btrfs will detect and if possible correct single-bit errors.
> They may be able to prevent write holes if they can reliably control
> every piece of I/O cache in the data stream. This includes the write
> acceleration cache found on most modern disks' on-board controllers. Not
> all of these reliably honor cache flush instructions from the host and
> because of this they cannot be relied upon to maintain data integrity
> under power fault conditions.

When the drives lie to you it's hard to work around that, sure..

I *do* have a UPS with a good deal of uptime available, and I plan to
get a secondary power backup (which I will probably have installed
before I even get to build my new spiffy NAS), so power shouldn't be a
problem, just potential hardware faults.

-derek

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available