[Discuss] On Btrfs raid and odd-count disks

Edward Ned Harvey (blu) blu at nedharvey.com
Sat Apr 13 10:21:05 EDT 2013


> From: discuss-bounces+blu=nedharvey.com at blu.org [mailto:discuss-
> bounces+blu=nedharvey.com at blu.org] On Behalf Of Derek Atkins
> 
> > ZFS prevents write holes by enforcing atomicity of all writes to
> > storage. It does this by controlling all of the I/O caching involved in
> > the write process from system RAM down to the write acceleration cache
> > on the disks themselves. ZFS updates the file system only after all
> > cache points have confirmed being flushed.
> >
> > If any of these points lie about their status then write holes can
> > appear under power fault conditions. 

True, but at least, with ZFS & BTRFS, any subsequent read of corrupt data will be detected as a result of cksums.

Also, since we're talking about redundant storage, ZFS (and presumably BTRFS, cuz it's obvious.) will attempt to correct the error.  If a single disk (or a number smaller than your redundancy protection level) wrote corrupt data (or no data) then the cksum fails, and the FS will try all possible combinations of eliminating devices and re-reading, to identify which device(s) contains corrupt data, and if it finds some combination that produces a good cksum, it will attempt to re-write the data to whichever disk(s) failed.


> Fair enough...  I don't know if standard (e.g. DM-level) RAID5 or RAID6
> provide for said "scrubbing"?  

Nope.
Scrubbing is only possible thanks to cksum'ing at the raid level.  Without that, your raid is dependent on the underlying devices to correctly report errors.  But if an error isn't noticed by hardware and escalated to the OS, then the error passes standard raid undetected.

How often does that happen?  Well, in my experience, heavy usage on several TB of enterprise-sata hardware produces a bit error about once every 1-2 years, as identified by the zfs cksum counter incrementing, without the hard drive error counter incrementing.  This means the error passed the drive undetected, and was identified and corrected by ZFS.




More information about the Discuss mailing list