Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] On Btrfs raid and odd-count disks



> From: discuss-bounces+blu=nedharvey.com at blu.org [mailto:discuss-
> bounces+blu=nedharvey.com at blu.org] On Behalf Of Derek Atkins
> 
> Disk write errors are RARELY reported by the disk interface, because the
> write error can happen due to multiple causes, few of which the
> interface can report.  Disk READ errors generally are reported, however,
> but by then it can be too late to save your data.

Disks have forward error correction algorithms, because they actually make errors frequently.  Agree with Derek:  If error occurs during write, the disk won't know until later when it re-reads the data and FEC identifies bad data.  The disk silently repeats the read request, may reset the head or take other corrective action.  (You may hear it *click*).  If the data comes up good, then disk increments the soft error counter, and continues functioning.  Otherwise, it reports the hard error to OS.  Either way, in ZFS or BTRFS, they apply a stronger checksum, and depending on your config, have redundant information to recover from any hard errors or undetected corruption returned by the disk.  The FEC algorithm in disk hardware is not nearly as powerful as the data integrity algorithms in ZFS or BTRFS.


> > mdadm has a sort of scrub facility available, in which it reads
> > all the bits -- see /sys/block/$array/md/sync_action
> 
> "reading" all the bits is not necessarily sufficient.  I'd like
> something that can actually correct on-disk write errors via parity and
> checksum.
> 
> A raw mirror isn't sufficient because you don't know which mirror has
> the "good" data.

If using hardware mirror, you don't know which side has the good data.  But if using zfs or btrfs mirror (or any other kind of redundancy, including raidz, or "copies" property, etc), then the filesystem has a checksum, and is able to identify which mirror has good data.  As long as you have sufficient redundant copies, at least one of which is not corrupt, then you detect & correct the problem.  ZFS & BTRFS attempt to re-write the data to the disk that failed, and then verify.  If the failure is persistent, then the disk is marked bad.  Or, if the error counter on a specific device increases too rapidly (even if successfully corrected) then again, the disk is marked bad.


> ZFS (and possibly BTRFS) seem to have enough metadata to correct small
> errors.

All errors, provided sufficient redundancy.




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org