BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] On Btrfs raid and odd-count disks

Subject: [Discuss] On Btrfs raid and odd-count disks
From: blu at nedharvey.com (Edward Ned Harvey (blu))
Date: Tue, 16 Apr 2013 00:44:54 +0000
In-reply-to: <sjmk3o3q27c.fsf@mocana.ihtfp.org>
References: <e511dc18cf96ddb091411f8f6463f405.squirrel@mail.mohawksoft.com> <D1B1A95FBDCF7341AC8EB0A97FCCC4773BBC8927@SN2PRD0410MB372.namprd04.prod.outlook.com> <sjmwqsatoqi.fsf@mocana.ihtfp.org> <51658526.2060608@gmail.com> <sjm7gkatlyp.fsf@mocana.ihtfp.org> <51659281.6060409@gmail.com> <5165B3FD.7070605@gmail.com> <sjmk3o9rybh.fsf@mocana.ihtfp.org> <5166D866.3080402@gmail.com> <sjmehefredy.fsf@mocana.ihtfp.org> <20130412145847.GI27670@randomstring.org> <sjmk3o3q27c.fsf@mocana.ihtfp.org>

> From: discuss-bounces+blu=nedharvey.com at blu.org [mailto:discuss-
> bounces+blu=nedharvey.com at blu.org] On Behalf Of Derek Atkins
> 
> Disk write errors are RARELY reported by the disk interface, because the
> write error can happen due to multiple causes, few of which the
> interface can report.  Disk READ errors generally are reported, however,
> but by then it can be too late to save your data.

Disks have forward error correction algorithms, because they actually make errors frequently.  Agree with Derek:  If error occurs during write, the disk won't know until later when it re-reads the data and FEC identifies bad data.  The disk silently repeats the read request, may reset the head or take other corrective action.  (You may hear it *click*).  If the data comes up good, then disk increments the soft error counter, and continues functioning.  Otherwise, it reports the hard error to OS.  Either way, in ZFS or BTRFS, they apply a stronger checksum, and depending on your config, have redundant information to recover from any hard errors or undetected corruption returned by the disk.  The FEC algorithm in disk hardware is not nearly as powerful as the data integrity algorithms in ZFS or BTRFS.


> > mdadm has a sort of scrub facility available, in which it reads
> > all the bits -- see /sys/block/$array/md/sync_action
> 
> "reading" all the bits is not necessarily sufficient.  I'd like
> something that can actually correct on-disk write errors via parity and
> checksum.
> 
> A raw mirror isn't sufficient because you don't know which mirror has
> the "good" data.

If using hardware mirror, you don't know which side has the good data.  But if using zfs or btrfs mirror (or any other kind of redundancy, including raidz, or "copies" property, etc), then the filesystem has a checksum, and is able to identify which mirror has good data.  As long as you have sufficient redundant copies, at least one of which is not corrupt, then you detect & correct the problem.  ZFS & BTRFS attempt to re-write the data to the disk that failed, and then verify.  If the failure is persistent, then the disk is marked bad.  Or, if the error counter on a specific device increases too rapidly (even if successfully corrected) then again, the disk is marked bad.


> ZFS (and possibly BTRFS) seem to have enough metadata to correct small
> errors.

All errors, provided sufficient redundancy.

Follow-Ups:
- [Discuss] On Btrfs raid and odd-count disks
  - From: blu at nedharvey.com (Edward Ned Harvey (blu))

References:
- [Discuss] On Btrfs raid and odd-count disks
  - From: markw at mohawksoft.com (markw at mohawksoft.com)
- [Discuss] On Btrfs raid and odd-count disks
  - From: blu at nedharvey.com (Edward Ned Harvey (blu))
- [Discuss] On Btrfs raid and odd-count disks
  - From: warlord at MIT.EDU (Derek Atkins)
- [Discuss] On Btrfs raid and odd-count disks
  - From: richard.pieri at gmail.com (Richard Pieri)
- [Discuss] On Btrfs raid and odd-count disks
  - From: warlord at MIT.EDU (Derek Atkins)
- [Discuss] On Btrfs raid and odd-count disks
  - From: richard.pieri at gmail.com (Richard Pieri)
- [Discuss] On Btrfs raid and odd-count disks
  - From: richard.pieri at gmail.com (Richard Pieri)
- [Discuss] On Btrfs raid and odd-count disks
  - From: warlord at MIT.EDU (Derek Atkins)
- [Discuss] On Btrfs raid and odd-count disks
  - From: richard.pieri at gmail.com (Richard Pieri)
- [Discuss] On Btrfs raid and odd-count disks
  - From: warlord at MIT.EDU (Derek Atkins)
- [Discuss] On Btrfs raid and odd-count disks
  - From: dsr at randomstring.org (Dan Ritter)
- [Discuss] On Btrfs raid and odd-count disks
  - From: warlord at MIT.EDU (Derek Atkins)

Prev by Date: [Discuss] On Btrfs raid and odd-count disks
Next by Date: [Discuss] On Btrfs raid and odd-count disks
Previous by thread: [Discuss] On Btrfs raid and odd-count disks
Next by thread: [Discuss] On Btrfs raid and odd-count disks
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Boston Linux & Unix / webmaster@blu.org