BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] On Btrfs raid and odd-count disks

Subject: [Discuss] On Btrfs raid and odd-count disks
From: warlord at MIT.EDU (Derek Atkins)
Date: Thu, 11 Apr 2013 09:28:34 -0400
In-reply-to: <5165B3FD.7070605@gmail.com> (Richard Pieri's message of "Wed, 10 Apr 2013 14:48:29 -0400")
References: <5164C9F3.4040802@gmail.com> <e511dc18cf96ddb091411f8f6463f405.squirrel@mail.mohawksoft.com> <D1B1A95FBDCF7341AC8EB0A97FCCC4773BBC8927@SN2PRD0410MB372.namprd04.prod.outlook.com> <sjmwqsatoqi.fsf@mocana.ihtfp.org> <51658526.2060608@gmail.com> <sjm7gkatlyp.fsf@mocana.ihtfp.org> <51659281.6060409@gmail.com> <5165B3FD.7070605@gmail.com>

Richard Pieri <richard.pieri at gmail.com> writes:

> In retrospect, if you're looking at file systems as a means to prevent
> write holes with RAID 5/6 then you're going about it wrong. Write holes
> happen with every RAID level. They happen with RAID 5 and 6. They happen
> with RAID 1 and RAID 10. Do not believe anyone who says that write holes
> are unique to RAID 5/6 and their derivatives. They are mistaken. Any two
> or more storage devices in a RAID set that are not atomically locked
> together can suffer write holes. They can even happen with ZFS.

The reason I'm looking at a filesystem here is that the WAY writes occur
can affect the write-holes you get in RAID5 and RAID6.  For example, ZFS
does not overwrite the existing block, it will write to a new block and
then after the write succeeds will it change the block-pointer.

> This is not a RAID issue. RAID is about making the hardware tolerant to
> faults. RAID does not care about the integrity of your data.

And *THAT* is the problem.  I was fault-tolerance *AND* data integrity.
Which is why I'm looking towards ZFS and BTRFS as potential solutions
that provide it.

> Write holes happen when power to the storage devices is lost during
> write operations. UPS and redundant power are the primary ways of
> preventing write holes. If the server doesn't lose power, or it has time
> to perform a graceful shutdown when mains fail, then no holes appear in
> the data it holds.

Or power to the CPU (assuming software raid) in the middle of a write.
See above as to how ZFS works around this problem.  Note, however, that
ZFS assumes that *MEMORY* is not corrupted, so you definitely need to
use ECC RAM.

> Battery-backed cache is the second line of defense against write holes.
> The battery prevents cache loss if redundant and backup power fail.
> Non-volatile cache (SSD) is becoming a popular alternative to
> battery-backed cache, although flash has its own set of power-related
> problems.
>
> The last line of defense against corruption is a good backup history.
>
> ZFS and Btrfs will detect and if possible correct single-bit errors.
> They may be able to prevent write holes if they can reliably control
> every piece of I/O cache in the data stream. This includes the write
> acceleration cache found on most modern disks' on-board controllers. Not
> all of these reliably honor cache flush instructions from the host and
> because of this they cannot be relied upon to maintain data integrity
> under power fault conditions.

When the drives lie to you it's hard to work around that, sure..

I *do* have a UPS with a good deal of uptime available, and I plan to
get a secondary power backup (which I will probably have installed
before I even get to build my new spiffy NAS), so power shouldn't be a
problem, just potential hardware faults.

-derek

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available

Follow-Ups:
- [Discuss] On Btrfs raid and odd-count disks
  - From: richard.pieri at gmail.com (Richard Pieri)

References:
- [Discuss] On Btrfs raid and odd-count disks
  - From: richard.pieri at gmail.com (Richard Pieri)
- [Discuss] On Btrfs raid and odd-count disks
  - From: markw at mohawksoft.com (markw at mohawksoft.com)
- [Discuss] On Btrfs raid and odd-count disks
  - From: blu at nedharvey.com (Edward Ned Harvey (blu))
- [Discuss] On Btrfs raid and odd-count disks
  - From: warlord at MIT.EDU (Derek Atkins)
- [Discuss] On Btrfs raid and odd-count disks
  - From: richard.pieri at gmail.com (Richard Pieri)
- [Discuss] On Btrfs raid and odd-count disks
  - From: warlord at MIT.EDU (Derek Atkins)
- [Discuss] On Btrfs raid and odd-count disks
  - From: richard.pieri at gmail.com (Richard Pieri)
- [Discuss] On Btrfs raid and odd-count disks
  - From: richard.pieri at gmail.com (Richard Pieri)

Prev by Date: [Discuss] Point-and-cluck groupware
Next by Date: [Discuss] [Position-available] Sr. Systems Engineer/DevOps Engineer
Previous by thread: [Discuss] On Btrfs raid and odd-count disks
Next by thread: [Discuss] On Btrfs raid and odd-count disks
Index(es):
- Date
- Thread

Boston Linux & Unix / webmaster@blu.org