[Discuss] On Btrfs raid and odd-count disks

Derek Atkins warlord at MIT.EDU
Fri Apr 12 10:51:21 EDT 2013


Richard Pieri <richard.pieri at gmail.com> writes:

> On 4/11/2013 9:28 AM, Derek Atkins wrote:
>> The reason I'm looking at a filesystem here is that the WAY writes occur
>> can affect the write-holes you get in RAID5 and RAID6.  For example, ZFS
>> does not overwrite the existing block, it will write to a new block and
>> then after the write succeeds will it change the block-pointer.
>
> COW does not prevent write holes.
>
> ZFS prevents write holes by enforcing atomicity of all writes to
> storage. It does this by controlling all of the I/O caching involved in
> the write process from system RAM down to the write acceleration cache
> on the disks themselves. ZFS updates the file system only after all
> cache points have confirmed being flushed.
>
> If any of these points lie about their status then write holes can
> appear under power fault conditions. The RAID level does not matter. If
> the hardware does not provide for the required write atomicity then you
> can suffer write holes under power fault conditions.
>
> Both ZFS and Btrfs provide facilities for automatically "erasing" write
> holes. The process is called "scrubbing". The scrubbing process walks
> through the entire file system tree, recalculates all file and metadata
> checksums, and compares them to the stored checksums. Errors are
> repaired using replica data. Oracle's documentation recommends a weekly
> scrubbing schedule for consumer-grade disks and a monthly scrubbing
> schedule for server-grade disks.

Fair enough...  I don't know if standard (e.g. DM-level) RAID5 or RAID6
provide for said "scrubbing"?  Or detecting/handling disk read or (or
worse, disk write) failures.

-derek

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available



More information about the Discuss mailing list