BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] ZFS vs. Btrfs

Subject: [Discuss] ZFS vs. Btrfs
From: gaf at blu.org (Jerry Feldman)
Date: Tue, 08 Jan 2013 07:39:05 -0500
In-reply-to: <D1B1A95FBDCF7341AC8EB0A97FCCC4771D76272B@SN2PRD0410MB372.namprd04.prod.outlook.com>
References: <50E3E953.9060404@gmail.com> <20130107151247.00004e20@unknown> <50EB452B.1000401@blu.org> <D1B1A95FBDCF7341AC8EB0A97FCCC4771D76272B@SN2PRD0410MB372.namprd04.prod.outlook.com>

On 01/07/2013 07:39 PM, Edward Ned Harvey (blu) wrote:
>> From: discuss-bounces+blu=nedharvey.com at blu.org [mailto:discuss-
>> bounces+blu=nedharvey.com at blu.org] On Behalf Of Jerry Feldman
>>
>> In my mind the important issue is resistance to drive failure. What
>> happens in both ZFS and Btrfs in the case of a power failure.
> In zfs, data is written to disk in transaction groups (TXG's).  There are some reserved blocks that are used as a ring buffer, to store the uber block.  When a TXG is written to pool, the uber block is updated.  When pool is mounted, system looks in the reserved uberblock storage area, finds the entry with the highest transaction number and matching checksum, and that entry is used as the latest fully flushed uberblock/TXG.  So therefore all transactions are atomic, and the filesystem is not possible to write in an inconsistent state (unless you have failing cpu or memory or something like that calculating incorrect checksums.)  So after a power outage or kernel panic, your filesystem is definitely consistent, and you may only lose up to the latest 5 seconds of async buffered writes prior to crash, that maybe were still yet-to-be flushed to disk.
>
> It's slightly more complicated when you consider sync-mode writes.  Sync writes are immediately written to NV storage, which are ZIL blocks in-pool if you don't have a dedicated device, but after sync writes hit the ZIL, they become async writes and get buffered with all the other async writes.  At pool mount time, system checks the ZIL for any unflushed transactions, and if necessary, flushes them to pool.  This guarantees both filesystem consistency, and posix behavior compliance, that sync writes be preserved in NV storage and survive such a crash.
>
> It is therefore possible, as you might expect, that sync writes will find their way into the filesystem consistently, while a few seconds worth of unflushed async buffered writes might be lost.  But filesystem inconsistency isn't one of the possible end results.
>
> In btrfs ... I have less detail ... but I know they write in transactions, and they do journaling (logging).  So the filesystem will be consistent.  They honor the posix behavior of sync writes to NV storage, so sync writes are guaranteed to be preserved.  And of course, async buffered writes are bound to be vulnerable to the crash.  So qualitatively you'll have similar reliability / crash guarantees.  But I know in ZFS (any modern version), the maximum length of time between TXG flushes is 5 sec...  I don't know if they have any similar time limits on btrfs, and if they do, I don't know what their values are.
>
>
In my experiencein the workplace there have been many power failures. At
Riverside we even had a bus hit a pole knocking out power to both us and
the T. At IBMduring Sandy the UPS failed. Essentially, failures do
occur. Fortunately our NAS system (Netgear readyNAS 3100) has been very
clean. At home, I have not experienced any corruption on my ext4
filesystems, but I don't beat it up that much.

In any case, I have always been a fan of btrees. I used reiserFS years
ago, and many years ago I used IBM's VSAM which was essentially a
btree-based system.

Another thing I like about btrfs is that you do not have to partition
the physical drives.

-- 
Jerry Feldman <gaf at blu.org>
Boston Linux and Unix
PGP key id:3BC1EB90 
PGP Key fingerprint: 49E2 C52A FC5A A31F 8D66  C0AF 7CEA 30FC 3BC1 EB90

Follow-Ups:
- [Discuss] ZFS vs. Btrfs
  - From: blu at nedharvey.com (Edward Ned Harvey (blu))

References:
- [Discuss] ZFS vs. Btrfs
  - From: tmetro+blu at gmail.com (Tom Metro)
- [Discuss] ZFS vs. Btrfs
  - From: richard.pieri at gmail.com (Rich Pieri)
- [Discuss] ZFS vs. Btrfs
  - From: gaf at blu.org (Jerry Feldman)
- [Discuss] ZFS vs. Btrfs
  - From: blu at nedharvey.com (Edward Ned Harvey (blu))

Prev by Date: [Discuss] data caps
Next by Date: [Discuss] data caps
Previous by thread: [Discuss] ZFS vs. Btrfs
Next by thread: [Discuss] ZFS vs. Btrfs
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Boston Linux & Unix / webmaster@blu.org