Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
On 01/07/2013 07:39 PM, Edward Ned Harvey (blu) wrote: >> From: discuss-bounces+blu=nedharvey.com at blu.org [mailto:discuss- >> bounces+blu=nedharvey.com at blu.org] On Behalf Of Jerry Feldman >> >> In my mind the important issue is resistance to drive failure. What >> happens in both ZFS and Btrfs in the case of a power failure. > In zfs, data is written to disk in transaction groups (TXG's). There are some reserved blocks that are used as a ring buffer, to store the uber block. When a TXG is written to pool, the uber block is updated. When pool is mounted, system looks in the reserved uberblock storage area, finds the entry with the highest transaction number and matching checksum, and that entry is used as the latest fully flushed uberblock/TXG. So therefore all transactions are atomic, and the filesystem is not possible to write in an inconsistent state (unless you have failing cpu or memory or something like that calculating incorrect checksums.) So after a power outage or kernel panic, your filesystem is definitely consistent, and you may only lose up to the latest 5 seconds of async buffered writes prior to crash, that maybe were still yet-to-be flushed to disk. > > It's slightly more complicated when you consider sync-mode writes. Sync writes are immediately written to NV storage, which are ZIL blocks in-pool if you don't have a dedicated device, but after sync writes hit the ZIL, they become async writes and get buffered with all the other async writes. At pool mount time, system checks the ZIL for any unflushed transactions, and if necessary, flushes them to pool. This guarantees both filesystem consistency, and posix behavior compliance, that sync writes be preserved in NV storage and survive such a crash. > > It is therefore possible, as you might expect, that sync writes will find their way into the filesystem consistently, while a few seconds worth of unflushed async buffered writes might be lost. But filesystem inconsistency isn't one of the possible end results. > > In btrfs ... I have less detail ... but I know they write in transactions, and they do journaling (logging). So the filesystem will be consistent. They honor the posix behavior of sync writes to NV storage, so sync writes are guaranteed to be preserved. And of course, async buffered writes are bound to be vulnerable to the crash. So qualitatively you'll have similar reliability / crash guarantees. But I know in ZFS (any modern version), the maximum length of time between TXG flushes is 5 sec... I don't know if they have any similar time limits on btrfs, and if they do, I don't know what their values are. > > In my experiencein the workplace there have been many power failures. At Riverside we even had a bus hit a pole knocking out power to both us and the T. At IBMduring Sandy the UPS failed. Essentially, failures do occur. Fortunately our NAS system (Netgear readyNAS 3100) has been very clean. At home, I have not experienced any corruption on my ext4 filesystems, but I don't beat it up that much. In any case, I have always been a fan of btrees. I used reiserFS years ago, and many years ago I used IBM's VSAM which was essentially a btree-based system. Another thing I like about btrfs is that you do not have to partition the physical drives. -- Jerry Feldman <gaf at blu.org> Boston Linux and Unix PGP key id:3BC1EB90 PGP Key fingerprint: 49E2 C52A FC5A A31F 8D66 C0AF 7CEA 30FC 3BC1 EB90
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |