BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

best practices using LVM and e2fsck

Subject: best practices using LVM and e2fsck
From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)
Date: Thu, 1 Jul 2010 10:47:44 -0400
In-reply-to: <4C2CA59F.10400-GqRSzq0LZOzYtjvyW6yDsg@public.gmane.org>
References: <10B70046B6BA4EEABFE45C40142FA2E0@dnanet.mit.edu> <001601cb172b$5d83ea70$188bbf50$@com> <20100629213222.GM7359@dragontoe.org> <000e01cb1872$4a545410$defcfc30$@com> <EEBBF0238C7146B6855CCE5CC9568715@dnanet.mit.edu> <20100630173628.GE2447@collab.or8.net> <4C2BC1E0.1030700@vl.com> <000101cb1924$027a5600$076f0200$@com> <4C2CA59F.10400@wayga.org>

> From: discuss-bounces-mNDKBlG2WHs at public.gmane.org [mailto:discuss-bounces-mNDKBlG2WHs at public.gmane.org] On
> Behalf Of Mark Komarinski
> 
> After this, the per-mount and day-elapsed checks are disabled, and the
> amount of disk set aside for root has been reduced from 5% to 1%.
> (Let's
> see that start a new argumen..I mean thread)

;-)


> I'm of the opinion that if a filesystem has enough of a hardware
> problem
> that it gets corrupted, you should just wipe it and restore from backup

This is a really common argument in favor of ZFS, so I've got a really good
answer to this one.  ;-)

A typical MTBF for hard drives might be 25,000 hours, which sounds very good
... one undetected bit error in 3 years ...  Until you start running raid.
If you've got a modest raid set, let's say 10 disks, now your MTBF overall
is 2,500 hours, or one undetected bit error every 3-4 months.  

In ZFS, it automatically checksums all the data written to disk, and
verifies behind the scenes during reads.  They've done it efficiently enough
that the computation overhead is undetectable.  (That is to say, even more
unnoticeable than unnoticeable.)  If you have no redundancy, a detected
checksum mismatch is just that - a detected error that would have gone
undetected.  But with redundancy, it will automatically check one of the
other disks (or parity) to correct the error.  If too many errors are
detected on a single disk, the disk is marked bad.

You can't do anything like this in extX.  While you can fsck and so forth,
you're only verifying the integrity of the filesystem, not the actual data.
It won't detect or correct hardware errors.

Recent personal experience:  One of my servers detected too many checksum
mismatches on a drive and marked it bad, although the hardware never
detected anything.  So the checksums saved my data.  ;-)  That happened last
week.

Follow-Ups:
- best practices using LVM and e2fsck
  - From: richard.pieri-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org (Richard Pieri)

References:
- best practices using LVM and e2fsck
  - From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)
- best practices using LVM and e2fsck
  - From: mkomarinski-GqRSzq0LZOzYtjvyW6yDsg at public.gmane.org (Mark Komarinski)

Prev by Date: best practices using LVM and e2fsck
Next by Date: best practices using LVM and e2fsck- thanks for your input
Previous by thread: best practices using LVM and e2fsck
Next by thread: best practices using LVM and e2fsck
Index(es):
- Date
- Thread

Boston Linux & Unix / webmaster@blu.org