Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

best practices using LVM and e2fsck



> From: discuss-bounces-mNDKBlG2WHs at public.gmane.org [mailto:discuss-bounces-mNDKBlG2WHs at public.gmane.org] On
> Behalf Of Mark Komarinski
> 
> After this, the per-mount and day-elapsed checks are disabled, and the
> amount of disk set aside for root has been reduced from 5% to 1%.
> (Let's
> see that start a new argumen..I mean thread)

;-)


> I'm of the opinion that if a filesystem has enough of a hardware
> problem
> that it gets corrupted, you should just wipe it and restore from backup

This is a really common argument in favor of ZFS, so I've got a really good
answer to this one.  ;-)

A typical MTBF for hard drives might be 25,000 hours, which sounds very good
... one undetected bit error in 3 years ...  Until you start running raid.
If you've got a modest raid set, let's say 10 disks, now your MTBF overall
is 2,500 hours, or one undetected bit error every 3-4 months.  

In ZFS, it automatically checksums all the data written to disk, and
verifies behind the scenes during reads.  They've done it efficiently enough
that the computation overhead is undetectable.  (That is to say, even more
unnoticeable than unnoticeable.)  If you have no redundancy, a detected
checksum mismatch is just that - a detected error that would have gone
undetected.  But with redundancy, it will automatically check one of the
other disks (or parity) to correct the error.  If too many errors are
detected on a single disk, the disk is marked bad.

You can't do anything like this in extX.  While you can fsck and so forth,
you're only verifying the integrity of the filesystem, not the actual data.
It won't detect or correct hardware errors.

Recent personal experience:  One of my servers detected too many checksum
mismatches on a drive and marked it bad, although the hardware never
detected anything.  So the checksums saved my data.  ;-)  That happened last
week.







BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org