BLU Discuss list archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
admins worst nightmare...
- Subject: admins worst nightmare...
- From: bogstad-e+AXbWqSrlAAvxtiuMwx3w at public.gmane.org (Bill Bogstad)
- Date: Wed, 10 Mar 2010 12:10:38 -0500
- In-reply-to: <4B96642E.6020807-wRvlPVLobi1/31tCrMuHxg@public.gmane.org>
- References: <4B956177.2010908@stephenadler.com> <000201cabf3b$0b52c570$21f85050$@com> <4B96642E.6020807@stephenadler.com>
On Tue, Mar 9, 2010 at 10:07 AM, Stephen Adler <adler-wRvlPVLobi1/31tCrMuHxg at public.gmane.org> wrote: >... > > The backups are made by copying the original data off the sun blade > system to backup system #1. I then run a nightly cron job which rsyncs > the data from backup system #1 to backup systems #2 and #3. The problem > is that backup system #1 has the corrupted file system, so once the > files were placed on backup system #1 and corrupted, I ended up copying > corrupted data off on to backup systems #2 and #3. > >... > tera byte drives running a software raid 5 raid array. Also, I have the > smartd tools running doing nightly and weekly checks. With all that in > place, there were no warning of errors on the file system. Which makes > me think there is a bug in ext3/md raid5 or the PCI esata controller > card is mucked up. I still have to very the memory, which is supposed to > be ECC memory. Don't forget that ECC just reduces the probability that bad data will be read from memory not eliminate it. Also, you are assuming that the problem here is in system #1. It could very well be that the error occurred during the network transfer when the data was copied from the original system to system #1. I can't remember who it was now, but there was someone from AT&T who gave a talk at BBLISA a few years back about always doing checksums of files every time he moved/copied them between networked system . As I recall, he found plenty of errors on 'working' systems which were handling massive amounts of data. > > I tried to do my homework in setting up this backup system, and with all > the redundancy I put in, I thought I didn't need the md5 check sum. > Well... I've learned my lesson the hard way. > > So... the lesson learned... > > ALWAYS DO MD5 CHECK SUMS ON CRITICAL DATA DURING BACKUPS NO MATTER HOW > LONG IT TAKES, BEFORE YOU DELETE THE ORIGINAL DATA. And do it every time you transfer that data to a new location. And if you are really paranoid, modify your applications to checksum the data as they read it off of the disk for processing. Of course this will either require modifying your data file format or adding an additional auxilliary file in which you store the block(?) level checksums. Kind of like what ZFS does, but extends the protection to transfers across networks or even to storage on backup media. Bill Bogstad
- References:
- admins worst nightmare...
- From: adler-wRvlPVLobi1/31tCrMuHxg at public.gmane.org (Stephen Adler)
- admins worst nightmare...
- From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)
- admins worst nightmare...
- From: adler-wRvlPVLobi1/31tCrMuHxg at public.gmane.org (Stephen Adler)
- admins worst nightmare...
- Prev by Date: OpenOffice and Microsoft Office
- Next by Date: Password vault programs for Linux, Windows, Smartphones
- Previous by thread: admins worst nightmare...
- Next by thread: [Positions-available] Mid-level PHP/Drupal Developer
- Index(es):