![]() |
Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
On Wed, Jan 7, 2009 at 12:57 PM, Doug <dougsweetser-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org> wrote: >[concerns about backing up corrupted files...] I've been thinking about this as well. What the tool should do depends on what you are trying to accomplish. Do you want to detect file corruption OR do you want to be able to correct corruption as well? The tool you mention par2 does both at the expense of additional disk space. To my mind, if you already have a good backup system in place (i.e. you have multiple copies of all your data) all you really need is a way to detect corruption. Corruption of the primary copy is paramount as errors there will eventually filter into your backups. Any tool that does file checksums and comparisons can potentially be used for this purpose. Fortunately, there are already a number of tools out there which do this which are normally used for instruction detection as a result of security problems. I ended up picking the aide package for this purpose. I have it setup to run every night and email the list of files whose checksums have changed since the previous run. This took a little tweaking of its configuration file, but it is doable. To save your sanity you really want it to ignore file deletion and new file creation. As it is, there is still too much noise as a result of browser cache directory index files, etc. Because of this I have aide totally ignore certain directories where files change frequently and I don't care. I still get a certain amount of noise due to .gconf directories, etc.; but it isn't bad. I could have set it up to only check certain directories, but I felt that scanning everything and ignoring was better then having to remember to add new entries to the configuration file every time I created a new directory. I'm only doing this on my primary (active copy) at the moment. Currently I have over 435,000 files (about 90 Gbytes of files) being monitored in my home directory and each copy of the aide database takes up about 70 Mbytes. I generally keep a week or two of databases in case I notice something odd. So far this hasn't happened. Oh, the cron job takes about two hours every night to run the checksum/comparison. You could say I'm trading off the CPU time for the nightly comparisons vs.the additional disk storage that par2 would require. OTOH, par2 would not tell me my data had been corrupted until I actually went and ran it anyway. Anyway, I hope this gives you some ideas for one possible way to deal with these concerns. For permanent backups, you could generate a static aide database fo and then protect that with par2. Again, this would be a detection only setup; but still worth it in my opinion. Bill Bogstad