Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Jerry asked: > I'm not sure what you really mean by sync and rotate in rsnapshot context,. Some of the documentation recommends using a directive called "sync_first". If you set that to 1, it overrides default behavior. It's then necessary to invoke "rsnapshot sync" prior to "rsnapshot hourly". The reason for doing this is so you can catch any errors and either handle them in your script, or re-run the sync manually after fixing problems, prior to rotation: it's an improvement but not well explained. > You could keep [checksums] > in a flat file at the same level where you archive is, so after > hourly.0 is complete, set up an hourly.0.checksum. Rich Pieri <richard.pieri at gmail.com> echoed: > This is a better idea than storing the checksums in a database. It > ensures that any given version of a file is always associated with the > correct checksum list. I think there are a couple of advantages to keeping backup metadata in a database table: it's reachable from everywhere which makes it easier to write integrity-checker scripts (especially against offline backups), and you can optimize the checksum process more easily (only generate checksums for new files that aren't yet in the metadata storage). I'm also thinking long-term: if I keep a really long, like 20-year, retention of this data I can know for certain that I still have the same file I started with. Checksums kept in the same place as backups can't protect against short-term accidental rollover of the backups themselves. I can then create scripts which block me from such accidental rollovers. For what it's worth: creating the database schema and insertion script was about 3 hours of work, which I've already done. I'm amazed at how many times the backup wheel has been reinvented, but without some of these fundamental capabilities. If I get ambitious, I'll package up my efforts and post to github Yet Another Backup Utility Whose Name I'll Have to Dream Up. Tom Metro wrote: > So the scenario you are trying to protect from is one in which > your source files are good, but your snapshot files get corrupt, > while maintaining original size and timestamp, and thus are not > overwritten by rsync? As you indicated, I don't think I want rsnapshot to verify checksums on every incremental pass, and I also don't want to burden the central backup server. So my thought is I can write scripts that separate out the "checksum my archive" and "verify archive against saved checksums" processes from the "compare source with archive checksums" process. This is modeled after my understanding of CrashPlan's efforts to "protect" against corruption. Their feature for "verify archive against saved checksums" is called Compact; by default this runs every 28 days, goes through the archive and /deletes/ any files which fail -- they call this "self-healing" (by that they mean the software stops trusting a corrupted file and enables it to be saved again, but meanwhile that file is exposed to loss). With 8TB of data and counting, including both near-line storage and off-line storage, I need to come up with efficient mechanisms for doing these 3 different things at varying intervals. That's probably the main reason I'm going with home-brew rather than anything I could find off the shelf. Someone mentioned git-annex which looks interesting. I wonder how well it would scale to an archive of a million files and 50+ savesets, which is what I'll likely be seeing at future employers? -rich
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |