Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
> On Dec 11, 2011, at 2:54 PM, markw at mohawksoft.com wrote: >> >> How? > > Let's posit a 7 day backup cycle. On day 1 you do your full backup (all > volume blocks). On day's 2 through 7 you do incremental backups (changed > volume blocks). > > Let's say that the volume contains your company's code repository. Users > are working on this volume almost constantly. Regardless of what they do, > the file system blocks are going to be changing just as frequently, and > what you see on day of your cycle 5 may not look at all like what you saw > on day 1. > > As a specific example, I create a file on day 1 that spans 3 volume > blocks. On day 2 I change the first block but the remaining two blocks > remain unchanged (I'm doing random I/O for performance reasons, just as > one would with a database). On day 3 I make changes to the second block. > Your block-level backups for this file would contain 3 blocks on day 1, 1 > block on day 1 and 1 block on day 3. > > On day 6 I accidentally delete the file and I contact you to restore it. > You go to your day 1 backup and do the full restore which gets the > original version of the file. Then you go to restore the day 2 > incremental and find that it is unusable. You can certainly restore the > day 3 incremental but I'm still missing the block of data backed up on day > 2, a block of data that can not be recovered because the backup of it is > gone. Much of my work is lost and I have to do it all over again. > > Now, consider the effect of missing blocks on directory data and inode > tables. > Your whole scenario hinges on a big misunderstanding of how block-level backup works. Lets say our block size is 8K. The volume you want to back up is 512G. That means your first backup contains 62.5 million blocks. Lets also assume you've zero'd out the empty space of the volume and are using block level deduplication, ala something like zfs. You only really backup 30 million blocks of data and the rest are zero. You are left with two components, the "data" as addressed by a hash code (SHA2), and the "structure" of your disk which is a linear list of blocks. Now, you make a HUGE (5%) change to your disk, you back-up 3 million changed blocks! You overlay those new blocks on to a list of old blocks and create a new list for the backup. Even though you've done an incremental backup, you still have a "whole" representation of the volume. These lists can be used to calculate the deltas between any two historical points, or even an arbitrary snapshot. Here's the other scenario. You can create a snapshot of any volume. Using the change log, or scanning it directly, you can use the block-level data and the block list to re-create a previous point-in-time snapshot even if you lost the original snapshot volumes.
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |