Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ZFS and block deduplication



On 04/22/2011 11:41 AM, Mark Woodward wrote:
> I have been trying to convince myself that the SHA2/256 hash is
> sufficient to identify blocks on a file system. Is anyone familiar with
> this?
>
> The theory is that you take a hash value of a block on a disk, and the
> hash, which is smaller than the actual block, is unique enough that the
> probability of any two blocks creating the same hash, is actually less
> than the probability of hardware failure.

> Given a small enough block size with a small enough set size, I can
> almost see it as safe enough for backups, but I certainly wouldn't put
> mission critical data on it. Would you? Tell me how I'm flat out wrong.
> I need to hear it.

If you read up on the rsync algorithm 
(http://cs.anu.edu.au/techreports/1996/TR-CS-96-05.html), he uses a 
combination of 2 different checksums to determine block uniqueness. 
And, IIRC, even then he still does an additional final check to make 
sure that the copied data is correct (and copies again if not).

DR





BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org