BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ZFS and block deduplication

Subject: ZFS and block deduplication
From: darose-prQxUZoa2zOsTnJN9+BGXg at public.gmane.org (David Rosenstrauch)
Date: Fri, 22 Apr 2011 11:53:23 -0400
In-reply-to: <4DB1A1B7.2040304-FJ05HQ0HCKaWd6l5hS35sQ@public.gmane.org>
References: <4DB1A1B7.2040304@mohawksoft.com>

On 04/22/2011 11:41 AM, Mark Woodward wrote:
> I have been trying to convince myself that the SHA2/256 hash is
> sufficient to identify blocks on a file system. Is anyone familiar with
> this?
>
> The theory is that you take a hash value of a block on a disk, and the
> hash, which is smaller than the actual block, is unique enough that the
> probability of any two blocks creating the same hash, is actually less
> than the probability of hardware failure.

> Given a small enough block size with a small enough set size, I can
> almost see it as safe enough for backups, but I certainly wouldn't put
> mission critical data on it. Would you? Tell me how I'm flat out wrong.
> I need to hear it.

If you read up on the rsync algorithm 
(http://cs.anu.edu.au/techreports/1996/TR-CS-96-05.html), he uses a 
combination of 2 different checksums to determine block uniqueness. 
And, IIRC, even then he still does an additional final check to make 
sure that the copied data is correct (and copies again if not).

DR

References:
- ZFS and block deduplication
  - From: markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org (Mark Woodward)

Prev by Date: ZFS and block deduplication
Next by Date: ZFS and block deduplication
Previous by thread: ZFS and block deduplication
Next by thread: ZFS and block deduplication
Index(es):
- Date
- Thread

Boston Linux & Unix / webmaster@blu.org