ZFS and block deduplication

Edward Ned Harvey blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org
Mon Apr 25 09:45:33 EDT 2011


> From: Mark Woodward [mailto:markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org]
> Sent: Monday, April 25, 2011 9:23 AM
> 
> This is one of those things that make my brain hurt. If I am
> representing more data with a fixed size number, i.e. a 4K block vs a
> 16K block, that does, in fact, increase the probability of collision 4X,

Nope.  Remember ... If you calculate 256-bit ideally distributed hashes of
any two different input streams that are both 256-bits or larger, then the
probability of collision is 2^-256 regardless of each input block size.

When you create a 256-bit hash of any input >= 256 bits, you are essentially
picking a random (but repeatable) number from 0 to 2^256-1.  So the
probability of collision is only dependent on the number of repetitions, and
not dependent on the size of the input block.





More information about the Discuss mailing list