Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
> From: discuss-bounces-mNDKBlG2WHs at public.gmane.org [mailto:discuss-bounces-mNDKBlG2WHs at public.gmane.org] On Behalf > Of Mark Woodward > > I have been trying to convince myself that the SHA2/256 hash is > sufficient to identify blocks on a file system. Is anyone familiar with > this? I am intimately familiar with this. And on planet Earth, yes it is sufficient. However, the cost of explaining your decision to anyone who is skeptical outweighs the cost of enabling verification. So as long as there's even a possibility that anybody might challenge your decision, you should enable verification. But if you're the boss and nobody will second-guess you, then you can safely rely on just the hash. If you'd like to read more about it, try this search... Google for zfs-discuss collision http://www.google.com/search?q=zfs-discuss+collision&ie=utf-8&oe=utf-8&aq=t& rls=org.mozilla:en-US:official&client=firefox-a Mostly in the extremely long thread, "(Fletcher+Verification) versus (Sha256+No Verification)" PS. "Safely" is a relative term. The probability of collision is non-zero, but the probability is essentially zero relative to Human Extinction Events, which means you can (relatively) safely rely on just the sha256 hash. If you want to calculate the actual probability of a collision, assume a 4k (2^12 bytes) block size (worst case) and every single block is precisely the same size (which isn't realistic, but is worst case) and every single block is unique (in which case why have you enabled dedup. So again, unrealistically evil-clown scenario worst case) and if your data pool size is the largest in the world (again worst case) say ... 2PB (2^41 bytes)... that would be physically impossible to hold the dedup tables using hardware currently available in the world, but again... Evil clown worst case for accidental collision. Then the number of blocks is 2^41 / 2^12 = 2^29 unique blocks. The formula on wikipedia for the birthday problem is: p(n;d) ~= 1-( (d-1)/d )^( 0.5*n*(n-1) ) In this case, n=2^29 d=2^256 Using bc to calculate the answer: bc -l n=2^29 d=2^256 scale=1024 1-e( ( 0.5*n*(n-1)*l((d-1)/d) ) ) .0000000000000000000000000000000000000000000000000000000000012446030 I manually truncated here (precision goes out 1024 places). This is 1.24E-60 Notice: There are estimated 1E50 atoms in Earth. So in the evil clown worst case for sha256 collision, the probability for collision is about the same as randomly selecting the same atom twice consecutively from 1000 Earths. Note: I had to repeat the calculation many times in bc, setting a larger and larger scale. The default scale of 20, and even 64 and 70 and 80 were not precise enough to produce a convergent answer around the -57th decimal place. So I just kept going larger, and in retrospect, anything over 100 would have been fine. I wrote 1024 above, so who cares.
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |