Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
I have been trying to convince myself that the SHA2/256 hash is sufficient to identify blocks on a file system. Is anyone familiar with this? The theory is that you take a hash value of a block on a disk, and the hash, which is smaller than the actual block, is unique enough that the probability of any two blocks creating the same hash, is actually less than the probability of hardware failure. Now, I know basic statistics well enough to not play the lottery, but I'm not sure I can get my head around it. On a completely logical level, assume that you have a block size of 32K and a hash size of 32 chars, there are 1000 (1024 if we are talking binary 32K) potential duplicate blocks per single hash. Right? For every unique block (by hash) we have a potential of 1000 collisions. Also, looking at the "birthday paradox," since every block is equally likely as every other block (in reality we know this is not 100% true), isn't the creator's stated probability calculations much weaker than assumed? I come from the old school were "god does not play dice" especially with storage. Given a small enough block size with a small enough set size, I can almost see it as safe enough for backups, but I certainly wouldn't put mission critical data on it. Would you? Tell me how I'm flat out wrong. I need to hear it.
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |