Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
On Mon, 25 Apr 2011, Mark Woodward wrote: > On 04/24/2011 10:52 PM, Edward Ned Harvey wrote: >>> From: Mark Woodward [mailto:markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org] >>> >>> You know, I've read the same math and I've worked it out myself. I agree >> it >>> sounds so astronomical as to be unrealistic to even imagine it, but no >> matter >>> how astronomical the odds, someone usually wins the lottery. >>> >>> I'm just trying to assure myself that there isn't some probability >> calculation >>> missing. I guess my gut is telling me this is too easy. >>> We're missing something. >> See - You're overlooking my first point. The cost of enabling verification >> is so darn near zero, that you should simply enable verification for the >> sake of not having to justify your decision to anybody (including yourself, >> if you're not feeling comfortable.) > Actually, I'm using ZFS as an example. I doing something different, but > the theory is the same, and yes, I'm still using SHA265. >> Actually, there are two assumptions being made: >> (1) We're assuming sha256 is an ideally distributed hash function. Nobody >> can prove that it's not - so we assume it is - but nobody can prove that it >> is either. If the hash distribution turns out to be imbalanced, for example >> if there's a higher probability of certain hashes than other hashes... Then >> that would increase the probability of hash collision. > True. >> (2) We're assuming the data in question is not being maliciously formed for >> the purposes of causing a hash collision. I think this is a safe >> assumption, because in the event of a collision, you would have two >> different pieces of data that are assumed to be identical and therefore one >> of them is thrown away... And personally I can accept the consequence of >> discarding data if someone's intentionally trying to break my filesystem >> maliciously. > I'm not sure this point is important. I trust that SHA256 is pretty darn > hard to create a collision. I would almost believe that it would be more > likely that blocks collided by random chance than malice. >>> Besides, personally, I'm looking at 16K blocks which increases the >> probability >>> a bit. >> You seem to have that backward - First of all the default block size is (up >> to) 128k... and the smaller the blocksize of the filesystem, the higher the >> number of blocks and therefore the higher the probability of collision. > This is one of those things that make my brain hurt. If I am > representing more data with a fixed size number, i.e. a 4K block vs a > 16K block, that does, in fact, increase the probability of collision 4X, Only for very small blocks. Once the block is larger than the hash, the probability of a collision is independent of the block size. Daniel Feenberg > however, it does decrease the total number of blocks by about 4x as well. > > >> If for example you had 1Tb of data, broken up into 1M blocks, then you would >> have a total number of 2^20 blocks. But if you broke it up into 1K blocks, >> then your block count would be 2^30. With a higher number of blocks being >> hashed, you get a higher probability of hash collision. > It comes down to absolute trust that the hashing algorithm works as > expected and that the data is as randomly distributed as expected. > > I'm sort of old school I guess. The mind set is not about probability, > it is about absolutes. In data storage, it has always been about > verifiability and we conveniently address probability of failure as a > different problem and address it differently. This methodology seems to > merge the two. Statistically speaking, I think I'm looking for 100% > assurances, and no such assurance has ever really existed. > > Its cool stuff. It is a completely different way of looking at storage. > > > > _______________________________________________ > Discuss mailing list > Discuss-mNDKBlG2WHs at public.gmane.org > http://lists.blu.org/mailman/listinfo/discuss >
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |