Boston Linux & UNIX was originally founded in 1994 as part of The Boston Computer Society. We meet on the third Wednesday of each month at the Massachusetts Institute of Technology, in Building E51.

BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] Crashplan is discontinued

On Thu, Aug 31, 2017 at 10:02 PM, Mike Small <smallm at> wrote:
> John Abreau <abreauj at> writes:
>> I've heard of tools using MD5 or SHA1 hashes to identify duplicates, and
>> potential issues with hash collisions causing false positives.
> By accident or maliciously? The numbers seem off for accidental
> collisions. An md5 sum is a 16 digit hex number. That gives
> 340282366920938463463374607431768211456 potential hash sums (or does the
> algorithm offer only a smaller subset?). I'm not going to bother to
> compute the probability of a collision. It's a very remote possiblity,
> yes? For the malicious case, if someone's able to mess with the hashes
> used by deduplication code in your file system or in your hopefully
> almost as good userland equivalent (which of course must use git in some
> way or another for reasons that are not clear to me) you have unsolvable
> problems.

Does git only compare the checksum or does it also look at file size as well?
I would think that comparing file size might make it even harder to
get a collision.
The only duplicate checksum that I've ever seen in practice was on 0
length files.
Zero length files are, of course, all perfect duplicates of each other... :-)

Bill Bogstad

BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!

Boston Linux & Unix /