[Discuss] Crashplan is discontinued

Bill Bogstad bogstad at pobox.com
Fri Sep 1 01:05:31 EDT 2017


On Thu, Aug 31, 2017 at 10:02 PM, Mike Small <smallm at sdf.org> wrote:
> John Abreau <abreauj at gmail.com> writes:
>
>> I've heard of tools using MD5 or SHA1 hashes to identify duplicates, and
>> potential issues with hash collisions causing false positives.
>
> By accident or maliciously? The numbers seem off for accidental
> collisions. An md5 sum is a 16 digit hex number. That gives
> 340282366920938463463374607431768211456 potential hash sums (or does the
> algorithm offer only a smaller subset?). I'm not going to bother to
> compute the probability of a collision. It's a very remote possiblity,
> yes? For the malicious case, if someone's able to mess with the hashes
> used by deduplication code in your file system or in your hopefully
> almost as good userland equivalent (which of course must use git in some
> way or another for reasons that are not clear to me) you have unsolvable
> problems.

Does git only compare the checksum or does it also look at file size as well?
I would think that comparing file size might make it even harder to
get a collision.
The only duplicate checksum that I've ever seen in practice was on 0
length files.
Zero length files are, of course, all perfect duplicates of each other... :-)

Bill Bogstad



More information about the Discuss mailing list