[Discuss] Crashplan is discontinued

Mike Small smallm at sdf.org
Fri Sep 1 12:35:20 EDT 2017


Bill Bogstad <bogstad at pobox.com> writes:

> On Thu, Aug 31, 2017 at 10:02 PM, Mike Small <smallm at sdf.org> wrote:
> Does git only compare the checksum or does it also look at file size as well?
> I would think that comparing file size might make it even harder to
> get a collision.
> The only duplicate checksum that I've ever seen in practice was on 0
> length files.
> Zero length files are, of course, all perfect duplicates of each other... :-)

Ah, git plumbing. Not really my specialty, but I think the answer is
implied by some of the docs, kind of. I'll add some guess work and if
someone knows better he or she can correct me.

Zero length file collisions are not an issue in git because the stuff in
its store (.git/object/{first two letters of SHA1 hash}/{rest of SHA1
hash} includes both the file contents themselves (blobs - check me in
gitglossary(7)) and tree objects which have capture file and directory
names and reference the content blobs. Here's some of my
.emacs.d/.git/objects contents (not a great use of git I'm finding,
btw. I should have done it down at the level where I only have files I
treat as my source code as opposed to stuff emacs changes behind my
back.):

8613r0:.git$ du -a objects/ | head                                                       
4       objects/af/2ef3b97a02a0cdc859c59e4d39d6a7aa01116c
4       objects/af/ef5e0daed0ecdf0d51dcc347149ae2e1f0e998
12      objects/af
4       objects/d7/2834524cad924ea210b41920293a6fcc5d72ff
8       objects/d7
4       objects/17/dc6f4f501ce4ee0f3488d246b825d0c3ad63fe
4       objects/17/62e9cf542a661f15351b3bb2c50e1a1d26a1cd
12      objects/17
4       objects/pack
4       objects/8e/709560a5a09f69f8be7665ad66e3c394620123
...

So if I'm understanding rightly you could have 10 zero length files in
git with different names and that's not a problem. You'd have 10 tree
objects in the store, i.e. directories and files matching the SHA1 hash
involved, perhaps that all reference one blob object with a different
SHA1 directory and file name for the contents (or lack thereof).

I think so far I don't see an actual compare, necessarily, just it
creates these tree objects and creates the blob object. Maybe it
overwrites the blob object for each file or maybe it sees it already
exists and just references it, I don't know. Kind of doesn't matter
except for performance or whatever. Or does it?

Let's take the malicious case. You want to get a file into the store
that has the same hash as an existing blob file, so that existing
references now have your contents instead of the original stuff. So
you'd be creating whatever tree object in the store, no hash collision
on that, but you'd want your file blob object to overwrite an existing
one. Unless my guesswork here is totally off I'm going to say git must
simply overwrite a blob file if you succeed in getting a hash
collision. If it did a compare to see if a path with the sha1 number was
already under .git/objects and didn't bother to write the new contents
then a hash collision couldn't be a real vulnerability and there
shouldn't have been a thread discussing it.

But I could be way off here. If you really want to know probably you
want to start by reading gitcore-tutorial(7), gitrepository-layout(7),
and maybe the source of git-hash-object or some other plumbing
command. Oh wait, git-hash-object I see now is a link to git, so you'd
have to read the top of the source which looks at what the execed
filename was, assuming I have indeed picked the right command here. The
plumbing man pages are pretty thin. Maybe higher level commands are
relevant here too.

-- 
Mike Small
smallm at sdf.org



More information about the Discuss mailing list