[Discuss] Crashplan is discontinued

John Abreau abreauj at gmail.com
Thu Aug 31 20:36:22 EDT 2017


I've heard of tools using MD5 or SHA1 hashes to identify duplicates, and
potential issues with hash collisions causing false positives.

Has anyone published research into using multiple hashes to address this,
to determine if two files with different contents could have both identical
MD5 hashes and identical SHA1 or SHA256 hashes?



On Thu, Aug 31, 2017 at 3:31 PM, Dan Ritter <dsr at randomstring.org> wrote:

> On Thu, Aug 31, 2017 at 11:10:57AM -0700, Rich Braun wrote:
> > Dale Worley's approach:
> > > I have a cron job which commits my home directory into a Git repository
> >
> > Sounds interesting; one of my use-cases is dealing with a couple hundred
> gigs of photos, with new ones arriving (via Nextcloud's sync capability,
> which I've set up recently as part of my Docker infra) at a rate of a
> thousand or so a month.
> >
> > One of the issues with pics is deduplication, as they're renamed across
> folders. My current rsnapshot approach doesn't cope well with that. Could
> git do this automatically without complex scripting?
> >
>
> That sounds like maybe a job for bup:
>
> https://github.com/bup/bup/blob/master/README.md
>
> -dsr-
> _______________________________________________
> Discuss mailing list
> Discuss at blu.org
> http://lists.blu.org/mailman/listinfo/discuss
>



-- 
John Abreau / Executive Director, Boston Linux & Unix
Email: abreauj at gmail.com / WWW http://www.abreau.net / PGP-Key-ID 0x920063C6
PGP-Key-Fingerprint A5AD 6BE1 FEFE 8E4F 5C23  C2D0 E885 E17C 9200 63C6



More information about the Discuss mailing list