BLU Discuss list archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Discuss] Deduplication
- Subject: [Discuss] Deduplication
- From: dsr at randomstring.org (Dan Ritter)
- Date: Thu, 5 Sep 2024 11:22:56 -0400
- In-reply-to: <5468c999-2d53-4d28-ba66-054c1efdd1d9@borg.org>
- References: <5468c999-2d53-4d28-ba66-054c1efdd1d9@borg.org>
Kent Borg wrote: > So today I ran "duperemove" on a couple volumes, and it scared up some > non-trivial space. I decided to run it on a third volume. > > Nope! It works by telling the kernel to make files that match to share the > same extents, but that only works for some file systems. > > - XFS. yes, I have used that a long time, it is clever enough to CoW any > changes that are later made, so files that match can later later diverge. > > - btrfs, which I have been using recently, because god knows it is heavy in > the CoW-ing world > > > But it doesn't work on any of the extN filesystems. I have used XFS on my > running volumes for a long time, but for backups I guess I stuck longer with > ext4 and I maybe even earlier ext-s on some disks?but they aren't active, so > that's okay. rdfind, however, will: DESCRIPTION rdfind finds duplicate files across and/or within several directories. It calculates checksum only if necessary. rdfind runs in O(Nlog(N)) time with N being the number of files. If two (or more) equal files are found, the program decides which of them is the original and the rest are considered du? plicates. This is done by ranking the files to each other and deciding which has the highest rank. See section RANKING for details. By default, no action is taken besides creating a file with the detected files and showing the possible amount of saved space. ... but it can create symlinks or hardlinks as desired. The one situation in which I find it useful is a compliance requirement at work to make a daily copy of the visible portion of a website -- we need to be able to show what we were showing the world on any given day. So we crawl the site from the outside, save it to a local directory, and then run rdfind because on most days, nothing has changed at all. -dsr-
- References:
- [Discuss] Deduplication
- From: kentborg at borg.org (Kent Borg)
- [Discuss] Deduplication
- Prev by Date: [Discuss] Deduplication
- Next by Date: [Discuss] Deduplication
- Previous by thread: [Discuss] Deduplication
- Next by thread: [Discuss] Deduplication
- Index(es):