Backing up sparse files ... VM's and TrueCrypt ... etc

Edward Ned Harvey bblisa3-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org
Thu Feb 18 20:18:36 EST 2010


> Richard Pieri wrote:
> >> tar -cf - /path/to/sparse | gzip --rsyncable > sparse.tgz
> >
> > For optimal performance with this trick you should zero out freed
> > sectors on the sparse image and then compact the image.  Very
> > efficient usage of the storage, but it's slow and tedious.
> 
> That would be true if tar read the sparse file as an ordinary file, but
> I'm assuming it knows how to properly skip over the holes. I see there
> is an option to handle sparse files:
> 
>         -S, --sparse
>                handle sparse files efficiently
> 
> I would expect with that option, you'll never see the junk that happens
> to be occupying the unused sectors.

The --sparse option of tar only seems to have any effect when you're
extracting the tarball.  It will look for files with a lot of sequential
zeros, as it extracts them, they will be extracted sparse.  So ...

Tar does in fact do alright at backing up such files, and restoring them.
But only by doing a Full every time.  Not incremental.

One thing I would add here:  When creating a tar backup of a sparse file ...
It appears that tar tries to read the whole file, and during the zero
sections, the system is just generating as many zeros as the CPU can
generate.  This is surprisingly slow compared to what you'd think, but it is
much faster than actually reading zeros from disk.  I forget now exactly,
but I think my disk can read 500 Mbit/s, and reading an all-zero sparse file
went at something like 2 Gbit/s.  That's pretty much all I know.  There are
tons of loose ends, like, could some other compiled version of tar perform
better, and so on.  I just don't know.  I didn't care to explore it, due to
lack of ability to do incrementals.






More information about the Discuss mailing list