Backing up sparse files ... VM's and TrueCrypt ... etc

Edward Ned Harvey blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org
Sun Feb 21 09:40:50 EST 2010


> The prior info might also explain why rsync is slow in this situation.
> With your use case of a sparse file that's only about 10% used, and
> your
> point that it still takes time to process the zeros produced by the OS,
> which rsync then has to calculate an MD5 hash of, it can take a while.

Here's a benchmark.
These are empty TrueCrypt volumes, so the nonsparse file takes 5G on disk,
while the sparse one takes 256K on disk, and is "apparently" 5G in length.

$ time cat truecrypt-5G-sparsefile.tc > /dev/null ; time cat
truecrypt-5G-nonsparsefile.tc > /dev/null
real    0m6.854s
real    1m33.533s

$ time md5sum truecrypt-5G-sparsefile.tc > /dev/null ; time md5sum
truecrypt-5G-nonsparsefile.tc > /dev/null
real    0m18.398s
real    1m25.641s

$ time gzip --fast -c truecrypt-5G-sparsefile.tc > /dev/null ; time gzip
--fast -c truecrypt-5G-nonsparsefile.tc > /dev/null
real    0m37.922s
real    4m35.956s


> What you really need is a hypothetical sparse_cat that is file system
> aware and can efficiently skip over the unused sectors. Or better yet,
> the equivalent functionality built-in to your archiving tool.

I agree, that would be nice.  However, as I mentioned above, you may be
overestimating the time to read or md5sum all the 0's in the hole of sparse
files.  The hypothetical sparse_cat would improve performance, but just
marginally.


> Basically they use a VMware tool to backup the VM image, and then rsync
> that backup file.

Oh la la.  That might be ok for them, having already bought the license for
other purposes, but it's $995 or higher, as far as I can tell.







More information about the Discuss mailing list