BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Backing up sparse files ... VM's and TrueCrypt ... etc

Subject: Backing up sparse files ... VM's and TrueCrypt ... etc
From: tmetro-blu-5a1Jt6qxUNc at public.gmane.org (Tom Metro)
Date: Sun, 21 Feb 2010 15:26:37 -0500
In-reply-to: <000001cab2fc$eed8d190$cc8a74b0$@com>
References: <000001caaf83$31cb15d0$95614170$@com> <4B7B8BC6.4020709@vl.com> <000101caafd8$56834280$0389c780$@com> <4B7C4263.7050907@vl.com> <7C2EBBD9-C37C-4647-AC4B-B4EC0C0A056B@gmail.com> <4B7DB4E9.2030805@vl.com> <000101cab101$250552f0$6f0ff8d0$@com> <4B8074D6.3070407@vl.com> <000001cab2fc$eed8d190$cc8a74b0$@com>

Edward Ned Harvey wrote:
> ?         Never use --sparse when creating an archive that is 
> compressed.  It?s pointless, and doubles the time to create archive.
> 
> ?         Yes, use --sparse during extraction, if the contents contain a 
> lot of serial 0?s and you want the files restored to a sparse state.
> 
> The man page saying ?using '--sparse' is not needed on extraction? is 
> misleading.  It?s technically true ? you don?t need it ? but it?s 
> misleading ? yes you need it if you want the files to be extracted sparsely.

Have you confirmed that through code inspection or experimentation?

I haven't tested it, but as I dug deeper and saw that they had a special 
tar file header for sparse files, it made perfect sense that the 
'--sparse' option was superfluous on extraction, because tar can see 
from the header that the file is flagged as being sparse. It's logical 
that they'd hard wire the "sparse writing" magic to be activated by that 
flag, and ignore command line options.

Also consider that the code to detect strings of zeros seems to be on 
the read side (based on the man page description). On extraction, it 
wouldn't make sense to expand the unused portions to strings of zeros, 
then follow that by code that detects the zeros and seeks past them to 
write a sparse file.

You can test this by taring a file containing several blocks of zeros 
followed by a few bytes of data without the '--sparse' option. Then 
extract it with the '--sparse' option and see if it gets turned into a 
sparse file.

> ...you may be overestimating the time to read or md5sum all the 0's
> in the hole of sparse files.

Perhaps, but...

> The hypothetical sparse_cat would improve performance, but just
> marginally.

...it would eliminate the need for a two-pass read with tar. And if 
summing zeros is fast, why is rsync so slow in your experiments?

(A literal sparse_cat (drop-in replacement for cat) wouldn't actually be 
that useful, as you need to communicate to the process receiving the 
stream the byte offset for each chunk of data, assuming you want to be 
able to reconstruct the sparse file later with the same holes. So 
practically speaking, this is something you'd have to integrate into 
tar, gzip, rsync, or whatever archiver you're using.

It sounds like it would be a small project to patch tar to use the 
fcntl, as it already has a data structure figured out for recording the 
holes. But you'd still need additional hacks to do incremental 
transfers. So the bigger win would be patching rsync.)

  -Tom

-- 
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/

References:
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: tmetro-blu-5a1Jt6qxUNc at public.gmane.org (Tom Metro)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: tmetro-blu-5a1Jt6qxUNc at public.gmane.org (Tom Metro)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: richard.pieri-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org (Richard Pieri)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: tmetro-blu-5a1Jt6qxUNc at public.gmane.org (Tom Metro)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: tmetro-blu-5a1Jt6qxUNc at public.gmane.org (Tom Metro)
- Backing up sparse files ... VM's and TrueCrypt ... etc
  - From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)

Prev by Date: reliable laptop battery vendors?
Next by Date: Backing up sparse files ... VM's and TrueCrypt ... etc
Previous by thread: Backing up sparse files ... VM's and TrueCrypt ... etc
Next by thread: Backing up sparse files ... VM's and TrueCrypt ... etc
Index(es):
- Date
- Thread

Boston Linux & Unix / webmaster@blu.org