Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Edward Ned Harvey wrote: > ? Never use --sparse when creating an archive that is > compressed. It?s pointless, and doubles the time to create archive. > > ? Yes, use --sparse during extraction, if the contents contain a > lot of serial 0?s and you want the files restored to a sparse state. > > The man page saying ?using '--sparse' is not needed on extraction? is > misleading. It?s technically true ? you don?t need it ? but it?s > misleading ? yes you need it if you want the files to be extracted sparsely. Have you confirmed that through code inspection or experimentation? I haven't tested it, but as I dug deeper and saw that they had a special tar file header for sparse files, it made perfect sense that the '--sparse' option was superfluous on extraction, because tar can see from the header that the file is flagged as being sparse. It's logical that they'd hard wire the "sparse writing" magic to be activated by that flag, and ignore command line options. Also consider that the code to detect strings of zeros seems to be on the read side (based on the man page description). On extraction, it wouldn't make sense to expand the unused portions to strings of zeros, then follow that by code that detects the zeros and seeks past them to write a sparse file. You can test this by taring a file containing several blocks of zeros followed by a few bytes of data without the '--sparse' option. Then extract it with the '--sparse' option and see if it gets turned into a sparse file. > ...you may be overestimating the time to read or md5sum all the 0's > in the hole of sparse files. Perhaps, but... > The hypothetical sparse_cat would improve performance, but just > marginally. ...it would eliminate the need for a two-pass read with tar. And if summing zeros is fast, why is rsync so slow in your experiments? (A literal sparse_cat (drop-in replacement for cat) wouldn't actually be that useful, as you need to communicate to the process receiving the stream the byte offset for each chunk of data, assuming you want to be able to reconstruct the sparse file later with the same holes. So practically speaking, this is something you'd have to integrate into tar, gzip, rsync, or whatever archiver you're using. It sounds like it would be a small project to patch tar to use the fcntl, as it already has a data structure figured out for recording the holes. But you'd still need additional hacks to do incremental transfers. So the bigger win would be patching rsync.) -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: http://tmetro.venturelogic.com/
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |