[Discuss] rsync v. cp in data migration

Richard Pieri richard.pieri at gmail.com
Fri May 24 18:46:48 EDT 2013


Derek,

I suspect that your timings are off because the test isn't quite apples
to apples. GNU tar's -S completely reads every file twice: once to test
sparseness, again to add it to the archive. GNU cp's documentation[1]
suggests that the --sparse=auto test algorithm may be smarter than that.
I suggest retesting with sparse file handling disabled: no -S with tar,
--sparse=never with cp.

Try this (as root) instead of the dd trick:

  sync && echo 3 > /proc/sys/vm/drop_caches

This will force the kernel to flush its buffers then drop the various
caches and free the associated RAM. The caches will fill up again so
you'll need to do this before each test to clean things out.

Here's another thing that might be relevant. tar is a bit of a CPU hog,
and the tar pipe trick invokes two separate tar processes which can make
it more CPU bound than cp -r.

One last thing: lots of small files. Thousands. Try using /usr as a source.


[1] I must apologize about my statements about cp not handling sparse
files. GNU cp, at least in the current coreutils, defaults to
--sparse=auto. YMMV with older versions and non-GNU versions of cp.

-- 
Rich P.




More information about the Discuss mailing list