[Discuss] rsync v. cp in data migration

Richard Pieri richard.pieri at gmail.com
Fri May 24 23:06:45 EDT 2013


Steve Harris wrote:
> 1) Using a tar pipeline will (should) always be slower than a single
> process (e.g., cp, cpio -p, rsync), because of the overhead of the two
> processes and the system buffering for the pipe.

This is turning into one of those "it depends" things. Standard cp is
limited to a single 256K internal buffer. It's excruciatingly slow to
copy lots of small files compared to using tar or cpio. Derek's tests
suggest that GNU cp may not be as constrained as standard cp from BSD or
SysV.

rsync does not run as a single process. It uses multiple reader and
(IIRC) writer processes to maximize throughput. It can be a heavier
burden on the system than tar for an initial bulk copy. For anyone
wondering why I'd use tar instead of rsync for the initial copy? This is
why.

> 2) Copying to an NFS-mounted filesystem is likely to be less efficient than
> alternatives (e.g., rsync) because of the NFS overhead -- it looks like a
> local filesystem but in fact there is a lot of network processing happening
> behind the scene.

Yup. NFS is not a light protocol. I would prefer to use rsync over SSH
and the arcfour cipher to rsync writes onto NFS volumes.

On the other hand, using rsync across the network has its own overhead
in whatever tunneling protocol is used. rsync is more efficient than
most other tools in that it tries to be efficient about what it
transmits so it tends to be the better choice.


> 4) AFAIK, cp will not preserve hard links.  rsync will (though not by
> default).  cpio and tar will by default.

GNU cp might. It has a plethora of non-standard options. Standard cp as
found on BSD and SysV does not. In fact, the FreeBSD cp(1) man page
specifically suggests using tar, cpio or pax if preserving hard links is
desired.

-- 
Rich P.



More information about the Discuss mailing list