Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] rsync v. cp in data migration



> From: discuss-bounces+blu=nedharvey.com at blu.org [mailto:discuss-
> bounces+blu=nedharvey.com at blu.org] On Behalf Of Steve Harris
> 
> 1) Using a tar pipeline will (should) always be slower than a single
> process (e.g., cp, cpio -p, rsync), because of the overhead of the two
> processes and the system buffering for the pipe.

Depends on the bottleneck.  If you have super fast IO capabilities, then the overhead of copying from RAM to RAM before hitting the IO could actually affect your performance.  But that's a rather unusual situation.  Usually your RAM is so much faster than the IO, it doesn't matter.

Which is slower:  Being stuck behind a granny using a walker, in your Toyota, or being stuck behind a granny using a walker in your Maserati?


> 2) Copying to an NFS-mounted filesystem is likely to be less efficient than
> alternatives (e.g., rsync) because of the NFS overhead -- it looks like a
> local filesystem but in fact there is a lot of network processing happening
> behind the scene.

Ah - Agreed - but there's a distinction to make here.  "rsync" the application, versus "rsync" the protocol.  You can use the rsync application from local disk to NFS mount, and obviously, you incur NFS overhead.  This is the situation the OP has been discussing.  It will work fine, but if you want to performance optimize ...  You're right, you can enable rsync in daemon mode on the receiving system, and use rsync:// protocol.  In this mode, the rsync client & server are able to significantly reduce the amount of data that needs to cross the wire (each system locally checks file statistics, etc, and only sends the smallest relevant data across the wire, as opposed to rsync from local fs to locally mounted remote fs, which requires the rsync application to perform all those operations itself, *across* the wire).  I honestly *do* believe using rsync protocol, with enabling rsync daemon on the receiving system, will be faster than the NFS option, assuming the network is your rate limiting factor.


> 3) I'm not an expert on rsync, but wasn't it (initially) written in a
> client-server mode to achieve very high efficiency copying files over a
> network?  Especially when updating (large) files which may have changed
> slightly.

"high efficiency" is a relative term.  It does a good job of skipping over files that don't need to be sent, and only sending chunks of files that have changed, but in order to do that, it needs to crawl the local and remote filesystems (if using daemon mode, the remote daemon still needs to crawl the whole filesystem locally), do a bunch of comparisons between local & remote, search for and calculate all those differences.  This is NOT comparable to the performance of things like ZFS and BTRFS incremental updates.  But in the absence of a COW (or equivalent) underlying filesystem, rsync is the fastest thing that I know.



BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org