Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] rsync v. cp in data migration



On Thu, May 23, 2013 at 8:11 PM, Richard Pieri <richard.pieri at gmail.com> wrote:
> Tom Metro wrote:
>> even be possible that the pauses cp takes to refill its buffers results
>> in it saturating your I/O bandwidth less, which could be desirable if
>> you are running this job while the disks are in use.)
>
> Generic cp(1) tries to mmap a complete file into RAM (with a hard-coded
> segment size limit). It then writes out the whole file (or segment) in
> one go. This leads to massive memory thrashing when lots of small files
> are being copied in sequence.
>
> tar and cpio use fixed buffers (at least they should) so they can avoid
> the mmap create/destroy cycles that a recursive cp provokes.

Possibly better (on a modern Linux system) might be for these programs
to use the

sendfile(int out_fd, int in_fd, off_t *offset, size_t count)

system call in order to avoid even having to temporarily map
the files data into the programs memory space.   You could still do
the equivalent of fixed size copies with an appropriate value for the
count argument.   I would think that rewritting "cpio -p" to use
sendfile() should be straightforward.

Bill Bogstad



BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org