Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
On Fri, May 24, 2013 at 08:47:37PM -0400, Steve Harris wrote: > 1) Using a tar pipeline will (should) always be slower than a single > process (e.g., cp, cpio -p, rsync), because of the overhead of the two > processes and the system buffering for the pipe. Not necessarily. Earlier in this thread, someone mentioned the sendfile(2) system call in Linux. sendfile is largely limited to sending data out via a socket. The more versatile solution to the problem of throwing data is splice(2). If I am reading the tea leaves correctly, two splice flows with a pipe in the middle Do The Right Thing (the kernel ends up copying data directly from the fd_in of the first splice call to the fd_out of the second). Lets say we modify tar to use splice extensively. Recall that a tar archive is a stream of headers and file data; the header for each archive member specifies the length of that member. Our intrepid sysadmin does this to move a lot of files around: $ tar -c | (cd /newdir; tar -x) * The reading tar process gets the size of the next archive member (via stat) and writes a header to standard output (the pipe's write end, file descriptor 1) * The reading tar process calls splice, with fd_in set to the file's file descriptor, fd_out set to 1, and len set to the file size. * The reading tar process writes out enough '\0' bytes to round up the tar output to the nearest 512 bytes and repeats. * The writing tar process reads the header from its standard input (the pipe's read end, file descriptor 0) and learns the size of the incoming archive member. * The writing tar process calls splice, with fd_in set to 0, fd_out set to the file's file descriptor, and len set to the file size. * The writing tar process reads the filler '\0' bytes and discards them. In this way, there is no userspace copying of file data at all. The big drawback to splice(2) is one of its ends must be a pipe. Our modified tar will have to take care to employ it only when its dealing with a pipe (on the other hand, GNU tar already does an fstat on its output to check to see if it is going to /dev/null). It remains to be seen if the Linux kernel will ever offer a splice-like system call that handles the general case. In the meantime, user-space processes desiring a general approach could employ two threads each with a splice call between an intra-process pipe. One wonders why no one has come up with a cp that does just that. -- Alex Pennace, alex at pennace.org, http://osiris.978.org/~alex/
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |