Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
On Tue, May 18, 2010 at 8:42 PM, Richard Pieri <richard.pieri-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org> wrote: > On May 18, 2010, at 2:06 PM, Bill Bogstad wrote: >> >> Please re-read the end of my last message. ? Take a look at pread() >> (POSIX) and readahead() (Linux only). >> It turns out you do not need separate file handles. ? ?Threads may >> still be required to make it non-blocking. > > I did look at pread(). ?While it doesn't reset the file handle's seek pointer it still effectively does a lseek() to the offset. ?So that's not really a performance win. I didn't say pread() was faster. However, I believe it could safely allow concurrent/independent IO requests via the same underlying file descriptor from multiple threads. Since the offset is explicit with pread() there should be no reason for conflicts on the single shared file offset to occur like with read() from multiple threads. Whether it actually works that way or not, I don't know. >> Not necessarily ? I would like random chunks of data from this file >> (perhaps NEED it at some specific computation point in the future), >> but I have some other computation I can do in the meantime. ? Please >> start the disk IO now. ?Don't make me create multiple FDs for a single >> file. ? ?At my option, I would like you to: > > This seems a little silly to me. ?I mean, if your processing takes more time then the reads then the hand-carved optimizations aren't a win. ?On the other hand, if they don't then the reads will block. ?If you aren't caching anything at that point then I think it's time to reconsider your storage format because this is a problem that's been solved before. Caching won't help me if I only want to look at each chunk once. If the data was in the file sequentially then the built-in kernel readahead would help. If the file format is fixed and I want to process the data in some other order then sequential then the simplistic kernel readhead isn't going to help (and may make things slower). Let's say physically reading each data chunk takes R time and processing each data chunk takes P time. If I simply do "read(); compute()" in a loop each chunk takes R + P wall clock time. If I can do something like "nonblocking-readahead(); compute(); read()" in the loop (with a single read() before the loop to prime the pump) then if the overhead of calling nonblocking-readahead is small (no reason it shouldn't be as all it does is put some IO requests on the disk queue), each chunk takes approximately R' + max(P, R) wall clock time. This is where R' is how long it takes to move the data from the kernel buffers into the application buffer space rather then the total time to read from disk. Compared to R, R' is likely to be small so with P == R that could give me close to a 2x performance increase. Whether the additional complexity is worth it is dependent on any number of variables. But it will definitely matter in some cases. The larger the data chunk the more it will probably matter. In some sense, the existence of the readahead() Linux system call and the ureadahead daemon under Ubuntu (speeds up system booting) is an existence proof that doing simultaneous computation and IO requests can result in significant performance improvement. Or at least someone managed to convince both the Linux kernel developers as well as the Ubuntu developers of this. Another way to look at it is that this is why multi-processing OSes were originally designed years ago. You can get higher overall system throughput if you can interleave multiple single-threaded applications on a single system. While one application is waiting for IO, another one can be doing computation. Back when computers were big $$, this was a big deal. Modifying an inherently single-threaded program so it can try to interleave its own computation with its own IO will sometimes help (and sometimes not). But I don't see it as silly. Bill Bogstad
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |