Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
On Wed, May 19, 2010 at 2:52 PM, Richard Pieri <richard.pieri-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org> wrote: > On May 19, 2010, at 1:09 PM, Bill Bogstad wrote: >> >> lseek() is cheap since all the kernel has to do is change it's >> internal offset counter for the file descriptor associated with a disk >> file. > > My understanding is that the largely depends on the actual implementation. ?The behavior may vary from one Unix to the next. ?That, and the results may be unexpected with sparse files. ?I'm not sure how pread() and sparse files interact. Not sure how sparse files matter for pread() as compared to read(). Eventually you will need to access whatever on disk structures tell you if that block is real or not whether it's sparse or not. In general, I don't think it's useful to think of files as "sparse" so much as they may or may not have holes in them (which can change over the life of the file). Now a lousy pread() OS implementation could reject concurrency of multiple pread()s when using the same FD in multiple threads (or child processes). However, I can get the same filesystem visible request for concurrency via multiple open()s of the same underlying file so I would hope that this would not be a common implementation. >> It's only when you do the subsequent read() that any real cost >> is incurred. ?Assuming uncached disk files, that is likely to require >> disk head seeks which is where the time cost comes into play and I see >> no way around that. > > Pre-load the cache before you need to actually read() data. If I understand correctly, you are suggesting that the kernel should automatically read data at the new offset after an lseek() in case it's followed by a read(). I can see how that is a good idea for many access patterns. If the file is opened O_RDWR or O_WRONLY rather then O_RDONLY that could be a mistake. Even if it is O_RDONLY having the OS/filesystem get this right (when to pre-cache, how much, etc.) is hard. Most applications are happy to let the system software do the best that it can and many applications fit into access patterns where this is good enough. My interest in this thread is "What mechanisms are available to applications that don't have access patterns that fit well with the heuristics that developers are willing to put into the kernel?" pread(), readahead(), AIO all seem relevant to this question. As for Linux kernel readahead, this LWN article (2007) seems a good place to start: http://lwn.net/Articles/235164/ Bill Bogstad
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |