Asynchronous File I/O on Linux

Bill Bogstad bogstad-e+AXbWqSrlAAvxtiuMwx3w at public.gmane.org
Wed May 19 13:09:44 EDT 2010


On Wed, May 19, 2010 at 10:32 AM, Richard Pieri <richard.pieri-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org> wrote:

>> Caching won't help me if I only want to look at each chunk once.  If
>> the data was in the file sequentially then
>> the built-in kernel readahead would help.  If the file format is fixed
>> and I want to process the data in some other order
>> then sequential then the simplistic kernel readhead isn't going to
>> help (and may make things slower).
>
> Yeah... see... the problem now is the file storage format.  What you really want now is an index into the actual data: find what you want from the index and use that pointer to jump immediately to the data you want instead of having to seek across Ghu knows how much file.  As I said, this has been solved before.

Err, how do you "jump immediately to the data" without "having to
seek"?   The only way I know to "jump immediately.." via
Linux/POSIX APIs is explicitly with lseek() (or implicitly with pread()).

lseek() is cheap since all the kernel has to do is change it's
internal offset counter for the file descriptor associated with a disk
file.  It's only when you do the subsequent read() that any real cost
is incurred.  Assuming uncached disk files, that is likely to require
disk head seeks which is where the time cost comes into play and I see
no way around that.

Bill Bogstad






More information about the Discuss mailing list