BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Asynchronous File I/O on Linux

Subject: Asynchronous File I/O on Linux
From: markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org (Mark Woodward)
Date: Sun, 16 May 2010 10:06:40 -0400
In-reply-to: <000101caf4fb$b15448e0$13fcdaa0$@com>
References: <4BEEF4AC.3060206@mohawksoft.com> <000101caf4fb$b15448e0$13fcdaa0$@com>

Edward Ned Harvey wrote:
>> From: discuss-bounces-mNDKBlG2WHs at public.gmane.org [mailto:discuss-bounces-mNDKBlG2WHs at public.gmane.org] On
>>
>> Does anyone know of a standard asynchronous file I/O system for Linux?
>>     
>
> I don't think you're using the term "async IO" correctly.  Unless I'm
> somehow missing something, which I don't think I am...
>   
Well, it is "async" in the view that I want to issue multiple read 
requests simultaneously from the same execution context (thread or 
process). What I want to test is *if* I can issue multiple read requests 
simultaneously to different portions of a file and have the file system 
driver, disk block driver, and/or the physical hard disk firmware sort 
out the disk I/O to read the blocks from the drive more efficiently.

OK, think of it this way....

A disk rotating at 7200 RPM has an average 5ms between where the disk 
head is and where a block begins on the platter. (1/2 rotational 
period). For the sake of this discussion, we'll ignore seek time.  If I 
want to access 4 random blocks in a file on the drive, and I must do it 
simultaneously, I can't get those blocks in any less than 20ms on 
average. If we carry this over to network file I/O, it should be able to 
increase performance even better.

Now, if we believe that tagged queuing works on "good" SATA drives, It 
should be possible to issue multiple read requests simultaneously and 
have the disk drive plan the block acquisition in order of physical 
location on the disk. If this is true, it should be possible to reduce 
the access time to 10ms~15ms. That will save 5ms~10ms per operation. 
When you are doing a million operations, this adds up to real wall clock 
time.

I haven't actually crawled through the Linux code to verify that this 
functionality is even present, but a quick and dirty test seemed like a 
fun thing to do.
> sync/async is a term that only makes sense for writes.  If you are doing
> sync writes, then your application will block until the write has been
> committed to nonvolatile storage.  If you're doing async writes, the kernel
> is allowed to buffer many such writes, and your application will unblock
> much sooner (typically instantly) thus accelerating your application
> performance.
>   
See note above.
> I second Richard Pieri:  You're talking about random access.
>
> If you are just performing random reads on a file, I don't see why you need
> to clone filehandles.  Just go ahead and open many file handles separately.
> More than one application can read a file at the same time, no problem.
>
> If you need to do random reads and writes ... I don't know if you can open a
> file for reading while it's also open for writing.  So I have nothing to add
> here.
>   
You can open a file for reading and again open it for writing. Nothing 
prevents you from that unless you use file locking.

References:
- Asynchronous File I/O on Linux
  - From: markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org (Mark Woodward)
- Asynchronous File I/O on Linux
  - From: blu-Z8efaSeK1ezqlBn2x/YWAg at public.gmane.org (Edward Ned Harvey)

Prev by Date: Asynchronous File I/O on Linux
Next by Date: Asynchronous File I/O on Linux
Previous by thread: Asynchronous File I/O on Linux
Next by thread: Asynchronous File I/O on Linux
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Boston Linux & Unix / webmaster@blu.org