Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
| There is an amusing description of UNIX defragging at: | http://www.netaction.org/msoft/cybersnare.html | | It contains a good analogy about how UNIX defrag works, embedded in an | anti-MS rant, so if you don't like that kind of thing, you are warned. An interesting and amusing tirade. But I'd quibble with one thing: The claim that Unix's supposed "automatic defragging" actually helps improve performance. Over the years, there have been a number of Unices that have had the ability to declare a file "contiguous" so that the disk routines will keep its blocks together. I've worked on several projects that used these, and done tests to measure their effectiveness. The bottom line in every case was: Don't bother. We were never able to find a single application that ran faster with contiguous files. In many cases, things ran slower (though the most common result was no measurable difference). We could generally construct an artificial test case that ran faster with contiguous files: A program that simply read a file and ignored its contents. But if a test program tried to actually use the data, then there was never any advantage, and sometimes it ran slower. Digging deeper turned up a simple explanation: If a process spends any time using the data after read() returns, it's highly likely that the read head will move past the start of the next block. When this happens, the next read() must wait an entire disk rotation to read that block. If the file's blocks are laid out randomly within a track, however, then on the average you only have to wait half a disk rotation for the next block. Unix kernels do have "disk strategy" routines that do read-ahead. This somewhat alleviates the above problem, of course. But not entirely. It's still likely that, for some fraction of the read-aheads, the read() call and thus the decision to fetch the next block will come too late. Whenever this happens, a contiguous file has a full rotational delay, while a random layout has only half a rotational delay. In some Unix systems, this was understood by the people who did the port, and they arranged for "consecutive" blocks to be separated by some angle. This interlacing is semi-effective, in that it allows for some amount of computation during that angle. But it is a fixed value for the entire disk drive, so it is appropriate for only a portion of applications. For the rest, it is either too large (and thus slows down the read-ahead) or to small (and results in just-missed-it cases that wait a full rotation). In general, the way that Unix files systems work, it is useful for the blocks of a file to be in adjacent tracks (to minimize arm motion). But it usually turns out that random arrangement of the blocks within a track gives higher performance for real application mixes than does contiguous blocks. The usual Unix disk strategy routines tend to produce this behavior most of the time. So measures to produce contiguous files are ineffective at best, and sometimes even hurt performance. - Subcription/unsubscription/info requests: send e-mail with "subscribe", "unsubscribe", or "info" on the first line of the message body to discuss-request at blu.org (Subject line is ignored).
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |