defragmenting UNIX
John Chambers
jc at trillian.mit.edu
Thu Mar 11 10:42:20 EST 1999
| There is an amusing description of UNIX defragging at:
| http://www.netaction.org/msoft/cybersnare.html
|
| It contains a good analogy about how UNIX defrag works, embedded in an
| anti-MS rant, so if you don't like that kind of thing, you are warned.
An interesting and amusing tirade. But I'd quibble with one thing:
The claim that Unix's supposed "automatic defragging" actually helps
improve performance.
Over the years, there have been a number of Unices that have had the
ability to declare a file "contiguous" so that the disk routines will
keep its blocks together. I've worked on several projects that used
these, and done tests to measure their effectiveness. The bottom line
in every case was: Don't bother. We were never able to find a single
application that ran faster with contiguous files. In many cases,
things ran slower (though the most common result was no measurable
difference).
We could generally construct an artificial test case that ran faster
with contiguous files: A program that simply read a file and ignored
its contents. But if a test program tried to actually use the data,
then there was never any advantage, and sometimes it ran slower.
Digging deeper turned up a simple explanation: If a process spends
any time using the data after read() returns, it's highly likely that
the read head will move past the start of the next block. When this
happens, the next read() must wait an entire disk rotation to read
that block. If the file's blocks are laid out randomly within a
track, however, then on the average you only have to wait half a disk
rotation for the next block.
Unix kernels do have "disk strategy" routines that do read-ahead.
This somewhat alleviates the above problem, of course. But not
entirely. It's still likely that, for some fraction of the
read-aheads, the read() call and thus the decision to fetch the next
block will come too late. Whenever this happens, a contiguous file
has a full rotational delay, while a random layout has only half a
rotational delay.
In some Unix systems, this was understood by the people who did the
port, and they arranged for "consecutive" blocks to be separated by
some angle. This interlacing is semi-effective, in that it allows for
some amount of computation during that angle. But it is a fixed value
for the entire disk drive, so it is appropriate for only a portion of
applications. For the rest, it is either too large (and thus slows
down the read-ahead) or to small (and results in just-missed-it cases
that wait a full rotation).
In general, the way that Unix files systems work, it is useful for
the blocks of a file to be in adjacent tracks (to minimize arm
motion). But it usually turns out that random arrangement of the
blocks within a track gives higher performance for real application
mixes than does contiguous blocks. The usual Unix disk strategy
routines tend to produce this behavior most of the time. So measures
to produce contiguous files are ineffective at best, and sometimes
even hurt performance.
-
Subcription/unsubscription/info requests: send e-mail with
"subscribe", "unsubscribe", or "info" on the first line of the
message body to discuss-request at blu.org (Subject line is ignored).
More information about the Discuss
mailing list