BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

defragmenting UNIX

Subject: defragmenting UNIX
From: jc at trillian.mit.edu (John Chambers)
Date: Thu, 11 Mar 1999 10:42:20 -0500
In-reply-to: <36E742A7.24A87776@world.std.com>

| 	There is an amusing description of UNIX defragging at:
| 	 http://www.netaction.org/msoft/cybersnare.html
| 
| 	It contains a good analogy about how UNIX defrag works, embedded in an
| 	anti-MS rant, so if you don't like that kind of thing, you are warned.


An interesting and amusing tirade.  But I'd quibble with  one  thing:
The  claim that Unix's supposed "automatic defragging" actually helps
improve performance.

Over the years, there have been a number of Unices that have had  the
ability to declare a file "contiguous" so that the disk routines will
keep its blocks together.  I've worked on several projects that  used
these, and done tests to measure their effectiveness. The bottom line
in every case was: Don't bother.  We were never able to find a single
application  that  ran  faster with contiguous files.  In many cases,
things ran slower (though the most common result  was  no  measurable
difference).

We could generally construct an artificial test case that ran  faster
with contiguous files:  A program that simply read a file and ignored
its contents.  But if a test program tried to actually use the  data,
then there was never any advantage, and sometimes it ran slower.

Digging deeper turned up a simple explanation:  If a  process  spends
any time using the data after read() returns, it's highly likely that
the read head will move past the start of the next block.  When  this
happens,  the  next  read() must wait an entire disk rotation to read
that block.  If the file's blocks are  laid  out  randomly  within  a
track, however, then on the average you only have to wait half a disk
rotation for the next block.

Unix kernels do have "disk strategy"  routines  that  do  read-ahead.
This  somewhat  alleviates  the  above  problem,  of course.  But not
entirely.   It's  still  likely  that,  for  some  fraction  of   the
read-aheads,  the read() call and thus the decision to fetch the next
block will come too late.  Whenever this happens, a  contiguous  file
has  a  full  rotational delay, while a random layout has only half a
rotational delay.

In some Unix systems, this was understood by the people who  did  the
port,  and  they arranged for "consecutive" blocks to be separated by
some angle. This interlacing is semi-effective, in that it allows for
some amount of computation during that angle. But it is a fixed value
for the entire disk drive, so it is appropriate for only a portion of
applications.   For  the rest, it is either too large (and thus slows
down the read-ahead) or to small (and results in just-missed-it cases
that wait a full rotation).

In general, the way that Unix files systems work, it  is  useful  for
the  blocks  of  a  file  to  be  in adjacent tracks (to minimize arm
motion).  But it usually turns out that  random  arrangement  of  the
blocks  within  a track gives higher performance for real application
mixes than does contiguous blocks.   The  usual  Unix  disk  strategy
routines tend to produce this behavior most of the time.  So measures
to produce contiguous files are ineffective at  best,  and  sometimes
even hurt performance.


-
Subcription/unsubscription/info requests: send e-mail with
"subscribe", "unsubscribe", or "info" on the first line of the
message body to discuss-request at blu.org (Subject line is ignored).

References:
- defragmenting UNIX
  - From: jerryc at world.std.com (Jerry Clabaugh)

Prev by Date: defragmenting UNIX
Next by Date: CD-R software
Previous by thread: defragmenting UNIX
Next by thread: SAMBA is the topic on Wednesday 3/17
Index(es):
- Date
- Thread

Boston Linux & Unix / webmaster@blu.org