[BLU] the philosophy of fork/exec

Mon Jan 22 17:56:45 EST 2001

David Kramer wrote:
| On Fri, 19 Jan 2001, Seth Gordon wrote:
| > > ...The recent
| > > description of unix's file-linking scheme as "strange" is an example
| > > of  how  even  experienced  unix  users  and programmers don't always
| > > understand the reasons behind the design.  The unix fork+exec
| > > scheme is another.
| >
| > Can you expand on this?  What do other OSs do for spawning a process
| > that don't fit the fork+exec model, what are the consequences of those
| > alternative techniques, and what problems does fork+exec solve?
|
| Nobody bit on this one, so I will attempt to answer the question, though
| it is outside any areas of expertise I pretend to have.

Hmmm ..  I seem to have been neglectful here, so maybe  I'll  add  to
David's comments.

One of the significant illustrations of the use of the unix  fork  is
in the way that the apache server handles incoming requests.  Part of
the httpd.conf file is the number of children to create. Apache forks
N  times,  and  these are all copies of the same program.  There's an
immediate efficiency gain here over the "spawn" paradigm  implemented
by most other systems.  When apache forks, the children don't need to
do any initialization at all.  The parent did that, and the  children
inherit  all  the  parent's  data  unchanged.   So  the children know
everything the parent did.  Startup for programs  like  this  can  be
significant.  Doing it only once is a major performance improvement.

Also, the children all inherit the parent's open files. In this case,
the  significant open file is the socket that the parent is listening
on.  After the forks, all the children are also listening on the same
shared file.  This file isn't replicated; it is a single file that is
open in all the forked processes. When a connection comes in, it goes
to the first of these apaches that does an accept() call.  Since HTTP
requests are all independent and web servers  don't  maintain  state,
this  works perfectly.  If there's an idle server, the client gets an
instant connection.  If there is no idle server, the first server  to
complete its current task will do an accept() and get the connection.
This sort of sharing of incoming requests is difficult  to  implement
with  anywhere  near  this  fast  response  time  on  systems  with a
different process model.

There is a major memory saving possible here, too.  On most hardware,
linux  and other unix-like systems now implement "store on write" for
the data of forked processes.  So when a process forks, not only  the
code  but  all  the  global data is shared.  If one modifies a global
datum, that memory block is copied for that process.  But the  global
data  set  up  before  the fork doesn't need to be copied until it is
modified.  This is easy to implement (if the  hardware  supports  it)
with the unix fork mechanism.  It is very difficult to implement with
a "spawn" approach, because it's difficult to discover that memory is
identical and can be shared.

Of course, the primary example of the fork+exec scheme is its use  to
implement  file  redirection and pipelines within the various shells.
This only takes a few lines of C on a unix-like system. On more other
systems,  it is much more difficult.  It typically entails having the
command interpreter pass a  whole  lot  of  extra  information  to  a
spawned  program,  and  then the startup code for that program has to
understand what was passed and implement it correctly.   It  is  very
difficult   to   get   the  implementers  of  various  compilers  and
interpreters to go along with this and do it in a consistent fashion.
With  the fork+exec approach, the code is in the command interpreter,
and the new processes don't see it, so it works with all programs no
matter what language they are written in.

This is a lot of the explanation for the way that wildcard characters
varied so wildly in DOS. With Windows, the command-line interface was
pretty much abandoned, of course, and wildcard expansion, when it  is
implemented at all, is done differently for nearly every program.  We
see a bit of this in unix  GUI  tools,  too,  though  the  glob(3)  C
library  routine  and the perl glob() function are there to encourage
consistency.

-
Subcription/unsubscription/info requests: send e-mail with
"subscribe", "unsubscribe", or "info" on the first line of the
message body to discuss-request at blu.org (Subject line is ignored).