[BLU] the philosophy of fork/exec
John Chambers
jc at trillian.mit.edu
Mon Jan 22 17:56:45 EST 2001
David Kramer wrote:
| On Fri, 19 Jan 2001, Seth Gordon wrote:
| > > ...The recent
| > > description of unix's file-linking scheme as "strange" is an example
| > > of how even experienced unix users and programmers don't always
| > > understand the reasons behind the design. The unix fork+exec
| > > scheme is another.
| >
| > Can you expand on this? What do other OSs do for spawning a process
| > that don't fit the fork+exec model, what are the consequences of those
| > alternative techniques, and what problems does fork+exec solve?
|
| Nobody bit on this one, so I will attempt to answer the question, though
| it is outside any areas of expertise I pretend to have.
Hmmm .. I seem to have been neglectful here, so maybe I'll add to
David's comments.
One of the significant illustrations of the use of the unix fork is
in the way that the apache server handles incoming requests. Part of
the httpd.conf file is the number of children to create. Apache forks
N times, and these are all copies of the same program. There's an
immediate efficiency gain here over the "spawn" paradigm implemented
by most other systems. When apache forks, the children don't need to
do any initialization at all. The parent did that, and the children
inherit all the parent's data unchanged. So the children know
everything the parent did. Startup for programs like this can be
significant. Doing it only once is a major performance improvement.
Also, the children all inherit the parent's open files. In this case,
the significant open file is the socket that the parent is listening
on. After the forks, all the children are also listening on the same
shared file. This file isn't replicated; it is a single file that is
open in all the forked processes. When a connection comes in, it goes
to the first of these apaches that does an accept() call. Since HTTP
requests are all independent and web servers don't maintain state,
this works perfectly. If there's an idle server, the client gets an
instant connection. If there is no idle server, the first server to
complete its current task will do an accept() and get the connection.
This sort of sharing of incoming requests is difficult to implement
with anywhere near this fast response time on systems with a
different process model.
There is a major memory saving possible here, too. On most hardware,
linux and other unix-like systems now implement "store on write" for
the data of forked processes. So when a process forks, not only the
code but all the global data is shared. If one modifies a global
datum, that memory block is copied for that process. But the global
data set up before the fork doesn't need to be copied until it is
modified. This is easy to implement (if the hardware supports it)
with the unix fork mechanism. It is very difficult to implement with
a "spawn" approach, because it's difficult to discover that memory is
identical and can be shared.
Of course, the primary example of the fork+exec scheme is its use to
implement file redirection and pipelines within the various shells.
This only takes a few lines of C on a unix-like system. On more other
systems, it is much more difficult. It typically entails having the
command interpreter pass a whole lot of extra information to a
spawned program, and then the startup code for that program has to
understand what was passed and implement it correctly. It is very
difficult to get the implementers of various compilers and
interpreters to go along with this and do it in a consistent fashion.
With the fork+exec approach, the code is in the command interpreter,
and the new processes don't see it, so it works with all programs no
matter what language they are written in.
This is a lot of the explanation for the way that wildcard characters
varied so wildly in DOS. With Windows, the command-line interface was
pretty much abandoned, of course, and wildcard expansion, when it is
implemented at all, is done differently for nearly every program. We
see a bit of this in unix GUI tools, too, though the glob(3) C
library routine and the perl glob() function are there to encourage
consistency.
-
Subcription/unsubscription/info requests: send e-mail with
"subscribe", "unsubscribe", or "info" on the first line of the
message body to discuss-request at blu.org (Subject line is ignored).
More information about the Discuss
mailing list