Souper Computer
Scott Lipcon
slipcon at cs.jhu.edu
Thu Feb 25 01:39:31 EST 1999
Hi,
I've been reading this list for a while, but never posted. I live in
the Boston area, but I'm at school at JHU in Baltimore. The local
chapter of the ACM, of which I'm an active member and officer, got a
grant last fall from a professor to build a Beowulf, so I'd be happy to
answer any questions. We recently got all the hardware issues worked
out, and in fact performed our first parallel computations last week.
The basic steps are:
- get a bunch of computers. They don't have to be the same speeds, but
it helps in terms of load balancing, and in ease of installing linux.
We have 7 Pentium II 350s, with 128Mb of RAM and 6.4gb ide drives, and
1 PII 350, with 384Mb of RAM and 2 9Gb Ultra Wide SCSI drives, to act
as a file server
- network them together. This is pretty important. "real" parallel
computers have very high speed interconnect busses... I'd say that is
the single biggest drawback of a beowulf type system - your parallel
workload had better be very large grained, otherwise your
communications latency and throughput will kill you. Our network is
100Mb switched ethernet. I've heard of people using normal 10Mb
ethernet for smaller clusters, that they're setting up just for fun,
but it probably doesn't work too great.
- install linux on them all. the Extreme Linux distribution from
RedHat is good, but its based on RedHat 5.0. We chose to install
RedHat 5.2 straight.
- Now you've got a network of workstations. We found the next logical
step was to figure out the logistics: NFS mounting /home from the
"master" node, cross mounting every drive on every machine via NFS,
exporting the password file from the master, getting rlogin to work
without any passwords, even for root, etc... There are a lot of things
you have to worry about, especially if the system is going to be
attached to the internet.
- Finally, you can install the tools that make it a Beowulf instead of
a network of workstations - there are RPMS available on the web that
make this a snap. PVM (Parallel Virtual Machine) is one such library
that you can use to write parallel programs. MPI is the other popular
one (Message Passing Interface)
As well as playing with this Beowulf, I'm taking a very interesting
course in Parallel Processing... the biggest thing I can stress is not
to expect a lot. Many people think "Hey, lets tie 8 computers
together, and we'll go 8 times faster" Not so - I could go on forever
about this, just from what I've learned in the first month of class,
and from our limited experiences with our beowulf. I'll keep it short
though. The way you talk about speed of a parallel system is to
compare it to a uniprocessor system... "Speedup" is defined as speed of
a uniprocessor divided by speed of the parallel machine. For an N
processor machine, the maximum possible speedup is N. You wont get
that in real life. A simple model is Amdahl's Law, which says the
following:
Assume that for a given program, x% can be run only on one processor.
Also assume that the rest of the job is entirely parallelizable.
Therefore (1 - x)% of the job can be run on all N processors. The
speedup in this case is:
N / ( 1 + (N - 1)x)
If you plot that for small values of x, you'll get some surprising
results:
For a 64 processor machine:
x = 1% (99% is fully parallelizable): Speedup = 39
x = 5% (95% is parallelizable): Speedup = 15.4
One more example - a large supercomputer, 1024 processors:
x = 1%: speedup = 91.8
x = 2%: 47.7
x = 8%: 12.4!!!
And that doesn't even take in to account the communications latencies
which would be common on a beowulf-class system.
To give a real-world example: POV-Ray. There exists a version of
povray that includes PVM support. At the time we ran it, only 7 of the
8 systems were up an running, so our theoretical max speedup was 7. We
rendered the standard benchmark image, at 640x480, on one computer, in
1:42 = 102 seconds. We then re-did the image, running on 7 computers -
it took 22 seconds. 102/22 = a speedup of 4.6. Certainly not bad, but
probably not what you'd expect either, if you go in with the "8
computers = 8 times faster" attitude.
Anyway, I've rambled on enough... I'd be happy to answer any questions
about our experiences, or whatever... either on the list or via email.
Web sites to check out:
http://www.beowulf.org
http://www.beowulf-underground.org
http://galaxy.acm.jhu.edu/ (shameless plug :)
Scott Lipcon
-
Subcription/unsubscription/info requests: send e-mail with subject of
"subscribe", "unsubscribe", or "info" to discuss-request at blu.org
More information about the Discuss
mailing list