Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] SVN server - What hardware do I need?

> From: at [mailto:discuss-
> at] On Behalf Of Greg Rundlett
> I'm working on replacing an existing Subversion server.  Does anyone have
> specific guidelines for the best hardware configuration for such a service?
> says
> that a
> fast disk and plenty of RAM for Apache are the chief concerns.

Definitely not accurate.  At least, not in general.

There are several things to say here:  

First of all, svn does a lot of compression (gzip / zlib level 6) and differencing.  So it's extremely efficient at disk utilization, but it hammers on the CPU, and does a lot of random IO.  For these reasons, you'll benefit enormously by either using a SSD, or having so much RAM that the whole repository eventually gets cached and stays in cache.

Only one commit can happen concurrently, which means you benefit by having the single fastest (one core) cpu available.  Any more than one core will be wasted during commits, as the job is not parallelizable.

Many checkouts (or other read-operations) can happen concurrently.  Which means you definitely benefit by multiple cores and hyperthreading.  But assuming you're limited by a 1Gbit ethernet ...  How many cores does it take performing gzip-6 decompression to max out a 1Gb link?  Somewhere around 4-8.  So it would be wasteful to go higher than that.

Only use apache if you need it for some reason.  It gives you capabilities beyond what svnserve can provide, but svnserve is *way* lighter memory usage, and much faster.  Apache is a notorious memory hog, as well as a lot of protocol overhead.

> file:/// access method, changing that to Apache over SSL

Yup.  Http by itself will slow you down bigtime just due to TCP/http overhead compared to svnserve.  If you're making it HTTPS, that's going to be a big hit too, because of frequent spin up and tear down of SSL connections means a lot of key generation.  In SSL, the key generation to establish a connection is the expensive part ... Once a connection is established, the actual encryption has barely any performance impact for sustained transfers.

> Many of the developers are remote, and use VNC to get onto the box, and
> do
> their checkouts and builds locally on that one machine.

Definitely *don't* allow users to access via file:/// while either apache or svnserve is also serving up the same directory.  Lock down the file permissions to make it inaccessible by users (only accessible via apache or svnuser) and force users to go through the daemon.

>  One
> "performance" improvement may come from using NX instead of VNC, as
> I've
> heard the former is even more responsive than the latter. 

The NX GUI traffic is definitely much accelerated over VNC gui traffic for WAN access, but behavior is also significantly different, in terms of how you go about suspending/resuming sessions, utilizing multiple monitors, resizing desktop sessions, etc.  If you're going from VNC to NX, there will be some adjustment period, and it won't be problem-free.  In the past, I've opted to suspport both the VNC and NX and SSH/XTunnel solutions, just because each one of them is better than the other for specific situations.

> A checkout of
> the source to do a build is non-trivial:  it's about 6GB of files.  But,
> assuming that a developer has those files, the real performance drag is
> waiting for that code to compile for 15+ minutes.  I'm not familiar with
> performance optimization in building C code, so I'm all ears for those
> tips. I'm even wondering if gcc can tell me what code is unused in the
> project (and therefore could be removed).

gcc is all-around performance intensive.  Beef up CPU, memory, and disk speed.  (Particularly random access time.)  The memory is mostly for caching.

In most default configurations, I believe it works single-threaded, but I think there's a -j option to either gcc or make, to run parallel.  This comes with some implications, so the developers have to read the man pages to figure out what's necessary to parallel process their builds.

BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!

Boston Linux & Unix /