Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How to best do zillions of little files?



I have a job that entails on the order of 50 million web pages  (they
say  this  week  ;-),  a few Kbytes each.  Now, unix file systems are
generally known to not work that well when you have millions of files
in  a  single  directory, and the general approach of splitting it up
into a tree is well known.  But I haven't seen any  good  info  about
linux  file systems, and the obvious google keywords seem to get lots
of interesting but irrelevant stuff.

Anyone know of some good info on this topic for various file systems?
Is there a generallly-useful directory scheme that makes it work well
(or at least not too poorly) on all linux file systems?

There's also the possibility of trying the DB systems, but it'd be  a
bit  disappointing  to spend months doing this and find that the best
case is an order of magnitude slower than the  dumb  nested-directory
approach.   (I've  seen this already so many times that I consider it
the most likely outcome of storing files as records in a DB.  ;-)





BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org