Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How to best do zillions of little files?



In addition, it depends on which Unix systems. Many of the commercial Unix 
systems give you the choice of the old UFS file systems or new advanced 
file systems. 
On Linux, ReiserFS is optimized for a large number of small files. But, 
journalling file systems are generally slower than non-journal file 
systems. 
On 2 Oct 2002 at 10:49, Derek Atkins wrote:

> I had a scenario where I was trying to create 300,000 files in
> one directory.  The files were named V[number] where [number]
> was monotonically increasing from 0-299999.  I killed the process
> after waiting a couple hours.
> 
> Breaking it up into directories of about 2000 files each REALLY
> helped.  In my case I used a one-level-deep sub-directory tree,
> where the directories were named D[number] and D[i] contained
> filed V[i*n] to V[i*(n+1)-1].  Creating the tree of 300,000 files
> using this method took about 5 minutes, and lookups are also fast.
> There are only 150 directories involved with 2000 files each,
> and I only need to know the end filename in order to know where to
> find it.
> 
> -derek
> 
> John Chambers <jc at trillian.mit.edu> writes:
> 
> > I have a job that entails on the order of 50 million web pages  (they
> > say  this  week  ;-),  a few Kbytes each.  Now, unix file systems are
> > generally known to not work that well when you have millions of files
> > in  a  single  directory, and the general approach of splitting it up
> > into a tree is well known.  But I haven't seen any  good  info  about
> > linux  file systems, and the obvious google keywords seem to get lots
> > of interesting but irrelevant stuff.
> > 
> > Anyone know of some good info on this topic for various file systems?
> > Is there a generallly-useful directory scheme that makes it work well
> > (or at least not too poorly) on all linux file systems?
> > 
> > There's also the possibility of trying the DB systems, but it'd be  a
> > bit  disappointing  to spend months doing this and find that the best
> > case is an order of magnitude slower than the  dumb  nested-directory
> > approach.   (I've  seen this already so many times that I consider it
> > the most likely outcome of storing files as records in a DB.  ;-)
> > 
> > _______________________________________________
> > Discuss mailing list
> > Discuss at blu.org
> > http://www.blu.org/mailman/listinfo/discuss
> 
> -- 
>        Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
>        Member, MIT Student Information Processing Board  (SIPB)
>        URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
>        warlord at MIT.EDU                        PGP key available
> _______________________________________________
> Discuss mailing list
> Discuss at blu.org
> http://www.blu.org/mailman/listinfo/discuss


-- 
Jerry Feldman <gaf at heli-vets.net>
ORWAC 67-16 Blue Hats
61st AHC '67-'68 Lucky Stars
Heli-vets net administrator.
VHPA L05750
http://www.heli-vets.net





BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org