How to best do zillions of little files?
Derek Atkins
warlord at MIT.EDU
Wed Oct 2 10:49:07 EDT 2002
I had a scenario where I was trying to create 300,000 files in
one directory. The files were named V[number] where [number]
was monotonically increasing from 0-299999. I killed the process
after waiting a couple hours.
Breaking it up into directories of about 2000 files each REALLY
helped. In my case I used a one-level-deep sub-directory tree,
where the directories were named D[number] and D[i] contained
filed V[i*n] to V[i*(n+1)-1]. Creating the tree of 300,000 files
using this method took about 5 minutes, and lookups are also fast.
There are only 150 directories involved with 2000 files each,
and I only need to know the end filename in order to know where to
find it.
-derek
John Chambers <jc at trillian.mit.edu> writes:
> I have a job that entails on the order of 50 million web pages (they
> say this week ;-), a few Kbytes each. Now, unix file systems are
> generally known to not work that well when you have millions of files
> in a single directory, and the general approach of splitting it up
> into a tree is well known. But I haven't seen any good info about
> linux file systems, and the obvious google keywords seem to get lots
> of interesting but irrelevant stuff.
>
> Anyone know of some good info on this topic for various file systems?
> Is there a generallly-useful directory scheme that makes it work well
> (or at least not too poorly) on all linux file systems?
>
> There's also the possibility of trying the DB systems, but it'd be a
> bit disappointing to spend months doing this and find that the best
> case is an order of magnitude slower than the dumb nested-directory
> approach. (I've seen this already so many times that I consider it
> the most likely outcome of storing files as records in a DB. ;-)
>
> _______________________________________________
> Discuss mailing list
> Discuss at blu.org
> http://www.blu.org/mailman/listinfo/discuss
--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
warlord at MIT.EDU PGP key available
More information about the Discuss
mailing list