Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU |
I had a scenario where I was trying to create 300,000 files in one directory. The files were named V[number] where [number] was monotonically increasing from 0-299999. I killed the process after waiting a couple hours. Breaking it up into directories of about 2000 files each REALLY helped. In my case I used a one-level-deep sub-directory tree, where the directories were named D[number] and D[i] contained filed V[i*n] to V[i*(n+1)-1]. Creating the tree of 300,000 files using this method took about 5 minutes, and lookups are also fast. There are only 150 directories involved with 2000 files each, and I only need to know the end filename in order to know where to find it. -derek John Chambers <jc at trillian.mit.edu> writes: > I have a job that entails on the order of 50 million web pages (they > say this week ;-), a few Kbytes each. Now, unix file systems are > generally known to not work that well when you have millions of files > in a single directory, and the general approach of splitting it up > into a tree is well known. But I haven't seen any good info about > linux file systems, and the obvious google keywords seem to get lots > of interesting but irrelevant stuff. > > Anyone know of some good info on this topic for various file systems? > Is there a generallly-useful directory scheme that makes it work well > (or at least not too poorly) on all linux file systems? > > There's also the possibility of trying the DB systems, but it'd be a > bit disappointing to spend months doing this and find that the best > case is an order of magnitude slower than the dumb nested-directory > approach. (I've seen this already so many times that I consider it > the most likely outcome of storing files as records in a DB. ;-) > > _______________________________________________ > Discuss mailing list > Discuss at blu.org > http://www.blu.org/mailman/listinfo/discuss -- Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory Member, MIT Student Information Processing Board (SIPB) URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH warlord at MIT.EDU PGP key available
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |