Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
On Wed, Oct 02, 2002 at 10:49:07AM -0400, Derek Atkins wrote: > I had a scenario where I was trying to create 300,000 files in > one directory. The files were named V[number] where [number] > was monotonically increasing from 0-299999. I killed the process > after waiting a couple hours. I once did something similar on a Macintosh. First, given a directory with thousands of files, as long as one didn't try to look at it in Finder, the Macintosh's HFS was very fast. Second, when I created too many files (was it 32K?), it trashed the disk and it had to be reformatted. (Also, I think HFS had a limit of total files per partition.) Moral One: Only some file systems can efficiently handle lots of files in a single directory. Moral Two: Be cautious when doing things that might be perverse. (I can hear it now: "No one will ever create that many files in a single directory, there is no need to slow down the normal case to check for that unlikely case. Hell, Finder would have no hope of opening a folder with so many files in it.") > Breaking it up into directories of about 2000 files each REALLY > helped. How are all these files related to each other? What natural organization do they have? Is there a natural mapping from file to directory path that can guarantee no directory will balloon to many thousands of files? If so, maybe use that organization. Will the data change in a way that would change a file's location? Will you be able to efficiently ripple that change through all dependent places that reference it? You might want to use a database to organize your files... You might have to come up with a hash that distributes files fairly randomly, in which case I suggest you do enough levels to handle a much greater number of files. Another consideration is how these files will be used. Will they just be put there, or will people also be accessing them? Are they fairly static, or are their lots of changes to be tracked? Are these by chance MIT's webification of course materials? If so, you might be spreading complete copies across multiple machines to handle bandwidth needs. Sounds like a cool problem... -kb
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |