Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
I have a job that entails on the order of 50 million web pages (they say this week ;-), a few Kbytes each. Now, unix file systems are generally known to not work that well when you have millions of files in a single directory, and the general approach of splitting it up into a tree is well known. But I haven't seen any good info about linux file systems, and the obvious google keywords seem to get lots of interesting but irrelevant stuff. Anyone know of some good info on this topic for various file systems? Is there a generallly-useful directory scheme that makes it work well (or at least not too poorly) on all linux file systems? There's also the possibility of trying the DB systems, but it'd be a bit disappointing to spend months doing this and find that the best case is an order of magnitude slower than the dumb nested-directory approach. (I've seen this already so many times that I consider it the most likely outcome of storing files as records in a DB. ;-)
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |