Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
>> I have a directory of 10000+ text files and would like to search >> for some strings in these files. When I tried using "grep" command >> with an asterisk, I get the error message somthing to the effect, >> >> "File argument list too long" >> >> What is the file argument limit for grep? I guess you need the grep >> source for this. I did not find any information in the man page. >> >> Are there any other recommended tools to search through such large >> list of files? > > >That has nothing to do with grep. It is a limit of >the shell. One way around this is to use the find command: > >Remember that find recursively follows directories, so >you may want to tell find not to recurse. > >Simple example: > > tarnhelm.blu.org [11] find . -type f -exec grep "Subba Rao" {} \; -print > >or > > tarnhelm.blu.org [12] find . -type f -exec grep -l "Subba Rao" {} \; > >Example will search all regular files in the current >directory and subdirectories. Grep will print the text, >but not fhe file name, and if the text is found, the file >name is printed on the following line. The second example >uses the -l option of grep which prints only the file name. The ultimate source of the limitation in question is the amount of space reserved for argv[] (and don't forget envp[]) in the kernel's exec module - it's a hardcoded value that is typically VERY large - 32 pages (128Kb) in the 2.2.17 kernels, for example. (Hmmm, now that you've got me looking at it I might have found a bug - it appears that the size of all args and the size of all environment variables are being individually compared to that limit value, rather than in aggregate...) Anyway, that workaround suggested above is correct but allow me to suggest a variation that is slightly more efficient: find . -type f -print | xargs -l100 grep -H "Subba Rao" This approach uses find only to generate the list of files, which it simply shoves into the pipeline. Meanwhile xargs is told to batch up 100 filenames at a time from that pipeline and pass them all to grep on the command line; grep has also been told to (via -H) mention each file's name when it gets a hit. This drastically reduces the number of times grep needs to be exec'd so things should go a little faster. You can experiment with that -l100 parameter, too - you could conceivably keep bumping it up until you once again run into the argv[] limit that started this whole discussion in the first place... - Subcription/unsubscription/info requests: send e-mail with "subscribe", "unsubscribe", or "info" on the first line of the message body to discuss-request at blu.org (Subject line is ignored).
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |