Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Limits of grep?



On  0, Michael O'Donnell <mod+blu at std.com> wrote:
> 
> 
> 
> >> I have a directory of 10000+ text files and would like to search
> >> for some strings in these files.  When I tried using "grep" command
> >> with an asterisk, I get the error message somthing to the effect,
> >>
> >> "File argument list too long"
> >>
> >> What is the file argument limit for grep?  I guess you need the grep
> >> source for this.  I did not find any information in the man page.
> >>
> >> Are there any other recommended tools to search through such large
> >> list of files?
> >
> >
> >That has nothing to do with grep.  It is a limit of
> >the shell.  One way around this is to use the find command:
> >
> >Remember that find recursively follows directories, so
> >you may want to tell find not to recurse.
> >
> >Simple example:
> >
> >  tarnhelm.blu.org [11] find .  -type f -exec grep "Subba Rao" {} \; -print
> >
> >or
> >
> >  tarnhelm.blu.org [12] find .  -type f -exec grep -l "Subba Rao" {} \;
> >
> >Example will search all regular files in the current
> >directory and subdirectories.  Grep will print the text,
> >but not fhe file name, and if the text is found, the file
> >name is printed on the following line.  The second example
> >uses the -l option of grep which prints only the file name.
> 
> 
> The ultimate source of the limitation in question
> is the amount of space reserved for argv[] (and
> don't forget envp[]) in the kernel's exec module -
> it's a hardcoded value that is typically VERY large -
> 32 pages (128Kb) in the 2.2.17 kernels, for example.
> 
>  (Hmmm, now that you've got me looking at it I might
>   have found a bug - it appears that the size of all
>   args and the size of all environment variables are
>   being individually compared to that limit value,
>   rather than in aggregate...)
> 
> Anyway, that workaround suggested above is correct
> but allow me to suggest a variation that is slightly
> more efficient:
> 
>    find . -type f -print | xargs -l100 grep -H "Subba Rao"
> 
> This approach uses find only to generate the list
> of files, which it simply shoves into the pipeline.
> Meanwhile xargs is told to batch up 100 filenames at
> a time from that pipeline and pass them all to grep on
> the command line; grep has also been told to (via -H)
> mention each file's name when it gets a hit.  This
> drastically reduces the number of times grep needs
> to be exec'd so things should go a little faster.
> You can experiment with that -l100 parameter, too -
> you could conceivably keep bumping it up until you
> once again run into the argv[] limit that started
> this whole discussion in the first place...
> 
> 

Thanks for replying. I tried the following solution and it worked and it is
much faster than using the plain 'find' command.

find <path> -print | xargs -n 500 grep <pattern>

Thanks to everyone who replied with a solution!

-- 

Subba Rao
subb3 at attglobal.net
http://pws.prserv.net/truemax/
-
Subcription/unsubscription/info requests: send e-mail with
"subscribe", "unsubscribe", or "info" on the first line of the
message body to discuss-request at blu.org (Subject line is ignored).




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org