BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Limits of grep?

Subject: Limits of grep?
From: subb3 at attglobal.net (Subba Rao)
Date: Sat, 30 Sep 2000 05:20:43 +0000
In-reply-to: <200009261357.JAA27151@chmls06.mediaone.net>; from mod+blu@std.com on Tue, Sep 26, 2000 at 09:57:24AM -0400
References: <200009261115.HAA18291@gaf.ne.mediaone.net> <200009261357.JAA27151@chmls06.mediaone.net>

On  0, Michael O'Donnell <mod+blu at std.com> wrote:
> 
> 
> 
> >> I have a directory of 10000+ text files and would like to search
> >> for some strings in these files.  When I tried using "grep" command
> >> with an asterisk, I get the error message somthing to the effect,
> >>
> >> "File argument list too long"
> >>
> >> What is the file argument limit for grep?  I guess you need the grep
> >> source for this.  I did not find any information in the man page.
> >>
> >> Are there any other recommended tools to search through such large
> >> list of files?
> >
> >
> >That has nothing to do with grep.  It is a limit of
> >the shell.  One way around this is to use the find command:
> >
> >Remember that find recursively follows directories, so
> >you may want to tell find not to recurse.
> >
> >Simple example:
> >
> >  tarnhelm.blu.org [11] find .  -type f -exec grep "Subba Rao" {} \; -print
> >
> >or
> >
> >  tarnhelm.blu.org [12] find .  -type f -exec grep -l "Subba Rao" {} \;
> >
> >Example will search all regular files in the current
> >directory and subdirectories.  Grep will print the text,
> >but not fhe file name, and if the text is found, the file
> >name is printed on the following line.  The second example
> >uses the -l option of grep which prints only the file name.
> 
> 
> The ultimate source of the limitation in question
> is the amount of space reserved for argv[] (and
> don't forget envp[]) in the kernel's exec module -
> it's a hardcoded value that is typically VERY large -
> 32 pages (128Kb) in the 2.2.17 kernels, for example.
> 
>  (Hmmm, now that you've got me looking at it I might
>   have found a bug - it appears that the size of all
>   args and the size of all environment variables are
>   being individually compared to that limit value,
>   rather than in aggregate...)
> 
> Anyway, that workaround suggested above is correct
> but allow me to suggest a variation that is slightly
> more efficient:
> 
>    find . -type f -print | xargs -l100 grep -H "Subba Rao"
> 
> This approach uses find only to generate the list
> of files, which it simply shoves into the pipeline.
> Meanwhile xargs is told to batch up 100 filenames at
> a time from that pipeline and pass them all to grep on
> the command line; grep has also been told to (via -H)
> mention each file's name when it gets a hit.  This
> drastically reduces the number of times grep needs
> to be exec'd so things should go a little faster.
> You can experiment with that -l100 parameter, too -
> you could conceivably keep bumping it up until you
> once again run into the argv[] limit that started
> this whole discussion in the first place...
> 
> 

Thanks for replying. I tried the following solution and it worked and it is
much faster than using the plain 'find' command.

find <path> -print | xargs -n 500 grep <pattern>

Thanks to everyone who replied with a solution!

-- 

Subba Rao
subb3 at attglobal.net
http://pws.prserv.net/truemax/
-
Subcription/unsubscription/info requests: send e-mail with
"subscribe", "unsubscribe", or "info" on the first line of the
message body to discuss-request at blu.org (Subject line is ignored).

References:
- Limits of grep?
  - From: gaf at blu.org (Jerry Feldman)
- Limits of grep?
  - From: mod+blu at std.com (Michael O'Donnell)

Prev by Date: bash scripting
Next by Date: VB and 4th?
Previous by thread: Limits of grep?
Next by thread: Limits of grep?
Index(es):
- Date
- Thread

Boston Linux & Unix / webmaster@blu.org