BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Limits of grep?

Subject: Limits of grep?
From: mod+blu at std.com (Michael O'Donnell)
Date: Tue, 26 Sep 2000 09:57:24 -0400
In-reply-to: Your message of "Tue, 26 Sep 2000 07:15:49 EDT." <200009261115.HAA18291@gaf.ne.mediaone.net>



>> I have a directory of 10000+ text files and would like to search
>> for some strings in these files.  When I tried using "grep" command
>> with an asterisk, I get the error message somthing to the effect,
>>
>> "File argument list too long"
>>
>> What is the file argument limit for grep?  I guess you need the grep
>> source for this.  I did not find any information in the man page.
>>
>> Are there any other recommended tools to search through such large
>> list of files?
>
>
>That has nothing to do with grep.  It is a limit of
>the shell.  One way around this is to use the find command:
>
>Remember that find recursively follows directories, so
>you may want to tell find not to recurse.
>
>Simple example:
>
>  tarnhelm.blu.org [11] find .  -type f -exec grep "Subba Rao" {} \; -print
>
>or
>
>  tarnhelm.blu.org [12] find .  -type f -exec grep -l "Subba Rao" {} \;
>
>Example will search all regular files in the current
>directory and subdirectories.  Grep will print the text,
>but not fhe file name, and if the text is found, the file
>name is printed on the following line.  The second example
>uses the -l option of grep which prints only the file name.


The ultimate source of the limitation in question
is the amount of space reserved for argv[] (and
don't forget envp[]) in the kernel's exec module -
it's a hardcoded value that is typically VERY large -
32 pages (128Kb) in the 2.2.17 kernels, for example.

 (Hmmm, now that you've got me looking at it I might
  have found a bug - it appears that the size of all
  args and the size of all environment variables are
  being individually compared to that limit value,
  rather than in aggregate...)

Anyway, that workaround suggested above is correct
but allow me to suggest a variation that is slightly
more efficient:

   find . -type f -print | xargs -l100 grep -H "Subba Rao"

This approach uses find only to generate the list
of files, which it simply shoves into the pipeline.
Meanwhile xargs is told to batch up 100 filenames at
a time from that pipeline and pass them all to grep on
the command line; grep has also been told to (via -H)
mention each file's name when it gets a hit.  This
drastically reduces the number of times grep needs
to be exec'd so things should go a little faster.
You can experiment with that -l100 parameter, too -
you could conceivably keep bumping it up until you
once again run into the argv[] limit that started
this whole discussion in the first place...


-
Subcription/unsubscription/info requests: send e-mail with
"subscribe", "unsubscribe", or "info" on the first line of the
message body to discuss-request at blu.org (Subject line is ignored).

Follow-Ups:
- Limits of grep?
  - From: subb3 at attglobal.net (Subba Rao)

References:
- Limits of grep?
  - From: gaf at blu.org (Jerry Feldman)

Prev by Date: Mediaone host names
Next by Date: Mediaone host names
Previous by thread: Limits of grep?
Next by thread: Limits of grep?
Index(es):
- Date
- Thread

Boston Linux & Unix / webmaster@blu.org