![]() |
Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
On Tue, 2 Sep 2003 karina.popkova at verizon.net wrote: > I am interested in finding words in a file > like a dictionary (file.dict) that have a > basic match to a pattern. > I just read this, so I will try this. > >>% egrep "^\(fee\|fie\)" junk.txt > > Now, what I really want to do > is to search for general words > that have a pattern, but also to > exclude a specific vowel or consonant. This is what pipes & filters work best at! The trick for such questions is usually to grep once for the broader pattern, and then pipe that to another grep to narrow down the results to just the ones you're looking for. > I.E., to make as a part of the search pattern > that you call an RE, to look for given words > and make sure that the vowel "a" or the > vowel "e" is not a part of the word or string? Here's one way to do it: egrep \ '(pattern|another pattern|a third pattern)' \ /usr/share/dict/words \ | grep -vi '[ae]' This looks for "pattern", "another pattern", or "a third pattern" in the word list file /usr/share/dict/words, then removes all lines in that list that have either of the letters 'a' or 'e' (or 'A' or 'E', because the -i flag I passed to the second grep makes the match case Insensitive). Note that, because the word 'pattern' has both an 'a' and an 'e', this particular example will never match anything, but you get the idea :) > How could you find all words in a file that do not > have the letter: a, or e or i, and so on (???) Here's one way to do it: grep -v '[aei]' /path/to/file Note: this matches *lines* that have none of the bracketed characters. If you actually want to match individual words, and it can't be assumed that the file has one word per line, then this has to be accounted for. Here's a way to handle that, using `fmt` to "flatten" the file: fmt -1 /path/to/file | grep -v '[aei]' > Or do not have the letters a, and e, and i in the same word? Building off the last example, you could do something like this: fmt -1 /path/to/file | grep '^[^aei]*$' Note that in other examples, I set up a regular character class, as -- [abcd] where you match any of the characters in brackets. In this last example, I instead decided to use a negated character class, as -- [^abcd] where you match any characters *except* the bracketed ones. So, if you want to exclude single letters, these are roughly equivalent: grep '[^abcd]' /path/to/file grep -v '[abcd]' /path/to/file The former looks for non [abcd] lines, while the latter looks for lines that are not [abcd] lines. Subtle difference. You may find that one version is more efficient than the other, and the two may handle edge cases differently. My hunch is that the [^abcd] variant will usually be faster & easier, but I haven't actually tested this idea. The situation where the -v exclusion match excels is when you want to exclude not just an individual character, but whole words or phrases: egrep -v 'this|that|another thing' /path/to/file There is no trivial equivalent to this in character classes. On the other hand, character classes might let you "inline" reverse matches in some cases: % grep '^j[^l]*y$' months january As opposed to something like % grep '^j.*y$' months | grep -v 'l' january Make sense? Take a look at _Mastering Regular Expressions_ for more details. For such a seemingly dry subject, it's a fascinating read... :) -- Chris Devers cdevers at pobox.com http://devers.homeip.net:8080/ Malloc, malloc, n & v. trans. 1 n. Canaanite deity controlling memory allocations. 2 v. trans. C/C++ library. To request space on the heap. -- from _The Computer Contradictionary_, Stan Kelly-Bootle, 1995
![]() |
|
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |