agrep is a version of the grep utility that also matches approximate patterns.
Description
agrep searches the input file names (standard input is the default) for records containing strings which either exactly or approximately match a pattern.
- Description
- Syntax
- Patterns
- Examples
- Related commands
- Linux commands help
A record is by default a single line, but it can be defined differently using the -d option (see below). Normally, each record found is copied to the standard output. Approximate matching allows finding records that contain the pattern with several errors including substitutions, insertions, and deletions.
For example, “Massechusets” matches “Massachusetts” with two errors (one substitution and one insertion). Running agrep -2 Massechusets foo outputs all lines in the file foo containing any string with (at most) 2 errors from “Massechusets”.
agrep supports many kinds of queries including arbitrary wildcards, sets of patterns, and in general, all regular expressions. It supports most of the options supported by the grep family plus several more (but it is not 100% compatible with grep).
As with the rest of the grep family, the characters $, ^, *, [, ], ^, |, (, ), !, and \ can cause unexpected results when included in the pattern, as these special characters are also meaningful to the shell. To avoid these problems, one should always enclose the entire pattern argument in single quotes, i.e., ‘pattern’. Do not use double quotes (").
When agrep is applied to more than one input file, the name of the file is displayed at the beginning of each line which matches the pattern. (The file name is not displayed when processing a single file, but in that case if the user wants the file name to appear, they should use /dev/null as a second file in the list, and then the file name will be displayed).
Syntax
agrep [ -#cdehiklnpstvwxBDGIS ] pattern [ -f patternfile ] [ filename… ]
Options
Patterns
agrep supports a large variety of patterns, including simple strings, strings with classes of characters, sets of strings, wildcards, and regular expressions.
Strings
A string is any sequence of characters, including the special symbols ^ for beginning of line and $ for end of line. The special characters listed above ( $, ^, *, [, ^, |, (, ), !, and \ ) should be preceded by \ if they are to be matched as regular characters. For example, ^abc\ corresponds to the string “^abc", whereas ^abc corresponds to the string “abc” at the beginning of a line.
Character classes
A class of characters is a list of characters inside “[]” (in order) corresponds to any character from the list, where a dash represents the range between two characters. For example, [a-ho-z] is any character between a and h or between o and z. The symbol ^ inside [] denotes which characters not to match (“complements” the list). For example, [^i-n] denotes any character except characters i through n. The symbol ^ thus has two meanings, but this is consistent with egrep. The symbol . stands for any character except for the newline character.
Boolean operations
agrep supports an AND operation ‘;’ and an OR operation ‘,’, but not a combination of both. For example, fast;network searches for all records containing both “fast” and “network”.
Wildcards
The symbol ‘#’ is used to denote a wildcard. # matches zero, or any number of, arbitrary characters. For example, ex#e matches “example”. The symbol # is equivalent to .* in egrep. In fact, .* works too, because it is a valid regular expression, but unless this is part of an actual regular expression, # works faster.
Combination of Exact and Approximate Matching
Any pattern inside angle brackets <> must match the text exactly even if the match is with errors. For example,
Regular Expressions
The syntax of regular expressions in agrep is in general the same as that for egrep. The union operation ‘|’, Kleene closure ‘’, and parentheses () are all supported. Currently ‘+’ is not supported. Regular expressions are currently limited to approximately 30 characters (excluding meta characters). Some options (-d, -w, -f, -t, -x, -D, -I, -S) do not currently work with regular expressions. The maximal number of errors for regular expressions that use ‘’ or ‘|’ is 4.
Examples
agrep -2 -c ABCDEFG foo
Gives the number of lines in file foo that contain “ABCDEFG” within two errors.
agrep -1 -D2 -S2 ‘ABCD#YZ’ foo
Outputs the lines containing “ABCD” followed within arbitrary distance by “YZ”, with up to one additional insertion (-D2 and -S2 make deletions and substitutions too “expensive”).
agrep -5 -p abcdefghij /path/to/dictionary/words
Outputs the list of all words in the dictionary located at /path/to/dictionary/words containing at least 5 of the first 10 letters of the alphabet in order.
agrep -1 ‘abc0-9*[x-z]’ foo
Outputs the lines containing, within up to one error, the string that starts with “abc” followed by one digit, followed by zero or more repetitions of either “de” or “fg”, followed by either “x”, “y”, or “z”.
agrep -d ‘^From ’ ‘breakdown;internet’ mbox
Outputs all mail messages (the pattern “^From " separates mail messages in a mail file) that contain keywords “breakdown” and “internet”.
agrep -d ‘$$’ -1 ‘
Finds all paragraphs that contain word1 followed by word2 with one error in place of the blank. In particular, if word1 is the last word in a line and word2 is the first word in the next line, then the space will be substituted by a newline symbol and it will match. Thus, this is a way to overcome separation by a newline. Note that -d ‘$$’ (or another delim which spans more than one line) is necessary, because otherwise agrep searches only one line at a time.
Related commands
grep — Filter text which matches a regular expression.