The gawk command is the interface for GAWK, a powerful pattern-matching and processing language. It is based on the language AWK.

Description

Working with text files often requires repeated tasks. You might want to extract certain lines and discard the rest. Or you may need to make changes wherever certain patterns appear, but leave the rest of the file alone. Writing single-use programs for these tasks in languages, such as C, C++, or Java, is time-consuming and inconvenient. Such jobs are often easier with awk. The awk utility interprets a special-purpose programming language that makes it easy to handle simple data-reformatting jobs.

  • Description
  • Syntax
  • AWK program execution
  • Examples
  • Related Commands
  • Linux commands help

The GNU implementation of awk is called gawk; if you invoke it with the proper options or environment variables (see Options), it is fully compatible with the POSIX specification of the awk language and with the Unix version of awk maintained by Brian Kernighan.

Using awk (or gawk) allows you to:

  • Manage small, personal databases
  • Generate reports
  • Validate data
  • Produce indexes and perform other document preparation tasks
  • Experiment with algorithms you can adapt later to other computer languages

Also, gawk provides facilities that make it easy to:

  • Extract bits and pieces of data for processing
  • Sort data
  • Perform simple network communications

Syntax

gawk [ POSIX or GNU style options ] -f program-file [ – ] file …

gawk [ POSIX or GNU style options ] [ – ] program-text file …

pgawk [ POSIX or GNU style options ] -f program-file [ – ] file …

pgawk [ POSIX or GNU style options ] [ – ] program-text file …

dgawk [ POSIX or GNU style options ] -f program-file [ – ] file …

Option format

gawk options may be either traditional POSIX-style one letter options, or GNU-style long options. POSIX options start with a single “-”, while long options start with “–”. Long options are provided for GNU-specific features and POSIX-mandated features.

gawk-specific options are often used in long-option form. Arguments to long options are either joined with the option by an = sign, with no intervening spaces, or they may be provided in the next command line argument. Long options may be abbreviated, as long as the abbreviation remains unique.

Additionally, each long option has a corresponding short option, so that the option’s functionality may be used from in #! executable scripts.

Options

In compatibility mode, any other options are flagged as invalid, but are otherwise ignored. In normal operation, as long as program text is supplied, unknown options are passed on to the AWK program in the ARGV array for processing. This is particularly useful for running AWK programs via the “#!” executable interpreter mechanism.

AWK program execution

An AWK program consists of a sequence of pattern-action statements and optional function definitions.

@include “file name” pattern { action statements } function name(parameter list) { statements }

gawk first reads the program source from the program-file(s) if specified, from arguments to –source, or from the first non-option argument on the command line. The -f and –source options may be used multiple times on the command line. gawk reads the program text as if all the program files and command line source texts had been concatenated. This is useful for building libraries of AWK functions, without having to include them in each new AWK program that uses them. It also provides the ability to mix library functions with command line programs.

Also, lines beginning with @include can include other source files into your program, making library use even easier.

The environment variable AWKPATH specifies a search path to use when finding source files named with the -f option. If this variable does not exist, the default path is “.:/usr/local/share/awk”. The actual directory may vary, depending on how gawk was built and installed. If a file name given to the -f option contains a “/” character, no path search is performed.

gawk executes AWK programs in the following order. First, all variable assignments specified via the -v option are performed. Next, gawk compiles the program into an internal form. Then, gawk executes the code in the BEGIN block(s) (if any), and then proceeds to read each file named in the ARGV array (up to ARGV[ARGC]). If there are no files named on the command line, gawk reads the standard input.

If a file name on the command line has the form var=val it is treated as a variable assignment. The variable var is assigned the value val. This happens after any BEGIN blocks are run. Command line variable assignment is most useful for dynamically assigning values to the variables AWK uses to control how input is broken into fields and records. It is also useful for controlling state if multiple passes are needed over a single data file.

If the value of a particular element of ARGV is empty (""), gawk skips over it.

For each input file, if a BEGINFILE rule exists, gawk executes the associated code before processing the file’s contents. Similarly, gawk executes the code associated with ENDFILE after processing the file.

For each record in the input, gawk tests to see if it matches any pattern in the AWK program. For each pattern that the record matches, the associated action is executed. The patterns are tested in the order they occur in the program.

Finally, after all the input is exhausted, gawk executes the code in the END block(s) (if any).

Official gawk user’s guide

If you want to learn more about this incredibly powerful language, check out the GNU gawk user guide.

Examples

gawk ‘{ num_fields = num_fields + NF } END { print num_fields }’

Print the total number of fields in all input lines.

gawk ’length($0) > 80'

Prints every line longer than 80 characters. The sole rule has a relational expression as its pattern, and has no action (so the default action, printing the record, is used).

ls -l files | awk ‘{ x += $4 } ; END { print “total bytes: " x }’

Prints the total number of bytes used by files.

awk — Interpreter for the AWK text processing programming language.sed — A utility for filtering and transforming text.