|
|
|
3. Invoking AWK programsAWK (named after its creators Al Aho, Peter Weinberger and Brian Kernighan) is a very powerful text processing language. It features automatic splitting of each input line in fields, associative arrays (arrays indexed by strings), and built-in string oriented functions. Brian Kernighan said about AWK: It was originally for writing these one and two line programs. It really was. I think it's very seductive because it does so many things automatically. It handles strings and numbers smoothly. It is an interpreter and there's no baggage, no derived object files. People start to write a one and two line program that just grows and grows; some of them grow unbelievably large: tens of thousands of lines -- which is nonsense. This article describes three ways to interface AWK programs with shell scripts and how to import shell variables into AWK programs. This text assumes a good understanding of AWK and shell scripting. If you want to learn how to program using AWK, you should read an AWK introduction, e.g. one of the documents in the bibliography .If appropriate we will differentiate between oawk, nawk, and awk. Calling AWK from a shell scriptHave a look at the following script, that searches a text string within files (like grep) using AWK:
This script may be called the following way: $ textsearch main *.c For now we'll ignore the fact that the script is not working correctly and describe how it should have worked. The script assigns the first command line
parameter ("main" in the command above)
to the script variable At this time, however, it only searches
the constant But how do we get the contents of the shell script variable into the AWK program? There are three major ways to do achieve this:
The second method has the disadvantage of not being portable to older versions of awk (and even different versions of nawk). The third method has some disadvantages we will describe later. Therefore we will explain the first, preferred method in detail. Shell script embeddingIf we call AWK the following way:
The awk program is called with one or more arguments: the first argument (marked red) is the complete AWK program, followed by the files specified on the command line. If we want to use the shell variable
This could work the following way:
In this example the AWK program consists of three parts:
It is essential that all three parts are written together without any whitespace, because AWK only takes one program on the command line and will complain about any further program found. What happens if we call this script $ textsearch hello *.doc Inside of the script the first argument "hello" will
be assigned to the shell variable
We now have exactly the solution for our problem: this is a way to import a shell environment variable into AWK. There's still one problem left. Consider the following invocation of our script: $ textsearch "our house" *.doc Now the variable
Now our AWK program (marked red)
is split in two parts, resulting in AWK error messages.
The first part ' The solution to this problem is simple: the shell environment variable should be enclosed in quotes:
Now you are able to write large AWK programs that may use shell script variables. The embedding of AWK programs in shell scripts is easy to use, portable, and allows the usage of arbitrary complex shell script commands for input pre- or post processing. The following example uses the technique described above to transfer the name of a file into the AWK script (marked red). The script substitute substitutes
arbitrary words in the input with other words
specified in the file oldword newword Each oldword in the input is substituted with newword in the output.
Some comments on the script:
Using the
|
: # textsearch - search text in files if [ $# -gt 0 ] then SearchString="$1" shift else echo >&2 "usage: $0 searchstring [file ...]" exit 1 fi awk -v Search="$SearchString" '$0 ~ Search' "$@" |
The script parts marked red assign the contents of the shell script variable SearchString to the AWK variable Search. This variable is then used inside of the AWK script (marked blue) to match a line.
Note that we changed the search command from
"/SearchString/
" to
"$0 ~ Search
", because AWK variables may
not be used between the pattern matching operator
/.../.
|
AWK knows another way to assign values to AWK variables, like in the following example:
$ awk '{ print "var is", var }' var=TEST file1 file2
This statement assigns the value "TEST" to the AWK variable "var", and then reads the files "file1" and "file2". The assignment works, because AWK interprets each file name containing an equal sign ("=") as an assignment.
This example is very portable (even oawk understands this syntax), and easy to use. So why don't we use this syntax exclusively?
This syntax has two drawbacks: the variable assignment
are interpreted by AWK the moment the file would have
been read. At this time the assignment takes place. Since
the BEGIN
action is performed before the
first file is read, the variable is not available
in the BEGIN
action.
The second problem is, that the order of the variable assignments and of the files are important. In the following example
$ awk '{ print "var is", var }' file1 var=TEST file2
the variable var is not defined during the read of file1, but during the reading of file2. This may cause bugs that are hard to track down.
An equally portable way to achieve the same result is Shell script embedding, the preferred method.
|
One way to start AWK scripts is to invoke awk with the command line flag -f and the AWK script name, e.g.
$ awk -f scriptname.awk
This usage has some disadvantages:
Since there is a better way to invoke the AWK script, we will not explain this syntax further.
|
An interpreter line is the first line of an executable text (non-binary) file. If the first two characters of the file are "#!", the remainder of the line is taken to be the name of an interpreter (an binary executable file). This program is then started with the file text [TODO: how? on stdin?].
This way any script may call its own interpreter, e.g.
#! /bin/awk -f BEGIN { print "this script is read by AWK" }
This is a comfortable way to call AWK scripts, because
in contrary to the "awk -f" solution the user does not
have to remember the whole path for the script (if his
PATH
environment variable is set correctly).
The programmer, however still does not have a way to pre- or postprocess the input/output of the AWK script.
|
A good introduction to the features of AWK from the inventors of the language.
This FAQ helps getting AWK binaries, finding (web) tutorials, and answers some more advanced AWK questions.
This document describes some bugs and different behavior of some AWK implementations. It's somewhat dated (1989?), because most of the bugs are fixed in newer AWK implementations.
This book has many interesting stories and anecdotes about the evolution of the UNIX operating system. An good book only for UNIX enthusiasts.
These are good places to pose AWK related questions.
1 Thanks to Stefan Lagotzki <lago20@gmx.de> for suggesting this.
| Top | |