awk

Very early on I realized that awk was useful for one thing in particular: printing a specific column, i.e.:

printf "col1\tcol2 col3\nfoo\t\tbar\t baz\n" | awk 'print { $2 }'

This would output col2 and bar. Please do notice how uneven and funky the spacing is in the printf.

Awk handles this by, per default, defining any (one or more) whitespace character(s) as a field-separator.

And that's about where my understanding of awk stopped. The syntax seemed too magical and alien to ever grasp. How wrong I was. It is so simple it is embarrassing...

You see, awk's syntax is built around two types of statements: An expression, followed by curly-braces, inside which comes an action.

Either of these two (expression, action) can be omitted.

An omitted expression will indicated to awk that it should be interpreted as TRUE and executed for each line awk traverses.

An omitted action will indicate to awk that it should PRINT the line currently being processed.

And when I say line, I really mean a record. So what the crap is a record, and how does it differ from a line?

Awk has a couple of special variables, among which are the FS (field separator, defaulting to one or more whitespace characters, as seen above) and RS (the record separator, defaulting to the newline character)

So by default, one record == one line. But there is nothing stopping you from redefining EITHER of these two variables. If you want FS to be a , well hey, you've just converted awk into a (very) rudimentary csv-parser.

And what's to say that a record (the thing containing columns/fields) can't be any characters between the letter "o"?

awk '
    /foo/ { print $2 }
    { print $NF }
    /bar/
' someFile

What the above script would do is the following: