Awk your way: Parsing Logs

It is a common challenge for technical interviews to parse Quake 3 server logs and display: Players in a match Player score card, listing player names and kill count: Ignore as a player If kills a player, add -1 to player's kill count (optional) Group outputs above by match (optional) Death cause report by match Working with files is a common practice for any developer. Using awk not so much, even though it is IMHO one of the best tools for doing so: The language is built for (1) text matching and (2) manipulation. Working with small files is as easy as it is working with very large files. Intending to spread the knowledge of the tool to more people, let's solve the challenge with AWK and get to know how you can effectively start using it today in your workflow. I assume you know well a programming language, your way around a (*nix) CLI and that we are using GNU awk. The beginning of a not so usual program As it is common with other Unix tools, it is better to break the program into smaller pieces, Awk programs bigger than ~150 lines are difficult to maintain. Here are the different programs we are going to create: clean.awk will read input files, which are the original log files, and output a cleaner version of their content. Containing just the data we need to manipulate and use. scoreboard.awk will use the output from the previous program to produce the score boards for each game. Let's create a walking skeleton to run and debug our progress while tackling the challenge: $ mkdir /tmp/awk-quake $ cd !$ $ curl --remote-name -L https://gist.githubusercontent.com/cloudwalk-tests/be1b636e58abff14088c8b5309f575d8/raw/df6ef4a9c0b326ce3760233ef24ae8bfa8e33940/qgames.log $ tail qgames.log 13:55 Kill: 3 4 6: Oootsimo killed Dono da Bola by MOD_ROCKET 13:55 Exit: Fraglimit hit. 13:55 score: 20 ping: 8 client: 3 Oootsimo 13:55 score: 19 ping: 14 client: 6 Zeh 13:55 score: 17 ping: 1 client: 2 Isgalamido 13:55 score: 13 ping: 0 client: 5 Assasinu Credi 13:55 score: 10 ping: 8 client: 4 Dono da Bola 13:55 score: 6 ping: 19 client: 7 Mal 14:11 ShutdownGame: 14:11 ------------------------------------------------------------ $ cat clean.awk { print } $ watch gawk -f parse.awk qgames.log Above we: Downloaded qgames.log Created clean.awk that prints everything passed to it Executed the program every couple of seconds (with watch) to see its result while we change it in another session (to stop watch, use CTRL-C) Let's change clean.awk to filter just the lines useful to us, and help us debug what to do with them: BEGIN { FS = " " LFS = "\n" } /Init/ { print } /kill/ { debug_fields() } function debug_fields() { for (i = 1; i

Mar 3, 2025 - 22:42
 0
Awk your way: Parsing Logs

It is a common challenge for technical interviews to parse Quake 3 server logs and display:

  1. Players in a match
  2. Player score card, listing player names and kill count:
    1. Ignore as a player
    2. If kills a player, add -1 to player's kill count
  3. (optional) Group outputs above by match
  4. (optional) Death cause report by match

Working with files is a common practice for any developer. Using awk not so much, even though it is IMHO one of the best tools for doing so:

  • The language is built for (1) text matching and (2) manipulation.
  • Working with small files is as easy as it is working with very large files.

Intending to spread the knowledge of the tool to more people, let's solve the challenge with AWK and get to know how you can effectively start using it today in your workflow. I assume you know well a programming language, your way around a (*nix) CLI and that we are using GNU awk.

The beginning of a not so usual program

As it is common with other Unix tools, it is better to break the program into smaller pieces, Awk programs bigger than ~150 lines are difficult to maintain.
Here are the different programs we are going to create:

  1. clean.awk will read input files, which are the original log files, and output a cleaner version of their content. Containing just the data we need to manipulate and use.
  2. scoreboard.awk will use the output from the previous program to produce the score boards for each game.

Let's create a walking skeleton to run and debug our progress while tackling the challenge:

$ mkdir /tmp/awk-quake
$ cd !$
$ curl --remote-name -L https://gist.githubusercontent.com/cloudwalk-tests/be1b636e58abff14088c8b5309f575d8/raw/df6ef4a9c0b326ce3760233ef24ae8bfa8e33940/qgames.log
$ tail qgames.log
 13:55 Kill: 3 4 6: Oootsimo killed Dono da Bola by MOD_ROCKET
 13:55 Exit: Fraglimit hit.
 13:55 score: 20  ping: 8  client: 3 Oootsimo
 13:55 score: 19  ping: 14  client: 6 Zeh
 13:55 score: 17  ping: 1  client: 2 Isgalamido
 13:55 score: 13  ping: 0  client: 5 Assasinu Credi
 13:55 score: 10  ping: 8  client: 4 Dono da Bola
 13:55 score: 6  ping: 19  client: 7 Mal
 14:11 ShutdownGame:
 14:11 ------------------------------------------------------------
$ cat clean.awk
{ print }
$ watch gawk -f parse.awk qgames.log

Above we:

  1. Downloaded qgames.log
  2. Created clean.awk that prints everything passed to it
  3. Executed the program every couple of seconds (with watch) to see its result while we change it in another session (to stop watch, use CTRL-C)

Let's change clean.awk to filter just the lines useful to us, and help us debug what to do with them:

BEGIN {
    FS = " "
    LFS = "\n"
}
/Init/ { print }
/kill/ { debug_fields() }

function debug_fields()
{
    for (i = 1; i <= NF; i++) {
        printf("%d: %s\n", i, $i)
    }
}

Don't despair yet, it is pretty simple what we are doing:

  1. BEGIN is a special block, that gets executed once at the start of the parsing:
    1. We use it to (re-)define some special variables:
      1. FS defines the field separator (space). It is used to break a matching line into a smaller array of objects.
      2. LFS defines the line separator (new line). Everything until that character will be treated as a line.
  2. /match/ { action } blocks execute a set of actions when a match (regex supported) is found:
    1. /Init/ { print } prints every line that has Init on it, without doing anything more.
    2. /kill/ { debug_fields() } executes the debug_fields() function for every line that has a matching kill string on it.
    3. Every line that doesn't match the rules above is ignored.
  3. function debug_fields() prints all fields identified after breaking the line with FS:

    1. NF is a special variable containing the number of fields parsed for the current line.
    2. $n is the field n parsed. Inside the loop $i will become $1,
      $2 and $3 allowing us to retrieve the contents of every field on that
      line, displaying something like:

      1: 20:54                                     
      2: Kill:
      3: 1022
      4: 2
      5: 22:
      6: 
      7: killed
      8: Isgalamido
      9: by
      10: MOD_TRIGGER_HURT
      
    3. The output above is useful to debug the current line contents we can work with. Try changing debug_fields() action to print $6 " killed " $8.

With little changes, we can use $6 (killer) and $8 (killed) to display who killed who, which is pretty much everything we need.