Awk your way: Parsing Logs
It is a common challenge for technical interviews to parse Quake 3 server logs and display: Players in a match Player score card, listing player names and kill count: Ignore as a player If kills a player, add -1 to player's kill count (optional) Group outputs above by match (optional) Death cause report by match Working with files is a common practice for any developer. Using awk not so much, even though it is IMHO one of the best tools for doing so: The language is built for (1) text matching and (2) manipulation. Working with small files is as easy as it is working with very large files. Intending to spread the knowledge of the tool to more people, let's solve the challenge with AWK and get to know how you can effectively start using it today in your workflow. I assume you know well a programming language, your way around a (*nix) CLI and that we are using GNU awk. The beginning of a not so usual program As it is common with other Unix tools, it is better to break the program into smaller pieces, Awk programs bigger than ~150 lines are difficult to maintain. Here are the different programs we are going to create: clean.awk will read input files, which are the original log files, and output a cleaner version of their content. Containing just the data we need to manipulate and use. scoreboard.awk will use the output from the previous program to produce the score boards for each game. Let's create a walking skeleton to run and debug our progress while tackling the challenge: $ mkdir /tmp/awk-quake $ cd !$ $ curl --remote-name -L https://gist.githubusercontent.com/cloudwalk-tests/be1b636e58abff14088c8b5309f575d8/raw/df6ef4a9c0b326ce3760233ef24ae8bfa8e33940/qgames.log $ tail qgames.log 13:55 Kill: 3 4 6: Oootsimo killed Dono da Bola by MOD_ROCKET 13:55 Exit: Fraglimit hit. 13:55 score: 20 ping: 8 client: 3 Oootsimo 13:55 score: 19 ping: 14 client: 6 Zeh 13:55 score: 17 ping: 1 client: 2 Isgalamido 13:55 score: 13 ping: 0 client: 5 Assasinu Credi 13:55 score: 10 ping: 8 client: 4 Dono da Bola 13:55 score: 6 ping: 19 client: 7 Mal 14:11 ShutdownGame: 14:11 ------------------------------------------------------------ $ cat clean.awk { print } $ watch gawk -f parse.awk qgames.log Above we: Downloaded qgames.log Created clean.awk that prints everything passed to it Executed the program every couple of seconds (with watch) to see its result while we change it in another session (to stop watch, use CTRL-C) Let's change clean.awk to filter just the lines useful to us, and help us debug what to do with them: BEGIN { FS = " " LFS = "\n" } /Init/ { print } /kill/ { debug_fields() } function debug_fields() { for (i = 1; i

It is a common challenge for technical interviews to parse Quake 3 server logs and display:
- Players in a match
- Player score card, listing player names and kill count:
- Ignore
as a player - If
kills a player, add-1
to player's kill count
- Ignore
- (optional) Group outputs above by match
- (optional) Death cause report by match
Working with files is a common practice for any developer. Using awk not so much, even though it is IMHO one of the best tools for doing so:
- The language is built for (1) text matching and (2) manipulation.
- Working with small files is as easy as it is working with very large files.
Intending to spread the knowledge of the tool to more people, let's solve the challenge with AWK and get to know how you can effectively start using it today in your workflow. I assume you know well a programming language, your way around a (*nix) CLI and that we are using GNU awk.
The beginning of a not so usual program
As it is common with other Unix tools, it is better to break the program into smaller pieces, Awk programs bigger than ~150 lines are difficult to maintain.
Here are the different programs we are going to create:
-
clean.awk
will read input files, which are the original log files, and output a cleaner version of their content. Containing just the data we need to manipulate and use. -
scoreboard.awk
will use the output from the previous program to produce the score boards for each game.
Let's create a walking skeleton to run and debug our progress while tackling the challenge:
$ mkdir /tmp/awk-quake
$ cd !$
$ curl --remote-name -L https://gist.githubusercontent.com/cloudwalk-tests/be1b636e58abff14088c8b5309f575d8/raw/df6ef4a9c0b326ce3760233ef24ae8bfa8e33940/qgames.log
$ tail qgames.log
13:55 Kill: 3 4 6: Oootsimo killed Dono da Bola by MOD_ROCKET
13:55 Exit: Fraglimit hit.
13:55 score: 20 ping: 8 client: 3 Oootsimo
13:55 score: 19 ping: 14 client: 6 Zeh
13:55 score: 17 ping: 1 client: 2 Isgalamido
13:55 score: 13 ping: 0 client: 5 Assasinu Credi
13:55 score: 10 ping: 8 client: 4 Dono da Bola
13:55 score: 6 ping: 19 client: 7 Mal
14:11 ShutdownGame:
14:11 ------------------------------------------------------------
$ cat clean.awk
{ print }
$ watch gawk -f parse.awk qgames.log
Above we:
- Downloaded
qgames.log
- Created
clean.awk
that prints everything passed to it - Executed the program every couple of seconds (with
watch
) to see its result while we change it in another session (to stopwatch
, use CTRL-C)
Let's change clean.awk
to filter just the lines useful to us, and help us debug what to do with them:
BEGIN {
FS = " "
LFS = "\n"
}
/Init/ { print }
/kill/ { debug_fields() }
function debug_fields()
{
for (i = 1; i <= NF; i++) {
printf("%d: %s\n", i, $i)
}
}
Don't despair yet, it is pretty simple what we are doing:
-
BEGIN
is a special block, that gets executed once at the start of the parsing:- We use it to (re-)define some special variables:
-
FS
defines the field separator (space). It is used to break a matching line into a smaller array of objects. -
LFS
defines the line separator (new line). Everything until that character will be treated as a line.
-
- We use it to (re-)define some special variables:
-
/match/ { action }
blocks execute a set ofactions
when amatch
(regex supported) is found:-
/Init/ { print }
prints every line that hasInit
on it, without doing anything more. -
/kill/ { debug_fields() }
executes thedebug_fields()
function for every line that has a matchingkill
string on it. - Every line that doesn't match the rules above is ignored.
-
-
function debug_fields()
prints all fields identified after breaking the line withFS
:-
NF
is a special variable containing the number of fields parsed for the current line. -
$n
is the fieldn
parsed. Inside the loop$i
will become$1
,
$2
and$3
allowing us to retrieve the contents of every field on that
line, displaying something like:
1: 20:54 2: Kill: 3: 1022 4: 2 5: 22: 6:
7: killed 8: Isgalamido 9: by 10: MOD_TRIGGER_HURT The output above is useful to debug the current line contents we can work with. Try changing
debug_fields()
action toprint $6 " killed " $8
.
-
With little changes, we can use $6
(killer) and $8
(killed) to display who killed who, which is pretty much everything we need.