Tsonnet #15 - Teaching Tsonnet to remember: adding variables
Welcome to the Tsonnet series! If you're just joining, you can check out how it all started in the first post of the series. In the previous post, we added parsing error tracing. Now we are prepared to expand the syntax and add other constructs to augment the language. It's been a while since the last post, but development didn't halt. Let's resume by adding one special feature: variables -- named local in Jsonnet. This will be a nice addition that paves the way for more interesting stuff. Let's get started with the AST Naming is hard and I felt that the AST value type wasn't expressing it properly. There were things being dealt with as values, like BinOp. They are not really values, but expressions. So I opted to represent everything as an expr, as we had at the beginning. Since locals can be declared in multiple ways, we need to make it flexible enough to support: Single declaration: // samples/variables/single.jsonnet local answer = 42; answer Multiple declarations, inline: // samples/variables/inline_multiple.jsonnet local a = 1, b = 2, c = a + b; c Multiple declarations, multiline: // samples/variables/multiline_multiple.jsonnet local a = 1; local b = 2; local c = a + b; c Scoped declarations: // samples/variables/scoped.jsonnet local a = -1; local b = (local a = 42; a); b Plus, we have been dealing with a single expr in our programs, and this new set of requirements asks for a sequence of expr. Besides that, we need a new Local variant to implement locals. The updated expr type looks like this: (* lib/ast.ml *) type expr = | Null of position | Number of position * number | Bool of position * bool | String of position * string | Ident of position * string | Array of position * expr list | Object of position * (string * expr) list | BinOp of position * bin_op * expr * expr | UnaryOp of position * unary_op * expr | Local of position * (string * expr) list | Unit | Seq of expr list let dummy_pos = { startpos = Lexing.dummy_pos; endpos = Lexing.dummy_pos; } Local will store a list of identifiers and their respective expressions. OCaml is expressive enough to allow us to have recursive types, which pair well with pattern matching. The dummy_pos will come in handy later. It's a fake position for cases when this value doesn't matter. Lexing local(s) The lexer change is trivial -- just 3 more tokens: diff --git a/lib/lexer.mll b/lib/lexer.mll index 280e0eb..49e43af 100644 --- a/lib/lexer.mll +++ b/lib/lexer.mll @@ -43,6 +43,9 @@ rule read = | '/' { DIVIDE } | '!' { NOT } | '~' { BITWISE_NOT } + | '=' { ASSIGN } + | ';' { SEMICOLON } + | "local" { LOCAL } | id { ID (Lexing.lexeme lexbuf) } | _ { raise (SyntaxError ("Unexpected char: " ^ Lexing.lexeme lexbuf)) } | eof { EOF } Oh My Parser! The parser was the one with the most changes. Check it out in its entirety (I explain below): (* lib/parser.mly *) %{ [@@@coverage exclude_file] open Ast let with_pos startpos endpos = { startpos = startpos; endpos = endpos; } %} %token INT %token FLOAT %token NULL %token BOOL %token STRING %token LEFT_SQR_BRACKET RIGHT_SQR_BRACKET %token LEFT_PAREN RIGHT_PAREN %token COMMA %token LEFT_CURLY_BRACKET RIGHT_CURLY_BRACKET %token COLON %token PLUS MINUS MULTIPLY DIVIDE %left PLUS MINUS %left MULTIPLY DIVIDE %token ID %token NOT BITWISE_NOT %left NOT BITWISE_NOT %token SEMICOLON %token LOCAL %token ASSIGN %token EOF %start prog %% prog: | e = expr; EOF { e } | e = expr_seq; EOF { e } ; expr: | e = assignable_expr { e } | e = vars { e } ; expr_seq: e = expr; SEMICOLON; rest = separated_list(SEMICOLON, expr) { Seq (e :: rest) }; assignable_expr: | e = scoped_expr { e } | e = literal { e } | e1 = assignable_expr; op = bin_op; e2 = assignable_expr { BinOp (with_pos $startpos $endpos, op, e1, e2) } | op = unary_op; e = assignable_expr; { UnaryOp (with_pos $startpos $endpos, op, e) } ; scoped_expr: | LEFT_PAREN; e = expr; RIGHT_PAREN { e } | LEFT_PAREN; e = expr_seq; RIGHT_PAREN { e } ; literal: | n = number { Number (with_pos $startpos $endpos, n) } | NULL { Null (with_pos $startpos $endpos) } | b = BOOL { Bool (with_pos $startpos $endpos, b) } | s = STRING { String (with_pos $startpos $endpos, s) } | id = ID { Ident (with_pos $startpos $endpos, id) } | LEFT_SQR_BRACKET; values = list_fields; RIGHT_SQR_BRACKET { Array (with_pos $startpos $endpos, values) } | LEFT_CURLY_BRACKET; attrs = obj_fields; RIGHT_CURLY_BRACKET { Object (with_pos $startpos $endpos, attrs) } ; list_fields: vl = separated_list(COMMA, assignable_expr) { vl }; obj_field: | k = STRING; COLON; e = assignable_expr { (k, e) } | k = ID; COLON; e = assignable_expr { (k, e) } ; obj_fields: obj = separated_list(COMMA, obj_field) { obj }; %inline number: | i = INT { Int i } | f = FLOAT { Float f } ; %inline bin_op:

Welcome to the Tsonnet series!
If you're just joining, you can check out how it all started in the first post of the series.
In the previous post, we added parsing error tracing.
Now we are prepared to expand the syntax and add other constructs to augment the language. It's been a while since the last post, but development didn't halt. Let's resume by adding one special feature: variables -- named local
in Jsonnet.
This will be a nice addition that paves the way for more interesting stuff.
Let's get started with the AST
Naming is hard and I felt that the AST value
type wasn't expressing it properly. There were things being dealt with as values, like BinOp
. They are not really values, but expressions.
So I opted to represent everything as an expr
, as we had at the beginning.
Since locals can be declared in multiple ways, we need to make it flexible enough to support:
- Single declaration:
// samples/variables/single.jsonnet
local answer = 42;
answer
- Multiple declarations, inline:
// samples/variables/inline_multiple.jsonnet
local a = 1, b = 2, c = a + b;
c
- Multiple declarations, multiline:
// samples/variables/multiline_multiple.jsonnet
local a = 1;
local b = 2;
local c = a + b;
c
- Scoped declarations:
// samples/variables/scoped.jsonnet
local a = -1;
local b = (local a = 42; a);
b
Plus, we have been dealing with a single expr
in our programs, and this new set of requirements asks for a sequence of expr
. Besides that, we need a new Local
variant to implement locals.
The updated expr
type looks like this:
(* lib/ast.ml *)
type expr =
| Null of position
| Number of position * number
| Bool of position * bool
| String of position * string
| Ident of position * string
| Array of position * expr list
| Object of position * (string * expr) list
| BinOp of position * bin_op * expr * expr
| UnaryOp of position * unary_op * expr
| Local of position * (string * expr) list
| Unit
| Seq of expr list
let dummy_pos = {
startpos = Lexing.dummy_pos;
endpos = Lexing.dummy_pos;
}
Local
will store a list of identifiers and their respective expressions.
OCaml is expressive enough to allow us to have recursive types, which pair well with pattern matching.
The dummy_pos
will come in handy later. It's a fake position for cases when this value doesn't matter.
Lexing local(s)
The lexer change is trivial -- just 3 more tokens:
diff --git a/lib/lexer.mll b/lib/lexer.mll
index 280e0eb..49e43af 100644
--- a/lib/lexer.mll
+++ b/lib/lexer.mll
@@ -43,6 +43,9 @@ rule read =
| '/' { DIVIDE }
| '!' { NOT }
| '~' { BITWISE_NOT }
+ | '=' { ASSIGN }
+ | ';' { SEMICOLON }
+ | "local" { LOCAL }
| id { ID (Lexing.lexeme lexbuf) }
| _ { raise (SyntaxError ("Unexpected char: " ^ Lexing.lexeme lexbuf)) }
| eof { EOF }
Oh My Parser!
The parser was the one with the most changes. Check it out in its entirety (I explain below):
(* lib/parser.mly *)
%{
[@@@coverage exclude_file]
open Ast
let with_pos startpos endpos = {
startpos = startpos;
endpos = endpos;
}
%}
%token <int> INT
%token <float> FLOAT
%token NULL
%token <bool> BOOL
%token <string> STRING
%token LEFT_SQR_BRACKET RIGHT_SQR_BRACKET
%token LEFT_PAREN RIGHT_PAREN
%token COMMA
%token LEFT_CURLY_BRACKET RIGHT_CURLY_BRACKET
%token COLON
%token PLUS MINUS MULTIPLY DIVIDE
%left PLUS MINUS
%left MULTIPLY DIVIDE
%token <string> ID
%token NOT BITWISE_NOT
%left NOT BITWISE_NOT
%token SEMICOLON
%token LOCAL
%token ASSIGN
%token EOF
%start <Ast.expr> prog
%%
prog:
| e = expr; EOF { e }
| e = expr_seq; EOF { e }
;
expr:
| e = assignable_expr { e }
| e = vars { e }
;
expr_seq:
e = expr; SEMICOLON; rest = separated_list(SEMICOLON, expr) { Seq (e :: rest) };
assignable_expr:
| e = scoped_expr { e }
| e = literal { e }
| e1 = assignable_expr; op = bin_op; e2 = assignable_expr { BinOp (with_pos $startpos $endpos, op, e1, e2) }
| op = unary_op; e = assignable_expr; { UnaryOp (with_pos $startpos $endpos, op, e) }
;
scoped_expr:
| LEFT_PAREN; e = expr; RIGHT_PAREN { e }
| LEFT_PAREN; e = expr_seq; RIGHT_PAREN { e }
;
literal:
| n = number { Number (with_pos $startpos $endpos, n) }
| NULL { Null (with_pos $startpos $endpos) }
| b = BOOL { Bool (with_pos $startpos $endpos, b) }
| s = STRING { String (with_pos $startpos $endpos, s) }
| id = ID { Ident (with_pos $startpos $endpos, id) }
| LEFT_SQR_BRACKET; values = list_fields; RIGHT_SQR_BRACKET { Array (with_pos $startpos $endpos, values) }
| LEFT_CURLY_BRACKET; attrs = obj_fields; RIGHT_CURLY_BRACKET { Object (with_pos $startpos $endpos, attrs) }
;
list_fields:
vl = separated_list(COMMA, assignable_expr) { vl };
obj_field:
| k = STRING; COLON; e = assignable_expr { (k, e) }
| k = ID; COLON; e = assignable_expr { (k, e) }
;
obj_fields:
obj = separated_list(COMMA, obj_field) { obj };
%inline number:
| i = INT { Int i }
| f = FLOAT { Float f }
;
%inline bin_op:
| PLUS { Add }
| MINUS { Subtract }
| MULTIPLY { Multiply }
| DIVIDE { Divide }
;
%inline unary_op:
| PLUS { Plus }
| MINUS { Minus }
| NOT { Not }
| BITWISE_NOT { BitwiseNot }
;
var:
varname = ID; ASSIGN; e = assignable_expr { (varname, e) };
vars:
LOCAL; vars = separated_nonempty_list(COMMA, var) { Local (with_pos $startpos $endpos, vars) };
The parser added the new tokens, but it got revamped in the process:
- The
prog
rule now accepts anexp_seq
since a program is composed of one or more sequential expressions. - Rules such as
number
,bin_op
,unary_op
, and will be inlined from where they are being used. This helps avoid shift/reduce ambiguities in certain cases. - The
literal
rule is just nice to have, as it helps with readability. - The new
scoped_expr
rule will take care of expressions or expression sequences wrapped in parentheses. A requirement for handling local and object attribute assignments. - Now we also need to differentiate
assignable_expr
fromexpr
. Variable declarations cannot be assigned unless they are scoped. This is cleanly captured byexpr
. The use of recursive rule calls makes the grammar quite expressive and easy to handle. - Last but not least, the
var
andvars
rules take care of parsing locals. ## Evaluating the environment
The interpreter needs to be tackled in parts. It's a big one.
The entry point (the run
function) needs to initiate an empty global environment that will hold the variables during interpretation:
let run (filename: string) : (string, string) result =
let env = Env.Map.empty in
parse filename >>= interpret env >>= fun (_env, expr) -> Json.expr_to_string expr
Each recursive call inherits its local env
.
Here's the Env
module implementation:
(* lib/env.ml *)
module Env =
struct
type t = string
let compare = String.compare
end
module Map = Map.Make(Env)
let empty = Map.empty
let keys env = Map.fold (fun k _ acc -> k :: acc) env []
It implements a dictionary using Map -- carefully chosen for its immutability property. As we are going to pass this environment around, it is an important trait to have. If we update the environment in-place, let's say using Hashtbl, we need to undo changes as we leave the scope, and this complicates the implementation and increases the likelihood of bugs A LOT. Neat!
Fun fact: functional programming languages tend to implement immutable data structures as persistent data structures since values are immutable by default (generally), it is safe to share references. It has some nice properties, like thread safety and easier reasoning, to name a few.
Let's get back to the interpreter.
Here's the meatier interpret
function (split by parts for ease of reasoning):
(** [interpret expr] interprets and reduce the intermediate AST [expr] into a result AST. *)
let rec interpret env expr =
match expr with
| Null _ | Bool _ | String _ | Number _ | Array _ | Object _ -> ok (env, expr)
| Ident (pos, varname) ->
(match Env.Map.find_opt varname env with
| Some expr -> interpret env expr
| None -> Error.trace ("Undefined variable: " ^ varname) pos >>= error)
When evaluating Ident
, we look up the name in the local environment. If found, we retrieve and interpret
it. If not found, we raise an undefined variable error.
| BinOp (pos, op, e1, e2) ->
(let* (_, e1') = interpret env e1 in
let* (_, e2') = interpret env e2 in
match op, e1', e2' with
| Add, (String _ as v1), (_ as v2) | Add, (_ as v1), (String _ as v2) ->
let* expr' = interpret_concat_op v1 v2 in
ok (env, expr')
| _, Number (pos, v1), Number (_, v2) ->
ok (env, Number (pos, interpret_arith_op op v1 v2))
| _ ->
Error.trace "Invalid binary operation" pos >>= error)
| UnaryOp (pos, op, expr) ->
(let* (env', expr') = interpret env expr in
Result.fold (interpret_unary_op op expr')
~ok:(fun expr' -> ok (env', expr'))
~error:(fun errmsg -> Error.trace errmsg pos >>= error)
)
Here's the new entry, the local token. Since it is a list of variables, we iterate over each entry and add the expr
to the environment:
| Local (_, vars) ->
let acc_fun env (varname, expr) = Env.Map.add varname expr env in
let env' = List.fold_left acc_fun env vars
in ok (env', Unit)
The semantics here are important. Each item we iterate will accumulate in a new env
, which should be passed to the next, adding variables to the scope.
We need to be attentive here to one aspect of Jsonnet. It's lazy, so the expr
is added to the environment unevaluated. It will be evaluated if and only if eventually needed to generate a result. This is an important characteristic that we need to comply with.
In the end, the accumulated environment is returned, and the program continues.
We have two other new matching types:
- Unit simply returns itself -- nothing to do.
- Sequences will be interpreted one by one till they end.
| Unit -> ok (env, Unit)
| Seq exprs ->
(match exprs with
| [] -> ok (env, Unit)
| [expr] -> interpret env expr
| (expr :: exprs) -> interpret env expr >>= fun (env', _) -> interpret env' (Seq exprs))
The function interpret_arith_op
is omitted -- it simply does arithmetic operations as before.
In interpret_concat_op
, we don't need to do any crazy expr
tricks when concatenating strings anymore:
let interpret_concat_op (e1 : expr) (e2 : expr) : (expr, string) result =
match e1, e2 with
| String (_, s1), String (_, s2) ->
ok (String (dummy_pos, s1^s2))
| String (_, s1), val2 ->
let* s2 = Json.expr_to_string val2 in ok (String (dummy_pos, s1^s2))
| val1, String (_, s2) ->
let* s1 = Json.expr_to_string val1 in ok (String (dummy_pos, s1^s2))
| _ ->
error "Invalid string concatenation operation"
The resulting string has no position in the source code, so dummy_pos
it is. Another possibility could be having a type to represent an evaluated string, but we are going to be just fine with it as is -- it's simpler for now.
We can't forget the tests
Here's a new cram test file diff with the new samples and their results:
diff --git a/samples/variables/inline_multiple.jsonnet b/samples/variables/inline_multiple.jsonnet
new file mode 100644
index 0000000..8ad1c34
--- /dev/null
+++ b/samples/variables/inline_multiple.jsonnet
@@ -0,0 +1,2 @@
+local a = 1, b = 2, c = a + b;
+c
diff --git a/samples/variables/multiline_multiple.jsonnet b/samples/variables/multiline_multiple.jsonnet
new file mode 100644
index 0000000..86b94a7
--- /dev/null
+++ b/samples/variables/multiline_multiple.jsonnet
@@ -0,0 +1,4 @@
+local a = 1;
+local b = 2;
+local c = a + b;
+c
diff --git a/samples/variables/scoped.jsonnet b/samples/variables/scoped.jsonnet
new file mode 100644
index 0000000..6408281
--- /dev/null
+++ b/samples/variables/scoped.jsonnet
@@ -0,0 +1,3 @@
+local a = -1;
+local b = (local a = 42; a);
+b
diff --git a/samples/variables/single.jsonnet b/samples/variables/single.jsonnet
new file mode 100644
index 0000000..ea998bb
--- /dev/null
+++ b/samples/variables/single.jsonnet
@@ -0,0 +1,2 @@
+local answer = 42;
+answer
diff --git a/test/cram/variables.t b/test/cram/variables.t
new file mode 100644
index 0000000..b9f54f1
--- /dev/null
+++ b/test/cram/variables.t
@@ -0,0 +1,11 @@
+ $ tsonnet ../../samples/variables/single.jsonnet
+ 42
+
+ $ tsonnet ../../samples/variables/inline_multiple.jsonnet
+ 3
+
+ $ tsonnet ../../samples/variables/multiline_multiple.jsonnet
+ 3
+
+ $ tsonnet ../../samples/variables/scoped.jsonnet
+ 42
We can't get away without testing mistyped locals, so here's a sample and its cram test:
diff --git a/samples/errors/unscoped_local.jsonnet b/samples/errors/unscoped_local.jsonnet
new file mode 100644
index 0000000..e27a568
--- /dev/null
+++ b/samples/errors/unscoped_local.jsonnet
@@ -0,0 +1,2 @@
+local a = local b = 1;
+a
diff --git a/test/cram/errors.t b/test/cram/errors.t
index d1be504..7b2e2ab 100644
--- a/test/cram/errors.t
+++ b/test/cram/errors.t
@@ -29,3 +29,11 @@
3: 3+3+3+false+
^^^^^^^^^^^^
[1]
+
+ $ tsonnet ../../samples/errors/unscoped_local.jsonnet
+ ../../samples/errors/unscoped_local.jsonnet:1:15 Invalid syntax
+
+ 1: local a = local b = 1;
+ ^^^^^^^^^^^^^^^^
+ [1]
+
Menhir explain and grammar ambiguities
Menhir has a great feature that can be enabled with the flag --explain
when using its standalone CLI, rather than integrated with dune (like we've been using).
It will output debugging information, specially ambiguities in the grammar, that I've used extensively while adding locals. So much so that it gained its own place in the Makefile:
diff --git a/Makefile b/Makefile
index 2285130..302068b 100644
--- a/Makefile
+++ b/Makefile
@@ -17,3 +17,7 @@ clean:
.PHONY: prepare-dev-setup
prepare-dev-setup:
mise install
+
+.PHONY: explain
+explain:
+ menhir --explain lib/parser.mly
Conclusion
Phew, what a big write-up! Bindings are stepping stones in any programming language, so they deserve careful attention to get them right!
The diff in its entirety can be seen here, if you want to follow the precise change.
I still want to write about late binding, but I'll leave it to another post.
Subscribe to Bit Maybe Wise and give your JSON some local variables to play with. We promise our posts are lazily evaluated but eagerly delivered!
Photo by Namroud Gorguis on Unsplash