Dev.to

Verifying two interpreter engines with one test suite

Crosscheck, my cross-engine testing framework for Memphis, had been sitting atop my writing to-do list for a while. Instead, I deleted it and replaced it. Memphis can now test itself across all engines. What does this mean? I can verify the treewalk interpreter and bytecode VM interpreter produce the same results (return value or symbol table entries) for a given snippet of Python. This functionality is available in unit tests, rather than only in integration tests. In Rust, this means in src/ rather than tests/. Let’s look at a couple of examples. Testing expressions Any interpreter worth its table salt should be able to evaluate 2 + 2. [An expression returns a value, whereas a statement modifies control flow or the symbol table. I couldn’t have defined this clearly before I began this project, so I hope this helps.] #[test] fn binary_expression() { let input = "2 + 2"; assert_crosscheck_return!(input, MemphisValue::Integer(4)); } A few building blocks were necessary to get here. First, I needed to represent a Python value independent of the runtime engine. This isn’t exactly necessary for integers, but come lists or objects, my architecture needed each engine to be able to manage interior mutability differently. Second, I needed a simple initialization flow, which would hide the complexity of initializing two interpreter engines and running a snippet of code through both. I’d had a few builders over time, but they always did too much. I wanted to truly hide all this behind assert_crosscheck_return!. Speaking of which, let’s look inside. macro_rules! assert_crosscheck_return { ($src:expr, $expected:expr) => {{ let mut session = $crate::crosscheck::CrosscheckSession::new($crate::domain::Source::from_text($src)); let (tw_val, vm_val) = session.eval(); assert_eq!( tw_val, $expected, "Treewalk return value did not match expected" ); assert_eq!( vm_val, $expected, "Bytecode VM return value did not match expected" ); }}; } Awesome! It calls something else. Sounds and reads like software. It’s worth mentioning here why assert_crosscheck_return is a macro rather than a helper function. When a panic happens inside a macro, it reports the line number of your test file, not of the macro itself. This is due to how Rust resolves them before compile-time. Some of my macros call helper functions under the hood, but having the assertions at the macro-level makes debugging way simpler. We create a CrosscheckSession, which creates two MemphisContext objects. Since this is test-only, I don’t mind loudly failing with expect inside of eval. pub struct CrosscheckSession { treewalk: MemphisContext, vm: MemphisContext, } impl CrosscheckSession { pub fn new(source: Source) -> Self { let treewalk = MemphisContext::new(Engine::Treewalk, source.clone()); let vm = MemphisContext::new(Engine::BytecodeVm, source); Self { treewalk, vm } } /// Run both engines; confirm they return the same value, then return the value. Useful /// for evaluating expressions or statements which only return a single value. pub fn eval(&mut self) -> (MemphisValue, MemphisValue) { let tw_val = self.treewalk.run().expect("Treewalk run failed."); let vm_val = self.vm.run().expect("VM run failed."); (tw_val, vm_val) } } Each MemphisContext manages the lexer/parser/interpreter lifetime for a single Engine. pub struct MemphisContext { lexer: Lexer, interpreter: Box, } impl MemphisContext { pub fn new(engine: Engine, source: Source) -> Self { let lexer = Lexer::new(&source); let interpreter = init_interpreter(engine, source.clone()); Self { lexer, interpreter } } pub fn run(&mut self) -> Result { let MemphisContext { lexer, interpreter, .. } = self; let mut parser = Parser::new(lexer); interpreter.run(&mut parser) } } Testing statements, classes, and more Let’s look at a slightly more involved example, a test which defines a class with one attribute and a couple of methods. #[test] fn method_call() { let mut session = crosscheck_eval!( r#" class Foo: def init(self, val): self.val = val def bar(self): return self.val f = Foo(10) b = f.bar() "# ); assert_crosscheck_eq!(session, "b", MemphisValue::Integer(10)); } Our pair of new macros here allows us to start a crosscheck session. Each session will run the provided code snippet through each engine, then stay alive so we can query its symbol table. macro_rules! crosscheck_eval { ($src:expr) => {{ $crate::crosscheck::CrosscheckSession::new($crate::domain::Source::from_text($src)) .run() .expect("Crosscheck session failed") }}; } macro_rules! assert_cross

May 5, 2025 - 12:45

Verifying two interpreter engines with one test suite

Crosscheck, my cross-engine testing framework for Memphis, had been sitting atop my writing to-do list for a while.

Instead, I deleted it and replaced it. Memphis can now test itself across all engines.

What does this mean?

I can verify the treewalk interpreter and bytecode VM interpreter produce the same results (return value or symbol table entries) for a given snippet of Python.
This functionality is available in unit tests, rather than only in integration tests. In Rust, this means in src/ rather than tests/.

Let’s look at a couple of examples.

Testing expressions

Any interpreter worth its table salt should be able to evaluate 2 + 2.

[An expression returns a value, whereas a statement modifies control flow or the symbol table. I couldn’t have defined this clearly before I began this project, so I hope this helps.]

#[test]
fn binary_expression() {
    let input = "2 + 2";
    assert_crosscheck_return!(input, MemphisValue::Integer(4));
}

A few building blocks were necessary to get here.

First, I needed to represent a Python value independent of the runtime engine. This isn’t exactly necessary for integers, but come lists or objects, my architecture needed each engine to be able to manage interior mutability differently.

Second, I needed a simple initialization flow, which would hide the complexity of initializing two interpreter engines and running a snippet of code through both. I’d had a few builders over time, but they always did too much. I wanted to truly hide all this behind assert_crosscheck_return!. Speaking of which, let’s look inside.

macro_rules! assert_crosscheck_return {
    ($src:expr, $expected:expr) => {{
        let mut session =
            $crate::crosscheck::CrosscheckSession::new($crate::domain::Source::from_text($src));
        let (tw_val, vm_val) = session.eval();
        assert_eq!(
            tw_val, $expected,
            "Treewalk return value did not match expected"
        );
        assert_eq!(
            vm_val, $expected,
            "Bytecode VM return value did not match expected"
        );
    }};
}

Awesome! It calls something else. Sounds and reads like software.

It’s worth mentioning here why assert_crosscheck_return is a macro rather than a helper function. When a panic happens inside a macro, it reports the line number of your test file, not of the macro itself. This is due to how Rust resolves them before compile-time. Some of my macros call helper functions under the hood, but having the assertions at the macro-level makes debugging way simpler.

We create a CrosscheckSession, which creates two MemphisContext objects. Since this is test-only, I don’t mind loudly failing with expect inside of eval.

pub struct CrosscheckSession {
    treewalk: MemphisContext,
    vm: MemphisContext,
}

impl CrosscheckSession {
    pub fn new(source: Source) -> Self {
        let treewalk = MemphisContext::new(Engine::Treewalk, source.clone());
        let vm = MemphisContext::new(Engine::BytecodeVm, source);
        Self { treewalk, vm }
    }

    /// Run both engines; confirm they return the same value, then return the value. Useful
    /// for evaluating expressions or statements which only return a single value.
    pub fn eval(&mut self) -> (MemphisValue, MemphisValue) {
        let tw_val = self.treewalk.run().expect("Treewalk run failed.");
        let vm_val = self.vm.run().expect("VM run failed.");

        (tw_val, vm_val)
    }
}

Each MemphisContext manages the lexer/parser/interpreter lifetime for a single Engine.

pub struct MemphisContext {
    lexer: Lexer,
    interpreter: Box<dyn Interpreter>,
}

impl MemphisContext {
    pub fn new(engine: Engine, source: Source) -> Self {
        let lexer = Lexer::new(&source);
        let interpreter = init_interpreter(engine, source.clone());

        Self { lexer, interpreter }
    }

    pub fn run(&mut self) -> Result<MemphisValue, MemphisError> {
        let MemphisContext {
            lexer, interpreter, ..
        } = self;

        let mut parser = Parser::new(lexer);
        interpreter.run(&mut parser)
    }
}

Testing statements, classes, and more

Let’s look at a slightly more involved example, a test which defines a class with one attribute and a couple of methods.

#[test]
fn method_call() {
    let mut session = crosscheck_eval!(
        r#"
class Foo:
    def __init__(self, val):
        self.val = val

    def bar(self):
        return self.val

f = Foo(10)
b = f.bar()
"#
    );
    assert_crosscheck_eq!(session, "b", MemphisValue::Integer(10));
}

Our pair of new macros here allows us to start a crosscheck session. Each session will run the provided code snippet through each engine, then stay alive so we can query its symbol table.

macro_rules! crosscheck_eval {
    ($src:expr) => {{
        $crate::crosscheck::CrosscheckSession::new($crate::domain::Source::from_text($src))
            .run()
            .expect("Crosscheck session failed")
    }};
}

macro_rules! assert_crosscheck_eq {
    ($session:expr, $name:expr, $expected:expr) => {{
        let actual = $session.read($name).expect("Symbol not found");
        assert_eq!(actual, $expected);
    }};
}

Let’s look at the two new CrosscheckSession methods, run and read. You’ll notice I break my own rule about keeping assertions at the top level inside the read method. I’ll put a ticket in the backlog to fix that later.

impl CrosscheckSession {
    /// Run both engines; discard the return value and return the session. Useful for 
    /// later reads from the symbol table with `CrosscheckSession::read`.
    pub fn run(mut self) -> Result<Self, MemphisError> {
        self.treewalk.run()?;
        self.vm.run()?;
        Ok(self)
    }

    /// Read a value from both engines; confirm they return the same value, then 
    /// return the value.
    pub fn read(&mut self, name: &str) -> Option<MemphisValue> {
        let a = self.treewalk.read(name)?;
        let b = self.vm.read(name)?;

        assert_eq!(a, b, "Engines returned different values");
        Some(a)
    }
}

And that’s it! I have a few other macros and flows to validate an expected Python runtime error, but they follow the same ideas as what I’ve shown here.

Why did I delete the old `crosscheck`?

Because it sucked.

Okay not really, it did its job for nearly a year! But it required a lot of boilerplate code. It also expected each test file to manually define what engines to verify (using TreewalkAdapter and BytecodeVmAdapter).

Have a look for yourself:

fn run_binary_expression_test<T: InterpreterTest>(mut interpreter: T) {
    let input = "2 + 2";
    match interpreter.evaluate(input) {
        Err(e) => panic!("Interpreter error: {:?}", e),
        Ok(result) => {
            assert_eq!(result, MemphisValue::Integer(4));
        }
    }
}

#[test]
fn test_treewalk_binary_expression() {
    run_binary_expression_test(TreewalkAdapter::new());
}

#[test]
fn test_bytecode_vm_binary_expression() {
    run_binary_expression_test(BytecodeVmAdapter::new());
}

Early-2024-me was pleased with this, but it is heavy.

I wrote the original as an integration test (where tests live in tests/ rather than src/) because I didn’t know what I was doing. Rust produces a separate binary for integration tests, which was unnecessary for verifying cross-engine behavior.

It also meant I was using memphis as a library — interesting on its own (a Python interpreter you can embed in other Rust code! cool!), but not what I needed here.

The End

This work took roughly a year, though I took about 11 months off in the middle.

If the founder of the Single Responsibility Principle reads this, I hope they are proud of me. Their principle served as a useful North Star.

Along the way, I used MemphisContext to support both engines in the REPL. That’s also the reason the Lexer is kept alive but the Parser isn’t—to support streaming input. But that’s a story for another day.

I hope you are well and your code returns the same value every time!

Subscribe & Save [on nothing]

Want a software career that actually feels meaningful? I wrote a free 5-day email course on honing your craft, aligning your work with your values, and building for yourself. Or just not hating your job! Get it here.

Build [With Me]

I mentor software engineers to navigate technical challenges and career growth in a supportive, sometimes silly environment. If you’re interested, you can explore my mentorship programs.

Elsewhere [From Scratch]

In addition to mentoring, I also write about neurodivergence and meaningful work. Less code and the same number of jokes.

Belonging in public while far from home - From Scratch Press