The Anatomy of Barriers

I've been working on migrating an Arm-based product to a different architecture. Throughout the process, I came across some lines of code with barriers. The challenge was that I couldn't fully understand or modify them properly without knowing exactly what those barriers were. In this post, we'll take a closer look at the two major types of barriers you often encounter: compiler barriers and memory barriers. Barriers? A barrier (also known as fence) is a mechanism of preventing memory operations from being reordered by compilers or CPUs. Modern processors and compilers often execute instructions out of order to improve performance, which can lead to unexpected behaviors in concurrent or low-level programming. Barriers enforce strict ordering by ensuring that certain memory operations are completed before others begin. Compiler Barriers A compiler barrier is an instruction or directive that tells the compiler: Do not move memory operations across this point. Preserve the order of instructions as written. It is important to understand: A compiler barrier only affects the compiler's optimization passes. It does not directly affect the CPU's execution or memory ordering. In other words, a compiler barrier controls the compiler, not the hardware. Modern compilers are insanely aggressive. They assume that reordering memory accesses is fine if the program's observable behavior (the as-if rule) doesn't change. However, it's a critical correctness requirement if you're writing: Lock-free data structures Spinlocks or mutexes Hardware drivers IPC code Even a simple-looking optimization can break things: *ptr = 1; flag = 1; If flag signals to another thread that *ptr is ready, but the compiler reorders these two stores, your system may crash or behave unpredictably as flag taken first before *ptr set to one. You need a compiler barrier in that case to guarantee the order you wrote is the order that gets emitted in the machine code: *ptr = 1; asm volatile("" ::: "memory"); flag = 1; It tells compilers not to reorder them. Memory Barriers on Arm When working with Arm processors, especially in multi-core or multi-threaded environments, memory consistency issues quickly become a real concern. Because Arm implements a weakly ordered memory model, the order in which memory operations appear to execute is not always the order you wrote in your code. This can lead to subtle, hard-to-reproduce bugs unless you use memory barriers properly. Modern CPUs like Arm prioritize performance. Hence, they allow memory accesses (loads and stores) to be: Reordered, Delayed, and Speculated. For instances: A store you wrote earlier might become visible to another core after a later store. A load you wrote later might complete before an earlier store is visible. This is usually harmless in single-threaded programs, but when multiple cores or devices are involved, it can break correctness. Arm defines three main types of memory barrier instructions: dmb, dsb, and isb. DMB (Data Memory Barrier) Ensures that memory accesses before the dmb are globally observed before memory accesses after it. Only affects memory accesses - instructions can still be fetched and decoded out of order. use cases: Message passing: ensuring that data is visible before a flag is set. Synchronizing shared variables across cores. str r5, [r1] ; write data dmb ; make sure the data is globally visible str r0, [r2] ; signal that data is ready DSB (Data Synchronization Barrier) Stronger than dmb. Ensures that all memory accesses and side effects before the dsb are complete, and execution doesn't proceed until they are done. use cases: Before entering low-power states: wfi, and wfe. Before sending interrupts via memory-mapped registers. After cache or TLB maintenance operations. str r5, [r1] ; update a shared buffer dsb ; ensure the update is complete wfi ; wait for interrupt ISB (Instruction Synchronization Barrier) Flushes the CPU's pipeline. Ensures that all instructions following the ISB are fetched anew. Used after changing system control registers or modifying code at runtime. use cases: After enabling or disabling MMU, caches, or other system registers. mcr p15, 0, r0, c1, c0, 0 ; update system control register isb ; ensure the update takes effect immediately Let's summarize when to use each barrier: Scenario Barrier Ensure memory write ordering across cores dmb Complete all previous memory transactions before continuing dsb Flush the instruction pipeline after system configuration changes isb Sending an interrupt (through a mailbox) after writing data dsb Cleaning cache lines and invalidating TLBs dsb + isb Conclusion Compiler, and memory barriers are essential tools for writing correct and reliable

May 5, 2025 - 11:38
 0
The Anatomy of Barriers

I've been working on migrating an Arm-based product to a different architecture. Throughout the process, I came across some lines of code with barriers. The challenge was that I couldn't fully understand or modify them properly without knowing exactly what those barriers were.

In this post, we'll take a closer look at the two major types of barriers you often encounter: compiler barriers and memory barriers.

Barriers?

A barrier (also known as fence) is a mechanism of preventing memory operations from being reordered by compilers or CPUs. Modern processors and compilers often execute instructions out of order to improve performance, which can lead to unexpected behaviors in concurrent or low-level programming. Barriers enforce strict ordering by ensuring that certain memory operations are completed before others begin.

Compiler Barriers

A compiler barrier is an instruction or directive that tells the compiler:

Do not move memory operations across this point. Preserve the order of instructions as written.

It is important to understand:

  1. A compiler barrier only affects the compiler's optimization passes.
  2. It does not directly affect the CPU's execution or memory ordering. In other words, a compiler barrier controls the compiler, not the hardware.

Modern compilers are insanely aggressive. They assume that reordering memory accesses is fine if the program's observable behavior (the as-if rule) doesn't change.

However, it's a critical correctness requirement if you're writing:

  • Lock-free data structures
  • Spinlocks or mutexes
  • Hardware drivers
  • IPC code

Even a simple-looking optimization can break things:

*ptr = 1;
flag = 1;

If flag signals to another thread that *ptr is ready, but the compiler reorders these two stores, your system may crash or behave unpredictably as flag taken first before *ptr set to one.

You need a compiler barrier in that case to guarantee the order you wrote is the order that gets emitted in the machine code:

*ptr = 1;
asm volatile("" ::: "memory");
flag = 1;

It tells compilers not to reorder them.

Memory Barriers on Arm

When working with Arm processors, especially in multi-core or multi-threaded environments, memory consistency issues quickly become a real concern. Because Arm implements a weakly ordered memory model, the order in which memory operations appear to execute is not always the order you wrote in your code.

This can lead to subtle, hard-to-reproduce bugs unless you use memory barriers properly.

Modern CPUs like Arm prioritize performance. Hence, they allow memory accesses (loads and stores) to be: Reordered, Delayed, and Speculated.

For instances:

  • A store you wrote earlier might become visible to another core after a later store.
  • A load you wrote later might complete before an earlier store is visible.

This is usually harmless in single-threaded programs, but when multiple cores or devices are involved, it can break correctness.

Arm defines three main types of memory barrier instructions: dmb, dsb, and isb.

DMB (Data Memory Barrier)

  • Ensures that memory accesses before the dmb are globally observed before memory accesses after it.
  • Only affects memory accesses - instructions can still be fetched and decoded out of order.

use cases:

  • Message passing: ensuring that data is visible before a flag is set.
  • Synchronizing shared variables across cores.
str r5, [r1]    ; write data
dmb             ; make sure the data is globally visible
str r0, [r2]    ; signal that data is ready

DSB (Data Synchronization Barrier)

  • Stronger than dmb.
  • Ensures that all memory accesses and side effects before the dsb are complete, and execution doesn't proceed until they are done.

use cases:

  • Before entering low-power states: wfi, and wfe.
  • Before sending interrupts via memory-mapped registers.
  • After cache or TLB maintenance operations.
str r5, [r1]    ; update a shared buffer
dsb             ; ensure the update is complete
wfi             ; wait for interrupt

ISB (Instruction Synchronization Barrier)

  • Flushes the CPU's pipeline.
  • Ensures that all instructions following the ISB are fetched anew.
  • Used after changing system control registers or modifying code at runtime.

use cases:

  • After enabling or disabling MMU, caches, or other system registers.
mcr p15, 0, r0, c1, c0, 0  ; update system control register
isb                        ; ensure the update takes effect immediately

Let's summarize when to use each barrier:

Scenario Barrier
Ensure memory write ordering across cores dmb
Complete all previous memory transactions before continuing dsb
Flush the instruction pipeline after system configuration changes isb
Sending an interrupt (through a mailbox) after writing data dsb
Cleaning cache lines and invalidating TLBs dsb + isb

Conclusion

Compiler, and memory barriers are essential tools for writing correct and reliable low-level code on a specific architecture. They might seem like magic words at first, but once you understand their role: controlling visibility and ordering of memory accesses, they become a logical part of your system design.

Understanding barriers is a rite of passage for serious system programmers. And once you get it right, your systems will be faster, safer, and far less mysterious.

Remember your code doesn't always execute as you write and expect.