Lab 5 (AArch64) - 64-Bit Assembly Language Lab
Introduction Hi everybody! In lab 5, we are moving forward from 6502 assembly language to modern processors such as x86 and aarch64. Although 6502 is offers a minimal instruction set, these modern processors have far more advanced capabilities and architecture. In this lab, we will be exploring the assembly languages in them. This blog post will be mainly about AArch64 server. There is a part 2 to this lab where everything that is coded here will be coded in x86. All the experimentations will be made in x86 and aarch64 remote servers. Setting Up In our class servers, the code examples are obtained through the path: /public/spo600-assembler-lab-examples.tgz. We will extract this .tgz file using the command tar. tar xvf /public/spo600-assembler-lab-examples.tgz The code will be presented in this structure: spo600 └── examples └── hello # "hello world" example programs ├── assembler │ ├── aarch64 # aarch64 gas assembly language version │ │ ├── hello.s │ │ └── Makefile │ ├── Makefile │ └── x86_64 # x86_64 assembly language versions │ ├── hello-gas.s # ... gas syntax │ ├── hello-nasm.s # ... nasm syntax │ └── Makefile └── c # Portable C versions ├── hello2.c # ... using write() ├── hello3.c # ... using syscall() ├── hello.c # ... using printf() └── Makefile AArch64 Assembly Program First, we will look at aarch64 server. Log in and go to the AArch64 assembly example directory: cd ~/spo600/examples/hello/assembler/aarch64 There is a hello.s file. This is the source file of our code! Next, you will see that there is a Makefile in this location. This means that we can run make and compile our program. Build the program using the make command. Based on the dependencies defined in the Makefile, make determines which parts of the code have changed and need to be rebuilt, then executes the necessary commands to update only those parts. make After this, you will see a binary file named hello in the directory. Run this resulting binary file. ./hello It will show you a "Hello, world!" on your terminal. Okay, now let's take a look at the code using objdump command. I want to compare the disassembled output of the object file (hello.o) to the source file (hello.s). objdump -d hello.o > hello_disassembled.txt Disassembled Output (hello.o) hello.o: file format elf64-littleaarch64 Disassembly of section .text: 0000000000000000 : 0: d2800020 mov x0, #0x1 // #1 4: 10000001 adr x1, 0 8: d28001c2 mov x2, #0xe // #14 c: d2800808 mov x8, #0x40 // #64 10: d4000001 svc #0x0 14: d2800000 mov x0, #0x0 // #0 18: d2800ba8 mov x8, #0x5d // #93 1c: d4000001 svc #0x0 Source File (hello.s) .text .globl _start _start: mov x0, 1 /* file descriptor: 1 is stdout */ adr x1, msg /* message location (memory address) */ mov x2, len /* message length (bytes) */ mov x8, 64 /* write is syscall #64 */ svc 0 /* invoke syscall */ mov x0, 0 /* status -> 0 */ mov x8, 93 /* exit is syscall #93 */ svc 0 /* invoke syscall */ .data msg: .ascii "Hello, world!\n" len= . - msg Looking at these two files, the disassembly file is an accurate instruction-by-instruction translation into machine code from our source file. The only difference is when we use the msg label in the adr instruction. However, this arose from the nature of how symbols are represented in disassembled code. Modifying the AArch64 Assembly Program Here is a basic loop in AArch64 assembler. .text .globl _start min = 0 /* starting value for the loop index; **note that this is a symbol (constant)**, not a variable */ max = 6 /* loop exits when the index hits this number (loop condition is i

Introduction
Hi everybody! In lab 5, we are moving forward from 6502
assembly language to modern processors such as x86
and aarch64
. Although 6502
is offers a minimal instruction set, these modern processors have far more advanced capabilities and architecture. In this lab, we will be exploring the assembly languages in them.
This blog post will be mainly about AArch64
server. There is a part 2 to this lab where everything that is coded here will be coded in x86
.
All the experimentations will be made in x86
and aarch64
remote servers.
Setting Up
In our class servers, the code examples are obtained through the path: /public/spo600-assembler-lab-examples.tgz
.
We will extract this .tgz
file using the command tar
.
tar xvf /public/spo600-assembler-lab-examples.tgz
The code will be presented in this structure:
spo600
└── examples
└── hello # "hello world" example programs
├── assembler
│ ├── aarch64 # aarch64 gas assembly language version
│ │ ├── hello.s
│ │ └── Makefile
│ ├── Makefile
│ └── x86_64 # x86_64 assembly language versions
│ ├── hello-gas.s # ... gas syntax
│ ├── hello-nasm.s # ... nasm syntax
│ └── Makefile
└── c # Portable C versions
├── hello2.c # ... using write()
├── hello3.c # ... using syscall()
├── hello.c # ... using printf()
└── Makefile
AArch64
Assembly Program
First, we will look at aarch64
server. Log in and go to the AArch64 assembly example directory:
cd ~/spo600/examples/hello/assembler/aarch64
There is a hello.s
file. This is the source file of our code!
Next, you will see that there is a Makefile
in this location. This means that we can run make
and compile our program. Build the program using the make
command.
Based on the dependencies defined in the
Makefile
,make
determines which parts of the code have changed and need to be rebuilt, then executes the necessary commands to update only those parts.
make
After this, you will see a binary file named hello
in the directory. Run this resulting binary file.
./hello
It will show you a "Hello, world!" on your terminal.
Okay, now let's take a look at the code using objdump
command. I want to compare the disassembled output of the object file (hello.o
) to the source file (hello.s
).
objdump -d hello.o > hello_disassembled.txt
-
Disassembled Output (
hello.o
)
hello.o: file format elf64-littleaarch64
Disassembly of section .text:
0000000000000000 <_start>:
0: d2800020 mov x0, #0x1 // #1
4: 10000001 adr x1, 0 <_start>
8: d28001c2 mov x2, #0xe // #14
c: d2800808 mov x8, #0x40 // #64
10: d4000001 svc #0x0
14: d2800000 mov x0, #0x0 // #0
18: d2800ba8 mov x8, #0x5d // #93
1c: d4000001 svc #0x0
-
Source File (
hello.s
)
.text
.globl _start
_start:
mov x0, 1 /* file descriptor: 1 is stdout */
adr x1, msg /* message location (memory address) */
mov x2, len /* message length (bytes) */
mov x8, 64 /* write is syscall #64 */
svc 0 /* invoke syscall */
mov x0, 0 /* status -> 0 */
mov x8, 93 /* exit is syscall #93 */
svc 0 /* invoke syscall */
.data
msg: .ascii "Hello, world!\n"
len= . - msg
Looking at these two files, the disassembly file is an accurate instruction-by-instruction translation into machine code from our source file. The only difference is when we use the msg
label in the adr
instruction. However, this arose from the nature of how symbols are represented in disassembled code.
Modifying the AArch64
Assembly Program
Here is a basic loop in AArch64 assembler.
.text
.globl _start
min = 0 /* starting value for the loop index; **note that this is a symbol (constant)**, not a variable */
max = 6 /* loop exits when the index hits this number (loop condition is i
_start:
mov x19, min
loop:
/* ... body of the loop ... do something useful here ... */
add x19, x19, 1 /* increment the loop counter */
cmp x19, max /* see if we've hit the max */
b.ne loop /* if not, then continue the loop */
mov x0, 0 /* set exit status to 0 */
mov x8, 93 /* exit is syscall #93 */
svc 0 /* invoke syscall */
The code is looping 6 times (max
is 6). It stores the loop's index in register 19 (x19
) to keep track of iterations. The body of the loop is empty right now.
☑️ Print Loop
Let's change this code so that it prints out "Loop" on every iteration. We added a .data
section and added the print in the body of the loop.
Change hello.s
like below:
.text
.globl _start
min = 0 /* starting value for the loop index */
max = 6 /* loop exits when the index hits this number */
_start:
mov x19, min /* initialize loop counter */
loop:
/* Print "Loop" message */
mov x0, 1 /* file descriptor: 1 is stdout */
adr x1, msg /* message location (memory address) */
mov x2, len /* message length (bytes) */
mov x8, 64 /* write is syscall #64 */
svc 0 /* invoke syscall */
add x19, x19, 1 /* increment the loop counter */
cmp x19, max /* see if we've hit the max */
b.ne loop /* if not, then continue the loop */
mov x0, 0 /* set exit status to 0 */
mov x8, 93 /* exit is syscall #93 */
svc 0 /* invoke syscall */
.data
msg: .ascii "Loop\n"
len= . - msg
Test this new code by using make
and running
make clean
make
./hello
Output:
[kzaw@aarch64-002 aarch64]$ ./hello
Loop
Loop
Loop
Loop
Loop
Loop
☑️ Print Loop and Index Number
Let's change the code again so that it will print Loop: #
where '#' is the current index number.
To do that, we will need to convert our loop counter number to its ASCII character representation. In ASCII/ISO-8859-1/Unicode UTF-8, the digit characters are in the range 48-57 (0x30-0x39).
add x20, x19, #48 /* x20 = x19 + 48 (ASCII '0') */
This converts the loop counter in register 19 (x19
) by adding 48 value (ASCII code for '0') and stores the new value in x20
. For example,
- when
x19
= 0,x20
becomes 48 (ASCII '0'). - when
x19
= 1,x20
becomes 49 (ASCII '1'). - And so on...
Next, define our message.
msg: .ascii "Loop: #\n" /* # is a placeholder for the digit */
I added this below code into the body of the loop. What's happening here is that I am getting the address of the msg
, and then adding an offset of 6 bytes to point to the position after "Loop: ". (which is 6 characters long).
The
strb
instruction writes a byte from a register into a memory.
w20
is the 32 bit view of x20
. When used with strb
, this means we take the lowest 8 bits (1 byte) from that register. In this context, where x20
is pointing to an ASCII digit (which takes 1 byte), the instruction will ignore the other 24 bits and ensure unnecessary space isn't taken.
[x21]
is the location to store the ASCII digit to. Now this whole thing means the instruction will write that ASCII digit to the memory location pointed to by x21
('#').
adr x21, msg /* Get address of message */
add x21, x21, #6 /* Position of the digit character (after "Loop: ") */
strb w20, [x21] /* Store the ASCII character at that position */
✨
Modified Code:
.text
.globl _start
min = 0 /* starting value for the loop index */
max = 6 /* loop exits when the index hits this number */
_start:
mov x19, min /* initialize loop counter */
loop:
/* Convert loop counter to ASCII character */
add x20, x19, #48 /* x20 = x19 + 48 (ASCII '0') */
/* Store the ASCII digit in the message */
adr x21, msg /* Get address of message */
add x21, x21, #6 /* Position of the digit character (after "Loop: ") */
strb w20, [x21] /* Store the ASCII character at that position */
/* Print message with loop counter */
mov x0, 1 /* file descriptor: 1 is stdout */
adr x1, msg /* message location (memory address) */
mov x2, len /* message length (bytes) */
mov x8, 64 /* write is syscall #64 */
svc 0 /* invoke syscall */
add x19, x19, 1 /* increment the loop counter */
cmp x19, max /* see if we've hit the max */
b.ne loop /* if not, then continue the loop */
mov x0, 0 /* set exit status to 0 */
mov x8, 93 /* exit is syscall #93 */
svc 0 /* invoke syscall */
.data
msg: .ascii "Loop: #\n" /* # is a placeholder for the digit */
len= . - msg
Output:
[kzaw@aarch64-002 aarch64]$ ./hello
Loop: 0
Loop: 1
Loop: 2
Loop: 3
Loop: 4
Loop: 5
☑️ Loop From 00 - 32
The next requirement is to loop from 00 - 32, printing in 2-digit decimal numbers. Besides changing the max
symbol to 33, there are other important changes we must do.
We copy the loop counter to x22.
mov x22, x19 /* Copy loop counter to x22 */
Next, we need to change our code to cater for two-digit conversion.
Before, it was just adding the ASCII number to our counter and that was it. Now it is more complicated.
How we calculate the tens digit is basically divide by 10. The udiv
instruction gets the quotient in the division.
So x20 = x22 / 10
/* Calculate tens digit: quotient of division by 10 */
mov x23, #10 /* Set divisor to 10 */
udiv x20, x22, x23 /* x20 = x22 / 10 (quotient = tens digit) */
Now that the tens digit is extracted, we can get the ones digit by getting the remainder. To do this, we will use mul
to multiply the quotient with 10. This value is then subtracted from original value to get the remainder.
/* Calculate ones digit: remainder of division by 10 */
mul x24, x20, x23 /* x24 = quotient * 10 */
sub x21, x22, x24 /* x21 = original - (quotient * 10) = remainder */
After all of this, we can convert everything to ASCII.
/* Convert digits to ASCII */
add x20, x20, #48 /* Convert tens digit to ASCII */
add x21, x21, #48 /* Convert ones digit to ASCII */
Let me show you a table of registers, comparing the old and new so we can understand better.
Register | Original Purpose | New Purpose |
---|---|---|
x19 | Loop counter | Loop counter (unchanged) |
x20 | ASCII digit | Tens digit (after conversion to ASCII) |
x21 | Message pointer | Ones digit (after conversion to ASCII) |
x22 | – | Copy of loop counter for calculations |
x23 | – | Constant value 10 (divisor) |
x24 | – | Pointer for message buffer manipulation |
Calculation is completed! Change the logic where we change the message to accommodate the two digits.
/* Store the ASCII digits in the message */
adr x24, msg /* Get address of message */
add x24, x24, #6 /* Position of the first digit (after "Loop: ") */
strb w20, [x24] /* Store the tens digit */
add x24, x24, #1 /* Move to the position of the second digit */
strb w21, [x24] /* Store the ones digit */
✨
Modified Code:
.text
.globl _start
min = 0 /* starting value for the loop index */
max = 33 /* loop exits when the index hits this number */
_start:
mov x19, min /* initialize loop counter */
loop:
/* Convert loop counter to two ASCII digits */
mov x22, x19 /* Copy loop counter to x22 */
/* Calculate tens digit: quotient of division by 10 */
mov x23, #10 /* Set divisor to 10 */
udiv x20, x22, x23 /* x20 = x22 / 10 (quotient = tens digit) */
/* Calculate ones digit: remainder of division by 10 */
mul x24, x20, x23 /* x24 = quotient * 10 */
sub x21, x22, x24 /* x21 = original - (quotient * 10) = remainder */
/* Convert digits to ASCII */
add x20, x20, #48 /* Convert tens digit to ASCII */
add x21, x21, #48 /* Convert ones digit to ASCII */
/* Store the ASCII digits in the message */
adr x24, msg /* Get address of message */
add x24, x24, #6 /* Position of the first digit (after "Loop: ") */
strb w20, [x24] /* Store the tens digit */
add x24, x24, #1 /* Move to the position of the second digit */
strb w21, [x24] /* Store the ones digit */
/* Print message with loop counter */
mov x0, 1 /* file descriptor: 1 is stdout */
adr x1, msg /* message location (memory address) */
mov x2, len /* message length (bytes) */
mov x8, 64 /* write is syscall #64 */
svc 0 /* invoke syscall */
add x19, x19, 1 /* increment the loop counter */
cmp x19, max /* see if we've hit the max */
b.ne loop /* if not, then continue the loop */
mov x0, 0 /* set exit status to 0 */
mov x8, 93 /* exit is syscall #93 */
svc 0 /* invoke syscall */
.data
msg: .ascii "Loop: ##\n" /* ## are placeholders for the two digits */
len= . - msg
Output:
[kzaw@aarch64-002 aarch64]$ ./hello
Loop: 00
Loop: 01
...
Loop: 09
Loop: 10
...
Loop: 32
☑️ Loop Without Trailing Zeros
The next change we're making is removing the leading zero for single-digit numbers.
To make this happen, we need to implement conditional logic that detects whether we're dealing with a single-digit or double-digit number, and formats the output accordingly.
We need to use different message formats depending on the number's value:
- For numbers 0-9: Use "Loop: #" (notice two spaces after the colon)
- For numbers 10-32: Use "Loop: ##" (notice one space after the colon)
.data
msg1: .ascii "Loop: #\n" /* Single-digit format (note: two spaces after colon) */
len1= . - msg1
msg2: .ascii "Loop: ##\n" /* Double-digit format (note: one space after colon) */
len2= . - msg2
The KEY change is this conditional check. We compare the tens digit with the ASCII value '0'. If it's not '0', then we know we have a two-digit number. We will make a new double_digit
function for our condition.
/* Determine if number is single or double digit */
cmp x20, #48 /* Compare tens digit to ASCII '0' */
b.ne double_digit /* If not '0', it's a double-digit number */
If it's not double digit, then we continue on to the single digit case. Use msg1
and store in position 6. After the logic, jump to the common print_msg
routine.
/* Single-digit case (0-9) */
adr x24, msg1 /* Get address of single-digit message */
strb w21, [x24, #6] /* Store ones digit at position after "Loop: " */
mov x1, x24 /* Set message address for print */
mov x2, len1 /* Set message length */
b print_msg /* Jump to print routine */
If it's double digit, we would jump to double_digit
routine. Use msg2
, set the length and address up for printing and then fall through to the print logic.
double_digit:
/* Double-digit case (10-32) */
adr x24, msg2 /* Get address of double-digit message */
strb w20, [x24, #6] /* Store tens digit */
strb w21, [x24, #7] /* Store ones digit */
mov x1, x24 /* Set message address for print */
mov x2, len2 /* Set message length */
This is the printing logic below.
print_msg:
/* Print message with loop counter */
mov x0, 1 /* file descriptor: 1 is stdout */
mov x8, 64 /* write is syscall #64 */
svc 0 /* invoke syscall */
✨
Modified Code:
.text
.globl _start
min = 0 /* starting value for the loop index */
max = 33 /* loop exits when the index hits this number */
_start:
mov x19, min /* initialize loop counter */
loop:
/* Convert loop counter to two ASCII digits */
mov x22, x19 /* Copy loop counter to x22 */
/* Calculate tens digit: quotient of division by 10 */
mov x23, #10 /* Set divisor to 10 */
udiv x20, x22, x23 /* x20 = x22 / 10 (quotient = tens digit) */
/* Calculate ones digit: remainder of division by 10 */
mul x24, x20, x23 /* x24 = quotient * 10 */
sub x21, x22, x24 /* x21 = original - (quotient * 10) = remainder */
/* Convert digits to ASCII */
add x20, x20, #48 /* Convert tens digit to ASCII */
add x21, x21, #48 /* Convert ones digit to ASCII */
/* Determine if number is single or double digit */
cmp x20, #48 /* Compare tens digit to ASCII '0' */
b.ne double_digit /* If not '0', it's a double-digit number */
/* Single-digit case (0-9) */
adr x24, msg1 /* Get address of single-digit message */
strb w21, [x24, #7] /* Store ones digit at position after "Loop: " */
mov x1, x24 /* Set message address for print */
mov x2, len1 /* Set message length */
b print_msg /* Jump to print routine */
double_digit:
/* Double-digit case (10-32) */
adr x24, msg2 /* Get address of double-digit message */
strb w20, [x24, #6] /* Store tens digit */
strb w21, [x24, #7] /* Store ones digit */
mov x1, x24 /* Set message address for print */
mov x2, len2 /* Set message length */
print_msg:
/* Print message with loop counter */
mov x0, 1 /* file descriptor: 1 is stdout */
mov x8, 64 /* write is syscall #64 */
svc 0 /* invoke syscall */
add x19, x19, 1 /* increment the loop counter */
cmp x19, max /* see if we've hit the max */
b.ne loop /* if not, then continue the loop */
mov x0, 0 /* set exit status to 0 */
mov x8, 93 /* exit is syscall #93 */
svc 0 /* invoke syscall */
.data
msg1: .ascii "Loop: #\n" /* Single-digit format (note: two spaces after colon) */
len1= . - msg1
msg2: .ascii "Loop: ##\n" /* Double-digit format (note: one space after colon) */
len2= . - msg2
Output:
Loop: 0
Loop: 1
Loop: 2
...
Loop: 9
Loop: 10
Loop: 11
...
Loop: 32
☑️ Loop with Hex Output (0 - 20)
Now, let's say we want to output in hex instead of decimal. Remove the single vs. double digit branch. These are the changes you can make.
We will now divide by 16 instead of 10. This makes the quotient the “high nibble” (first hexadecimal digit) and the remainder the “low nibble.”
mov x23, #16
For each nibble, check whether its value is less than 10. If it is, convert it to ASCII by adding 48. If it's not, we add 55 so that 1o becomes 65 ('A'
), 11 becomes 66 ('B'
), etc.
Here's that logic for the high nibble. The same is applied to low nibble.
cmp x20, #10 /* Check if high nibble is less than 10 */
blt high_digit_decimal
add x20, x20, #55 /* For hex A-F */
b high_digit_done
high_digit_decimal:
add x20, x20, #48 /* For digits 0-9 */
high_digit_done:
✨
Modified Code:
.text
.globl _start
min = 0 /* starting value for the loop index */
max = 33 /* loop exits when the index hits this number */
_start:
mov x19, min /* initialize loop counter */
loop:
/* Convert loop counter to two hexadecimal digits */
mov x22, x19 /* Copy loop counter to x22 */
/* Calculate high nibble: quotient of division by 16 */
mov x23, #16 /* Set divisor to 16 for hex */
udiv x20, x22, x23 /* x20 = x22 / 16 (high nibble) */
/* Calculate low nibble: remainder of division by 16 */
mul x24, x20, x23 /* x24 = high nibble * 16 */
sub x21, x22, x24 /* x21 = original - (high nibble * 16) = low nibble */
/* Convert high nibble to ASCII */
cmp x20, #10 /* Check if high nibble is less than 10 */
blt high_digit_decimal
/* For hex A-F: add 55 (10+55=65 -> 'A') */
add x20, x20, #55
b high_digit_done
high_digit_decimal:
add x20, x20, #48 /* For digits 0-9: add 48 ('0') */
high_digit_done:
/* Convert low nibble to ASCII */
cmp x21, #10 /* Check if low nibble is less than 10 */
blt low_digit_decimal
/* For hex A-F */
add x21, x21, #55
b low_digit_done
low_digit_decimal:
add x21, x21, #48 /* For digits 0-9 */
low_digit_done:
/* Store the ASCII characters in the message template */
adr x24, msg /* Get address of the message template */
strb w20, [x24, #6] /* Store high nibble at position after "Loop: " */
strb w21, [x24, #7] /* Store low nibble */
/* Print message */
mov x1, x24 /* Set message address for print */
mov x2, len /* Set message length */
mov x0, 1 /* file descriptor: 1 is stdout */
mov x8, 64 /* write is syscall #64 */
svc 0 /* invoke syscall */
add x19, x19, 1 /* increment the loop counter */
cmp x19, max /* see if we've hit the max */
b.ne loop /* if not, then continue the loop */
mov x0, 0 /* set exit status to 0 */
mov x8, 93 /* exit is syscall #93 */
svc 0 /* invoke syscall */
.data
msg: .ascii "Loop: ##\n" /* Message format for hexadecimal output */
len= . - msg
Output:
[kzaw@aarch64-002 aarch64]$ ./hello
Loop: 00
Loop: 01
Loop: 02
Loop: 03
Loop: 04
Loop: 05
Loop: 06
Loop: 07
Loop: 08
Loop: 09
Loop: 0A
Loop: 0B
Loop: 0C
Loop: 0D
Loop: 0E
Loop: 0F
Loop: 10
Loop: 11
Loop: 12
Loop: 13
Loop: 14
Loop: 15
Loop: 16
Loop: 17
Loop: 18
Loop: 19
Loop: 1A
Loop: 1B
Loop: 1C
Loop: 1D
Loop: 1E
Loop: 1F
Loop: 20
Final Thoughts
This lab helped me understand and code more in modern assembly language. Within AArch64
, I got used to multiple logic concepts from memory manipulation, loops to character encoding! This helped me solidify my understanding.
One major takeaway is how explicit everything is in assembly. Every character printed and every loop iteration must be carefully crafted. I became fully aware that this is the low-level assembly and there is not BTS. I had to manage my own memory locations and be cautious with my index counters. It is far more low level than C!
In Part 2 of this lab, I will implement the same steps in x86_64
assembly. While the logic may stay similar, the syntax and system call conventions will be completely different.
Thank you for your time. See you soon!