CMPUT 229 - Computer Organization and Architecture I
Lab 6: Stack Manipulation
Introduction
HELP!!! Your CMPUT 229 teaching staff has noticed that students are not properly using the register-saving calling conventions. Students are forgetting to store registers to the stack. So, teaching-staff team is tasking you with writing a compiler pass that checks whether a RISC-V function follows correct register-saving calling conventions.
Compilers are programs that analyze computer programs and generate computer code. They convert high-level languages (such as C) into assembly before creating the executable binary. Creating a compiler is a very complex tast that is beyond the scope of this course (CMPUT 415 covers compiler design). This lab consists in the implementation of a small part of a compiler.
Register Calling Conventions

Register calling conventions are a set of rules that dictate how registers should be saved when calling functions or procedures. The caller is the function that makes the call, the callee is the function that is called. Calling conventions ensure that a function can be replaced by a different implementation without needing to examine the register usage in either its caller functions or in the functions that it calls. Another scenario where calling conventions are important is when different functions of a program are written by different programmers. The calling convention is the interface that ensures that modularity encapsuling is possible in software development. If register calling conventions are violated, then wrong data flow relations may be created in the program leading to incorrect results in the computation. This could lead to serious bugs and vulnerabilities. For instance, a callee function may overwrite registers that the caller assumed would not be changed. The register saving/restoring calling convention for the RISC-V ISA (Instruction Set Architecture) are attached to this page.
Although this is the register saving/restoring calling convention used in this course, this is not the convention possible. For instance, it would be fully correct to have a convention where the values in the temporary registers (t0-t6
) must also be preserved by a callee function.
In principle, any convention works. The only requirement is that both the caller and callee respect the same convention.
Register-Saving and Register-Restoring Instructions

A calling convention states that the caller expects that the callee will not change the values of certain registers. If the callee does change the values of these registers, the caller may not work as expected. To prevent this, the callee must save the values of these registers to the stack before changing them. The callee must then restore the values of these registers before returning to the caller.
The standard RISC-V calling conventions state that the callee should preserve the values of sp
and s0-s11
. Alternative conventions may dictate that a different set of registers must be preserved by the callee. For instance, the callee may be required to preserve the values of all, or of none, of the registers that it uses.
Unless otherwise stated register-saving instructions refers a sequence of instructions at the start of the function. The following are register-saving instructions:
addi sp, sp, -12
sw s0, 0(sp)
sw s1, 4(sp)
sw ra, 8(sp)
Similarly, register-restoring instructions refers to a sequence of instructions at the end of the function. The following are register-restoring instructions:
lw s0, 0(sp)
lw s1, 4(sp)
lw ra, 8(sp)
addi sp, sp, 12
Functions as an Array of Instructions
In this lab, functions will be represented as an array of RISC-V instructions. Remember that every RISC-V instruction can be represented as a word in memory (addi sp, sp, -12 = 0xFF410113). Thus, a function can be represented as an array of words. The end of the array is signaled by the sentinel value of 0xFFFFFFFF.
A solution to this lab will parse through an array of RISC-V instructions until it reaches the sentinel value. The arrays of RISC-V instructions that the solution creates should also be terminated by a sentinel value.
RISC-V Instructions that Write

One of the advantages of RISC-V in comparison with other architectures is the simplified instruction set. There are only 6 types of instructions in RISC-V. As a result, there are only a few types of instructions that write to a register.
The image above is taken from the RISC-V Green Sheet. The following instruction types write to a register called rd
: R, I, U, UJ. S-type instructions store values to memory and do not write to registers. SB-type instructions are branches that do not have a register destination. The opcode
and funct3
fields can be used to distinguish between instruction formats and determine the storage behaviour of a particular instruction.
A Calling-Convention Bitmap
In RISC-V, a register calling convention can be represented by a 32-bit bitmap data structure. Each bit corresponds to one of the RISC-V registers. If the bit is 1, the register is a callee-saved register, otherwise it is not.
The table below shows the mapping between bits and register numbers. For example, if a register calling convention is represented by the value 0x00000020
, where only bit 5 is a 1, then the register t0
is the only callee-saved register in this convention.
Registers | zero |
ra |
sp |
gp |
tp |
t0 |
t1 |
t2 |
fp |
s1 |
a0 |
a1 |
a2 |
a3 |
a4 |
a5 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Numbers | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
Registers | a6 |
a7 |
s2 |
s3 |
s4 |
s5 |
s6 |
s7 |
s8 |
s9 |
s10 |
s11 |
t3 |
t4 |
t5 |
t6 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Numbers | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
How Inserted Code Affects Accesses: Introduction
Implementing a compiler pass that inserts instructions into the binary of a program can lead to some complications. For example, inserting instructions into the program may disrupt the behaviour of jump instructions.
To understand what complications might arise, the following are important considerations:
- The target address of some RISC-V instructions is PC-Relative
- A jump instruction with an offset of 8 will jump 8 bytes ahead of the current program counter.
- Assembled Code does not have Labels
- Labels are artifacts that enable humans to more easily write and read assembly code. RARS transforms labels into address differences when converting the text of an assembly program into binary. Therefore, the binary representation of a program does not have labels. Thus, when additional instructions are inserted into the binary, they may corrupt the behaviour of the program.
Consider the following code snippet from a factorial function in RISC-V. The value on the left is the addresses of the instruction in memory:
0x0100 li t0, 1 # t0 will be the factorial result. 0x0104 loop: mul t0, a0, t0 # a0 is the input to the function. factorial(a0) = t0. 0x0108 addi a0, a0, -1 0x010C beqz a0, end 0x0110 jal x0, loop # jal x0, loop will resolve to jal x0, -12.
Now consider a simple compiler pass that uses s1
as a counter for how often the multiply instruction executes.
0x0100 li t0, 1 0x0104 li s1, 0 # s1 counts the number of times the loop is executed. 0x0108 loop: mul t0, a0, t0 0x010C addi s1, s1, 1 # increment the counter. 0x0110 addi a0, a0, -1 0x0114 beqz a0, end 0x011C jal x0, -12 # jump to where we think the loop is.
The insertion of the instructions above has broken the code. Now the jump-and-link instruction moves the program counter back 12 bytes to the addi s1, s1, 1
instruction, ignoring the multiply.
This occurs because when we write jal x0, loop
, we are using a pseudo instruction. The assembler will resolve this by inserting the offset to the loop label once the program is assembled.
The compiler pass only sees this jal x0, -12
instruction because it transforms the already assembled binary code.
Although the compiler pass inserted two instructions, the jump is only one instruction away from the correct loop location because the compiler pass inserted only one instruction: addi s1, s1, 1
between the source and destination of the jump. Using this insight, consider the possible jumps in this lab.
How Inserted Code Affects Accesses in This Lab

The image to the right demonstrates the possible accesses that a function foo
can make. The blue dot represents the register-saving instruction sequence inserted at the start of foo
. The yellow dot represent the register-retoring instruction sequence added at the end of the foo
.
- Jumps Within the Function A solution to this lab will insert register-saving instructions at the begin of a function and register-restoring instructions at the end of a function. Thus, any relative jump inside the function will remain correct.
- Calls to Other Functions Function
- Accesses to Data Section To access items in the data section of memory, a load address instruction such as
foo
may call another function. If the function is placed ahead of foo
in memory (like in the image), then the jump will cross over the newly-inserted register-restoring instructions. As a result, after the insertion of the stack-manipulation instructions, the jump will be incorrect. It will be short of it's destination by the number of bytes inserted at the end of the function. Similarly, any function placed before foo
(or recursive calls to foo
itself) jumps backwards over the register-saving instruction sequence of foo
.
More explicitly, if a
jal
instruction jumps outside the body of a function then correct it's offset accordingly.
la t0, Array
is typically used. However, just like jumps to labels (such as jal x0, loop
from the factorial example), loading the address of a label is also a pseudo instruction. In principle, the assembler works the same in both cases because it resolves the relative address of the destination label. However, given that the data section is far away from the text section, the assembler adds an additional instruction.
la t0, Array -> auipc t0, 0x????? addi t0, t0, 0x???The
auipc rd, imm
instruction stands for Add Upper Immediate to Program Counter. This instruction sets the destination register to the program counter plus some upper immediate. The use of this instruction allows for long relative accesses from the current program counter to areas like the data section. Then, the addi
addresses the lower immediate bits to find the exact intended address.
Criteria for a Load Address
In this lab, the following criteria is used to determine whether an access to the data section occurs:-
If there is an
auipc
instruction with a nonzero, positive immediate. -
If an
addi
instruction occurs immediately after theauipc
. -
If the
auipc
andaddi
instructions use the same register in all places (see the above example with t0).
addi
instruction accordingly.
Creating an Exit Node
A solution to this lab will insert register-restoring instruction sequences before a function returns. RISC-V functions may have many return statements. A naive solution might insert register-restoring instruction sequences before every return statement. However, this solution is not concise and complicates the task of adjusting relative jumps and accesses to the data segment.
Thus, a solution to this lab will insert a single register-restoring instruction sequence before a single return statement. An exit node defines the only point where the function returns. The solution will create this single return statement.
See the following example of a function that checks whether a number is positive, implemented with an exit node:

The exit node in this lab will restore registers, increment the stack pointer and then return (like the example above).
To ensure that there is a single exit node, a solution to this lab will redirect every return statement in the original function to a new exit node. To do this, replace all return statements (jalr x0, ra, 0
) with a relative jump to the exit node jal x0, imm
where PC
+ immediate equals the address of the first instruction in the exit node.
Even if a function has a single return statement, a solution to this lab should still redirect it to an exit node, ending with a new return statement.
Note: Return statements will always be in the form jalr x0, ra, 0
. Any other instruction is not considered a return.
Assignment
This lab implements a simple compiler pass. In a compiler, a pass may or may not run during compilation. The compiler pass in this lab will insert stack manipulation instructions to a function that does not follow register conventions. Write the following RISC-V functions in the file stackmanipulation.s
. ALL of the following functions are required:
stackManipulation This is the main function called from common.s. Convert the binary code of a single RISC-V function into it's equivalent stack-manipulated variation. Arguments: a0: address of the first element of an array of RISC-V instructions ending with a sentinel value (0xFFFFFFFF). a1: register calling conventions for the RISC-V function. Returns: a0: address of the first element of a stack-manipulated variation of the array of RISC-V instructions, ending with a sentinel value (0xFFFFFFFF).
stackManipulation will call all other functions. A valid stackManipulation function can either edit the RISC-V function in place or copy elements of the function to newly allocated memory space. Both options are valid, but it is recommended to use newly allocated space.
findWrites: Find all the register writes in a RISC-V function. Arguments: a0: address of the first element of an array of RISC-V instructions ending with a sentinel value (0xFFFFFFFF). Returns: a0: bit map of registers written to in the RISC-V function.
storeStackInstructions: Insert register-saving/restoring instructions to a specified memory location. Store the sentinel value (0xFFFFFFFF) at the end of the register-saving/restoring instructions. Arguments: a0: boolean value. If 0, store register-saving instructions. If 1, store register-restoring instructions. a1: address of the location to store register-saving/restoring instructions. a2: bit map indicating which registers to save to the stack. Returns: None.
When storing stack instructions, follow the order of the register numbers in RISC-V.
For example, sw ra ...
will appear before sw sp ...
.
Below is an example of memory after a call to storeStackInstructions
with a0 = 0
, a1 = 0x10010000
, a2 = 0x00000C22
:
Memory Address | Value |
---|---|
0x10010000 | 0xFF010113 # addi sp, sp, -16 |
0x10010004 | 0x00112023 # sw ra, 0(sp) |
0x10010008 | 0x00512223 # sw t0, 4(sp) |
0x1001000C | 0x00A12423 # sw a0, 8(sp) |
0x10010010 | 0x00B12623 # sw a1, 12(sp) |
0x10010014 | 0xFFFFFFFF # sentinel value |
Below is an example of memory after a call to storeStackInstructions
with a0 = 1
, a1 = 0x10010000
, a2 = 0x00000C22
:
Memory Address | Value |
---|---|
0x10010000 | 0x00012083 # lw ra, 0(sp) |
0x10010004 | 0x00412283 # lw t0, 4(sp) |
0x10010008 | 0x00812503 ># lw a0, 8(sp)
|
0x1001000C | 0x00C12583 # lw a1, 12(sp) |
0x10010000 | 0x01010113 # addi sp, sp, 16 |
0x10010004 | 0xFFFFFFFF # sentinel value |
Additional Notes
- In the case that no registers need to be saved, store only the sentinel value.
-
Do not store the return value at the end of the register-restoring instructions. The return value should be stored later by
stackManipulation
The above functions might be an adequate implementation of the compiler pass. However, they have some consequences for other areas of the code. If a function accesses the data section or calls other functions, the inserted instructions will corrupt these accesses or the function-call instructions. To address this, a solution to this lab also implements the following function:
fixAccesses Correct the accesses in a RISC-V function that has instructions inserted at the start and at the end. To correct accesses, adjust the immediates of jal and la instructions. Arguments: a0: address of the first element of an array of RISC-V instructions ending with a sentinel value (0xFFFFFFFF). a1: number of bytes inserted at the start of the function. a2: number of bytes inserted at the end of the function. Returns: None.
Additional Notes
- Since the solution adds a return statement at the end of the exit node, the number of bytes inserted at the end of the function should be 4 more bytes than the start.
- This function should work even if 0 bytes are inserted at the start of the function.
- This function should only fix jal instructions and la instructions as described in the "How Inserted Code Affects Accesses Section." A solution should ignore branches, jalr and any other accesses.
- Jumps to the first instruction of the function could either be recursive functions (store registers to stack) or loops (don't store registers to stack). Because of this complexity, do not adjust the immediate on jumps to the first instruction in the function. In other words, a solution should treat jumps to the first instruction in the function the same as jumps anywhere else in the body of the function.
Another consequence of the above functions is that they do not account for RISC-V functions with multiple return statements. Since the stack manipulation instructions would have to be inserted before every return statement, this is not concise. Instead, a solution to this lab will create an exit node. A solution to this lab should implement the following function:
redirectReturns: Redirect all return statements in a RISC-V function to jump to an exit node. Arguments: a0: address of the first element of an array of RISC-V instructiosn with one or more return statements, ending with a sentinel value (0xFFFFFFFF). a1: pointer to exit node. Returns: None.
A solution to this lab should be considerate of the order that functions are called in. If redirectReturns
is called after the real return statement in the exit node is inserted, then it will redirect the real return statement and the function will be incorrect.
Writing a Solution
Notice the directive .include "common.s"
in the solution template file provided. This directive will cause RARS to execute the code in the file common.s
before executing any code in the solution.
The code in common.s
reads in a file provided in the program arguments, initializes the arguments and calls stackManipulation
.
A solution must begin with the label stackManipulation:
.
In RISC-V assembly every function must have a return statement. The following are examples of valid return statements: ret
, jr ra
and jalr zero, 0(ra)
.
Some functions have been provided as helper functions. These appear at the bottom of the solution file. Some of the helper functions are empty to be optionally completed. The following helper functions are provided:
createLoad
: IncompletecreateStore
: IncompletegetAddiImm
: Takes anaddi
instruction and extracts the sign-extended immediatecreateAddi
: Given anrs1
,rd
and an immediate, creates anaddi
instruction (rd = rs1 + imm
)getJalImm
: Takes ajal
instruction and extracts the sign-extended immediatecreateJal
: Given ard
and an immediate, creates ajal
instruction (rd = PC + imm
)
Recommended Program Flow
The following is a recommmended flow for how to write stackManipulation
:
-
Call
findWrites
to get the register bitmap of registers written to in the function. -
Call
fixAccesses
to correct any jal or la instructions. By callingfixAccesses
this early, it is simpler to determine whether a jal instruction jumps outside the function body. -
Call
storeStackInstructions
to insert the register-saving instructions. - Copy the body of the function to space where the outputted instruction sequence will live.
-
Call
storeStackInstructions
to insert the register-restoring instructions. Remember to keep a pointer to the first instruction in the register-restoring instructions as this will be the start of the exit node. -
Call
redirectReturns
to redirect all return statements to the exit node. -
Insert the return statement at the end of the exit node, followed by the sentinel value (
0xFFFFFFFF
).
Testing your Lab
common.s
contains some unit tests to help test the solution. This testing works on sample inputs and checks whether the output from functions in stackmanipulation.s
matches the correct output. Check the common file for the test inputs and correct outputs. These tests do not run on the file inputted in the program arguments. Rather, the tests run on hardcoded test cases.
Additional sample inputs and outputs are provided in the Tests
folder. There are two program arguments required for this lab. Firstly, the path to the .binary
test file to run. Secondly, the register saving convention. Register conventions must contain 8 characters, where every letter is capitalized (do not put 0x before the register conventions). Separate the program arguments with a space. For example, Tests/add.binary 1234FFFF
will be valid. The common.s
file prints both the input RISC-V function and the outputted instruction sequence. It also print the result of unit testing. Passing the tests does NOT mean the solution will receive full marks. Test the solution further to ensure that it is correct.
Instruction Disassembly
This lab has an additional folder in the Code
section called Disassembly
. This folder contains files that performs instruction disassembly which is the process of translating hexadecimal instructions to their actual RISC-V instructions. This code is from an Open Source RISC-V Disassembler. Please contact a TA if you run into any issues using the code.
To utilize this disassembler, first compile the file print-instructions.c
using a C compiler (for example, from the terminal execute the command gcc print-instructions.c
or clang print-instructions.c
). Then create a file containing the hexadecimal instructions to disassemble (copy and paste your RARS output into a txt file). Then run the executable (./a.out) with the hexadecimal textfile as a program argument (./a.out RARS-instructions.txt
). See the README.md
in the disassembly folder for more information.
Creating Test Cases
The tests provided are not extensive, create your own corner-case tests to ensure that your code is correct.
To create test cases, write a RISC-V function to test with. Use the command rars "YOUR_RISCV_FILE.s" a dump .text Binary "YOUR_DESIRED_BINARY_FILE.binary"
. The binary file created by this command can be used as a program argument in RARS for stackManipulation.s
(program arguments = YOUR_DESIRED_BINARY_FILE.binary 1234ABCD
).
More complex test cases can be written that do not pass the entire .text section as input. This can be helpful to test only one function in a file, to test fixAccesses
. To isolate a specific function, use dummy statements. The dummy statement will not be a real RISC-V statement but indicates to the commmon file where the function starts and ends. Insert the statement "addi x0, x0, 229" immediately before the first line and immediately after the last line in your function. See test factorial.s
for an example of this.
Assumptions and Notes
- Functions will be at least 1 instruction and at most 128 instructions.
- The only jumps that will occur are described in the "How Inserted Code Affects Jumps: In This Lab" section.
- Immediates will never overflow from jump instructions. (Original Immediate + New Offset will always fit in the instruction immediate).
- All return instructions will be
jalr x0, ra, 0
- Within the binary code of the function passed to StackManipualtion, there is at most 5 return statements, 5 calls to other functions, and 5 accesses to the data section within a function.
- If no registers need to be saved, a solution will not insert any stack manipulation instructions. However, a solution should still create an exit node (containing just a return), redirect return statements to that exit node and fix accesses.
- Please save registers to the stack in order from x0-x31 (x0, ra, ..., t5, t6). The marking script will not mark your solution correct otherwise.
- A solution to this lab should save the zero register to the stack just like any other register. However, none of the register conventions will require x0 to be saved to the stack since this is unnecessary.
Resources
Marking Guide
Assignments too short to be adequately judged for code quality will be given a zero for that portion of the evaluation.
- 15% for
stackManipulation
overall correctness. - 10% for
findWrites
- 30% for
storeStackInstructions
- 15% for
fixAccesses
- 10% for
redirectReturns
- 20% for code cleanliness, readability, and comments
Submission
There is a single file to be submitted for this lab. stackmanipulation.s
should contain the code for your solution.
- Do not add a
main
label to this file. - Do not modify the line
.include "common.s"
. - Keep the file
stackmanipulation.s
in theCode
folder of the git repository. - Push your repository to GitHub before the deadline. Just committing will not upload your code. Check online to ensure your solution is submitted.