CMPUT 229 - Computer Organization and Architecture I
Lab 3: Fix Branch
Creator: Kristen Newbury
Converted to RISC-V by: Zhaoyu Li
Introduction
The goal of this lab is to fix branch instructions in a RISC-V program that has been modified. The modified program is created by taking an original program and inserting some instructions into it. It will be necessary to fix the branch instructions in the modified program so that they will branch to the same target labels that they did in the original program. In order to achieve this you will compare the original program to the modified program to find where the new instructions are located, then you will make appropriate adjustments to the branch instructions in the modified program.
Changes to Targets of Branches
The insertion of additional instructions in a program will change the address of the branch targets. For instance, consider the original program below:
RISC-V Assembly |
The code as loaded into the RARS simulator |
||||||||||||
foo: beq t0 t1 target addi t0 t0 1 target: addi t1 t1 1 |
|
After inserting three instructions, we obtain the modified program shown below:
The code as loaded into the RARS simulator |
|||||||||||||||||||||
|
In the original program, the initial address of the target is
0x0040002c, but the address of the target label is 0x00400038 in the
modified program. In other words, the branch instruction at the address
0x00400024
no longer branches to the same label as it
did in the original program. This is because branch instructions in
RISC-V specify their targets using a relative offset. Therefore, we
need to change this offset in the modified program such that the
branch instructions branch to the same target label as it did in the
original program.
RISC-V Branch Instructions
There are six different types of branch instructions in RISC-V:
beq |
Branch EQual |
bge |
Branch Greater than or Equal |
bgeu |
Branch Greater than or Equal Unsigned |
blt |
Branch Less Than |
bltu |
Branch Less Than Unsigned |
bne |
Branch Not Equal |
All branch instructions in RISC-V are encoded in the SB Type format:
31 | 25 | 24 | 20 | 19 | 15 | 14 | 12 | 11 | 7 | 6 | 0 |
imm[12|10:5]
|
rs2
|
rs1
|
funct3
|
imm[4:1|11]
|
opcode
|
In RISC-V, the relative offset of a branch instruction is encoded in bits 7-11 & 25-31 of the branch instruction. With only 12 bits (13 after shifting as the lowest bit will always be zero) to specify the offset, the range that a branch target can occur in is limited. The introduction of new instructions between the branch and its target may lead to a situation where the branch is no longer possible because the distance between the branch and its target cannot be expressed in only 12 bits.
To simplify this assignment, assume that this case will not occur for any of the programs that are provided as input to your assignment. In other words, the number of insertion made to the program will not be enough to create an impossible branch. Keep in mind that jump instruction may also be affected by the insertion of instructions. However, you DO NOT need to adjust the immediate value of jump instructions in this lab.
Fixing Branch Instructions
In order to determine if a branch instruction was affected by the insertion of instructions to the program, the solution needs to calculate the original targets of each branch instructions in the original program. Only insertions that take place between a branch and its target will affect the location of the proper target. In other words, branch instructions will only need to be fixed if instructions were inserted in between the branch and its target.
In order to calculate a branch's target the solution will need the value
contained in the PC
. During program execution the
PC
contains the memory address of the current
instruction being executed. In this lab the binary representation of the
program is located in the
user data segment of memory, therefore we can think of the value of the
address that was used to load the binary representation of the instruction
into a register as the value that the PC
normally contains
during program execution. In other words, the
PC
for an instruction is the address that is used to load
the instruction from memory.
To calculate the address of the target for branches, the following algorithm is recommended:
L
← immediate of the branch instructionA
←L << 1
S
← Sign extendA
to 32 bitsT
←S + PC
T
is the address of the branch target. Keep in mind that
this address will correspond to an address in the data segment, where the
program representation has been placed.
Finding Insertion Points
An "insertion point" is the address of the instruction (in the original program) before a block of added instructions. In order to find the insertion points the two programs must be compared. We first load one instruction from each program, then do a simple check to see if the two instructions are equal. If they are not equal then this is the first instruction in a possible block of added instructions. We also keep a counter and increment it for each new instruction in the block of new instructions. As soon as an instruction is found in the modified program that matches the last previously seen instruction in the original program then we have reached the end of the block of added instructions.

Terminologies
This section formally introduces the terminologies used in this lab.
The modified program is obtained by inserting instructions into the original program. Branch instructions in the modified program must be fixed so that they jump to the same label as they did in the original program. In this lab, the fixed program refers to the modified program with fixed branch instructions.
The instructions array of a program is an array of RISC-V instructions in binary representation, where the instructions in the array are the instructions in the program.
The insertion points array is an array of all the insertion points. The array must be sorted in ascending order by the value of the insertion points.
The insertions array is an array of natural numbers where the
i
'th element in the array is the number of instructions
inserted at the i
'th insertion point in the insertion
points array.
Note that each element is stored in 32 bits.
The branches array contains the addresses of all the branch instructions in the original program. Elements in the array must be sorted in ascending order by the addresses of the branch instructions.
The targets array contains the addresses of the target
instructions for all the branch instructions in the branches array. The
i
'th element in the array is the address of the branch
target for the i
'th branch instruction in the branches
array.
All arrays are terminated by the sentinel value
0xFFFFFFFF
.
Assignment
In this assignment you will write a RISC-V assembly program that fixes branch instructions in a modified RISC-V program. You must implement all the following functions.
fixBranch This function is the entry point of student's solution to this lab. It fixes branch instructions in the modified program such that they branch to the same target as they did in the original program. Writes the instructions of the fixed program to a separate array. Arguments: a0: Pointer to the instructions array of the original program. a1: Pointer to the instructions array of the modified program. a2: Pointer to an empty array that is the same size as the instructions array of the modified program. This function must write the instructions of the fixed program to this array and terminate it with the sentinel value 0xFFFFFFFF. Returns: None
findInsertions Finds all the insertion points and how many instructions were inserted after each insertion point Arguments: a0: Pointer to the instructions array of the original program. a1: Pointer to the instructions array of the modified program. Returns: a0: Pointer to the insertion points array. a1: Pointer to the insertions array.
findBranches Finds all branches and their respective targets in the original program Arguments: a0: Pointer to the instructions array of the original program. Returns: a0: Pointer to the branches array. a1: Pointer to the targets array.
Input Guarantees
-
The binary representation of the RISC-V programs are terminated by the
sentinel value
0xFFFFFFFF
. - There will be a maximum of 50 instructions inserted to the modified program
- There will be a maximum of 25 branches to fix.
- All instructions are valid RISC-V instructions.
- Neither branches nor jumps will ever be inserted to the original program to create the modified program.
-
Some pseudo-instructions in RARS correspond to more than one
instruction.
For example, the
la
pseudo instruction expands into a two separate instructions. The input will contain no such instructions. - The modified program will not contain inserted instructions before the first instruction of the original program.
Resources
We also provide two Python programs that will help students in generating their own test cases and checking whether their solution is working as expected.
binDecompiler.py is a decompiler for RISC-V assembly.
Required positional arguments:
-
Input file format
-
b
: binary file -
h
: hexadecimal instruction representation file input (each instruction as0x________
on a newline)
-
- Input file path
Optional arguments:
-
-o OUTPUT
. Basename of output file. If provided, binCompiler.py writes its output to two different files:OUTPUT.tsv
andOUTPUT.txt
.-
OUTPUT.tsv
. Each row in the file contains the following information regarding a RISC-V instruction in the input file:Address
,Code
, andSource
.Address
is the address of the instruction when loaded into memory.Code
is the hexadecimal representation of the RISC-V assembly source code (Source
). -
OUTPUT.txt
. Each line in the file is a hexadecimal representation of a RISC-V instruction in the input file.
binDecompiler.py
without the-o
flag,binDecompiler.py
will print the content ofOUTPUT.tsv
andOUTPUT.txt
to the terminal, separating them with an empty line. -
Usage:
python3 binDecompiler.py {b,h} inputFile [-o OUTPUT]
Example:
python3 binDecompiler.py b original.bin -o original_decomp
# Decompiles the binary file original.bin.
# Outputs the result to two files, original_decomp.tsv and original_decomp.txt
hexToBin.py converts hexadecimal numbers in textual format to binary format and writes the result to a file. Expects one hexadecimal number per line.
Optional arguments:
-
-i INPUT
. Path to the input file. If you runhexToBin.py
without the-i
flag, it will take input from standard input, one hexadecimal number per line. -
-o OUTPUT
. Path to the output file. If you runhexToBin.py
without the-o
flag, it will write the output toout.bin
Usage:
python3 hexToBin.py [-i INPUT] [-o OUTPUT]
Example:
python3 hexToBin.py -i lab_out.txt -o lab_out.bin
# Converts the hexadecimal numbers in lab_out.txt to binary format.
# Outputs the result to the file lab_out.bin
The description of the two Python programs above is also in the provided README.
Testing your Lab
Note: To execute RARS on Windows, or if you do not haverars
aliased to execute RARS as a Java executable, you need
to replace rars
with java -jar </path/to/rars>
, where </path/to/rars>
is the path to RARS
on your file system.
To test your lab, you must first generate your input files: the
binary files for the original and modified programs.
The process for generating test cases is complicated because you must
insert new instructions into the binary without modifying the
pre-existing branch instructions.
You can use any RISC-V .s
files to generate the original
program as long as it satisfies the constraints in the
Input Guarantees section (two sample
files are provided in the Tests
directory). To generate
the binary file, original.bin
, for the original program,
original.s
, you can run the following command:
rars a dump .text Binary original.bin original.s
To create the modified program and generate its binary file, you can follow the steps below:
-
Decompile the binary file for the original program using the provided
binDecompiler.py
Python script.python3 binDecompiler.py b original.bin -o original_decomp
This creates two files:original_decomp.tsv
andoriginal_decomp.txt
. -
To create the modified program, one option is to use a spreadsheet editor, such as Google Spreadsheets or Microsoft Excel, to edit the
.tsv
file directly. Some IDE's such as VS Code and JetBrains also supports opening.tsv
files in a tabular format. If none of the aforementioned tools are available to you, skip to the next step.Using spreadsheet editors, you can insert rows above or below an instruction to insert instructions before or after it, which can be done by placing hexadecimal representation of valid RISC-V instructions in the empty cells of the 'Code' column. Make sure your modified program satisfies the constraints outline in the Input Guarantees section.
-
Another option to create the modified program requires opening both
the
.tsv
and the.txt
files, preferably in two separate windows that are placed side by side. Now, you can use the information in the.tsv
file to assist you in inserting instructions into the.txt
file. The steps are similar to using a spreadsheet editor, except that you will be adding new lines. -
Generating the binary file for the modified program.
-
If you have followed step 2 to create the modified program,
simply copy all the instructions in the 'Code' column of the
.tsv
file and provide it as standard input tohexToBin.py
.python3 hexToBin.py -o modified.bin
-
If you have instead followed step 3, you can provide the file
name of the modified program as a commandline argument.
python3 hexToBin.py -i modified.txt -o modified.bin
-
If you have followed step 2 to create the modified program,
simply copy all the instructions in the 'Code' column of the
-
To test your lab, you should run
fixBranch.s
with RARS and must provide two binary files as program arguments. The first binary file is the original program and the second one is the modified program.rars fixBranch.s pa original.bin modified.bin nc > lab.out
This command runs your lab solution onoriginal.bin
andmodified.bin
. The providedcommon.s
prints the hexadecimal representation of all the instructions in the fixed program. By disabling the RARS copyright notice with thenc
flag and piping the output tolab.out
, you can decompilelab.out
usingbinDecompiler.py
without modifying the content oflab.out
.python3 binDecompiler.py h lab.out -o lab_out_decomp
Similar to step 1, two files,lab_out_decomp.tsv
andlab_out_decomp.txt
will be created.
Here is a video demonstrating the general flow of testing your lab. This video is created by screen recording slides 28-33 in the presentation for this lab.
Check My Lab
This lab is supported
in CheckMyLab.
To get started, navigate to the FixBranch lab in CheckMyLab found in
the dashboard.
From there, students can upload test cases in the My test cases table.
Test cases follow the format described below.
Additionally, students can upload their fixBranch.s
file in
the My solutions table, which will then be tested against all other
valid test cases.
Uploading a Test Case
To create a test case for CheckMyLab, first create the binary files for your original and modified assembly, then place the files in the same directory:
[NAME].bin [NAME]Modified.bin
where [NAME]
is the name of the test case.
Then, zip both files into a single archive:
zip [NAME].zip [NAME].bin [NAME]Modified.bin
The resulting [NAME].zip
file can be uploaded to CheckMyLab.
Marking Guide
Assignments too short to be adequately judged for code quality will be given a zero for that portion of the evaluation.
- 24% for correct insertion points and number of instructions added at each insertion point.
- 24% for correct addresses of branch instructions and respective branch targets.
- 32% for correctly fixing the offsets of the branch instructions.
- 20% for code cleanliness, readability, and comments.
Submission
There is a single file to be submitted for this lab. The file name is
fixBranch.s
and it should contain only the code for your
solution.
-
Do not add a
main
label tofixBranch.s
. - Do not modify the line
.include "common.s"
. - Keep the file
fixBranch.s
in theCode
folder of the git repository. - Push your repository to GitHub before the deadline. Just committing will not upload your code. Check online to ensure your solution is submitted.