CMPUT 229 - Computer Organization and Architecture I

RISC-V-to-ARM Binary Translation - Control

CMPUT 229 - Lab 6, RISC-V to ARM - Control

Prerequisite

This lab builds on top of the previous RISC-V to ARM ALU translation lab. Though a correct solution to the previous lab is not necessary to complete this one, a correct translateALU function can help facilitate testing. You are expected to use your own translateALU: a correct solution will not be provided. Additionally, a working RISCVtoARM_ALU function could provide basic scaffolding for the task of translating branch offsets.

Introduction

The task in this lab is to write a translator for branch instructions and unconditional register jump instructions from RISC-V into ARM. Combine this new translator with the translateALU function written in the previous lab allows for the translation of a Turing-complete subset of RISC-V instructions into ARM.

Information

The CPSR (Current Program Status Register)

The CPSR, or status register, controls all conditional execution on the ARM instruction set and various execution modes of the CPU. The CPSR controls the mode of the CPU (kernel/user), interrupts, and is extensible for future functionality. The portion of the CPSR relevant to this lab are the uppermost 4 bits, collectively known as the condition-code flags.

Any data-processing instruction can update the condition-code flags. To do so the Status bit of the instruction must be set to 1. Then, after the instruction is executed, the flags are set based on the result of the instruction as follows:

Conditions

Every instruction has a conditions field at the beginning. These bits determine what state one or more of the CPSR's condition code flags have to be in for the instruction to execute.

The CMP Instruction

CPSR condition flags can be set by instructions whose Status bit is set to one. See previous lab for position of flag bits in instructions. Once the flags change, they stay in the new state until altered again by another instruction. RISC-V doesn't support the system of conditional execution that ARM does, therefore the vast majority of these instructions cannot realistically exist as translations from RISC-V. For the translation, some of the instructions that set CPSR's condition code flags must be implemented because the ARM's system of branching depends on conditional execution of a branch instruction.

In this lab, only the CMP instruction sets condition code flags. CMP is a signed integer subtraction instruction whose result is used to update the CPSR's condition code flags but isn't stored anywhere. It is a data-processing register instruction with opcode 1010 whose Rd register should be encoded as 0 and whose status bit should be 1. The encoding format is as shown here:

The table summary of CMP is as follows:

InstructionOpcodeDescription
CMP1010subtraction of one register value from another (specifically Rn - Rm) and result is not written to a destination register

There are 16 possible states for a given instruction's 4 condition bits. Only three of these states are used in this lab. The three, explained in the context of the CMP instruction, are as follows:

CodeFlag CheckMeaning
1110no checkalways execute
0000Z setexecute if result of flag-setting instruction was zero
1010N equals Vexecute if Rn was greater than or equal to Rm in CMP instruction that last set these flags

ARM's Pre-Fetching and Program Counter

Unlike RISC-V, ARM prefetches two instructions ahead during execution, causing its PC to point to an address 2 word (8 bytes) ahead. For instance, when an instruction whose address is 0x00000000 is being executed, the PC holds the value 0x00000008. In ARM branch and jump instructions, the immediate is added to the PC. Therefore, the immediate value in such instructions is equal the address of the destination instruction and the address of the branch instruction minus 8.

Branch Format


Notes

Branch Exchange Format

BX, or Branch Exchange jumps to the address stored in the register specified in bits 3-0, i.e. places the contents of the specified register into the PC.

Notes

Branch Offset Calculation

While there are significantly fewer instructions that must be implemented in this lab compared to the ALU part. However, in this lab you have to correctly translate RISC-V branch offsets into ARM branch offsets accounting for the fact that each RISC-V branch gets translated into two ARM instructions.

For example: suppose that there's a branch at address 0x00000000 in RISC-V and that is should be translated to an ARM instruction at address 0x10000000. If the RISC-V branch's destination is an instruction at address 0x0000001C, then the same instruction translated into ARM would be at address 0x1000001C only if there were no branches translated in between them; if there was a single additional translated branch in between them the target instruction would be at 0x10000020; if all instructions between them were translated branches (that is, 6 translated branches), then the target instruction would be at 0x10000034. To emphasize, these discrepancies are due to a single RISC-V branch being translated into 2 ARM instructions.

Here's a visual of the two most extreme cases described above:

An algorithm is needed to compute the target address. One possibility, suggested below, is to go over the RISC-V code twice while using two tables to keep track of branch target addresses.

Suggested Branch-Target Computation Algorithm

This algorithm requires the pointers to ARM and RISC-V instructions from the previous lab and works with the following two tables:

RAT and BTT are index correlated to the RISC-V instructions stored in memory: the n-th entry RAT and n-th entry in BTT correspond to the n-th RISC-V instruction in memory (i.e. all of these are at 4*n + base address in memory).

The first pass should be over RISC-V instructions. On this pass, for each instruction, you should:

  1. Translate the RISC-V instruction into ARM instruction(s) and store the ARM instruction(s) in the memory space reserved for the ARM instruction. In the case of the branch instruction, use the immediate value zero for now.
  2. Store the address of the translated instruction into the corresponding row of RAT. In the case of branches, store the address of the CMP instruction instead of the address of the branch instruction.
  3. If the current instruction is a branch instruction:
    1. Calculate its RISC-V target address.
    2. Calculate the address of the RAT row corresponding to the branch target.
    3. Store this address in the BTT row corresponding to the branch instruction.
  4. If the instruction is not a branch, store 0 in the row corresponding row of the BTT.

In the second pass, for each row in the BTT:

  1. If the value in the BTT is zero, move to the next row.
  2. Otherwise:
    1. Retrieve the corresponding ARM CMP instruction from RAT.
    2. Add 4 to this address to obtain the address of the corresponding ARM branch instruction.
      ARM branch instruction address = CMP address + 0x4
    3. Use the value in the BTT row as an address to retrieve the address of the ARM branch target from RAT.
    4. Calculate the ARM branch offset as the difference between the ARM branch target address and the ARM branch instruction address minus eight (see ARM's Pre-Fetching and Program Counter):
      ARM Branch Offset = ARM branch target address - ARM branch instruction address - 0x8
    5. Shift the result right by two to obtain the value for the offset field in the ARM instruction:
      ARM Offset Field = ARM Branch Offset >> 2
    6. Load the binary representation of the ARM branch instruction. The address for this load is obtained by adding 4 to the value in the corresponding RAT row.
    7. Overwrite the value of the immediate field with the value computed above.
    8. Store the new binary representation to the same location from where it was loaded.

    Here is a GIF illustrating the above approach:

    If the GIF is too fast or too slow for your liking, feel free to download the PDF version that you can find in the Resources section.

    Though the above is the recommended approach, feel free to implement a different one, provided that it allows you to implement branch translation correctly.

    Assignment

    Your assignment is to correctly translate RISC-V's branches and unconditional jump instructions, while utilizing the translateALU function, imported from arm_alu.s, to translsate ALU instructions.

    RISC-V Instructions to Translate

    The following are all of the new RISC-V instructions that this lab solution must translate. Constraints are put on them to ensure simple transition to ARM. In the encoding, s specifies a source register, t a target register, d a destination register and i an immediate value.

    InstructionEncodingType
    JALR    d, imm(s) iiii iiii iiii ssss s000 dddd d110 0111 I
    BEQ     s, t, offset imm[12|10:5]t tttt ssss s000 imm[4:1|11]110 0011 SB
    BGE     s, t, offset imm[12|10:5]t tttt ssss s101 imm[4:1|11]110 0011 SB
    Notes

    The jalr instruction's immediate doesn't get shifted: the 12 bits are used as is.

    To encode branches' source and target registers into CMP instructions, source and target should be encoded as Rn and Rm respectively.

    Guarantees and Cautions

    • All inputs used for grading will be valid; ALU instructions will follow the guaranteed format from the previous lab, and the jalr instruction will be used exclusively to implement the unconditional jump.
    • You can use any functions from the specification of the previous lab in this lab (e.g. your translateRegister function could be helpful here). Markers will, however, use a different arm_alu.s file than your own, so those functions are expected to act preecisely according to the specification.
    • For all inputs we will use to test your solution, a correct ARM translation fits into the 2kb's space preallocated in the common.s file.
    • The solution file import arm_alu.s cause an error if compiled without any adjustments. To fix this, simply remove the common.s import from the arm_alu.s file tthat you are using.
    • Your solution will not be tested with a label at the very end of the program, so you do not need to consider this edgecase.
    • If you have print ecalls in your code for debugging purposes, make sure to remove them before submitting your solution because it may result in lost marks.

    Specification

    You are required to implement the following functions:

    • RISCVtoARM
      This function translates RISC-V code in memory at address found in a0 into ARM code and stores that ARM code into the memory address found in a1.
      • Arguments:
        • a0: pointer to memory containing a RISC-V function. The end of the RISC-V instructions is marked by the sentinel word 0xFFFFFFFF.
        • a1: a pointer to preallocated memory where you will have to write ARM instructions.
      • Return Values:
        • a0: number of bytes worth of instructions generated by RISCVtoARM.
    • translateControl
      This function translates a single RISC-V beq, bge or jalr instruction into either one or two ARM instructions.
      • Arguments:
        • a0: untranslated RISC-V instruction.
      • Return Values:
        • a0: first translated ARM instruction. This should either be a wholly tanslated BX instruction, or a CMP instruction.
        • a1: 0 or second translated ARM instruction. When non-zero, it should return a branch with 0 offset.
    • calculateRISCVBranchOffset
      This function performs simple computations to calculate the RISC-V branch offset. Negative values calculated by this function should be returned with proper sign extension.
      • Arguments:
        • a0: RISC-V instruction.
      • Return Values:
          a0: branch offset

    Testing

    To obtain testing data, you can write short RISC-V programs using the subset of instructions provided, and convert them into binary files using the following command

    rars "YOUR_RISCV_FILE" a dump .text Binary "YOUR_DESIRED_BINARY_FILE"

    The provided common.s file loads RISC-V binary from a file and generates out.bin file after calling the functions specified above. This commons.s file should be included in your arm.s file. The program, starting in arm.s, takes the name of the file containing the test to load as an argument. Thus, it can be run using rars arm.s pa RISCV_BINARY_FILE. The submitted solution must not contain the common.s attached. It also must not contain a main function

    This assignment provides the program ARMDisassembler.s that prints ARM instructions in a textual representation.

    The disassembler indicates when the status bit is set by adding an S after the instruction type (e.g. ADD S R0, R1, R2), and indicates when a non-shift data-processing register instruction has a shift by appending LL/LR/AR/RR alongside the shift amount at the very end of the instruction. Make sure to take all of this into account when analyzing the output.

    Otherwise, the disassembler is designed to print instructions that follow the specifications, producing question marks where no valid interpretation is possible. Though some are provided already, you are responsible for creating test cases to ensure compliance with the assignment specification. The program can be run using:

    rars ARMDisassembler.s pa out.bin

    To view the bytecode contents of the generated out.bin files in a terminal, use the following command:

    hexdump out.bin

    Here are some test cases you can use to test your program:

    RISC-V ProgramRISC-V BinaryARM BinaryARM Text Representation
    branchesAndJumps.s branchesAndJumps.bin branchesAndJumps.out branchesAndJumps.txt
    moreBranchesAndJumps.s moreBranchesAndJumps.bin moreBranchesAndJumps.out moreBranchesAndJumps.txt
    manyBranches.s manyBranches.bin manyBranches.out manyBranches.txt

    Check My Lab

    Link to CheckMyLab

    This lab is supported in CheckMyLab. To get started, navigate to the ARM-ALU lab in CheckMyLab found in the dashboard. From there, students can upload test cases in the My test cases table. Test cases are RISC-V binary files, generated as described in the Testing section. Additionally, students can upload their solution in the My solutions table, which will then be tested against all other valid test cases. To upload your solution, first zip your arm.s and arm_alu.s files into a single archive. For example:

      zip solution.zip arm.s arm_alu.s
    

    Then, upload the zip file as the solution in CheckMyLab.

    Resources

    More information about ARM instruction set encoding can be found here.

    A PDF version of the branch table algorithm illustration can be found here.

    Slides for this lab can be found here as a PDF and here as a PPTX.

    Marking Guide

    Assignments too short to be adequately judged for code quality will be given a zero. Register translation is vital for all instructions. Therefore it is difficult for a binary translator that does not do correct register translation to pass ANY of the grading test cases. Please, ensure proper register translation according to the table above.

    • 50% for correctly translating RISC-V branching instructions.
      • 10% for correctly translating the branch regardless of offset (i.e. only output of translateControl is considered)
      • 20% for proper offset calculation in the case of RISC-V branches with non-negative (>=0) offsets
      • 20% for proper offset calculation in the case of RISC-V branches with non-positive (<=0) offsets
    • 10% for correctly translating RISC-V jump instructions
    • 10% for correctly implementing the translateALU function
    • 10% for correctly implementing the calculateRISCVBranchOffset function
    • 20% for code cleanliness, readability, and comments

    Submission

    The file name should be arm.s and it should contain only the code for the functions specified above. Make sure to not include a main function in your solution. Do not remove .include "common.s" from the top of your solution. To submit, keep the arm.s file in the Code directory of your submission repo, where the latest commit (before the deadline) from the master branch will be marked. Your solution also MUST include the CMPUT 229 Student Submission License at the top of the file containing your solution and you must include your name in the appropriate place in the license text.