CMPUT 229 - Computer Organization and Architecture I

Lab 2: String Search

Creator: Nathan Ulmer

Introduction

Computers represent textual data as strings. Common string operations include print, concatenate, sort, and search. String search is the problem of finding occurrences of a query string contained in a search string. An implementation of string search can return a list of all the occurrences of the query string, or it can return the first match.

String Search

The solution for this lab implements a modified string search. Instead of returning the location of a match, it returns the sum of the indices of the all the matches. The index of the first character of the string is 0. For example, consider a search for young in the following string:

She was young the way an actual young person is young.
        ^                       ^               ^
        Match at 8              Match at 32     Match at 48

This search returns 88 because 8+32+48 = 88.

Strings and Characters

This section provides information about strings.

Each character of a string is stored as a number in memory. A commonly used mapping between characters and numbers is the American Standard Code for Information Interchange (ASCII). In ASCII characters are stored as single bytes, according to the mapping given in the following table.

A string is an array of characters. The null terminator character ('\0'), which corresponds to the number 0, denotes the end of a string and it must appear as the final character of every string. With this format a search can detect the end of a string by comparing each character with 0.

Below are memory representations of the strings "RISC-V" and "Assembly", starting at address 0x100. The last character in both strings is the null terminator.

Working with Strings

This section provides more information on RISC-V features that may be useful in completing this lab.

When working with strings, it is important to be aware of how characters are stored. Your function only gets a pointer to the start of the string. To get a particular character from the string, you will need to load that character. RISC-V provides the lb instruction for this purpose. To load into t0 the byte located at the address in t1, you can write lb t0, 0(t1). The 8-bit value at the address in t1 is sign-extended to 32-bits before being stored into t0. To load a byte without sign extension, RISC-V provides the lbu instruction.

Assignment

Write a RISC-V assembly function named stringSearch with the following specification:

stringSearch:
  This function returns the sum of the indices of instances of the query string in the search string.

  Arguments:
    a0: Pointer to the query string
    a1: Pointer to the search string

  Returns:
    a0: Sum of the indices

After searching the search string for all instances of the query string, be sure to return the sum of the indices in a0.

Writing a Solution

The assembler directive .include "common.s" at the start of the solution file causes the assembler to insert the code present in the common.s file at the start of the solution. The code in the common.s file reads the program arguments, and loads the corresponding file. The inputs in the loaded file are given as arguments to stringSearch.

Do not modify the stringSearch label in the solution file because this label is required by the common.s file to properly call your stringSearch function.

In RISC-V, every function must have a return statement. The following are examples of valid return statements: ret, jr ra, and jalr zero, 0(ra). This instruction must appear at the end of the function so that the function will then return to the program in the common.s to print the result.

You don't need to follow register conventions in you solution for this lab, but you should become familiar with the RISC-V register conventions for future labs. Return values must be stored in the argument registers. For this lab there is only a single return value, using register a0.

Testing your Lab

Test cases are plain text files with the following format:

[query string]
[search string]

Code in the common.sreads this file, loads the two strings into memory, and passes the pointers to the strings to your stringSearch function. You are encouraged to read the common.s code so that you can understand how it works.

To perform a test, enter the complete file path of the corresponding test input into the Program Argument box in RARS. The file path cannot have any spaces. Do not put quotations around the filename. Make sure that the given path is the full path to the file. For example /home/user/cmput229/test.txt is a valid path, while /home/user/cmput 229/this is a test.txt is not. Once you have entered the path to the test case, you can run the program. The common.s file takes care of reading the input file, calling the stringSearch function, and printing the result.

Be sure to create your own tests to ensure that your code is correct.

Check My Lab

Link to CheckMyLab

This lab is supported in CheckMyLab. To get started, navigate to the StringSearch lab in CheckMyLab found in the dashboard. From there, students can upload test cases in the My test cases table. Test cases are text files which follow the format described above. Additionally, students can upload their stringSearch.s file in the My solutions table, which will then be tested against all other valid test cases.

Resources

  • Slides used for the lab introduction (.pptx) (.pdf).
  • Marksheet used to mark the lab (.txt).

Notes and Assumptions

  • Every string contains only ASCII characters, which range from 0-127.
  • A string can contain any ASCII character.
  • Both input strings have a maximum length of 200, and a minimum length of 1.
  • Each character has a length of 1 byte. Be sure to use the correct instructions for the data type.
  • The search is case sensitive, thus the same letter in different cases are not equal. For example "a" is not equal to "A".

Marking Guide

Assignments too short to be adequately judged for code quality will be given a zero for that portion of the evaluation.

  • 40% for overall correctness.
  • 20% for correct handling of single character queries.
  • 20% for correct handling of queries with a single occurrence.
  • 20% for code cleanliness, readability, and comments.

Submission

There is a single file to be submitted for this lab. stringSearch.s should contain the code for your solution.

  • Do Not add a main label to this file.
  • Do Not modify the line .include "common.s".
  • Keep the file stringSearch.s in the Code folder of the git repository.
  • Push your repository to GitHub before the deadline. Just committing will not upload your code. Check online to ensure your solution is submitted.