CMPUT 229 - Computer Organization and Architecture I
Lab 2: String Search
Creator: Nathan Ulmer
Introduction
Computers represent textual data as strings. Common string operations include print, concatenate, sort, and search. String search is the problem of finding occurrences of a query string contained in a search string. An implementation of string search can return a list of all the occurrences of the query string, or it can return the first match.
String Search
The solution for this lab implements a modified string search.
Instead of returning the location of a match, it returns the sum of the
indices of the all the matches.
The index of the first character of the string is 0.
For example, consider a search for young
in the following
string:
She was young the way an actual young person is young. ^ ^ ^ Match at 8 Match at 32 Match at 48
This search returns 88 because 8+32+48 = 88.
Strings and Characters
This section provides information about strings.
Each character of a string is stored as a number in memory. A commonly used mapping between characters and numbers is the American Standard Code for Information Interchange (ASCII). In ASCII characters are stored as single bytes, according to the mapping given in the following table.
A string is an array of characters.
The null terminator character ('\0'
),
which corresponds to the number 0, denotes the end of a string
and it must appear as the final character of every string.
With this format a search can detect the end of a string by comparing each
character with 0.
Below are memory representations of the strings "RISC-V" and "Assembly",
starting at address 0x100
.
The last character in both strings is the null terminator.
Working with Strings
This section provides more information on RISC-V features that may be useful in completing this lab.
When working with strings, it is important to be aware of how characters are
stored.
Your function only gets a pointer to the start of the string.
To get a particular character from the string, you will need to load that
character.
RISC-V provides the lb
instruction for this purpose.
To load into t0
the byte located at the address
in t1
, you can write lb t0, 0(t1)
.
The 8-bit value at the address in t1
is sign-extended to 32-bits
before being stored into t0
.
To load a byte without sign extension, RISC-V provides the lbu
instruction.
Assignment
Write a RISC-V assembly function named stringSearch
with the
following specification:
stringSearch: This function returns the sum of the indices of instances of the query string in the search string. Arguments: a0: Pointer to the query string a1: Pointer to the search string Returns: a0: Sum of the indices
After searching the search string for all instances of the query string, be
sure to return the sum of the indices in a0
.
Writing a Solution
The assembler directive .include "common.s"
at the start
of the solution file causes the assembler to insert the code present in
the common.s
file at the start of the solution.
The code in the common.s
file reads the program arguments, and
loads the corresponding file.
The inputs in the loaded file are given as arguments
to stringSearch
.
Do not modify the stringSearch
label in the
solution file because this label is required by the common.s
file
to properly call your stringSearch
function.
In RISC-V, every function must have a return statement.
The following are examples of valid return
statements: ret
, jr ra
, and
jalr zero, 0(ra)
.
This instruction must appear at the end of the function so that the function will
then return to the program in the common.s
to print the result.
You don't need to follow register conventions in you solution for this lab,
but you should become familiar with the RISC-V register conventions for future
labs.
Return values must be stored in the argument registers.
For this lab there is only a single return value, using register a0
.
Testing your Lab
Test cases are plain text files with the following format:
[query string] [search string]
Code in the common.s
reads this file, loads the two
strings into memory, and passes the pointers to the strings to your
stringSearch
function. You are encouraged to read the
common.s code so that you can understand how it works.
To perform a test, enter the complete file path of the corresponding test
input into the Program Argument box in RARS. The file path cannot have
any spaces.
Do not put quotations around the filename. Make sure that the given path is
the full path to the file.
For example /home/user/cmput229/test.txt
is a valid path,
while /home/user/cmput 229/this is a test.txt
is not.
Once you have entered the path to the test case, you can run the program.
The common.s
file takes care of reading the input file, calling
the stringSearch
function, and printing the result.
Be sure to create your own tests to ensure that your code is correct.
Check My Lab
This lab is supported
in CheckMyLab.
To get started, navigate to the StringSearch lab in CheckMyLab found in
the dashboard.
From there, students can upload test cases in the My test cases table.
Test cases are text files which follow the format described above.
Additionally, students can upload their stringSearch.s
file in
the My solutions table, which will then be tested against all other
valid test cases.
Resources
Notes and Assumptions
- Every string contains only ASCII characters, which range from 0-127.
- A string can contain any ASCII character.
- Both input strings have a maximum length of 200, and a minimum length of 1.
- Each character has a length of 1 byte. Be sure to use the correct instructions for the data type.
- The search is case sensitive, thus the same letter in different cases are not equal. For example "a" is not equal to "A".
Marking Guide
Assignments too short to be adequately judged for code quality will be given a zero for that portion of the evaluation.
- 40% for overall correctness.
- 20% for correct handling of single character queries.
- 20% for correct handling of queries with a single occurrence.
- 20% for code cleanliness, readability, and comments.
Submission
There is a single file to be submitted for this lab.
stringSearch.s
should contain the code for your solution.
- Do Not add a
main
label to this file. - Do Not modify the line
.include "common.s"
. - Keep the file
stringSearch.s
in theCode
folder of the git repository. - Push your repository to GitHub before the deadline. Just committing will not upload your code. Check online to ensure your solution is submitted.