Talking to Memory: Loads, Stores, and Arrays
beginnerRISCVLesson 4 of 4
addi t0, zero, 0 # sum = 0lw t1, 0(a0) # loop: t1 = mem[a0] (current element)add t0, t0, t1 # sum += elementaddi a0, a0, 4 # advance the pointer one word (4 bytes)addi t2, t2, -1 # one fewer element to gobne t2, zero, loop # repeat until the count reaches 0
registers
memory / stack
pc & flags
Branches and loops gave a loop the power to repeat. But a loop that only churns registers runs out of things to do — registers are few. The interesting work lives in memory, and you reach it with just two instructions: one to read, one to write.
Load and store
Registers and memory are separate worlds. The CPU computes on registers, but it can only move data between them and memory — it never does arithmetic directly on memory. Two instructions bridge the gap:
lw t1, 0(a0) # LOAD word: t1 = mem[a0 + 0]
sw t1, 0(a0) # STORE word: mem[a0 + 0] = t1
lw copies a word out of memory into a register; sw copies a register’s value
into memory. (You met both briefly in the stack
— this is the lesson that owes you the full story.)
Addresses and offsets
Memory is one giant numbered array of bytes, and an address is just an index
into it. The 0(a0) syntax is an address built from two parts:
lw t1, 8(a0) # address = a0 + 8
# │ └─ base register: holds a starting address
# └──── offset: a constant added to it
The base register holds where you are; the constant offset reaches a fixed
distance from there. This is exactly how the stack lesson read 12(sp) — sp is
the base, 12 is the offset. The same pattern walks structs, arrays, and stack
frames alike.
Bytes versus words
A word on RV32 is 4 bytes, so consecutive words sit 4 addresses apart:
mem[0], mem[4], mem[8], …. When you don’t need a whole word, narrower
loads and stores reach a single byte:
lb t1, 0(a0) # load byte, sign-extended (e.g. for signed chars)
lbu t1, 0(a0) # load byte, zero-extended (e.g. for raw bytes / ASCII)
sb t1, 0(a0) # store the low byte of t1
lb fills the upper bits with the byte’s sign so -1 stays -1; lbu fills them
with zeros, which is what you want for text and unsigned data. Picking the wrong
one is a classic source of “why is my character a huge negative number?” bugs.
Walking an array
Put load and loop together and you can sweep an entire array. Keep a pointer in a register, read through it, then advance the pointer by one element each time around — exactly the loop shape from the previous lesson, now with memory:
addi t0, zero, 0 # sum = 0
loop: lw t1, 0(a0) # read the element a0 points at
add t0, t0, t1 # accumulate
addi a0, a0, 4 # march the pointer to the next word
addi t2, t2, -1 # count down the remaining elements
bne t2, zero, loop # go again until none are left
Notice the offset never changes — it stays 0(a0). The pointer does the moving.
That’s the heart of array traversal: a fixed access pattern over a marching base
address.
Step through it
Use the visualizer and watch two panels at once. The memory panel shows the
array [10, 32] parked at 0x2000. Press step ▶ and follow a0 in the
registers panel: each pass through the loop it climbs 0x2000 → 0x2004 →
0x2008, while the very same lw 0(a0) pulls a different value each time. The sum
lands on 42 right as the counter hits zero and the branch falls through.
Try this
- The array holds 4-byte words. What if you wrote
addi a0, a0, 1instead of4? (You’d advance one byte, landing in the middle of the first word and reading garbage — element size and the stride must match.) - How would you find the largest element instead of the sum? (Keep a “best so
far” register; replace
addwith ablt/bgecompare-and-update — same walk, different body.) - Why must
lwuse a word-aligned address like0x2004and not0x2003? (Most loads expect natural alignment; a misaligned word access traps or runs slowly.)
Memory is where programs keep everything bigger than a handful of registers. With loads, stores, and a marching pointer, you can now touch all of it — and that’s the foundation the stack was quietly standing on the whole time.