Getting started to reverse AVR code

This post is a work in progress, so if you find it incomplete and not readable probably it's not finished yet. I prefer to publish a little before than leave a post to rust in my drafts.

Registers

What is missing from this list is the stack pointer: it’s not a directly accessible register, it’s implemented using the memory address couple 0x3e:0x3d, see the section about the prologue of a function to see what this means.

Harvard architecture

Unlike well known systems, this architecture has the memory space separated from the code space: the first is called SRAM, the second Program.

For this reason there are separated instructions to load and to store data in these spaces, lds/sts and lpm/spm respectively.

Another particularity is that by the memory space can be accessed some peripherics

Arithmetic

It’s important to understand how mathematics works with registers: the first thing to learn is that (in all the architecture) the arithmetic is module the number of bits; the other thing is that negative numbers are implemented via two’s complement but before explaining that I need to explain one’s complement.

One’s complement

It consists in flipping all the bits of a number, in this way if you define the negative of a given number as the one’s complement of it you have the nice property that this two numbers summed are equal to zero.

The problem is that you have two zero: all bits equal to zero and all equal to one.

Two’s complement

It’s an extension of the one’s complement: to obtain the negative representation of a number you have to take the one’s complement and add one: in this way you have an asymmetry between the minimum and maximum number that can be represented.

Normally in the code is this the way the negative numbers are represented. Remember that a value into a register is not signed or unsigned by itself, it depends on how is used in the code.

Flags

As the registers have a fixed size, arithmetic operations can overflow, for example think of the result of the sum of two register containing each the value 0xff, the result, 0x1fe, cannot fit into the destination register. For this reason exists a special register named sreg containing a bit (called flag) indicating when a overflow happened.

It’s not the only flag dedicated in this register, the list is

C there is an overflow
Z Zero flag
N Negative flag
V two’s complement overflow
S \(N\oplus V\)
H half carry
T transfer bit
I global interrupt flag

Calling convention

The processor doesn’t have a notion of argument of its own, when you call a routine in your program the caller have to define a convention with the callee in order to communicate. Usually is set by the compiler (I’m not completely sure).

Using avr-gcc we have the following indications (source):

Instruction sets

Here a summary of the instrutions available on this architecture, with a little description of the operations that they implement. To have more informations read the summary or the complete reference.

Arithmetic

add ra, rb adds two register and stores the result in the first one
adc ra, rb adds two register using also the carry flag and stores the result in the first one
adiw ra, K adds immediate to word
inc ra increments a register of one
sub ra, rb subtracts two registers and stores the result in the first one
sbc ra, rb subtracts two registers using also the carry flag and stores the result in the first one
sbiw ra, K subtracts immediate from word
dec ra decrements a register
com ra takes the one’s complement of a register
neg ra takes the two’s complement
eor ra, rb calculates the exclusive or of two register and stores the result in the first one

Load and store

ldi ra, K loads immediate in register
lds ra, K loads register with value stored in address
ld ra, x loads register with value stored in address contained in x
ld ra, x+  
ld ra, -x  
ldd ra, x+q loads register with value stored in address pointed by x + q

Branch

Examples

Below take a look to some examples of common routines implemented with this language

Prologue

This is the start of a function, where it sets its frame pointer and allocate the space into the stack for local variables; generally looks like the following

push r28
push r29
in r28, 0x3d
in r29, 0x3e
subi r28, 0x10
sbci r29, r1

In this case the code saves the frame pointer of the caller and sets the frame pointer to the actual position of the stack pointer. The moves downs the frame pointer of 16 bytes to create space for the local variables. I think is backward with respect to the normal use of stack and frame pointers in the x86 code.

To access local variables you simply can use the load/store with displacement instruction with the y register (that is the frame pointer)

ldd r24, y+1
ldd r25: y+2
eor r24, r25
ld r25, r1
std y+1, r24
std y+2, r25

strlen

    movw r30, r24
loop:
    ld r0, z+
    tst r0
    brne loop

    com r24
    com r25
    add r24, r30
    adc r25, r31
    ret

Here the tricky part are the last five instructions (not ret of course): when the r0 contains a NULL byte then z point to the address of that byte plus one (remember the post-increment addressing), so the com (the one’s complement), and add/adc instructions can be summarized as follow

memcpy

    movw r30, r22
    movw r26, r24
    rjmp start
loop:
    ld r0, z+
    st x+, r0
start:
    subi r20, 0x01
    sbci r21, 0x00
    brcc loop
    ret

Sign extension

This section explains how I arrived to understand the meaning of this piece of code:

lds r24, y+1
lds r25, y+2
mov r0, r25
lsl r0
sbc r26, r26
sbc r27, r27

initialy didn’t make any sense, it loads from the stack a short and then left-shifts the most significant byte; to end it subtracts two unrelated registers using the result of the shifting as carry.

Pratically r26 and r27 are always zero if the last bit of r25 is zero, if it’s not zero then r27:r27 are equal to ffff. Seems sign extension to me.

In order to prove my point I decided to experiment and to experiment I need to create a test case where I cast a variable from short to 32bits like the following:

$ cat extended.c
#include<stdint.h>

int32_t miao(short value) {
    return (int32_t)value;
}
$ avr-gcc -c extended.c

Once compiled I can look at the assembly code generated and bingo

$ r2 -A -a avr extended.o
[0x08000034]> pdf
/ (fcn) entry0 44
|   entry0 ();
|           0x08000034      cf93           push r28                    ; [01] m-r-x section size 44 named .text
|           0x08000036      df93           push r29
|           0x08000038      00d0           rcall 0x800003a
|           0x0800003a      cdb7           in r28, 0x3d                ; '=' ; IO SPL: Stack lower bits SP0-SP7
|           0x0800003c      deb7           in r29, 0x3e                ; '>' ; IO SPH: Stack higher bits SP8-SP10
|           0x0800003e      9a83           std y+2, r25
|           0x08000040      8983           std y+1, r24
|           0x08000042      8981           ldd r24, y+1
|           0x08000044      9a81           ldd r25, y+2
|           0x08000046      092e           mov r0, r25
|           0x08000048      000c           lsl r0
|           0x0800004a      aa0b           sbc r26, r26
|           0x0800004c      bb0b           sbc r27, r27
|           0x0800004e      682f           mov r22, r24
|           0x08000050      792f           mov r23, r25
|           0x08000052      8a2f           mov r24, r26
|           0x08000054      9b2f           mov r25, r27
|           0x08000056      0f90           pop r0
|           0x08000058      0f90           pop r0
|           0x0800005a      df91           pop r29
|           0x0800005c      cf91           pop r28
\           0x0800005e      0895           ret
Do you find this post incomplete? probably because it's a work in progress. Let me know how do you want this to be completed