This is a post in a serie regarding side channels, from the theoretical and
pratical point of view; the posts are
 introduction on the model of computing devices (to be finished)
 using the Chipwhisperer ([here](``link://slug/sidechannelsusingthechipwhisperer``))
 power analysis (this post)
 glitching (to be finished)
All the graphs in this post are generated using my own libray named [power_analys](https://github.com/gipi/poweranalysis/),
this is alphaquality software without documentation (for now); also a notebook
with all the code that generated the graphs will be available as soon as I take
the time to clean up it.
## Physical interface
As I said in the (tobecompleted) post about the model of a processing unit, the "calculations" inside a
device are done, roughly speaking, switching on and off groups of transistors to
perform the needed operations; since the transistors are physical entities, they
interact with the physical world and need **energy** to move electrons around,
so we can imagine that the power consumption would be in some way connected with
the internal state of the processor[^Nano].
[^Nano]: For an extensive explanation you can use the book "Nanometer CMOS ICs, From Basics to ASICs" pg 161
To measure the power consumption, tipically you have a shunt resistor between the
power line and the ``VCC`` pin of the target with an ``ADC`` measuring the voltage as indicated in the
drawing below
{{% pyplots %}}
import matplotlib
matplotlib.use('Agg')
import schemdraw
import schemdraw.elements as elm
from schemdraw.dsp.dsp import Adc
d = schemdraw.Drawing()
# define as the first thing the IC so we can reference it later
d += (target := elm.Ic(pins=[elm.IcPin('VCC', side='left')]).label('Target', loc='top'))
# move the "drawing point" at the left of it
d.move_from(target.VCC, dx=5, dy=0)
# add elements
d += (power := elm.Vdd().label('Power', loc='top'))
d += elm.Line().right().at(power.end)
d += (shunt := elm.Resistor().right().label('shunt').to(target.VCC))
# now stuff connected to the ADC
d += elm.Line().at(shunt.end).down()
d += Adc().label('ADC').down()
d.draw()
{{% /pyplots %}}
this works because thanks to Ohm's law, the voltage difference at the sides of
the resistor is proportional to the current drawn from the device.
In general it's possible that multiple pins are responsible for providing power
to the digital device and maybe each pin serves different subsystems
(peripherals, core, etc...), so in real settings the configuration can be more
complex.
In my experiments I'm going to use the ChipWhispererlite, it has the
[AD8331](https://www.analog.com/media/en/technicaldocumentation/datasheets/AD8331_8332_8334.pdf)
chip (a **variable gain amplifier**, ``VGA``) and the
[AD9215](https://www.analog.com/media/en/technicaldocumentation/datasheets/AD9215.pdf)
(an **ADC**). The target's clock and ``ADC`` are synchronized, only the latter
has a sample rate four times of the former (this ratio can be configured).
The configuration of the ``VGA`` is such that the input impedance is 50 ohm
(see table 7 at page 30 of the datasheet) and
``AC``coupled (so you don't have ``DC`` component, at least theoretically); this explains why, since
the input is connected to the low side of the shunt resistor on the board, you
have negative readings: the high side is **ideally** constant (at the power
supply voltage) so, without ``AC``coupling you would obtain something like
$$
V_{L} = V_H  I\times R_\hbox{shunt}
$$
but removing the constant voltage, what you see from the traces is
$$
\eqalign{
AC(V_{L}) &= AC(V_H) + AC(I\times R_\hbox{shunt}) \cr
&= AC(I\times R_\hbox{shunt})\cr
}
$$
The actual value that you'll find in the traces are also affected by the
**gain** set for the ``VGA``.
**Note:** the power consumption is not technically the values that you are
capturing but has a direct relation with them: the textbook definition is
$$
P = V\cdot I
$$
If the assumption is that the voltage is constant with **a lowamplitube varying
component** (let's call it \\(V_{AC}\\)) we can obtain the following approximation
$$
\eqalign{
P &= V\cdot I \cr
&= (V_{DC} + V_{AC})\cdot I \cr
&= V_{DC}\left(1 + {V_{AC}\over V_{DC}}\right)\cdot I\cr
&\sim V_{DC}\cdot I\cr
}
$$
that works fine as long as \\({V_{AC}\over V_{DC}}\\) is negligible. So at the
end the amplitude measured is really, up to a constant, the power consumption.
Since we are not going to do real physics, for now this is enough to continue.
## Maths&Statistics
To understand a little better what we are going to see in the following
we need to have an overview of what we mean by **correlation**: what follows is an introduction
of the statistical formalism needed to explain **correlation power
analysis**[^Optimal].
[^Optimal]: Optimal statistical power analysis; E. Brier, C. Clavier, F. Olivier ([link to the paper](https://eprint.iacr.org/2003/152.pdf))
First of all, we need a model to describe the relationship between power
consumption and the _digital world_: the simplest one with usefulness is the
model using the **Hamming weight** or **Hamming distance**.
The **Hamming weight** is a quantity defined over a number using its base 2 representation: if \\(d\\) is
a number such that
$$
d = \sum_{i=0}^{N  1} d_i 2^i\hbox{ with }d_i\in \\{0,1\\}
$$
we define the hamming weight as
$$
HW(d) = \sum_{i=0}^{N  1} d_i
$$
Although the Hamming weight is not linear with respect to its argument, if we
see \\(d\\) as a uniformly independent random variable we can safely assume for
the mean and variance for the Hamming weight
$$
\mu_H = {N\over 2}\qquad\sigma^2_H = {N\over4}
$$
What does it mean? simply that when we are going to capture traces, we are going
to feed as input random bytes (in statistical terms are random variables), since
the HW _inherits_ randomness from them, it's also treated as a random
variable, allowing to use statistical tools to analyze the results.
Another important quantity related to the Hamming weight is the **Hamming
distance** that measure the number of bit flip between two numbers (represented
in base 2); this quantity as a very simple definition [^HD]
$$
HD(S, D) = HW(S\oplus D)
$$
[^HD]: Hacker's delight pg95 and pg343
where \\(\oplus\\) is the ``xor`` (exclusive or) operation defined via this
truth table
 A  B  A \\(\oplus\\) B 

 0  0  0 
 0  1  1 
 1  0  1 
 1  1  0 
An interesting property is its simmetry
$$
A\oplus B = B \oplus A
$$
and interestingly, if we fix one operand and make the other a uniformly random
independent variable, the Hamming distance has the same mean and variance as the
Hamming weight. We will see later how this can be useful.
What we want to achieve is the "measure" of the correlation between our model's
parameters and the power consumption and to do that we need to use the
**Pearson's correlation factor**: its definition is given by
$$
\rho(A,B) = {Cov(A, B)\over\sigma_A\sigma_B}
$$
where \\(A\\) and \\(B\\) are random variables; the value of \\(\rho(A, B)\\) is
comprised between \\(1\\) and \\(1\\), when two variables are independent
\\(\rho\\) has a value of zero.
In the following experiments I'm going to use (implicitely) a model where the power consumption
\\(W\\) is related to the Hamming weight of some input \\(D\\) (as a random
variable) by the following relation
$$
W = aH(D) + b
$$
where \\(b\\) is a **noise term**, all the contributions for the power
consumption of the device not involved with the computation directly.
From this model is possible to obtain the following statistical relations:
$$
\eqalign{
\sigma^2_W &= a^2\sigma^2_H + \sigma^2_b\cr
\rho(W, H) &= {Cov(W, H)\over\sigma_W\sigma_H}\cr
&= {Cov(aH+b, H)\over\sqrt{a^2\sigma^2_H + \sigma^2_b}\sigma_H}\cr
&= {aCov(H, H)\over\sqrt{a^2\sigma^2_H + \sigma^2_b}\sigma_H}\cr
&= {a\sigma_H^2\over\sqrt{a^2\sigma^2_H + \sigma^2_b}\sigma_H}\cr
&= {a\sigma_H\over\sqrt{a^2\sigma^2_H + \sigma^2_b}}\cr
&= {a\sqrt{N/4}\over\sqrt{a^2{N/4} + \sigma^2_b}}\cr
&= \hbox{sign}(a){\sqrt{N}\over\sqrt{N + 4\sigma^2_b}}\cr
}
$$
so the absolute value of the correlation tends to the limit of 1 when the noise
term is negligible, where the sign is given only by the sign of the coefficient \\(a\\).
Another useful fact for later, related to the Hamming distance, is that if you
want to maximise the correlation of this with respect to a fixed "background",
this maximum is unique[^Optimal].
## Measuring different instructions
Before starting crunching numbers, a very simple overview of how different
instructions have different
"energyfootprint": let's try to show the power consumption of the target executing
 10 ``nop``s
 10 ``mul``s
 20 ``mul``s
 10 ``mul``s, 10 ``nop``s and 10 ``muls``
![](/images/sidechannels/comparisoninstructions.png)
the assembly code is the following (each column is a different capture)
```text
nop mul r0, r1 mul r0, r1 mul r0, r1
nop mul r0, r1 mul r0, r1 mul r0, r1
nop mul r0, r1 mul r0, r1 mul r0, r1
nop mul r0, r1 mul r0, r1 mul r0, r1
nop mul r0, r1 mul r0, r1 mul r0, r1
nop mul r0, r1 mul r0, r1 mul r0, r1
nop mul r0, r1 mul r0, r1 mul r0, r1
nop mul r0, r1 mul r0, r1 mul r0, r1
nop mul r0, r1 mul r0, r1 mul r0, r1
nop mul r0, r1 mul r0, r1 mul r0, r1
rjmp .2 rjmp .2 mul r0, r1 nop
mul r0, r1 nop
mul r0, r1 nop
mul r0, r1 nop
mul r0, r1 nop
mul r0, r1 nop
mul r0, r1 nop
mul r0, r1 nop
mul r0, r1 nop
mul r0, r1 nop
rjmp .2 mul r0, r1
mul r0, r1
mul r0, r1
mul r0, r1
mul r0, r1
mul r0, r1
mul r0, r1
mul r0, r1
mul r0, r1
mul r0, r1
rjmp .2
```
the tail of the capture is the ``rjmp`` instruction that loops indefinitely.
The big spike at the start of the capture is probably the current generated by
the ``GPIO`` used to trigger the capture.
The first thing that you can notice is that ten ``nop``s are taking 40 cycles to
complete (i.e. 10 target's clock cycles) meanwhile ten ``mul``s needs double
that number. This matches with the actual number of cycles needed by these
instructions: ``nop`` executes in one cycle, meanwhile ``mul`` takes two cycles
and each cycle the ``ADC`` samples four times.
## Correlation
In the last section I showed that different instructions have different
"footprints", probably the easiest possible way to use that is by timing
analysis: for example if we had a piece of code that checks
for a password and exits at the first wrong character, we could capture the power traces for
every possible (first char) input and
we should see only one having a different pattern.
What follows is an example with the firmware provided by the Chipwhisperer
(``basicpasswdcheck``) where the ``h`` character is the first of the secret
password (I grouped the traces in sets of 16 to make clearer the "outcast")
![](/images/sidechannels/basicpasswdchecktiming.png)
![](/images/sidechannels/basicpasswdchecktiming1.png)
![](/images/sidechannels/basicpasswdchecktiming2.png)
![](/images/sidechannels/basicpasswdchecktiming3.png)
For now I'm not interested in performing specific attacks but to study the
relationship between instructions and power consumption and to do this I will
perform an "experiment" where I use as input the firmware code with the simplest
possible assembly instructions.
The first experiment is to check what we can infer from the relation between the
argument of the instruction ``ldi r16, `` and the power consumption: these
are the (interesting portion of) instructions (``sts`` is the instruction
triggering the capture of the trace)
```
6de: c0 93 05 06 sts 0x0605, r28
6e2: 00 00 nop
6e4: 00 00 nop
6e6: 00 00 nop
6e8: 00 00 nop
6ea: 00 00 nop
6ec: 00 00 nop
6ee: 00 00 nop
6f0: 00 00 nop
6f2: 00 00 nop
6f4: 00 00 nop
6f6: 00 e0 ldi r16,
6f8: 00 00 nop
6fa: 00 00 nop
6fc: 00 00 nop
6fe: 00 00 nop
700: 00 00 nop
702: 00 00 nop
704: 00 00 nop
706: 00 00 nop
708: 00 00 nop
70a: 00 00 nop
70c: ff cf rjmp .2
```
As you can see there is the instruction of interested sourrounded by
inputindependent code, this should make the relation standing out.
After capturing 100 traces for each hamming weight it's possible to plot the
average for each one together as in the following plot where I tried to align
the instructions with the actual clock cycle where they **should** happen
(I'm not sure of the alignment of the trace with the actual instructions, so the
instruction in picture are the first guess)
![](/images/sidechannels/pahamming.png)
It's possible to note "something going on" around the ``ldi`` instruction; if
we zoom in we can see a little better
![](/images/sidechannels/pahammingzoom.png)
At first seems disaligned by halfinstruction, take note of this for the
following.
Meanwhile it's possible to have a better view plotting a scatterplot of
mean value of a given sample for a specific Hamming weight versus the
corresponding hamming weight (the
vertical bars are the standard deviations of each respective set of traces):
![](/images/sidechannels/pahammingscatterplot.png)
The first interesting samples have a **negative** correlation but after a couple
of cycles this change to positive improving considerebly (note for example at
sample 50 the standard deviations are way far from each other, this means that
also taking noise into consideration the respective sets of traces are
"separated").
The final graph is the correlation I talked so much about
![](/images/sidechannels/ldicorr.png)
Just with one kind of instruction is not possible to deduce anything valuable
(althought it's interesting seeing that _something is there_), now I want to try
a different instruction: ``adc rx, ``.
This is the code under scrutiny
```
6de: c0 93 05 06 sts 0x0605, r28
6e2: 10 e0 ldi r17,
6e4: 00 e0 ldi r16, 0x00
6e6: 00 00 nop
6e8: 00 00 nop
6ea: 00 00 nop
6ec: 00 00 nop
6ee: 00 00 nop
6f0: 00 00 nop
6f2: 00 00 nop
6f4: 00 00 nop
6f6: 00 00 nop
6f8: 00 00 nop
6fa: 01 1f adc r16, r17
6fc: 00 00 nop
6fe: 00 00 nop
700: 00 00 nop
702: 00 00 nop
704: 00 00 nop
706: 00 00 nop
708: 00 00 nop
70a: 00 00 nop
70c: 00 00 nop
70e: 00 00 nop
710: ff cf rjmp .2
```
Plotting each averaged class we obtain something similar to before
![](/images/sidechannels/adchamming.png)
but notice how the instructions seem aligned perfectly!
![](/images/sidechannels/adchammingzoom.png)
in particular is interesting to note that there is not in this case a negative
correlation around the _start_ of the instruction
![](/images/sidechannels/adchammingscatterplot.png)
![](/images/sidechannels/adccorr.png)
What explanation is available for this behaviour? Well, a modern processing unit
employs some tricks to speedup computation and a prevalent one is the
**pipeling**, i.e., splitting the instruction life cycle in different stages and
when the unit is executing a given instruction is also doing other stuff with the
instruction that should be executed next; in particular for the XMega we have this quote from the
manual:
The AVR uses a Harvard architecture  with separate memories and buses for program and
data. Instructions in the program memory are executed with a single level pipeline. While one
instruction is being executed, the next instruction is prefetched from the
program memory. This concept enables instructions to be executed in every clock cycle
To clarify a little better here a diagram
![](/images/sidechannels/avrpipelinetiming.png)
Now if you take a look at the [AVR instruction set manual](http://ww1.microchip.com/downloads/en/devicedoc/atmel0856avrinstructionsetmanual.pdf)
you can see how the opcode for the ``ldi`` instruction is encoded:
![](/images/sidechannels/avrldi.png)
i.e. the value to put in the register is encoded directly in the instruction
that is read in the fetching stage during the previous instruction!
With this in mind for now on I assume that the alignment is correct (another
implicit assumption is that the ``sts`` instruction, that is twocycles long,
executes at the last cycle); while
I'm at it I want to doubleproof the _fetching stage_ assumption for ``ldi``
putting an infinite loop just before it :)
Observe what happens if we flash this code
```
6de: c0 93 05 06 sts 0x0605, r28 ; 0x800605 <__TEXT_REGION_LENGTH__+0x7de605>
6e2: 00 00 nop
6e4: 00 00 nop
6e6: 00 00 nop
6e8: 00 00 nop
6ea: 00 00 nop
6ec: 00 00 nop
6ee: 00 00 nop
6f0: 00 00 nop
6f2: 00 00 nop
6f4: 00 00 nop
6f6: ff cf rjmp .2 ; 0x6f6
6f8: 00 e0 ldi r16,
```
![](/images/sidechannels/ldihamming.png)
it's clearly visible a repeated pattern, look at this zoom
![](/images/sidechannels/ldihammingzoom.png)
and the repeation in the correlation
![](/images/sidechannels/ldihammingscatterplot.png)
![](/images/sidechannels/ldiloopovercorr.png)
Compare what you have seen with the same concept applied to the ``adc``
![](/images/sidechannels/adcloopoverhamming.png)
No correlation at all! I mean, this would have been obvious from the start but
it's always better to double check unexpected results :)
Now to improve on my experiment I decided to change strategy: until now I
flashed a different code every time I wanted to read a particular value for
``ldi`` but this involves overwriting the flash so I decided to write some code
to build a **jump table**; with this code a character is read from the serial,
converted to an integer and used as an index to jump to a specific block where
the ``ldi r16, `` is waiting to execute (I have a specific [post](link://slug/reversingavrcode) about
reversing ``AVR`` assembly if you want to understand what is happening).
![](/images/sidechannels/ldijumptablehamming.png)
and also in this case it's pretty amazing how the instructions align perfectly
(by the way, an [old AVR instruction set manual indicated contradicting values for
the number of cycles for the ``ldd`` instruction](https://www.reddit.com/r/avr/comments/s3p2yl/contradictory_information_about_cycles_for_the/) so I used this captures as a
evidence for the correct number).
There is only one thing that bothers me, there is apparently a "tail" after the
instruction that shouldn't be there, my educated guess is that the ``ADC``
frontend, being ``AC`` coupled causes this anomaly since the capacitor in front
of the measuring device has a change oscillating with the flow of current.
On order to test this I tried to capture the power consumption using my trusty
Siglent using some python code that I wrote, but this is argument of a future
post.
## A more realistic case
Now I want to try to apply to the simplest difficult problem, i.e., trying
to break a constant time password comparison; the code is the following
```c
#define PASSWORD "kebab"
#define SIZE_INPUT 32
int main(void)
{
char password[] = PASSWORD;
platform_init();
init_uart();
trigger_setup();
/* banner requesting the password */
putch('p');
putch('a');
putch('s');
putch('s');
putch('w');
putch('o');
putch('r');
putch('d');
putch(':');
putch(' ');
/* taking the password, waiting for a newline or for SIZE_INPUT characters */
char c;
char input[SIZE_INPUT];
unsigned int input_index = 0;
do {
c = getch();
input[input_index++] = c;
} while(c != '\n' && input_index < SIZE_INPUT);
trigger_high();
unsigned char result = 0;
for (uint8_t index = 0 ; index < strlen(PASSWORD) ; index++) {
result = input[index] ^ password[index];
}
if (result)
while(1);
putch('W');
putch('I');
putch('N');
while(1);
}
```
and this is the generated assembly (I advice always to check generated code
because the compiler sometimes does magic)
```
000006c6 :
6c6: cf 93 push r28
6c8: df 93 push r29
6ca: cd b7 in r28, 0x3d ; 61
6cc: de b7 in r29, 0x3e ; 62
6ce: a6 97 sbiw r28, 0x26 ; 38 < allocate 38 byte on the stack (32 for the input, 6 for the hardcoded password (that includes the null byte))
6d0: cd bf out 0x3d, r28 ; 61
6d2: de bf out 0x3e, r29 ; 62
6d4: 86 e0 ldi r24, 0x06 ; 6 < set the counter to six
6d6: e0 e1 ldi r30, 0x10 ; 16
6d8: f0 e2 ldi r31, 0x20 ; 32 < set the source Z to the password
6da: de 01 movw r26, r28 < set dest to X (on the stack)
6dc: 91 96 adiw r26, 0x21 ; 33 < with a starting offset of 33
6de: 01 90 ld r0, Z+ < loop starts here with r0 = *Z++
6e0: 0d 92 st X+, r0 < *(X++) = r0
6e2: 8a 95 dec r24 < decrement counter
6e4: e1 f7 brne .8 < jump if not zero 
6e6: 0e 94 4d 03 call 0x69a ; 0x69a
6ea: 0e 94 7c 02 call 0x4f8 ; 0x4f8
6ee: 81 e0 ldi r24, 0x01 ; 1
6f0: 80 93 01 06 sts 0x0601, r24 ; 0x800601 <__TEXT_REGION_LENGTH__+0x7de601>
6f4: 80 e7 ldi r24, 0x70 ; 112
6f6: 0e 94 b9 02 call 0x572 ; 0x572
6fa: 81 e6 ldi r24, 0x61 ; 97
6fc: 0e 94 b9 02 call 0x572 ; 0x572
700: 83 e7 ldi r24, 0x73 ; 115
702: 0e 94 b9 02 call 0x572 ; 0x572
706: 83 e7 ldi r24, 0x73 ; 115
708: 0e 94 b9 02 call 0x572 ; 0x572
70c: 87 e7 ldi r24, 0x77 ; 119
70e: 0e 94 b9 02 call 0x572 ; 0x572
712: 8f e6 ldi r24, 0x6F ; 111
714: 0e 94 b9 02 call 0x572 ; 0x572
718: 82 e7 ldi r24, 0x72 ; 114
71a: 0e 94 b9 02 call 0x572 ; 0x572
71e: 84 e6 ldi r24, 0x64 ; 100
720: 0e 94 b9 02 call 0x572 ; 0x572
724: 8a e3 ldi r24, 0x3A ; 58
726: 0e 94 b9 02 call 0x572 ; 0x572
72a: 80 e2 ldi r24, 0x20 ; 32
72c: 0e 94 b9 02 call 0x572 ; 0x572
730: ce 01 movw r24, r28
732: 01 96 adiw r24, 0x01 ; 1
734: 7c 01 movw r14, r24
736: 8e 01 movw r16, r28
738: 0f 5d subi r16, 0xDF ; 223 < r16 = fp  32
73a: 1f 4f sbci r17, 0xFF ; 255
73c: 6c 01 movw r12, r24
73e: 0e 94 b2 02 call 0x564 ; 0x564
742: d6 01 movw r26, r12
744: 8d 93 st X+, r24
746: 6d 01 movw r12, r26
748: 8a 30 cpi r24, 0x0A ; 10
74a: 19 f0 breq .+6 ; 0x752
74c: a0 17 cp r26, r16
74e: b1 07 cpc r27, r17
750: b1 f7 brne .20
752: 81 e0 ldi r24, 0x01 ; 1
754: 80 93 05 06 sts 0x0605, r24
758: f8 01 movw r30, r16 < Z points to password
75a: 9e 01 movw r18, r28
75c: 2a 5f subi r18, 0xFA ; 250
75e: 3f 4f sbci r19, 0xFF ; 255
760: 80 e0 ldi r24, 0x00 ; 0
762: d7 01 movw r26, r14 < loop starts here
764: 4d 91 ld r20, X+
766: 7d 01 movw r14, r26
768: 91 91 ld r25, Z+
76a: 94 27 eor r25, r20
76c: 89 2b or r24, r25
76e: a2 17 cp r26, r18
770: b3 07 cpc r27, r19
772: b9 f7 brne .18 ; 0x762
774: 81 11 cpse r24, r1
776: ff cf rjmp .2 ; 0x776
778: 87 e5 ldi r24, 0x57 ; 87
77a: 0e 94 b9 02 call 0x572 ; 0x572
77e: 89 e4 ldi r24, 0x49 ; 73
780: 0e 94 b9 02 call 0x572 ; 0x572
784: 8e e4 ldi r24, 0x4E ; 78
786: 0e 94 b9 02 call 0x572 ; 0x572
78a: ff cf rjmp .2 ; 0x78a
```
Let see a single capture
![](/images/sidechannels/constanttimetrace.png)
it's possible to deduce that there are five iterations happening before the
infinite loop (five is the number of characters in the password).
**Note:** this could seem boring but for me it's very important: thanks to power
analysis you have some sort of **oracle** that gives you information otherwise
unavailable! Imagine you have a device that presents some sort of password
prompt and you, via tinkering from the terminal can asses that accepts up to 32
characters; this means you have \\(\left(2^8\right)^{32} = 2^{256}\\) possible passwords,
a key space huge, without hope to be bruteforced.
**But** from power analysis you have now reduced the key space to \\(\left(2^8\right)^{5} = 2^{40}\\)
more feasible (an encryption algorithm with such "strength" would be considered broken).
Obviously a 32 characters password doesn't have this problem :)
Now if we study only the power consumption
in relation to the 0th byte of input it's possible to see "something
happening" only in the first iteration of the loop (actually there is something
going on also at the start of the second iteration, but "less").
![](/images/sidechannels/constanttime0thhamming.png)
here the zoom
![](/images/sidechannels/constanttime0thhammingzoom.png)
This is interesting: we have a way to see **when** our input is used in the
computation; in the graph above I overlayed the instructions but obviously in
general this is not possible, however this temporal information instead is available
via power analysis. This is akin to **taint analysis** done in security research
were the software is analyzed to find how the user input can "propagate" and
impact execution, take a look at [libdft](https://www.cs.columbia.edu/~vpk/research/libdft/)
for example. But this out of scope (for now :P).
Now I want to try something new: remember the stuff above about the Hamming distance
and the ``xor`` operation? in
this case I want to try to find for each sample the element which xor creates
the most correlated relation with respect to the input. And something interesting come out: obviously the
majority of samples have no correlation with nothing, in a couple of points
there is correlation with the input bytes but specific samples (corresponding
to the instruction ``ld r25, Z+``, i.e. the instruction loading the hardcoded
password into a register) correlate to the password!
Here the results where each entry is the best correlation between the ith input
byte (row) and the jth iteration of the instruction (column):

0 
1 
2 
3 
4 
0 
b'k':0.73 
b'\xeb':0.26 
b'\xd4':0.06 
b'\xfe':0.05 
b'\xd6':0.04 
1 
b'F':0.02 
b'e':0.76 
b'\xe5':0.25 
b'\xde':0.07 
b'\xda':0.06 
2 
b'\xbb':0.02 
b'z':0.01 
b'b':0.78 
b'\xe2':0.25 
b'\xbd':0.06 
3 
b'\xbf':0.11 
b'\x9f':0.10 
b'\xbf':0.06 
b'a':0.80 
b'\xe1':0.20 
4 
b']':0.02 
b'\xf7':0.01 
b'v':0.01 
b'\xdf':0.02 
b'b':0.81 
and the diagonal contains the password (the floating point number is the value
of the correlation). For completness here the table with the iterations along the
rows and the (eight) samples related to the instruction along the columns
 sample index  0  1  2  3  4  5  6  7 

 44 b'\x00' 0.56 b'+' 0.68 b'+' 0.73 b'k' 0.72 b'k' 0.71 b'+' 0.60  b'+' 0.61 b'+' 0.58 
 92 b'\x80' 0.20 b'e' 0.67 b'e' 0.76 b'e' 0.76 b'e' 0.74 b'e' 0.42 b'e' 0.40 b'e' 0.39 
 140 b'A' 0.18 b'b' 0.72 b'b' 0.78 b'b' 0.78 b'b' 0.76 b'b' 0.43 b'b' 0.41 b'b' 0.39 
 188 b'\x88' 0.13 b'a' 0.73 b'a' 0.80 b'a' 0.80 b'a' 0.78 b'a' 0.45 b'a' 0.44 b'a' 0.42 
 236 b'@' 0.21 b'b' 0.76 b'b' 0.82 b'b' 0.81 b'b' 0.80 b'b' 0.45 b'b' 0.43 b'b' 0.41 
Now with this discovery I would expect to see a symmetric relation with the
``ldd r20, X+`` instruction: if we have correlation between the input byte
(``ldd r20, X+``) and the password byte (``ldd r25, Z+``) in a given iteration I would expect to see a
similar correlation between the password byte and the following iteration input byte but this doesn't happen;
if you look at the table with the most correlated bytes you see that there are
interesting correlation with **two** previous iteration (sample 129, 177, 225)
 sample index  0  1  2  3  4  5  6  7 

 32 b'\xb1' 0.01 b'\x03' 0.82 b'\x03' 0.87 b'\x03' 0.86 b'\x03' 0.83 b'\x00' 0.90 b'\x00' 0.91 b'\x00' 0.90 
 80 b'm' 0.01 b'\xa1' 0.82 b'\xa1' 0.88 b'\xa1' 0.87 b'\xa1' 0.85 b'\x80' 0.64 b'\x80' 0.67 b'\x80' 0.66 
 128 b'\x15' 0.01 b'k' 0.78 b'k' 0.85 b'k' 0.83 b'k' 0.81 b'\x00' 0.44 b'\x80' 0.49 b'\x80' 0.49 
 176 b'\x9f' 0.09 b'e' 0.82 b'e' 0.88 b'e' 0.87 b'e' 0.85 b'\x00' 0.50 b'\x00' 0.55 b'\x80' 0.54 
 224 b'\xf7' 0.01 b'b' 0.79 b'b' 0.86 b'b' 0.84 b'b' 0.82 b'\x00' 0.52 b'\x80' 0.55 b'\x80' 0.55 
To summarize the relationship between power consumption and input bytes I'm
using the following diagram where in the entry \\(\left(i, j\right)\\) of the
"matrix" you have the correlation with \\(I[i]\otimes I[j]\\), with \\(I[\alpha]\\)
indicating the \\(\alpha\\)th input byte. In the diagonal \\(\left(a, a\right)\\)
instead I placed the direct correlation with the input byte \\(I[a]\\)
![](/images/sidechannels/constanttimecrosscorr.png)
It's possible to see a correlation between two adjacent ``ld``s from memory.
Improving the experiment a little, it's possible to manually craft some assembly
to unroll the loop in a way that the loadings of the password and the input bytes
are not more one after the other, but instead the password is loaded into some
registers at the start of the routine
(this time I used a different password: "armedilzo", so I don't have repeated
characters and is longer)
```
75a: 80 93 05 06 sts 0x0605, r24 ; 0x800605 <__TEXT_REGION_LENGTH__+0x7de605>
75e: 00 e0 ldi r16, 0x00 ; 0
760: d9 a0 ldd r13, Y+33 ; 0x21
762: fa a0 ldd r15, Y+34 ; 0x22
764: 1b a1 ldd r17, Y+35 ; 0x23
766: ac a1 ldd r26, Y+36 ; 0x24
768: ed a1 ldd r30, Y+37 ; 0x25
76a: 6e a1 ldd r22, Y+38 ; 0x26
76c: 4f a1 ldd r20, Y+39 ; 0x27
76e: 28 a5 ldd r18, Y+40 ; 0x28
770: 89 a5 ldd r24, Y+41 ; 0x29
772: e9 80 ldd r14, Y+1 ; 0x01
774: de 24 eor r13, r14
776: 0d 29 or r16, r13
778: ea 80 ldd r14, Y+2 ; 0x02
77a: fe 24 eor r15, r14
77c: 0f 29 or r16, r15
77e: eb 80 ldd r14, Y+3 ; 0x03
780: 1e 25 eor r17, r14
782: 01 2b or r16, r17
784: ec 80 ldd r14, Y+4 ; 0x04
786: ae 25 eor r26, r14
788: 0a 2b or r16, r26
78a: ed 80 ldd r14, Y+5 ; 0x05
78c: ee 25 eor r30, r14
78e: 0e 2b or r16, r30
790: ee 80 ldd r14, Y+6 ; 0x06
792: 6e 25 eor r22, r14
794: 06 2b or r16, r22
796: ef 80 ldd r14, Y+7 ; 0x07
798: 4e 25 eor r20, r14
79a: 04 2b or r16, r20
79c: e8 84 ldd r14, Y+8 ; 0x08
79e: 2e 25 eor r18, r14
7a0: 02 2b or r16, r18
7a2: e9 84 ldd r14, Y+9 ; 0x09
7a4: 8e 25 eor r24, r14
7a6: 08 2b or r16, r24
7a8: 01 11 cpse r16, r1
7aa: ff cf rjmp .2 ; 0x7aa
```
this time the correlation between two``ld``sapart is more apparent.
![](/images/sidechannels/constanttimeunrollmanualcrosscorr.png)
probably the branch instruction interferes with the ``ld``s in some obscure way.
This breaks my method of recovering the password but maybe something can be done
for this and if something interesting come up I will tell you in future.
## Conclusion
In this post I gave you an overview of what is possible to "see" via power analysis and
some of the tools available for it. Maybe I uncovered a new "primitive" to steal
secret in chips of the AVR family. Probably there will be a future post about
some "advanced" experiments, like capturing traces via an oscilloscope, align
them via fourier transform and more.
## Linkography
 [AVR instruction set manual](http://ww1.microchip.com/downloads/en/devicedoc/atmel0856avrinstructionsetmanual.pdf)
 [Statistics and Secret Leakage](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.3849&rep=rep1&type=pdf)
 [Power Analysis, What Is Now Possible...](https://link.springer.com/content/pdf/10.1007/3540444483_38.pdf), paper from ``ASIACRYPT2000``
 [Investigations of Power Analysis Attacks on Smartcards](https://www.usenix.org/legacy/events/smartcard99/full_papers/messerges/messerges.pdf)
 [Differential Power Analysis](https://www.paulkocher.com/doc/DifferentialPowerAnalysis.pdf), paper by Paul Kocher
 [Correlation Power Analysis with a Leakage Model](https://www.researchgate.net/publication/221291676_Correlation_Power_Analysis_with_a_Leakage_Model) paper introducing the CPA in 2004
 [Tamper Resistance  a Cautionary Note](https://www.cl.cam.ac.uk/~rja14/tamper.html)
 [Understanding Power Trace](https://forum.newae.com/t/understandingpowertrace/1905), post from NewAE forum explaining the meaning of the power traces
 [CHIP.FAIL – GLITCHING THE SILICON OF THE CONNECTED WORLD](https://sector.ca/sessions/chipfailglitchingthesiliconoftheconnectedworld/)
 [Fault diagnosis and tolerance in cryptography (FDTC) 2014](https://fdtc.deib.polimi.it/FDTC14/slides.html) with some slides:
 https://fdtc.deib.polimi.it/FDTC11/shared/FDTC2011session43.pdf
 https://fdtc.deib.polimi.it/FDTC11/shared/FDTC2011session21.pdf
 https://fdtc.deib.polimi.it/FDTC14/shared/FDTC2014session_1_1.pdf
 [POWER ANALYSIS BASED SOFTWARE REVERSE ENGINEERING ASSISTED BY FUZZING II](https://www.schutzwerk.com/en/43/posts/poweranalysis_2/)
 [Power Analysis For Cheapskates](https://media.blackhat.com/us13/US13OFlynnPowerAnalysisAttacksforCheapskatesWP.pdf)
 [Sidechannel Attacks Using the Chipwhisperer](https://www.robopenguins.com/chipwhisperer/)
 [Pearson's correlation](https://www.statlect.com/fundamentalsofprobability/linearcorrelation)