This post will be probably a too ambitious one, since I'm going to write about a
very theoretical aspect of computation, in particular the so called **abstract
interpretation**: for now let me just say that this field tries to give the
_best possible answer_ to unsolvable characterization of computer programs.
Another point of view that I want to address is the use of this kind of analysis
with reverse engineering, since a lot of aspects between these two kind of
analysis are shared.
This post will have pretty mathematical formalism but should be readable enough
also without a PhD.
## Languages
As a first thing to start thinking about code we need to define what is a
program: let me show you a grammar of a pseudo-C language:
$$
\begin{array} {rlll}
\mathsf{S} & ::= & \mathsf{x = A} \\\\
& | & ; \\\\
& | & \mathsf{{\bf if}\, (B)\, S} \\\\
& | & \mathsf{{\bf if}\, (B)\, S\\, {\bf else}\, S} \\\\
& | & \mathsf{{\bf while}\, (B)\, S} \\\\
& | & \mathsf{{\bf break} ; } \\\\
& | & \mathsf{\\{ Sl \\} } \\\\
\mathsf{Sl} & ::= &\mathsf{Sl}\\; \mathsf{S}\\;|\,\epsilon \\\\
\mathsf{P} & ::= &\mathsf{Sl} \\\\
\end{array}
$$
We'll use this grammar to define a **structural analysis** of the code, a
pseudo-recursive approach: to understand what I mean, let's start with the
simplest analysis possible, i.e., how many statements there are in the program;
obviously you would think that simply counting the "lines" should suffice, but
we can instead define the counting in a structural way, starting from the
terminals, i.e. the statements themselves
$$
\begin{array} {rlll}
\mathsf{S} & ::= & \mathsf{x = A} & \hbox{count}(\mathsf{S}) = 1 \\\\
\mathsf{Sl} & ::= & \mathsf{Sl}^\prime\\;\mathsf{S} & \hbox{count}(\mathsf{Sl})=\hbox{count}(\mathsf{Sl}^\prime) + \hbox{count}(\mathsf{S}) \\\\
\mathsf{S} & ::= & \mathsf{{\bf if}\\, (B)\\, S_t} & \hbox{count}(\mathsf{S}) = \hbox{count}(\mathsf{S_t}) + 1 \\\\
\mathsf{S} & ::= & \mathsf{{\bf if}\\, (B)\\, S_t\\, {\bf else}\\, S_f} & \hbox{count}(\mathsf{S}) = \hbox{count}(\mathsf{S_t}) + \hbox{count}(\mathsf{S_f}) + 2 \\\\
\mathsf{S} & ::= & \mathsf{{\bf while}\\, (B)\\, S_b} & \hbox{count}(\mathsf{S}) = \hbox{count}(\mathsf{S_b}) + 1 \\\\
\end{array}
$$
This works because since a program it's a **finite** set of statements, there are **finite** numbers of steps
to reach the terminating elements of the grammar.
Now, for something more interesting that will come in hand later, we can
associate at each structure a **control flow graph**: these diagrams show
the \\(\mathsf{\bf if}\\), \\(\mathsf{\bf if else}\\) and \\(\mathsf{\bf while}\\)
diagramatical structure

stateDiagram
head: if (B)
[*] --> head
head --> [*]
head --> S: true
S --> [*]

stateDiagram
head: if (B)
left: S
right: S
[*] --> head
head --> left: false
head --> right: true
right --> [*]
left --> [*]

stateDiagram
head: while (B)
body: S
[*] --> head
head --> body: true
body --> head
head --> [*]: false

Now, using structural analysis, it's possible to deduce that relationship between the
number of entering and exiting edges in a particular block
$$
\begin{array} {rlll}
\mathsf{S} & ::= & \mathsf{x = A} & \sharp\hbox{out}(\mathsf{S}) = 1 \\\\
\mathsf{Sl} & ::= & \mathsf{Sl}^\prime\\;\mathsf{S} & \sharp\hbox{in}(\mathsf{Sl})=\sharp\hbox{in}(\mathsf{Sl}^\prime),\;\sharp\hbox{out}(\mathsf{Sl}) = \sharp\hbox{out}(\mathsf{S}) \\\\
\mathsf{S} & ::= & \mathsf{{\bf if}\\, (B)\\, S_t} & \sharp\hbox{out}(\mathsf{S}) = \sharp\hbox{out}(\mathsf{S_t}) + 1 \\\\
\mathsf{S} & ::= & \mathsf{{\bf if}\\, (B)\\, S_t\\, {\bf else}\\, S_f} & \sharp\hbox{out}(\mathsf{S}) = \sharp\hbox{out}(\mathsf{S_t}) + \sharp\hbox{out}(\mathsf{S_f}) \\\\
\mathsf{S} & ::= & \mathsf{{\bf while}\\, (B)\\, S_b} & \sharp\hbox{out}(\mathsf{S}) = 1 \\\\
\end{array}
$$
and in particular it's obvious that the convergence node of an \\(\mathsf{\bf if}\\)
block might have a number of entering edges without an upper bound.
**Note:** you can build directly the flowchart from the CFG.
With this we can assume that the final CFG is composed of nodes that have **at
most** two exiting edges but an unlimited number of entering edges.
But, the important fact is that the control flow instructions are the ones with
the two exiting edges, where the ``while`` is identifiable by a returning edge
but the ``if`` is recognizable from the two different paths converging to the
same exit node. Obviously ``goto`` are not take into consideration, but in our
case it's not part of the language.
## Abstract interpretation and lattices
What is abstract interpretation and why we should care? to answer this you
should know that from theoretical computation studies it's known that it's
impossible to build a program that is able to extract information of another
program unless it's a trivial property.
This doesn't mean that we cannot know anything about a given program, simply
that we must accept the fact the we have to approximate something.
The following in this section is an overview of the formalism needed to actually
talk about what means "approximate". It's pretty much taken from two books: the
most formal book on the subject "Principles of abstract interpretation" by
Patrick Cousot (if you are not a math-person avoid it, it's practically a book
about proving theorems) and "Static program analysis" by Anders Møller and Michael I. Schwartzbach,
an openly available book that you can find [here](https://cs.au.dk/~amoeller/spa/spa.pdf).
The last one is more accessible (it's still pretty formal though).
Let's start with the simplest of the examples (it's the first example in any
book about this stuff :P) i.e. the study of the "sign": we know that a variable
representing an integer can assume three possible values with respect to the
sign, it can be positive, negative or be zero.
With repect to multiplication this "classification" behaves well
$$
\begin{array} {r|ccc}
* & + & 0 & - \\\\
\hline + & + & 0 & - \\\\
0 & 0 & 0 & 0 \\\\
- & - & 0 & + \\\\
\end{array}
$$
but not the addition
$$
\begin{array} {r|ccc}
+ & + & 0 & - \\\\
\hline + & + & + & ? \\\\
0 & + & 0 & - \\\\
- & ? & - & - \\\\
\end{array}
$$
since summing two integers with opposite signs can have **all** the prossible
outcomes, this means that we lose **any information possible**. Moreover this
means that every further calculation needs to take that into account (note how
in the particular case of multiplication, whatever value is multiplied by zero,
it becomes zero).
To formalize mathematically this concept of "anything" we introduce the symbol
\\(\top\\) (you can read it as "top") representing the set of all possible values
in this representation (we are talking about signs, so it's the set
\\(\\{+,0,-\\}\\)) that roughly speaking means "I don't know what value actually
is". Let's also introduce the symbol \\(\bot\\) (you can read it as "bottom") to
indicate values that are not numbers (think about pointers or unreachable
variables).
This allows us to formalize our reasoning defining an **abstract domain**
composed by five symbols \\(\\{+, 0, -, \top, \bot\\}\\) that we can arrange in
the following manner
$$
\begin{matrix}
&& \top & \\\\
&\huge\diagup & \huge| & \huge\diagdown \\\\
+ & & 0 & & -\\\\
&\huge\diagdown & \huge| & \huge\diagup \\\\
&&\bot
\end{matrix}
$$
where the elements are ordered vertically with regard of "approximation";
let's define an ordering via the operator \\(\sqsubseteq\\):
when \\(x\sqsubseteq y\\) we say that \\(y\\) is a **safe approximation** of
\\(x\\), or that \\(x\\) is **at least as precise** as \\(y\\). In this case we
have \\(+\sqsubseteq\top\\). A set with such operator is called **poset**
(Partial order set). Let me introduce this space:
**Posets:** a _poset_ \\(\langle\mathbb{P}\sqsubseteq\rangle\\) is a set
\\(\mathbb{P}\\) with a _partial order_ \\(\sqsubseteq\\) that is
- _Reflexive:_ \\(\forall x\in\mathbb{P}.x\sqsubseteq x\\)
- _Antisymmetric:_ \\(\forall x,y\in\mathbb{P}.\left((x\sqsubseteq y)\wedge(y\sqsubseteq x)\right)\Rightarrow(x = y)\\)
- _Transitive:_ \\(\forall x,y,z\in\mathbb{P}.\left((x\sqsubseteq y)\wedge(y\sqsubseteq z)\right)\Rightarrow(x \sqsubseteq z)\\)
**Note:** not every couple of elements are supposed to have a well defined
relation with respect to the operator \\(\sqsubseteq\\): if for a couple of
element \\(x,y\in\mathbb{P}\\) we have \\(x\sqsubseteq\\) or \\(y\sqsubset x\\)
we say that are _comparable_, otherwise they are _incomparable_. When in a poset
all the elements are comparable we have a poset that is _total_.
Let \\(\langle \mathbb{P}, \sqsubseteq\rangle\\) be a poset and \\(S\in
\wp(\mathbb{P})\\) be a subset.This subset \\(S\\) has
- upper bound \\(u\\) iff \\(u\in\mathbb{P}\\) and \\(\forall x\in
S.x\sqsubseteq u\\)
- a least upper bound (lub, indicated \\(\sqcup S\\)) iff \\(\sqcup S\\) is an
upper bound of \\(S\\) smaller than other upper bound of \\(S\\).
\\(\sqcup\\{x, y\\}\\) is denoted with the infix notation \\(x\sqcup y\\)
The intuition around posets is that they abstract **set theory**, where
\\(\sqsubseteq\\) is the analogous of \\(\subseteq\\) and \\(\sqcup\\),
\\(\sqcap\\) have as "dual" \\(\cup\\) and \\(\cap\\).
Finite posets \\(\langle \mathbb{P}, \sqsubseteq\rangle\\) can be represented by **Hassle
diagrams**, which is a set of points \\(\\{p_x | x\in P\\}\\) in the plan such
that
- if \\(x \sqsubset y\\) then \\(p_x\\) is strictly below \\(p_y\\) and
- \\(p_x\\) and \\(p_y\\) are linked by a segment when \\(x \lessdot y\\)
(\\(y\\) covers \\(x\\)) where \\(x\lessdot y \triangleq x\sqsubset y\wedge
\not\exists z\in P.x\sqsubset z\wedge z\sqsubset y\\).
**Note:** two unlinked elements are incomparable.
$$
\begin{array}{ccl}
x\sqcap x = x & x\sqcup x = x & \hbox{idempotency} \\\\
x\sqcap y = y \sqcap x & x\sqcup y = y \sqcup x & \hbox{commutativity} \\\\
x\sqcap (y\sqcap z) = (x\sqcap y)\sqcap z & x\sqcup (y\sqcup z) = (x\sqcup y)\sqcap z & \hbox{associativity} \\\\
(x\sqcap y)\sqcup x = x & (x\sqcup y)\sqcap x = x & \hbox{absorpsion} \\\\
\end{array}
$$
**soundness:** the results are always correct
**completeness:** all true facts are provable
Soundness (\\(\Longleftarrow\\)) states that if a statement is proved bythe proof method then that
statement is true. Completeness (\\(\Longrightarrow\\)) states that the proof method is always applicable
to prove a true statement.
If you think it's useless stuff, you know, vulnerabilities come from it
- [Step-by-Step Walkthrough of CVE-2022-32792 - WebKit B3ReduceStrength Out-of-Bounds Write](https://starlabs.sg/blog/2022/09-step-by-step-walkthrough-of-cve-2022-32792/)
$$
\require{AMScd}
\begin{CD}
A @>a>> B\\\\
@V b V V @VV c V\\\\
C @>>d> D
\end{CD}
$$
### Properties
Properties can be understood as the set of mathematical objects that have this
property, e.g
$$
2\mathbb{Z} = \left\\{x\in\mathbb{Z}\,\|\,\exists k\in\mathbb{Z}.x=2k\right\\}
$$
hence if \\(P\\) is a property then \\(x\in P\\) means "\\(x\\) has property
\\(P\\)".
Under this interpretation, **logical implication** is _subset inclusion_: as an
example
$$
\hbox{"to be greater than 42 implies to be positive"} \Longleftrightarrow
\\{x\in\mathbb{Z}\,\|\,x\gt42\\}\subseteq\\{x\in\mathbb{Z}\,\|\,x\geq0\\}
$$
In the expression \\(P\subseteq Q\\), \\(P\\) is said to be **stronger**/**more
precise** since its property is satisfied by **less elements**.
To refer to the abstract domain of sign, we can associate to each abstract sign
a set represented by the elements in \\(\mathbb{Z}\\) that have such property,
i.e.
$$
\lt \,\sim\,\\{x\in\mathbb{Z}\,\|\,x\lt0\\}
$$
Another example is the property to be an integer constant, that is espressed by
the set
$$
\left\\{\left\\{ n\right\\}\,\|\,n\in\mathbb{Z}\right\\}
$$
### Liveness and safeness properties
There are two properties that are fundamental:
- **liveness:** something "good" must happen during execution
- **safeness:** anything "bad" doesn't happen during execution
Let me borrow an example from the seminal paper that defined these concepts,
using a **producer**/**consumer**:
it consists of a producer process which puts messages into a buffer and a
consumer process which removes the messages. We assume that the buffer can
hold at most \\(b\\) messages, \\(b\qeq1\\). The processes must be
synchronized so the producer doesn't try to put a message into a buffer if
there is no room for it, and the consumer doesn't try to remove a message
that has not yet been produced
Let's implement these entities using a buffer \\(B\\) of dimension \\(b\\) using as
references respectively to the sent and received messages with the variables
\\(s\\) and \\(r\\): when a producer wants to send a message it puts it into
\\(B[s\mod b]\\) and the consumers takes it from \\(B[r \mod b]\\).
```
while (1) {
while ((s - r) % b == 0 && r != s)
;
send_message(s);
s = s + 1;
}
```
The following diagrams explain the cfg of producer and consumer:

stateDiagram-v2
main: while (1)
head: s - r mod b = 0 and r != s
message: deliver message
cursor: s = s + 1
[*] --> main
main --> head
head --> message: false
message --> cursor
cursor --> main
head --> head: true

stateDiagram-v2
main: while (1)
head: s - r mod b = 0 and r == s
message: receive message
cursor: r = r + 1
[*] --> main
main --> head
head --> message: false
message --> cursor
cursor --> main
head --> head: true

Now, we define \\(n = s - r\\) as the number of messages in the buffer, to prove
the desired safety property we need to prove that \\(0\leq n\leq b\\) holds for
the execution of the entire program. Here program is the combined execution of
these two processes.
An **assertion** is a logical function of program variables and tokens positions
and it's said to be **invariant** if it's always true during the execution of
the program.
An **interpretation** of a program is an assignment of an assertion to each arc
of the cfg. We say that the program is in a **legitimate state** if each token
is on an arc whose interpretation is true. The interpretation is said to be
**invariant** if the program always remains in a legitimate state throughout its
execution.
So, under these definitions, proving the program correctness can be done
verifying the following conditions:
1. the program starts in a legitimate state
2. during execution it remains in a legitimate state, i.e. if at a certain
moment it's in a legitimate state, any execution step will leave it in a
legitimate state.
### Turing my beloved
Assume \\(\hbox{termination}(P)\\) would always terminate and returns true iff
\\(P\\) always terminates on all input data; the following statement is a
contradiction
$$
P = \mathsf{\bf while}(\hbox{termination}(P));
$$
### Structural deductive stateless prefix trace semantics
This is from chapter 6 of PoAI;
- **Stateless:** does not register the values of variables
- **Structural:** by induction on the program syntax
- **Deductive:** the semantics of each program component is specified by a
deductive system of axioms and rules of inference, a program execution being
a proof in such a system.
\\(\def\trace{\mathbb{T}}\\)
A **trace** is a finite sequence of configurations (labels) separated by
actions. We let \\(\trace^+\\) be the set of finite traces, \\(\trace^\infty\\)
be the set of infinite traces and
\\(\trace^{+\infty}\buildrel\triangle\over=\trace^+\cup\trace^\infty\\)
\\(\def\action#1{\xrightarrow{\hspace{1cm}\displaystyle{#1}\hspace{1cm}}}\\)
\\(\def\l#1{\vphantom{x}^{#1}}\\)
\\(\def\pijoin{\buildrel\frown\over\cdot}\\)
$$
\pi_1\l{l_1}\pijoin\l{l_2}\pi_2=\cases{
\pi_1\l{l_1}\pi_2 \,\hbox{when}\,\l{l_1}=\l{l_2}\cr
\hbox{undefined when }\,\l{l_1}\not=\l{l_2}\cr
}
$$
$$
\action{\pi_1}\underbrace{\hbox{at}[\\![\mathsf{S}]\\!]\action{\pi_2}l}_{\displaystyle\in\mathcal{S}^\star[\\![\mathsf{S}]\\!]\left( \pi_1\hbox{at}[\\![\mathsf{S}]\\!] \right)}
$$
Because \\(\mathbb{T}\longrightarrow\wp(\mathbb{T^\star})\\) is isomorph to
\\(\wp(\mathbb{T}^+\times\mathbb{T}^\star)\\) then we can write
$$
\mathcal{S}^{\hat\star}[\\![\mathsf{S}]\\!] = \left\\{\langle\pi,\pi^\prime\rangle\,\|\,\pi\in\trace^+\wedge\pi^\prime\in\mathcal{S}^\star[\\![\mathsf{S}]\\!]\pi\right\\}
$$
so that \\(\mathcal{S}^{\star}[\\![\mathsf{S}]\\!]\\) is the right image of \\(\mathcal{S}^{\hat\star}[\\![\mathsf{S}]\\!]\\)
### Maximal trace semantics
A trace is **maximal** when it's finite and represents an execution that
terminates or it is infinite and represents an execution that doesn't terminate.
Maximal finite traces of statements are prefix executions that exit the
statement.
$$
\mathcal{S}^{\infty}[\\![\mathsf{S}]\\!]\left( \pi^l \right) \buildrel\triangle\over= \lim\left( \mathcal{S}^{\star}[\\![\mathsf{S}]\\!]\left( \pi^l \right) \right)
$$
$$
\mathcal{S}^{+}[\\![\mathsf{S}]\\!]\left(\pi_1\hbox{at}[ \\![\mathsf{S}]\\!]\right)
\buildrel\triangle\over= \left\\{ \pi^{\phantom{2}l}_2\in\mathcal{S}^{\star}\[\\![\mathsf{S}]\\!\](\pi_1\hbox{at}[\\![\mathsf{S}]\\!])\,\|\,
\vphantom{\pi}^l=\hbox{after}[\\![\mathsf{S}]\\!]\vee\left(\hbox{escape}[\\![S]\\!] \wedge\vphantom{\pi}^l=\hbox{break-to}[\\![\mathsf{S}]\\!]\right)
\right\\}
$$
$$
\eqalign{
\mathcal{S}^{+\infty}[\\![\mathsf{S}]\\!]\left(\pi^l\right) & \buildrel\triangle\over= \mathcal{S}^{+}[\\![\mathsf{S}]\\!]\left(\pi^l\right) \cup \mathcal{S}^{\infty}[\\![\mathsf{S}]\\!]\left(\pi^l\right) \\\\
\mathcal{S}^{+\infty}[\\![\mathsf{S}]\\!]\Pi & \buildrel\triangle\over= \bigcup\left\\{ \mathcal{S}^{+\infty}[\\![\mathsf{S}]\\!]\left(\pi^l\right) \, \| \, \pi^l\in\Pi \right\\} \\\\
\mathcal{S}^{+\infty}[\\![\mathsf{S}]\\!] & \buildrel\triangle\over= \mathcal{S}^{+\infty}[\\![\mathsf{S}]\\!]\left( \mathbb{T}^+ \right) \\\\
\mathcal{S}^{+\infty}[\\![\mathsf{P}]\\!] & \buildrel\triangle\over= \mathcal{S}^{+\infty}[\\![\mathsf{S}]\\!]\left( \left\\{ \hbox{at}[\\![\mathsf{P}]\\!] \right\\}\right) \\\\
}
$$
## Subroutines
In our syntax is not included explicitely the execution of subroutine but this
should not be a problem since you can think of that as a sub-CFG implictily
defined by that instruction.
## CFG from instructions
In a more abstract sense some code is a sequence of instructions, in this
context I'm going to talk about **binary** code, i.e. code that is usually
represented and stored via bits and organized in a flat way.
To clarify: in general the following argument could apply also to source code
(in particular control flow and stuff) but it's a little more complex since
allow for nesting and it's explicit about the "intended" code, but here I'm more
interested in reconstructing it. Moreover, the compiler is doing in reverse what
in general a decompiler is trying to do.
The core point here is that, being the instructions laid out one after the
other, some of them must handle the control flow and these are the instructions
we are interested in studying, at least as appetizer: let's organize the
instruction based on their behaviour with respect to control flow
- **standard:** the large majority of instructions flow to the next instruction
- **jump:** unconditional jump
- **branch:** conditional jump if some condition holds, otherwise fallthrough
- **call:** invokes a function
- **return:** transfer back control to the caller
**Note:** we can create a taxonomy for classes of instructions, first with the
subdivision "standard"/"control flow", then the latter can be subdivided again
in "intra"/"extra"-procedural, where ``call``s and ``return``s create a
_context_ that is not present with ``jmp``s and ``branch``es.
If we abstract each instruction as a vertex and we set directed edges between
an instruction and its (possible) successors we obtain what is called a
**control flow graph** (CFG from now on), this is the next step after having
disassembled your code.
Take in mind that usually the analysis is done at function level, if you
encounter a ``call`` instruction, you treat it like a ``standard`` one, if you encounter
a ``return`` instruction you set the corresponding vertex as an end point.
Two particular vertices are the **entry point** and the **exit**, the first is
usually not explicitely indicated in binary code and it's a challenging problem
by itself to find it, the latter is usually represented by the ``return`` instruction.
Calling functions is not usually included in the CFG, that generates another
graph of its own, the **call graph**.
**Note:** obviously in real code you can have **signals** and **exceptions** or
**interrupts** that go against this reasoning.
**Note:** there are cases where a stream of bytes can have different
interpretation regarding the instructions in it, I'm looking at you ``x86``. In
particular, a jump can reach in the "middle" of another possible instruction
creating a different interpretation.
It's possible to transform a little the graph, grouping instructions in regions
having a single entry point and a single exit point: the so called **basic
blocks**.
The first instruction in them is named leader, the last instruction is the
tail. The leader is either the first instruction in a program, the target of a
jump, branch or call, or the fall-through instruction of a non-taken branch.
Instructions of type jump, branch, call and return cannot be followed by any
other instruction in a block.
So at the end, basic blocks are blocks of instructions connected by branches and
are a less grained representation of a CFG; loops are obvious in such representation.
## Intermediate language
But there is much more: in the last [post about reversing ``C++`` code with
``ghidra``](link://slug/reversing-c++-qt-applications-using-ghidra) I also
talked about the capabilities of the tools to "abstract away" the actual
instructions and via ``Pcode`` that is an intermediate language used by
``ghidra`` to have a common language for all the architecture it understands.
A lot of tools have an **intermediate language** (``IL``) or **intermediate
representation** (``IR``) used for their own purposes and here I'm talking about
tools that are not even reverse engineering tools: for example
- ``LLVM``
- ``Binary Ninja``
- ``radare2``
- ``IDA``
Whatever language you use to encode the binary instructions, exists a
representation the in invariant and that is used to extract core information
about the flow of execution.
### P-code
It uses definition in ``SLEIGH``, it's human readable; ``P-code`` is a Register
Transfer Language: each native instruction is translated in (possibly) multiple
p-code operations that takes parts of the processor state as input and output
variables (so called ``Varnode``s).
The p-code obtained from direct translation is referred to as **raw** but can be
further refined and exist additional operations that are not available
initially.
Take in mind that all p-code operations are associated with the address of the
original instruction they were translated from. For a single instruction, a
counter is used to enumerate the multiple p-code operations involved in the
translation. The address and this counter, as a pair, are referred with the name
of **sequence number**.
The core p-code defined are the following ([here the documentation](https://spinsel.dev/assets/2020-06-17-ghidra-brainfuck-processor-1/ghidra_docs/language_spec/html/pcoderef.html))
| Name | Arguments | Description |
|----------|-------------------|--------------------|
| ``COPY`` | ``input0 output`` | ``output = input`` |
| ``LOAD`` | ``input0 input1 output`` | ``output = *input`` or ``output = *[input0]input1`` |
| ``STORE`` | ``input0 input1 input2`` | ``*input1 = input2`` or ``*[input0] = input2`` |
| ``BRANCH`` | ``input0`` | ``goto input0`` |
| ``CBRANCH`` | ``input0 input1`` | ``if (input1) goto input0`` |
| ``BRANCHIND`` | ``input0`` | ``goto [input0]`` |
| ``CALL`` | ``input0 ...`` | ``call [input0]`` |
| ``CALLIND`` | ``input0 ...`` | ``call [input0]`` |
| ``RETURN`` | ``input0 ...`` | ``return [input0]`` |
```
PUSH RBP
(unique, 0xea00, 8) = COPY (register, 0x28, 8)
(register, 0x20, 8) = INT_SUB (register, 0x20, 8), (const, 0x8, 8)
STORE (const, 0x1b1, 8), (register, 0x20, 8), (unique, 0xea00, 8)
MOV RBP,RSP
(register, 0x28, 8) = COPY (register, 0x20, 8)
SUB RSP,0x20
(register, 0x200, 1) = INT_LESS (register, 0x20, 8), (const, 0x20, 8)
(register, 0x20b, 1) = INT_SBORROW (register, 0x20, 8), (const, 0x20, 8)
(register, 0x20, 8) = INT_SUB (register, 0x20, 8), (const, 0x20, 8)
(register, 0x207, 1) = INT_SLESS (register, 0x20, 8), (const, 0x0, 8)
(register, 0x206, 1) = INT_EQUAL (register, 0x20, 8), (const, 0x0, 8)
(unique, 0x12e80, 8) = INT_AND (register, 0x20, 8), (const, 0xff, 8)
(unique, 0x12f00, 1) = POPCOUNT (unique, 0x12e80, 8)
(unique, 0x12f80, 1) = INT_AND (unique, 0x12f00, 1), (const, 0x1, 1)
(register, 0x202, 1) = INT_EQUAL (unique, 0x12f80, 1), (const, 0x0, 1)
MOV dword ptr [RBP + local_1c],EDI
(unique, 0x3100, 8) = INT_ADD (register, 0x28, 8), (const, 0xfffffffffff
(unique, 0xbf00, 4) = COPY (register, 0x38, 4)
STORE (const, 0x1b1, 4), (unique, 0x3100, 8), (unique, 0xbf00, 4)
MOV qword ptr [RBP + local_28],RSI
(unique, 0x3100, 8) = INT_ADD (register, 0x28, 8), (const, 0xfffffffffff
(unique, 0xc000, 8) = COPY (register, 0x30, 8)
STORE (const, 0x1b1, 4), (unique, 0x3100, 8), (unique, 0xc000, 8)
MOV RAX,qword ptr [RBP + local_28]
(unique, 0x3100, 8) = INT_ADD (register, 0x28, 8), (const, 0xfffffffffff
(unique, 0xc000, 8) = LOAD (const, 0x1b1, 4), (unique, 0x3100, 8)
(register, 0x0, 8) = COPY (unique, 0xc000, 8)
ADD RAX,0x8
(register, 0x200, 1) = INT_CARRY (register, 0x0, 8), (const, 0x8, 8)
(register, 0x20b, 1) = INT_SCARRY (register, 0x0, 8), (const, 0x8, 8)
(register, 0x0, 8) = INT_ADD (register, 0x0, 8), (const, 0x8, 8)
(register, 0x207, 1) = INT_SLESS (register, 0x0, 8), (const, 0x0, 8)
(register, 0x206, 1) = INT_EQUAL (register, 0x0, 8), (const, 0x0, 8)
(unique, 0x12e80, 8) = INT_AND (register, 0x0, 8), (const, 0xff, 8)
(unique, 0x12f00, 1) = POPCOUNT (unique, 0x12e80, 8)
(unique, 0x12f80, 1) = INT_AND (unique, 0x12f00, 1), (const, 0x1, 1)
(register, 0x202, 1) = INT_EQUAL (unique, 0x12f80, 1), (const, 0x0, 1)
MOV RAX,qword ptr [RAX]
(unique, 0xc000, 8) = LOAD (const, 0x1b1, 4), (register, 0x0, 8)
(register, 0x0, 8) = COPY (unique, 0xc000, 8)
MOV RDI,RAX
(register, 0x38, 8) = COPY (register, 0x0, 8)
CALL ::atoi
(register, 0x20, 8) = INT_SUB (register, 0x20, 8), (const, 0x8, 8)
STORE (const, 0x1b1, 8), (register, 0x20, 8), (const, 0x10117b, 8)
CALL (ram, 0x101050, 8)
MOV dword ptr [RBP + index],EAX
(unique, 0x3100, 8) = INT_ADD (register, 0x28, 8), (const, 0xfffffffffff
(unique, 0xbf00, 4) = COPY (register, 0x0, 4)
STORE (const, 0x1b1, 4), (unique, 0x3100, 8), (unique, 0xbf00, 4)
MOV EAX,dword ptr [RBP + index]
(unique, 0x3100, 8) = INT_ADD (register, 0x28, 8), (const, 0xfffffffffff
(unique, 0xbf00, 4) = LOAD (const, 0x1b1, 4), (unique, 0x3100, 8)
(register, 0x0, 4) = COPY (unique, 0xbf00, 4)
(register, 0x0, 8) = INT_ZEXT (register, 0x0, 4)
CMP EAX,0x2
(register, 0x200, 1) = INT_LESS (register, 0x0, 4), (const, 0x2, 4)
(register, 0x20b, 1) = INT_SBORROW (register, 0x0, 4), (const, 0x2, 4)
(unique, 0x29000, 4) = INT_SUB (register, 0x0, 4), (const, 0x2, 4)
(register, 0x207, 1) = INT_SLESS (unique, 0x29000, 4), (const, 0x0, 4)
(register, 0x206, 1) = INT_EQUAL (unique, 0x29000, 4), (const, 0x0, 4)
(unique, 0x12e80, 4) = INT_AND (unique, 0x29000, 4), (const, 0xff, 4)
(unique, 0x12f00, 1) = POPCOUNT (unique, 0x12e80, 4)
(unique, 0x12f80, 1) = INT_AND (unique, 0x12f00, 1), (const, 0x1, 1)
(register, 0x202, 1) = INT_EQUAL (unique, 0x12f80, 1), (const, 0x0, 1)
JBE LAB_0010119c
(unique, 0xca00, 1) = BOOL_OR (register, 0x200, 1), (register, 0x206, 1)
CBRANCH (ram, 0x10119c, 8), (unique, 0xca00, 1)
LEA RAX,[s_forbidden!_0010200d]
(register, 0x0, 8) = COPY (const, 0x10200d, 8)
MOV RDI=>s_forbidden!_0010200d,RAX
(register, 0x38, 8) = COPY (register, 0x0, 8)
CALL ::puts
(register, 0x20, 8) = INT_SUB (register, 0x20, 8), (const, 0x8, 8)
STORE (const, 0x1b1, 8), (register, 0x20, 8), (const, 0x101195, 8)
CALL (ram, 0x101030, 8)
MOV EAX,0x1
(register, 0x0, 8) = COPY (const, 0x1, 8)
JMP LAB_001011d0
BRANCH (ram, 0x1011d0, 8)
MOV EAX,dword ptr [RBP + index]
(unique, 0x3100, 8) = INT_ADD (register, 0x28, 8), (const, 0xfffffffffff
(unique, 0xbf00, 4) = LOAD (const, 0x1b1, 4), (unique, 0x3100, 8)
(register, 0x0, 4) = COPY (unique, 0xbf00, 4)
(register, 0x0, 8) = INT_ZEXT (register, 0x0, 4)
CDQE
(register, 0x0, 8) = INT_SEXT (register, 0x0, 4)
LEA RDX,[RAX*0x8]
(unique, 0x3680, 8) = INT_MULT (register, 0x0, 8), (const, 0x8, 8)
(register, 0x10, 8) = COPY (unique, 0x3680, 8)
LEA RAX,[messages]
(register, 0x0, 8) = COPY (const, 0x104030, 8)
MOV RAX=>messages,qword ptr [RDX + RAX*0x1]
(unique, 0x3300, 8) = INT_MULT (register, 0x0, 8), (const, 0x1, 8)
(unique, 0x3400, 8) = INT_ADD (register, 0x10, 8), (unique, 0x3300, 8)
(unique, 0xc000, 8) = LOAD (const, 0x1b1, 4), (unique, 0x3400, 8)
(register, 0x0, 8) = COPY (unique, 0xc000, 8)
MOV RSI,RAX
(register, 0x30, 8) = COPY (register, 0x0, 8)
LEA RAX,[s_message:_'%s'_00102018]
(register, 0x0, 8) = COPY (const, 0x102018, 8)
MOV RDI=>s_message:_'%s'_00102018,RAX
(register, 0x38, 8) = COPY (register, 0x0, 8)
MOV EAX,0x0
(register, 0x0, 8) = COPY (const, 0x0, 8)
6_c32105f2
CALL ::printf
(register, 0x20, 8) = INT_SUB (register, 0x20, 8), (const, 0x8, 8)
STORE (const, 0x1b1, 8), (register, 0x20, 8), (const, 0x1011cb, 8)
CALL (ram, 0x101040, 8)
MOV EAX,0x0
(register, 0x0, 8) = COPY (const, 0x0, 8)
LEAVE
(register, 0x20, 8) = COPY (register, 0x28, 8)
(register, 0x28, 8) = LOAD (const, 0x1b1, 8), (register, 0x20, 8)
(register, 0x20, 8) = INT_ADD (register, 0x20, 8), (const, 0x8, 8)
RET
(register, 0x288, 8) = LOAD (const, 0x1b1, 8), (register, 0x20, 8)
(register, 0x20, 8) = INT_ADD (register, 0x20, 8), (const, 0x8, 8)
RETURN (register, 0x288, 8)
```
## Ghidra's decompiler implementation
On Debian gradle is at version 4 but it requires version 7 :P
https://github.com/NationalSecurityAgency/ghidra/blob/master/DevGuide.md
export PATH=/opt/gradle-7.5.1/bin:$PATH
export JAVA_HOME=$(readlink -f /usr/bin/javac | sed "s:/bin/javac::")
be sure to have openjdk17
These are my personal notes about the implementation of the decompiling process in ``ghidra``,
it's not authoritative and could be all a fever dream from my side, take it with
a grain of salt.
First of all, ``ghidra`` is built using two different languages, ``C++`` for the
core analysis and ``Java`` for firther manipulation and the application itself:
user interface, plugins etc...
The decompiler it's written in ``C++``, it's possible to compile it separately
using the following encantation
```
$ cd Ghidra/Features/Decompiler/src/decompile/cpp
$ make decomp_dbg
$ export SLEIGHHOME=~/git/ghidra/
$ ./decomp_dbg
[decomp]>
```
but ``ghidra`` itself calls it via the ``DecompInterface`` class that in reality
uses a client in ``C++`` located at ``Ghidra/Features/Decompiler/src/decompile/cpp/ghidra_process.cc``
that registers the following commands
- ``registerProgram``
- ``deregisterProgram``
- ``flushNative``
- ``decompileAt``: decompile a specific function
- ``structureGraph``
- ``setAction``
- ``setOptions``
(if you want to know more details you can look at the source code that has a lot
of comments); you can find [here](https://github.com/NationalSecurityAgency/ghidra/blob/6842712129b8da45077bb8c5049e607d685f4dea/Ghidra/Features/Decompiler/src/main/java/ghidra/app/decompiler/DecompInterface.java#L748)
where it calls ``decompileAt``
The communication is done via ``XML`` with the actual decompiler process that is implemented
``Ghidra/Features/Decompiler/src/decompile/cpp/ghidra_process.cc`` is the client
The actual binary is named ``decompile`` (at ``./Ghidra/Features/Decompiler/build/os/linux_x86_64/decompile``)
the command line decompiler is implemented in
``Ghidra/Features/Decompiler/src/decompile/cpp/consolemain.cc``
Obviously you need to [``registerProgram``](https://github.com/NationalSecurityAgency/ghidra/blob/6842712129b8da45077bb8c5049e607d685f4dea/Ghidra/Features/Decompiler/src/main/java/ghidra/app/decompiler/DecompileProcess.java#L414)
before
```java
protected void initializeProcess() throws IOException, DecompileException {
if (decompCallback == null) {
throw new IOException("Program not opened in decompiler");
}
if (decompProcess == null) {
decompProcess = DecompileProcessFactory.get();
}
else if (!decompProcess.isReady()) {
DecompileProcessFactory.release(decompProcess);
decompProcess = DecompileProcessFactory.get();
}
long uniqueBase = UniqueLayout.SLEIGH_BASE.getOffset(pcodelanguage);
XmlEncode xmlEncode = new XmlEncode();
pcodelanguage.encodeTranslator(xmlEncode, program.getAddressFactory(), uniqueBase);
String tspec = xmlEncode.toString();
xmlEncode.clear();
dtmanage.encodeCoreTypes(xmlEncode);
String coretypes = xmlEncode.toString();
SleighLanguageDescription sleighdescription =
(SleighLanguageDescription) pcodelanguage.getLanguageDescription();
ResourceFile pspecfile = sleighdescription.getSpecFile();
String pspecxml = fileToString(pspecfile);
xmlEncode.clear();
compilerSpec.encode(xmlEncode);
String cspecxml = xmlEncode.toString();
decompCallback.setNativeMessage(null);
decompProcess.registerProgram(decompCallback, pspecxml, cspecxml, tspec, coretypes,
program);
```
* @param pspecxml = string containing .pspec xml
* @param cspecxml = string containing .cspec xml
* @param tspecxml = XML string containing translator spec
* @param coretypesxml = XML description of core data-types
./Ghidra/Processors/x86/data/languages/x86.pspec
./Ghidra/Processors/x86/data/languages/x86.ldefs
./Ghidra/Processors/x86/data/languages/x86.slaspec
./Ghidra/Processors/x86/data/languages/x86.opinion
./Ghidra/Processors/x86/data/languages/x86.sla
/opt/ghidra-github/Ghidra/Processors/x86/data/languages/x86-64.pspec
cspec
tspec
coretypes
```
void RegisterProgram::rawAction(void)
ghidra = new ArchitectureGhidra(pspec,cspec,tspec,corespec,sin,sout);
ghidra->init(store);
```
![](/images/symbolic/AddrSpaceManager-hierarchy.png)
## Theorems
Now, the title of this section can be puzzling, why theorems? remember when I
told you that every basic block controlling the flow of the code has a logic
condition at end? this means that a continous path from vertex \\(A\\) to vertex
\\(B\\) is practically "building" a formula using boolean conditions.
Let me step back a little to explain some maths to you
- ``SAT``: Boolean Satisfability Problem, i.e. determining if there exists an
interpretation that satisfies a given boolean formula.
- ``SMT``: Satisfability Modulo Theory, an extension of ``SAT``
- ``BMC``: Bounded Model Checking
- [Introduction to SMT with Z3](https://www.iaik.tugraz.at/wp-content/uploads/2019/09/Introduction-to-SMT-with-Z3.pdf)
## Experiment with ghidra
Since I'm interested to practical applications but I want to be able to use this
mechanism directly while I'm reverse engineering code with ``ghidra``, I want a
way to "lift" the ``Pcode`` as logical expression and work on that.
It's not a difficult task, I mean, ``Pcode`` has not many instructions and it's
a tedious but doable task, for example [maat does that](https://github.com/trailofbits/maat/blob/master/src/ir/cpu.cpp#L471)
but it's ``C++`` but fortunately for me someone has done that in ``Java``, a tool named,
[GhiHorn](https://github.com/CERTCC/kaiju/tree/fbefa018eb93e4821aa9eadd7e34f073a2efde8f/src/main/java/kaiju/tools/ghihorn),
thanks to the magic of Jython I can use directly from the console since the
classes are defined as ``public``, take note that here however we are using the
``Java``'s APIs
([here](https://z3prover.github.io/api/html/namespacecom_1_1microsoft_1_1z3.html)
the documentation)
```
>>> from kaiju.tools.ghihorn.decompiler import GhiHornSimpleDecompiler
>>> from kaiju.tools.ghihorn import GhiHornifierBuilder
>>> from kaiju.tools.ghihorn.api import GhiHornApiDatabase
>>> apiDatabase = GhiHornApiDatabase(GhiHornSimpleDecompiler())
>>> hornifier = SimpleHornifier("kebab")
>>> hornifier.setDecompiler(GhiHornSimpleDecompiler())
>>> hornifier.setEntryPoint(toAddr(0x0101149))
>>> hornifier.setApiDatabase(apiDatabase)
>>> hornifier.hornify(currentProgram, ConsoleTaskMonitor())
```
```
/**
* Expressions that represent a p-code operation. They basically have the form:
*
* out = OP(in...)
*/
public class PcodeExpression implements HornExpression {
...
}
```
If I place the cursor on the ``uvar1 < 4`` block of this code
```c
int main(int argc,char **argv)
{
uint uVar1;
int jim;
uVar1 = atoi(argv[1]);
if (uVar1 < 4) {
puts("!");
}
return 0;
}
```
I can extract the logic expression from the ``Pcode``
```
>>> from kaiju.tools.ghihorn.hornifer.horn.expression import PcodeExpression
>>> currentLocation.getToken().getPcodeOp()
(unique, 0xcb80, 1) INT_LESS (register, 0x0, 4) , (const, 0x4, 4)
>>> pcode = currentLocation.getToken().getPcodeOp()
>>> expr = PcodeExpression(pcode)
>>> expr.getOperation()
uVar1@main!cast_argv < const=4
```
seems pretty ok, I want to investigate a little more, in particular I'm
interested to see if it's encoded as an **unsigned** operation
```
>>> type(expr)
>>> expr.getComponents()
array(kaiju.tools.ghihorn.hornifer.horn.expression.HornExpression, [uVar1@main!cast_argv, const=4])
>>> expr.getComponents()[0]
uVar1@main!cast_argv
>>> var = expr.getComponents()[0]
>>> var.getDataType()
kaiju.tools.ghihorn.z3.GhiHornBitVectorType@6385652a
>>> expr.getOperation().getClass()
```
and indeed, from the last line, it is!
Now I want to use it
```
>>> from kaiju.tools.ghihorn.z3 import GhiHornContext
>>> ctx = GhiHornContext()
>>> solver = ctx.mkSolver()
>>> type(solver)
>>> instance = expr.instantiate(ctx)
>>> instance
(= bVar148@main!cast_argv (bvult uVar1@main!cast_argv #x0000000000000004))
>>> type(instance)
>>> solver.add(instance)
>>> solver.check()
SATISFIABLE
>>> model = solver.getModel()
>>> model
(define-fun uVar1@main!cast_argv () (_ BitVec 64)
#x0000000000000000)
(define-fun bVar148@main!cast_argv () Bool
true)
```
Now, listen, I know that ``Java`` is shit so I'm teaching a neat trick to export
all, from ``Java``
```
>>> print solver.toString()
(declare-fun bVar145@main!array () Bool)
(declare-fun uVar1@main!array () (_ BitVec 64))
(assert (= bVar145@main!array (bvult uVar1@main!array #x0000000000000003)))
(model-add bVar145@main!array
()
Bool
(not (bvule #x0000000000000003 uVar1@main!array)))
```
you can import into ``python``
```
In [1]: import z3
In [2]: s = z3.Solver()
In [3]: s.from_string("(declare-fun bVar145@main!array () Bool)\n(declare-fun uVar1@main!array () (_ BitVec 64))\n(assert (= bVar145@main!array (bvult uVar1@main!array #x0000000000000003)))\n(model-add bVar145@main!array\n ()\n ...: Bool\n (not (bvule #x0000000000000003 uVar1@main!array)))\n")
In [4]: print(s)
[bVar145@main!array == ULT(uVar1@main!array, 3)]
```
## Ideas
- Maybe having a distribution of the input and looking for the element in the tail
of such distribution. Note here the "right" or "compliant" input is tricky:
in theory everything accepted is compliant.
- Try to build a finite state machine from the CFG
```c
struct {
size_t length;
char payload[];
} miao_t
void do_something(char* buffer, miao_t* m) {
size_t length = m->length - sizeof(m->head);
memcpy(buffer, m->payload, length);
}
```
here, with the assumption that the attacker controls the ``m`` variable in its
entirety,
the obvious vulnerability is that if ``m->length`` is less than the size in
bytes of the field ``head`` then an underflow happens.
**But** if the attacker controls the two arguments, i.e. the **two pointers**
now you have a read/write primitive :)
## Links
- [Abstract Interpretation spring 2005 course](http://web.mit.edu/afs/athena.mit.edu/course/16/16.399/www/) with slides
- [Awesome symbolic execution](https://github.com/ksluckow/awesome-symbolic-execution) github repo
- [Why Horn Formulas Matter in Computer Science: Initial Structures and Generic Examples](https://core.ac.uk/download/pdf/82190596.pdf)
- [Z3 Python tutorials](https://ericpony.github.io/z3py-tutorial/)
- [SAT/SMT by example](https://sat-smt.codes/SAT_SMT_by_example.pdf)
- [Dynamic Symbolic Execution for Software Analysis](https://gtmfs2020.github.io/media/gtmfs2020_slides_Cadar.pdf)
- [LLBMC — The Low-Level Bounded Model Checker](https://llbmc.org/)
- [Constrained Horn Clauses (CHC)](https://ece.uwaterloo.ca/~agurfink/ece750t29f18/assets/pdf/07_CHC_LIA.pdf) from Automated Program Verification (APV) Fall 2018
- [Identifying Loops In Almost Linear Time](https://dl.acm.org/doi/pdf/10.1145/316686.316687)
- [Practical Dynamic Reconstruction of Control Flow Graphs](https://homepages.dcc.ufmg.br/~fernando/publications/papers/RimsaSPE20.pdf)
- [A Survey of Symbolic Execution Techniques](https://arxiv.org/pdf/1610.00502.pdf)
- [Regular Property Guided Dynamic Symbolic Execution](https://zbchen.github.io/Papers_files/icse2015.pdf)
- [Symbolic Verification of Regular Properties](https://zbchen.github.io/Papers_files/icse2018-1.pdf)
- [ESP: Path-Sensitive Program Verification in Polynomial Time](https://web.eecs.umich.edu/~weimerw/590/reading/dls-pldi02.pdf)
- [Algorithms for Computing the Static Single Assignment Form](https://iss.oden.utexas.edu/Publications/Papers/JACM2003.pdf)
- [Efficiently Computing Static Single Assignment Form and the Control Dependence Graph](https://www.cs.utexas.edu/~pingali/CS380C/2010/papers/ssaCytron.pdf)
- [Static Program Analysis](https://cs.au.dk/~amoeller/spa/spa.pdf)
- static program analysis technique of [Abstract Interpretation](https://wiki.mozilla.org/Abstract_Interpretation) by Mozilla
- [Abstract Interpretation](https://pages.cs.wisc.edu/~horwitz/CS704-NOTES/10.ABSTRACT-INTERPRETATION.html)
- [Abstract Interpretation and Abstract Domains](https://www.dsi.unive.it/~avp/domains.pdf)
- [An abstract interpretation-based deobfuscation plugin for ghidra](https://www.msreverseengineering.com/blog/2019/4/17/an-abstract-interpretation-based-deobfuscation-plugin-for-ghidra)
- [RolfRolles/GhidraPAL](https://github.com/RolfRolles/GhidraPAL) Ghidra Program Analysis Library
- [Concrete interpretation of Chess](https://static1.squarespace.com/static/53a64cc2e4b0c63fc41a3320/t/5a94f02824a694ce120900d0/1519710249860/SBM-Concrete+Interpretation+of+Chess.pdf)
- [Abstract intepretation of Chess](https://static1.squarespace.com/static/53a64cc2e4b0c63fc41a3320/t/5a94f03853450a67004c973c/1519710266003/SBM-Abstract+Interpretation+of+Chess.pdf)
- Be a binary rockstar ([video](https://www.youtube.com/watch?v=jqyoDlGyao4))
- [Look out! Divergent representations are everywhere!](https://blog.trailofbits.com/2022/11/10/divergent-representations-variable-overflows-c-compiler/)
- [BINSEC](https://binsec.github.io/) binary code analysis for security
- [Proving the security of software-intensive embedded systems by abstract interpretation](https://hal.inria.fr/tel-03127921/document), PhD thesis by Marc Chevalier where a more realistic
semantics of C is defined.
- [Finding JIT Optimizer Bugs using SMT Solvers and Fuzzing](https://www.pypy.org/posts/2022/12/jit-bug-finding-smt-fuzzing.html) in ``PyPy``.
- [gtirb](https://github.com/GrammaTech/gtirb) Intermediate Representation for Binary analysis and transformation
- [ddisasm](https://github.com/GrammaTech/ddisasm) A fast and accurate disassembler
- [Ghidraton](https://www.mandiant.com/resources/blog/ghidrathon-snaking-ghidra-python-3-scripting), Snaking Ghidra with Python 3 Scripting
- [Hyperproperties](https://www.cs.cornell.edu/fbs/publications/Hyperproperties.pdf), paper by Michael R. Clarkson and Fred B. Schneider
- [IKOS](https://github.com/NASA-SW-VnV/ikos) Static analyzer for C/C++ based on the theory of Abstract Interpretation by NASA ([slide](https://ntrs.nasa.gov/api/citations/20170009851/downloads/20170009851.pdf))
- [Proving the correctness of multiprocess programs](https://lamport.azurewebsites.net/pubs/proving.pdf), papers where "liveness" and "safeness" were defined
- [“Static Analysis by Abstract Interpretation and Decision Procedures”](https://hal.archives-ouvertes.fr/tel-01102418/document) PhD thesis by Julien Henry and [PAGAI](https://pagai.gricad-pages.univ-grenoble-alpes.fr/), a tool which does Path Analysis for invariant Generation by Abstract Interpretation