radare2/libr/anal
Kenny MacDermid e04b82059a Fix generated ESIL for AVR flags. (#7852)
A typo of `__generic_sub_update_flags_rr` meant that the generated
code contained the immediate as a register, resulting in the flags
not being set correctly.

After switching this to `_rk` the format of the immediate values were
still incorrect because they were being output as hex without the `0x`
prefix. This was changed to output decimal instead, as that matches the
format of the result value when doing operations like `cpi`. For
example:

    cpi r26, 0x2e
    46,r26,-,r26,0x08,&,!,46,0x08,&...

The format for the `__generic_add_update_flags` was also fixed, but
as there is currently no `_rk` version it doesn't affect anything.
2017-07-03 01:46:47 +02:00
..
arch/gb Move lua53 plugins to radare2-extras, available via r2pm 2017-04-05 10:16:50 +02:00
d Correcting typos in type database 2017-04-15 20:03:23 +02:00
p Fix generated ESIL for AVR flags. (#7852) 2017-07-03 01:46:47 +02:00
anal_ex.c Remove configure-plugins dependency for the make meson 2017-05-26 02:43:53 +02:00
anal.c Remove configure-plugins dependency for the make meson 2017-05-26 02:43:53 +02:00
bb.c Fix #6990 - crash in r_anal_bb_offset_inst 2017-03-12 22:37:17 +01:00
cc.c Load types and cc info on asm.arch change 2016-08-16 11:59:34 +02:00
cond.c Fix #3286 - Use stdbool.h 2016-07-12 22:15:19 +02:00
cycles.c Update indentation in some more random files 2015-12-14 14:32:18 +01:00
data.c Fix #6428 - Honor scr.color in ad command 2017-04-16 11:41:27 +02:00
diff.c Fix undefined behaviour introduced after fix in regression 2017-04-18 17:22:32 +02:00
esil2reil.c fixes segfault in aetr 2017-05-02 17:43:37 +02:00
esil_stats.c Support for modifying the incoming value in operation RAnalEsilCallbacks::hook_reg_write(). (#5977) 2016-10-18 16:59:38 +02:00
esil_trace.c Bump sdb to fix hash collision issues 2017-01-14 22:02:33 +01:00
esil.c Fix last covs 2017-06-18 01:11:11 +02:00
fcn.c Fix function detection with NOP (#7691) 2017-06-07 01:53:59 +02:00
fcnstore.c Use ht_* from sdb in fcnstore.c 2017-02-24 23:42:17 +01:00
flirt.c Remove fcn->vars and reindent anal/fcn.c var.c and flirt.c 2017-03-09 23:46:02 +01:00
hint.c Improved MSVC support (WIP) 2017-05-09 14:25:57 +02:00
labels.c Fix minor issues by clang-analyzer (#7303) 2017-04-18 14:03:42 +02:00
Makefile Refix mingw32 build 2017-05-10 00:34:05 +02:00
meson.build support for static build (#7822) 2017-06-28 22:54:40 +02:00
meta.c Fix help for /A 2017-05-22 01:08:54 +02:00
op.c Honor MMX and SSE op.family for x86.cs 2017-05-22 00:56:24 +02:00
pin.c commented unused variable (libr/anal/pin.c) (#7714) 2017-06-11 02:13:19 +02:00
README
README.meta
ref.c Avoid null bytes in axt output 2016-02-09 11:38:55 -06:00
reflines.c Handle ^C in reflines 2017-03-16 22:57:55 +01:00
sign.c Remove trailing space 2017-06-03 14:43:31 +02:00
state.c Remove configure-plugins dependency for the make meson 2017-05-26 02:43:53 +02:00
switch.c Fix memleaks in av, lot of anal code cleanup and do not always allocate bb->diff 2016-08-22 18:32:18 +02:00
types.c TCC - Saving types across sessions and add Arch/OS specific defines 2017-06-01 16:57:00 +02:00
value.c Fix #6162 - Renames r_str_concat to r_str_append 2017-03-16 22:29:49 +01:00
var.c Improved MSVC support (WIP) 2017-05-09 14:25:57 +02:00
xrefs.c Implement axq 2017-06-17 02:48:25 +02:00

x86 opcodes: http://en.wikipedia.org/wiki/X86_instruction_listings

NOTE: Most of the information in this document is not matching with reality.
      Take it as random ideas, proposals and so on

Code analysis module
====================
* Opcodes that will be executed depending on cond?
  - for example: (x86, arm..) (0f94c2  setz dl)
* Direction of the stack? (inc/dec) required?
* Register value source type
  - This is static entropy level for a register at some point
  - Constant value
    mov eax, 33
    mov eax, [const] ; from ro memory
      static_entropy = 0;
  - Variable
    mov eax, [rwmem] ; from rw memory (variable)
      static_entropy = 1;
  - Modification
    add eax, ebx ; from rw memory (variable)
      static_entropy ++;

  * At any point of the program we can determine if a register
    has a static fixed value or the level of possible polimorfism

  -- store register values in execution traces

////////////////////////////////////////

Global picture

(anal) -> can keep track of results of different context (functions ...)
  |
  `---> we get a context.. so we work there with
       (anal context owns stack, regs, ...)
     - able to detect function arguments
     - we can configure the context in a way or other
     - it is able to get info from global anal
     - feeded with bytes

r_anal_get_bb(an, 0x804800);
r_anal_op_t * op = r_anal_get_op(an, 0x804800);
r_anal_get_fun(an, 0x804800);

----------------------------------------

// Must use r_alloc_pool for every type of structure (per function level)
// Must store all this info using r_db
// Only index when requested (tempral analysis are temporal)
// Memory selectors are just modifiers .. how?
// How to handle with self-modifying code?
  - if its a conditional branch, refs are true , false
    - if not and there is more than one branch is all the possibilities
  - if an address is accessed in read|write and exec mode we should warn!
  xrefs[] = {
    addr = 0x8048480
    type = R|W|X  - executable xrefs are control flow branches,
                  - read/write are for data
  }
  refs[] = {
    op   = eq,add,mul ??
    reg  = regidx
    addr = 0x8048580
    type = R|W|X
  }

// we need an api in r_buf to modify bits with endian and values..
struct bin {
  int offset;
  int size;
  int endian;
};

enum type {
  IMM
  REG
  MEM
};

struct r_anal_value_t {
  int op; // NOP, ADD, SEL, ...
  int type; // opcode, reg, imm, addr
  ut64 num; // idxofreg, immvalue, addrnum
  struct bin bin;
  int size;
  int nextop; // ADD, MUL, ...
  struct r_anal_value_t *next;
};

struct arg {
  int rw; // READ | WRITE direction
  int nv; // number of values
  struct r_anal_value_t *v;
};

mov eax, [0x8048+eax*4]

  mov -> args = { "eax", {0x8048 {+eax*4}} }

struct r_anal_ref_t {
  int type; // READ, WRITE, EXEC
  struct r_anal_value_t value;
};

struct r_anal_op_t {
  ut64 addr;
  int frame;
  int type;
  int cond;
  int nestlevel;
  int length;
  int crc;
  struct r_anal_value_t rep;
  int nargs;
  struct arg args[];

  struct r_anal_op_t *next;
  int nrefs;
  struct r_anal_ref_t refs[];
  int nxrefs;
  struct r_anal_ref_t xrefs[];
};

/* basic block */
struct r_anal_bb_t {
  ut64 addr;
  int type;
  int size;
  ut8 *bytes;
  struct r_anal_op_t *head; // opcode heading this basic block
  struct r_anal_ref_t refs[];
  struct r_anal_ref_t xrefs[];
};

/* function */
struct r_anal_fun_t {
  char *name;
  ut64 addr;
  int size;
  // XXX: use r_ranges instead of addr+size?
  struct r_anal_ref_t refs[];
  struct r_anal_ref_t xrefs[];
};

/* used to emulate */
struct r_anal_arch_t {
  struct r_reg_t reg;
  char **regs;
  int pc; // program counter
  int sp; // stack pointer
  int bp; // base pointer
  int gp; // global pointer
  int sr; // src
  int dr; // dst
};

const char **regs = { "eax", "ebx", "ecx", "...", NULL };

  if (opcode.xrefs[i].type & R_ANAL_XS_EXEC)

// compilation process defines a mapping between the binary representation
// of an opcode into an AST of structs describing the opcode itself or
// we can just serialize it into a evaluable string
// - evaluable strings are cheaper in memory consumption
// - strstr(es, "%eax") easy way to check if a register is used
// - the eval string should be converted into an AST at some point

Analysis levels:
================
- opcode level
  - frame size
  - conditional (used by branches(jumps) and arm opcodes)
  - weight (importance) (if <0, it is a nop) trash detection
  - XXX file/line (dwarf nfo??? here) i think no
  - lifetime of register value (detect if
  - nesting level (branch analysis)
  - sign
  - type
  -- operand level:
     - bitsize
     - mem | reg | imm
       - value
     - direction (read|write)
     - operand index
- basic block level
  - bytes + length + (checksum?)
  - type (head, tail, body, last)
  - xrefs (branches to here)
  - refs (must be an array)
    - true branch
    - false branch
    - destinations[] // for call eax and so
- function level
  - name
  - offset range (r_range here, functions do not need to be linear)
  - variables (use r_var) (( merge r_var here? ))
  - arguments ("")
  - xrefs
  - calls (outrefs)
  == graph simplification (serialize blocks with direct branches (jmp))
- program
  - comprends data + code trees
  - all references must be stored twice
  - r_range of functions, data and other shit

Context analysis:
=================
- Merge r_vm here -- multiarchitecture code emulation
- Allows to track register lifetime,
- Detect possible values for 'call eax' f.ex
- Identify fake conditional branches

TEH RIR
=======
The radare intermediate representation.
 - ascii representation of opcode level analysis

-- epilog/prolog bytez for extra function detection

Architecture language
=====================
Allows to describe an architecture (byte parsing, read/write)
 - opcode reassembling
 - automatic code analysis
 r_anal_opcode_set(op, R_OPTYPE_ADD);
 - opcode level analysis can be manually modified in runtime
   - basic blocks can change

Decompilation
=============
Use ALT .. in a inverse way OMG thats freaking

/////////////////////////////////////////////////////////////

opcode_analyze ()
  - parse bytes and fill an structure
  - opcode type and arguments
  - underlying vm code
opcode_modify ()
  - modify the bytes based on the structure changes
  - the structure should expose the bit level info to make this possible
   // this is //
   * modify reg, immediate or memory values

+--------------+
| AnalArchLang | **
+--------------+
if [arg0 == 0xff] {
	reg = { eax, ecx, edx, ebx, esp, ebp, esi, edi }
	jmp [0xe0+reg]
	jmp [0xe8+reg]

	reg = { eax, ecx, edx, ebx, esp, ebp, esi, edi }
	push [0xf0+reg]

	reg = { eax, ecx, edx, ebx, esp, ebp, esi, edi }
	call [0xd0+reg]
	call [0xd8+reg]
}

[0:7]=e8 {
  type = "call"
  addr = [8:31]
  len = 5
}
[0:7]=50 && [0:7]<60 {
  type = "push"
  len = 1
}
[0:7]=c3 {
  type = "ret"
  len = 1
}

BASIC OPS we need for the IR
============================ -- this is RISC! :D

Each opcode must support a size value. The format is:
We need some intermediate temporal registers


lispy assembly:
  (addi eax 3)
  (addi *(+ eax 8) 3)

  lea edi, [ecx*4-0x4]
  (set edi (- (* ecx 4) 4)
  (set edi (* ecx 4 - 4)) ; iterative format

     1 byte       1           N       N
  [ opcode ] [ type|size ] [ arg ] [ arg ]

type = [ op | reg | mem | imm ]  ; 2 bits is enought
size = 1, 2, 4, 8           ; byte level

ADD reg, reg
SUB reg, reg
JMP reg
JMP imm
JMP mem
SET reg, imm
STO mem, reg   ; store register value into memory
LOA reg, mem   ; load memory value into register
 ...