mirror of
https://github.com/xemu-project/xemu.git
synced 2025-02-03 18:53:12 +00:00
update
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4581 c046a42c-6fe2-441c-8c8c-71466251a162
This commit is contained in:
parent
b314f2706b
commit
0a6b7b7813
119
tcg/README
119
tcg/README
@ -16,14 +16,18 @@ from the host, although it is never the case for QEMU.
|
|||||||
|
|
||||||
A TCG "function" corresponds to a QEMU Translated Block (TB).
|
A TCG "function" corresponds to a QEMU Translated Block (TB).
|
||||||
|
|
||||||
A TCG "temporary" is a variable only live in a given
|
A TCG "temporary" is a variable only live in a basic
|
||||||
function. Temporaries are allocated explicitly in each function.
|
block. Temporaries are allocated explicitly in each function.
|
||||||
|
|
||||||
A TCG "global" is a variable which is live in all the functions. They
|
A TCG "local temporary" is a variable only live in a function. Local
|
||||||
are defined before the functions defined. A TCG global can be a memory
|
temporaries are allocated explicitly in each function.
|
||||||
location (e.g. a QEMU CPU register), a fixed host register (e.g. the
|
|
||||||
QEMU CPU state pointer) or a memory location which is stored in a
|
A TCG "global" is a variable which is live in all the functions
|
||||||
register outside QEMU TBs (not implemented yet).
|
(equivalent of a C global variable). They are defined before the
|
||||||
|
functions defined. A TCG global can be a memory location (e.g. a QEMU
|
||||||
|
CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
|
||||||
|
or a memory location which is stored in a register outside QEMU TBs
|
||||||
|
(not implemented yet).
|
||||||
|
|
||||||
A TCG "basic block" corresponds to a list of instructions terminated
|
A TCG "basic block" corresponds to a list of instructions terminated
|
||||||
by a branch instruction.
|
by a branch instruction.
|
||||||
@ -32,11 +36,11 @@ by a branch instruction.
|
|||||||
|
|
||||||
3.1) Introduction
|
3.1) Introduction
|
||||||
|
|
||||||
TCG instructions operate on variables which are temporaries or
|
TCG instructions operate on variables which are temporaries, local
|
||||||
globals. TCG instructions and variables are strongly typed. Two types
|
temporaries or globals. TCG instructions and variables are strongly
|
||||||
are supported: 32 bit integers and 64 bit integers. Pointers are
|
typed. Two types are supported: 32 bit integers and 64 bit
|
||||||
defined as an alias to 32 bit or 64 bit integers depending on the TCG
|
integers. Pointers are defined as an alias to 32 bit or 64 bit
|
||||||
target word size.
|
integers depending on the TCG target word size.
|
||||||
|
|
||||||
Each instruction has a fixed number of output variable operands, input
|
Each instruction has a fixed number of output variable operands, input
|
||||||
variable operands and always constant operands.
|
variable operands and always constant operands.
|
||||||
@ -44,14 +48,12 @@ variable operands and always constant operands.
|
|||||||
The notable exception is the call instruction which has a variable
|
The notable exception is the call instruction which has a variable
|
||||||
number of outputs and inputs.
|
number of outputs and inputs.
|
||||||
|
|
||||||
In the textual form, output operands come first, followed by input
|
In the textual form, output operands usually come first, followed by
|
||||||
operands, followed by constant operands. The output type is included
|
input operands, followed by constant operands. The output type is
|
||||||
in the instruction name. Constants are prefixed with a '$'.
|
included in the instruction name. Constants are prefixed with a '$'.
|
||||||
|
|
||||||
add_i32 t0, t1, t2 (t0 <- t1 + t2)
|
add_i32 t0, t1, t2 (t0 <- t1 + t2)
|
||||||
|
|
||||||
sub_i64 t2, t3, $4 (t2 <- t3 - 4)
|
|
||||||
|
|
||||||
3.2) Assumptions
|
3.2) Assumptions
|
||||||
|
|
||||||
* Basic blocks
|
* Basic blocks
|
||||||
@ -62,9 +64,8 @@ sub_i64 t2, t3, $4 (t2 <- t3 - 4)
|
|||||||
- Basic blocks start after the end of a previous basic block, at a
|
- Basic blocks start after the end of a previous basic block, at a
|
||||||
set_label instruction or after a legacy dyngen operation.
|
set_label instruction or after a legacy dyngen operation.
|
||||||
|
|
||||||
After the end of a basic block, temporaries at destroyed and globals
|
After the end of a basic block, the content of temporaries is
|
||||||
are stored at their initial storage (register or memory place
|
destroyed, but local temporaries and globals are preserved.
|
||||||
depending on their declarations).
|
|
||||||
|
|
||||||
* Floating point types are not supported yet
|
* Floating point types are not supported yet
|
||||||
|
|
||||||
@ -100,7 +101,7 @@ optimizations:
|
|||||||
is suppressed.
|
is suppressed.
|
||||||
|
|
||||||
- A liveness analysis is done at the basic block level. The
|
- A liveness analysis is done at the basic block level. The
|
||||||
information is used to suppress moves from a dead temporary to
|
information is used to suppress moves from a dead variable to
|
||||||
another one. It is also used to remove instructions which compute
|
another one. It is also used to remove instructions which compute
|
||||||
dead results. The later is especially useful for condition code
|
dead results. The later is especially useful for condition code
|
||||||
optimization in QEMU.
|
optimization in QEMU.
|
||||||
@ -113,47 +114,6 @@ optimizations:
|
|||||||
|
|
||||||
only the last instruction is kept.
|
only the last instruction is kept.
|
||||||
|
|
||||||
- A macro system is supported (may get closer to function inlining
|
|
||||||
some day). It is useful if the liveness analysis is likely to prove
|
|
||||||
that some results of a computation are indeed not useful. With the
|
|
||||||
macro system, the user can provide several alternative
|
|
||||||
implementations which are used depending on the used results. It is
|
|
||||||
especially useful for condition code optimization in QEMU.
|
|
||||||
|
|
||||||
Here is an example:
|
|
||||||
|
|
||||||
macro_2 t0, t1, $1
|
|
||||||
mov_i32 t0, $0x1234
|
|
||||||
|
|
||||||
The macro identified by the ID "$1" normally returns the values t0
|
|
||||||
and t1. Suppose its implementation is:
|
|
||||||
|
|
||||||
macro_start
|
|
||||||
brcond_i32 t2, $0, $TCG_COND_EQ, $1
|
|
||||||
mov_i32 t0, $2
|
|
||||||
br $2
|
|
||||||
set_label $1
|
|
||||||
mov_i32 t0, $3
|
|
||||||
set_label $2
|
|
||||||
add_i32 t1, t3, t4
|
|
||||||
macro_end
|
|
||||||
|
|
||||||
If t0 is not used after the macro, the user can provide a simpler
|
|
||||||
implementation:
|
|
||||||
|
|
||||||
macro_start
|
|
||||||
add_i32 t1, t2, t4
|
|
||||||
macro_end
|
|
||||||
|
|
||||||
TCG automatically chooses the right implementation depending on
|
|
||||||
which macro outputs are used after it.
|
|
||||||
|
|
||||||
Note that if TCG did more expensive optimizations, macros would be
|
|
||||||
less useful. In the previous example a macro is useful because the
|
|
||||||
liveness analysis is done on each basic block separately. Hence TCG
|
|
||||||
cannot remove the code computing 't0' even if it is not used after
|
|
||||||
the first macro implementation.
|
|
||||||
|
|
||||||
3.4) Instruction Reference
|
3.4) Instruction Reference
|
||||||
|
|
||||||
********* Function call
|
********* Function call
|
||||||
@ -241,6 +201,10 @@ t0=t1|t2
|
|||||||
|
|
||||||
t0=t1^t2
|
t0=t1^t2
|
||||||
|
|
||||||
|
* not_i32/i64 t0, t1
|
||||||
|
|
||||||
|
t0=~t1
|
||||||
|
|
||||||
********* Shifts
|
********* Shifts
|
||||||
|
|
||||||
* shl_i32/i64 t0, t1, t2
|
* shl_i32/i64 t0, t1, t2
|
||||||
@ -428,3 +392,34 @@ to apply more optimizations because more registers will be free for
|
|||||||
the generated code.
|
the generated code.
|
||||||
|
|
||||||
The exception model is the same as the dyngen one.
|
The exception model is the same as the dyngen one.
|
||||||
|
|
||||||
|
6) Recommended coding rules for best performance
|
||||||
|
|
||||||
|
- Use globals to represent the parts of the QEMU CPU state which are
|
||||||
|
often modified, e.g. the integer registers and the condition
|
||||||
|
codes. TCG will be able to use host registers to store them.
|
||||||
|
|
||||||
|
- Avoid globals stored in fixed registers. They must be used only to
|
||||||
|
store the pointer to the CPU state and possibly to store a pointer
|
||||||
|
to a register window. The other uses are to ensure backward
|
||||||
|
compatibility with dyngen during the porting a new target to TCG.
|
||||||
|
|
||||||
|
- Use temporaries. Use local temporaries only when really needed,
|
||||||
|
e.g. when you need to use a value after a jump. Local temporaries
|
||||||
|
introduce a performance hit in the current TCG implementation: their
|
||||||
|
content is saved to memory at end of each basic block.
|
||||||
|
|
||||||
|
- Free temporaries and local temporaries when they are no longer used
|
||||||
|
(tcg_temp_free). Since tcg_const_x() also creates a temporary, you
|
||||||
|
should free it after it is used. Freeing temporaries does not yield
|
||||||
|
a better generated code, but it reduces the memory usage of TCG and
|
||||||
|
the speed of the translation.
|
||||||
|
|
||||||
|
- Don't hesitate to use helpers for complicated or seldom used target
|
||||||
|
intructions. There is little performance advantage in using TCG to
|
||||||
|
implement target instructions taking more than about twenty TCG
|
||||||
|
instructions.
|
||||||
|
|
||||||
|
- Use the 'discard' instruction if you know that TCG won't be able to
|
||||||
|
prove that a given global is "dead" at a given program point. The
|
||||||
|
x86 target uses it to improve the condition codes optimisation.
|
||||||
|
31
tcg/TODO
31
tcg/TODO
@ -1,32 +1,15 @@
|
|||||||
- test macro system
|
- Add new instructions such as: andnot, ror, rol, setcond, clz, ctz,
|
||||||
|
popcnt.
|
||||||
|
|
||||||
- test conditional jumps
|
- See if it is worth exporting mul2, mulu2, div2, divu2.
|
||||||
|
|
||||||
- test mul, div, ext8s, ext16s, bswap
|
- Support of globals saved in fixed registers between TBs.
|
||||||
|
|
||||||
- generate a global TB prologue and epilogue to save/restore registers
|
|
||||||
to/from the CPU state and to reserve a stack frame to optimize
|
|
||||||
helper calls. Modify cpu-exec.c so that it does not use global
|
|
||||||
register variables (except maybe for 'env').
|
|
||||||
|
|
||||||
- fully convert the x86 target. The minimal amount of work includes:
|
|
||||||
- add cc_src, cc_dst and cc_op as globals
|
|
||||||
- disable its eflags optimization (the liveness analysis should
|
|
||||||
suffice)
|
|
||||||
- move complicated operations to helpers (in particular FPU, SSE, MMX).
|
|
||||||
|
|
||||||
- optimize the x86 target:
|
|
||||||
- move some or all the registers as globals
|
|
||||||
- use the TB prologue and epilogue to have QEMU target registers in
|
|
||||||
pre assigned host registers.
|
|
||||||
|
|
||||||
Ideas:
|
Ideas:
|
||||||
|
|
||||||
- Move the slow part of the qemu_ld/st ops after the end of the TB.
|
- Move the slow part of the qemu_ld/st ops after the end of the TB.
|
||||||
|
|
||||||
- Experiment: change instruction storage to simplify macro handling
|
- Change exception syntax to get closer to QOP system (exception
|
||||||
and to handle dynamic allocation and see if the translation speed is
|
|
||||||
OK.
|
|
||||||
|
|
||||||
- change exception syntax to get closer to QOP system (exception
|
|
||||||
parameters given with a specific instruction).
|
parameters given with a specific instruction).
|
||||||
|
|
||||||
|
- Add float and vector support.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user