Since 7ecd02a06f, we are prepared to re-start code generation
with a smaller TB if a relocation is out of range. We no longer
need to leave a nop in the stream Just In Case.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The offset even checks were folded into the range check incorrectly.
By offsetting by 1, and not decrementing the width, we silently
allowed out of range branches.
Assert that the offset is always even instead. Move tcg_out_goto
down into the CONFIG_SOFTMMU block so that it is not unused.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use tcg_tbrel_diff when we need a displacement to a label,
and with a NULL argument when we need the normalizing addend.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The maximum TB code gen size is UINT16_MAX, which the current
code does not support. Use our utility function to optimally
add an arbitrary constant.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use tcg_tbrel_diff when we need a displacement to a label,
and with a NULL argument when we need the normalizing addend.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Joelle van Dyne <j@getutm.app>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
A typo generated a branch-and-link insn instead of plain branch.
Reviewed-by: Joelle van Dyne <j@getutm.app>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This produces a small pc-relative displacement within the
generated code to the TB structure that preceeds it.
Reviewed-by: Joelle van Dyne <j@getutm.app>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Plumb the value through to alloc_code_gen_buffer. This is not
supported by any os or tcg backend, so for now enabling it will
result in an error.
Reviewed-by: Joelle van Dyne <j@getutm.app>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
There is nothing within the translators that ought to be
changing the TranslationBlock data, so make it const.
This does not actually use the read-only copy of the
data structure that exists within the rx region.
Reviewed-by: Joelle van Dyne <j@getutm.app>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Pass both rx and rw addresses to tb_target_set_jmp_target.
Reviewed-by: Joelle van Dyne <j@getutm.app>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We must change all targets at once, since all must match
the declaration in tcg.c.
Reviewed-by: Joelle van Dyne <j@getutm.app>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Simplify the arguments to always use s->code_ptr instead of
take it as an argument. That makes it easy to ensure that
the value_ptr is always the rx version.
Reviewed-by: Joelle van Dyne <j@getutm.app>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We must change all targets at once, since all must match
the declaration in tcg.c.
Reviewed-by: Joelle van Dyne <j@getutm.app>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Change TCGLabel.u.value_ptr to const, and initialize it with
tcg_splitwx_to_rx. Propagate const through tcg/host/ only
as far as needed to avoid errors from the value_ptr change.
Reviewed-by: Joelle van Dyne <j@getutm.app>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add two helper functions, using a global variable to hold
the displacement. The displacement is currently always 0,
so no change in behaviour.
Begin using the functions in tcg common code only.
Reviewed-by: Joelle van Dyne <j@getutm.app>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This value is constant across all thread-local copies of TCGContext,
so we might as well move it out of thread-local storage.
Reviewed-by: Joelle van Dyne <j@getutm.app>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This value is constant across all thread-local copies of TCGContext,
so we might as well move it out of thread-local storage.
Use the correct function pointer type, and name the variable
tcg_qemu_tb_exec, which means that we are able to remove the
macro that does the casting.
Replace HAVE_TCG_QEMU_TB_EXEC with CONFIG_TCG_INTERPRETER,
as this is somewhat clearer in intent.
Reviewed-by: Joelle van Dyne <j@getutm.app>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We are shortly going to have a split rw/rx jit buffer. Depending
on the host, we need to flush the dcache at the rw data pointer and
flush the icache at the rx code pointer.
For now, the two passed pointers are identical, so there is no
effective change in behaviour.
Reviewed-by: Joelle van Dyne <j@getutm.app>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This is currently a no-op within tci/tcg-target.h, but
is about to be moved to a more generic location.
Reviewed-by: Joelle van Dyne <j@getutm.app>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Enable this on i386 to restrict the set of input registers
for an 8-bit store, as required by the architecture. This
removes the last use of scratch registers for user-only mode.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Always true when movbe is available, otherwise leave
this to generic code.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Out-of-range shifts have undefined results, but must not trap.
Mask off immediate shift counts to solve this problem.
This bug can be reproduced by running the following guest instructions:
xor %ecx,%ecx
sar %cl,%eax
cmovne %edi,%eax
After optimization, the tcg opcodes of the sar are
movi_i32 tmp3,$0xffffffffffffffff pref=all
sar_i32 tmp3,eax,tmp3 dead: 2 pref=all
mov_i32 cc_dst,eax sync: 0 dead: 1 pref=0xffc0300
mov_i32 cc_src,tmp3 sync: 0 dead: 0 1 pref=all
movi_i32 cc_op,$0x31 sync: 0 dead: 0 pref=all
The sar_i32 opcode is a shift by -1, which unmasked generates
0x200808d618: fffa5b9b illegal
Signed-off-by: Zihao Yu <yuzihao@ict.ac.cn>
Message-Id: <20201216081206.9628-1-yuzihao@ict.ac.cn>
[rth: Reworded the patch description.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
In f47db80cc0, we handled odd-sized tail clearing for
the case of hosts that have vector operations, but did
not handle the case of hosts that do not have vector ops.
This was ok until e2e7168a21, which changed the encoding
of simd_desc such that the odd sizes are impossible.
Add memset as a tcg helper, and use that for all out-of-line
byte stores to vectors. This includes, but is not limited to,
the tail clearing operation in question.
Cc: qemu-stable@nongnu.org
Buglink: https://bugs.launchpad.net/bugs/1907817
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This has been a tcg-specific function, but is also in use
by hardware accelerators via physmem.c. This can cause
link errors when tcg is disabled.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Joelle van Dyne <j@getutm.app>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20201214140314.18544-3-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
LLVM/Clang, supports runtime checks for forward-edge Control-Flow
Integrity (CFI).
CFI on indirect function calls (cfi-icall) ensures that, in indirect
function calls, the function called is of the right signature for the
pointer type defined at compile time.
For this check to work, the code must always respect the function
signature when using function pointer, the function must be defined
at compile time, and be compiled with link-time optimization.
This rules out, for example, shared libraries that are dynamically loaded
(given that functions are not known at compile time), and code that is
dynamically generated at run-time.
This patch:
1) Introduces the CONFIG_CFI flag to support cfi in QEMU
2) Introduces a decorator to allow the definition of "sensitive"
functions, where a non-instrumented function may be called at runtime
through a pointer. The decorator will take care of disabling cfi-icall
checks on such functions, when cfi is enabled.
3) Marks functions currently in QEMU that exhibit such behavior,
in particular:
- The function in TCG that calls pre-compiled TBs
- The function in TCI that interprets instructions
- Functions in the plugin infrastructures that jump to callbacks
- Functions in util that directly call a signal handler
Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
Acked-by: Alex Bennée <alex.bennee@linaro.org
Message-Id: <20201204230615.2392-3-dbuono@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To be able to compile this file with -Werror=implicit-fallthrough,
we need to add some fallthrough annotations to the case statements
that might fall through. Unfortunately, the typical "/* fallthrough */"
comments do not work here as expected since some case labels are
wrapped in macros and the compiler fails to match the comments in
this case. But using __attribute__((fallthrough)) seems to work fine,
so let's use that instead (by introducing a new QEMU_FALLTHROUGH
macro in our compiler.h header file).
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201211152426.350966-11-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
This reverts commit cd0372c515.
The patch is incorrect in that it retains copies between globals and
non-local temps, and non-local temps still die at the end of the BB.
Failing test case for hppa:
.globl _start
_start:
cmpiclr,= 0x24,%r19,%r0
cmpiclr,<> 0x2f,%r19,%r19
---- 00010057 0001005b
movi_i32 tmp0,$0x24
sub_i32 tmp1,tmp0,r19
mov_i32 tmp2,tmp0
mov_i32 tmp3,r19
movi_i32 tmp1,$0x0
---- 0001005b 0001005f
brcond_i32 tmp2,tmp3,eq,$L1
movi_i32 tmp0,$0x2f
sub_i32 tmp1,tmp0,r19
mov_i32 tmp2,tmp0
mov_i32 tmp3,r19
movi_i32 tmp1,$0x0
mov_i32 r19,tmp1
setcond_i32 psw_n,tmp2,tmp3,ne
set_label $L1
In this case, both copies of "mov_i32 tmp3,r19" are removed. The
second because opt thought it was redundant. The first is removed
later by liveness because tmp3 is known to be dead. This leaves
the setcond_i32 with an uninitialized input.
Revert the entire patch for 5.2, and a proper optimization across
the branch may be considered for the next development cycle.
Reported-by: qemu@igor2.repo.hu
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Since 6e6c4efed9, there has been a more appropriate range check
done later at the end of tcg_gen_code. There, a failing range
check results in a returned error code, which causes the TB to
be restarted at half the size.
Reported-by: Sai Pavan Boddu <saipava@xilinx.com>
Tested-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We can easily propagate temp values through the entire extended
basic block (in this case, the set of blocks connected by fallthru),
simply by not discarding the register state at the branch.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We can easily register allocate the entire extended basic block
(in this case, the set of blocks connected by fallthru), simply
by not discarding the register state at the branch.
This does not help blocks starting with a label, as they are
reached via a taken branch, and that would require saving the
complete register state at the branch.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The cmp_vec opcode is mandatory; this symbol is unused.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When the two arguments are identical, this can be reduced to
dup_vec or to mov_vec from a tcg_constant_vec.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The definition of INDEX_op_dupi_vec is that it operates on
units of tcg_target_ulong -- in this case 32 bits. It does
not work to use this for a uint64_t value that happens to be
small enough to fit in tcg_target_ulong.
Fixes: d2fd745fe8
Fixes: db432672dc
Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The previous change wrongly stated that 32-bit avx2 should have
used VPBROADCASTW. But that's a 16-bit broadcast and we want a
32-bit broadcast.
Fixes: 7b60ef3264
Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These are easier to set and test when they have their own fields.
Reduce the size of alias_index and sort_index to 4 bits, which is
sufficient for TCG_MAX_OP_ARGS. This leaves only the bits indicating
constants within the ct field.
Move all initialization to allocation time, rather than init
individual fields in process_op_defs.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This wasn't actually used for anything, really. All variable
operands must accept registers, and which are indicated by the
set in TCGArgConstraint.regs.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This uses an existing hole in the TCGArgConstraint structure
and will be convenient for keeping the data in one place.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The union is unused; let "regs" appear in the main structure
without the "u.regs" wrapping.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
With larger vector sizes, it turns out oprsz == maxsz, and we only
need to represent mismatch for oprsz <= 32. We do, however, need
to represent larger oprsz and do so without reducing SIMD_DATA_BITS.
Reduce the size of the oprsz field and increase the maxsz field.
Steal the oprsz value of 24 to indicate equality with maxsz.
Tested-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Instead of creating GStrings and passing them into log_disas,
just print the annotations directly in tb_gen_code.
Fix the annotations for the slow paths of the TB, after the
part implementing the final guest instruction.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
clang's C11 atomic_fetch_*() functions only take a C11 atomic type
pointer argument. QEMU uses direct types (int, etc) and this causes a
compiler error when a QEMU code calls these functions in a source file
that also included <stdatomic.h> via a system header file:
$ CC=clang CXX=clang++ ./configure ... && make
../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid)
Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h
and <stdatomic.h> can co-exist. I checked /usr/include on my machine and
searched GitHub for existing "qatomic_" users but there seem to be none.
This patch was generated using:
$ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \
sort -u >/tmp/changed_identifiers
$ for identifier in $(</tmp/changed_identifiers); do
sed -i "s%\<$identifier\>%q$identifier%g" \
$(git grep -I -l "\<$identifier\>")
done
I manually fixed line-wrap issues and misaligned rST tables.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
We already support duplication of 128-bit blocks. This extends
that support to 256-bit blocks. This will be needed by SVE2.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>