As pmovmskb is used by strlen et al, this is the third
highest overhead sse operation at %0.8.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
[Reorganize to generate code for any vector size. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The more complicated operations here are insertions and extractions.
Otherwise, there are just more entries than usual because the PS/PD/SS/SD
variations are encoded in the opcode rater than in the prefixes.
These three-byte opcodes also include AVX new instructions, whose
implementation in the helpers was originally done by Paul Brook
<paul@nowt.org>.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Three-byte opcodes from the 0F3Ah area all have an immediate byte which
is usually unsigned. Clarify in the helper code that it is unsigned;
the new decoder treats immediates as signed by default, and seeing
an intN_t in the prototype might give the wrong impression that one
can use decode->immediate directly.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The more complicated ones here are d6-d7, e6-e7, f7. The others
are trivial.
For LDDQU, using gen_load_sse directly might corrupt the register if
the second part of the load fails. Therefore, add a custom X86_TYPE_WM
value; like X86_TYPE_W it does call gen_load(), but it also rejects a
value of 11 in the ModRM field like X86_TYPE_M.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This includes shifts by immediate, which use bits 3-5 of the ModRM byte
as an opcode extension. With the exception of 128-bit shifts, they are
implemented using gvec.
This also covers VZEROALL and VZEROUPPER, which use the same opcode
as EMMS. If we were wanting to optimize out gen_clear_ymmh then this
would be one of the starting points. The implementation of the VZEROALL
and VZEROUPPER helpers is by Paul Brook.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These are a mixed batch, including the first two horizontal
(66 and F2 only) operations, more moves, and SSE4a extract/insert.
Because SSE4a is pretty rare, I chose to leave the helper as they are,
but it is possible to unify them by loading index and length from the
source XMM register and generating deposit or extract TCG ops.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These are mostly floating-point SSE operations. The odd ones out
are MOVMSK and CVTxx2yy, the others are straightforward.
Unary operations are a bit special in AVX because they have 2 operands
for PD/PS operands (VEX.vvvv must be 1111b), and 3 operands for SD/SS.
They are handled using X86_OP_GROUP3 for compactness.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These are more simple integer instructions present in both MMX and SSE/AVX,
with no holes that were later occupied by newer instructions.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These are both MMX and SSE/AVX instructions, except for vmovdqu. In both
cases the inputs and output is in s->ptr{0,1,2}, so the only difference
between MMX, SSE, and AVX is which helper to call.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new implementation of SSE will cover AVX from the get go, because
all the work for the helper functions is already done. We just need to
build them.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new implementation of SSE will cover AVX from the get go, so include
the 24 extra comparison operators that are only available with the VEX
prefix.
Based on a patch by Paul Brook <paul@nowt.org>.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Compared to Paul's implementation, the new decoder will use a different approach
to implement AVX's merging of dst with src1 on scalar operations. Adjust the
old SSE decoder to be compatible with new-style helpers.
The affected instructions are CVTSx2Sx, ROUNDSx, RSQRTSx, SQRTSx, RCPSx.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Compared to Paul's implementation, the new decoder will use a different approach
to implement AVX's merging of dst with src1 on scalar operations. Adjust the
helpers to provide this functionality.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add to the helpers all the operands that are needed to implement AVX.
Extracted from a patch by Paul Brook <paul@nowt.org>.
Message-Id: <20220424220204.2493824-26-paul@nowt.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Adjust all #ifdefs to match the ones in ops_sse.h.
Signed-off-by: Paul Brook <paul@nowt.org>
Message-Id: <20220424220204.2493824-23-paul@nowt.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Because these are the only VEX instructions that QEMU supports, the
new decoder is entered on the first byte of a valid VEX prefix, and VEX
decoding only needs to be done in decode-new.c.inc.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Many SSE and AVX instructions are only valid with specific prefixes
(none, 66, F3, F2). Introduce a direct way to encode this in the
decoding table to avoid using decode groups too much.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a new hflag bit to determine whether AVX instructions are allowed
Signed-off-by: Paul Brook <paul@nowt.org>
Message-Id: <20220424220204.2493824-4-paul@nowt.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
TCG will shortly implement VAES instructions, so add the relevant feature
word to the DisasContext.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add generic code generation that takes care of preparing operands
around calls to decode.e.gen in a table-driven manner, so that ALU
operations need not take care of that.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
REX.W can be used even in 32-bit mode by AVX instructions, where it is retroactively
renamed to VEX.W. Make the field available even in 32-bit mode but keep the REX_W()
macro as it was; this way, that the handling of dflag does not use it by mistake and
the AVX code more clearly points at the special VEX behavior of the bit.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ldq takes a pointer to the first byte to load the 64-bit word in;
ldo takes a pointer to the first byte of the ZMMReg. Make them
consistent, which will be useful in the new SSE decoder's
load/writeback routines.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This will be used for emission and endian adjustments of gvec operations.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220822223722.1697758-2-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rather than recurse directly on mmu_translate, go through the
same softmmu lookup that we did for the page table walk.
This centralizes all knowledge of MMU_NESTED_IDX, with respect
to setup of TranslationParams, to get_physical_address.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221002172956.265735-10-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use probe_access_full in order to resolve to a host address,
which then lets us use a host cmpxchg to update the pte.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/279
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221002172956.265735-9-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We don't need one variable set per translation level,
which requires copying into pte/pte_addr for huge pages.
Standardize on pte/pte_addr for all levels.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221002172956.265735-8-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use MMU_NESTED_IDX for each memory access, rather than
just a single translation to physical. Adjust svm_save_seg
and svm_load_seg to pass in mmu_idx.
This removes the last use of get_hphys so remove it.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221002172956.265735-7-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These new mmu indexes will be helpful for improving
paging and code throughout the target.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221002172956.265735-6-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace with PTE_HPHYS for the page table walk, and a direct call
to mmu_translate for the final stage2 translation. Hoist the check
for HF2_NPT_MASK out to get_physical_address, which avoids the
recursive call when stage2 is disabled.
We can now return all the way out to x86_cpu_tlb_fill before raising
an exception, which means probe works.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221002172956.265735-5-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Create TranslateParams for inputs, TranslateResults for successful
outputs, and TranslateFault for error outputs; return true on success.
Move stage1 error paths from handle_mmu_fault to x86_cpu_tlb_fill;
reorg the rest of handle_mmu_fault into get_physical_address.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221002172956.265735-4-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use a boolean to control the call to get_hphys instead
of passing a null function pointer.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221002172956.265735-3-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace int is_write1 and magic numbers with the proper
MMUAccessType access_type and enumerators.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221002172956.265735-2-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Restore pc_save while undoing any state change that may have
happened while decoding the instruction. Leave a TODO about
removing all of that when the table-based decoder is complete.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221016222303.288551-1-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The semantic difference between the deprecated device_legacy_reset()
function and the newer device_cold_reset() function is that the new
function resets both the device itself and any qbuses it owns,
whereas the legacy function resets just the device itself and nothing
else.
The x86_cpu_after_reset() function uses device_legacy_reset() to reset
the APIC; this is an APICCommonState and does not have any qbuses, so
for this purpose the two functions behave identically and we can stop
using the deprecated one.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20221013171926.1447899-1-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Resetting a guest that has Hyper-V VMBus support enabled triggers a QEMU
assertion failure:
hw/hyperv/hyperv.c:131: synic_reset: Assertion `QLIST_EMPTY(&synic->sint_routes)' failed.
This happens both on normal guest reboot or when using "system_reset" HMP
command.
The failing assertion was introduced by commit 64ddecc88b ("hyperv: SControl is optional to enable SynIc")
to catch dangling SINT routes on SynIC reset.
The root cause of this problem is that the SynIC itself is reset before
devices using SINT routes have chance to clean up these routes.
Since there seems to be no existing mechanism to force reset callbacks (or
methods) to be executed in specific order let's use a similar method that
is already used to reset another interrupt controller (APIC) after devices
have been reset - by invoking the SynIC reset from the machine reset
handler via a new x86_cpu_after_reset() function co-located with
the existing x86_cpu_reset() in target/i386/cpu.c.
Opportunistically move the APIC reset handler there, too.
Fixes: 64ddecc88b ("hyperv: SControl is optional to enable SynIc") # exposed the bug
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <cb57cee2e29b20d06f81dce054cbcea8b5d497e8.1664552976.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add support for saving/restoring extended save states when signals
are delivered. This allows using AVX, MPX or PKRU registers in
signal handlers.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The MSR_CORE_THREAD_COUNT MSR describes CPU package topology, such as number
of threads and cores for a given package. This is information that QEMU has
readily available and can provide through the new user space MSR deflection
interface.
This patch propagates the existing hvf logic from patch 027ac0cb51
("target/i386/hvf: add rdmsr 35H MSR_CORE_THREAD_COUNT") to KVM.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-Id: <20221004225643.65036-4-agraf@csgraf.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM has grown support to deflect arbitrary MSRs to user space since
Linux 5.10. For now we don't expect to make a lot of use of this
feature, so let's expose it the easiest way possible: With up to 16
individually maskable MSRs.
This patch adds a kvm_filter_msr() function that other code can call
to install a hook on KVM MSR reads or writes.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-Id: <20221004225643.65036-3-agraf@csgraf.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Intel CPUs starting with Haswell-E implement a new MSR called
MSR_CORE_THREAD_COUNT which exposes the number of threads and cores
inside of a package.
This MSR is used by XNU to populate internal data structures and not
implementing it prevents virtual machines with more than 1 vCPU from
booting if the emulated CPU generation is at least Haswell-E.
This patch propagates the existing hvf logic from patch 027ac0cb51
("target/i386/hvf: add rdmsr 35H MSR_CORE_THREAD_COUNT") to TCG.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-Id: <20221004225643.65036-2-agraf@csgraf.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221001140935.465607-27-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Expand this function at each of its callers.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221001140935.465607-26-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Create a tcg global temp for this, and use it instead of explicit stores.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221001140935.465607-25-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221001140935.465607-24-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These functions have only one caller, and the logic is more
obvious this way.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221001140935.465607-23-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These functions are always passed aflag, so we might as well
read it from DisasContext directly. While we're at it, use
a common subroutine for these two functions.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221001140935.465607-22-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>