xemu/target/arm/vfp.decode

# AArch32 VFP instruction descriptions (conditional insns)
#
#  Copyright (c) 2019 Linaro, Ltd
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, see <http://www.gnu.org/licenses/>.

#
# This file is processed by scripts/decodetree.py
#
# Encodings for the conditional VFP instructions are here:
# generally anything matching A32
#  cccc 11.. .... .... .... 101. .... ....
# and T32
#  1110 110. .... .... .... 101. .... ....
#  1110 1110 .... .... .... 101. .... ....
# (but those patterns might also cover some Neon instructions,
# which do not live in this file.)

# VFP registers have an odd encoding with a four-bit field
# and a one-bit field which are assembled in different orders
# depending on whether the register is double or single precision.
# Each individual instruction function must do the checks for
# "double register selected but CPU does not have double support"
# and "double register number has bit 4 set but CPU does not
# support D16-D31" (which should UNDEF).
%vm_dp  5:1 0:4
%vm_sp  0:4 5:1
%vn_dp  7:1 16:4
%vn_sp  16:4 7:1
%vd_dp  22:1 12:4
%vd_sp  12:4 22:1

%vmov_idx_b     21:1 5:2
%vmov_idx_h     21:1 6:1

# VMOV scalar to general-purpose register; note that this does
# include some Neon cases.
VMOV_to_gp   ---- 1110 u:1 1.        1 .... rt:4 1011 ... 1 0000 \
             vn=%vn_dp size=0 index=%vmov_idx_b
VMOV_to_gp   ---- 1110 u:1 0.        1 .... rt:4 1011 ..1 1 0000 \
             vn=%vn_dp size=1 index=%vmov_idx_h
VMOV_to_gp   ---- 1110 0   0 index:1 1 .... rt:4 1011 .00 1 0000 \
             vn=%vn_dp size=2 u=0

VMOV_from_gp ---- 1110 0 1.        0 .... rt:4 1011 ... 1 0000 \
             vn=%vn_dp size=0 index=%vmov_idx_b
VMOV_from_gp ---- 1110 0 0.        0 .... rt:4 1011 ..1 1 0000 \
             vn=%vn_dp size=1 index=%vmov_idx_h
VMOV_from_gp ---- 1110 0 0 index:1 0 .... rt:4 1011 .00 1 0000 \
             vn=%vn_dp size=2

VDUP         ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \
             vn=%vn_dp

VMSR_VMRS    ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
VMOV_single  ---- 1110 000 l:1 .... rt:4 1010 . 001 0000 \
             vn=%vn_sp

VMOV_64_sp   ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 .... \
             vm=%vm_sp
VMOV_64_dp   ---- 1100 010 op:1 rt2:4 rt:4 1011 00.1 .... \
             vm=%vm_dp

# Note that the half-precision variants of VLDR and VSTR are
# not part of this decodetree at all because they have bits [9:8] == 0b01
VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8 \
             vd=%vd_sp
VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8 \
             vd=%vd_dp

# We split the load/store multiple up into two patterns to avoid
# overlap with other insns in the "Advanced SIMD load/store and 64-bit move"
# grouping:
#   P=0 U=0 W=0 is 64-bit VMOV
#   P=1 W=0 is VLDR/VSTR
#   P=U W=1 is UNDEF
# leaving P=0 U=1 W=x and P=1 U=0 W=1 for load/store multiple.
# These include FSTM/FLDM.
VLDM_VSTM_sp ---- 1100 1 . w:1 l:1 rn:4 .... 1010 imm:8 \
             vd=%vd_sp p=0 u=1
VLDM_VSTM_dp ---- 1100 1 . w:1 l:1 rn:4 .... 1011 imm:8 \
             vd=%vd_dp p=0 u=1

VLDM_VSTM_sp ---- 1101 0.1 l:1 rn:4 .... 1010 imm:8 \
             vd=%vd_sp p=1 u=0 w=1
VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
             vd=%vd_dp p=1 u=0 w=1

# 3-register VFP data-processing; bits [23,21:20,6] identify the operation.
VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 .... \
             vm=%vm_sp vn=%vn_sp vd=%vd_sp
VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 .... \
             vm=%vm_dp vn=%vn_dp vd=%vd_dp

VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 .... \
             vm=%vm_sp vn=%vn_sp vd=%vd_sp
VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 .... \
             vm=%vm_dp vn=%vn_dp vd=%vd_dp

VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 .... \
             vm=%vm_sp vn=%vn_sp vd=%vd_sp
VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 .... \
             vm=%vm_dp vn=%vn_dp vd=%vd_dp

VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 .... \
             vm=%vm_sp vn=%vn_sp vd=%vd_sp
VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 .... \
             vm=%vm_dp vn=%vn_dp vd=%vd_dp

VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 .... \
             vm=%vm_sp vn=%vn_sp vd=%vd_sp
VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 .... \
             vm=%vm_dp vn=%vn_dp vd=%vd_dp

VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 .... \
             vm=%vm_sp vn=%vn_sp vd=%vd_sp
VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 .... \
             vm=%vm_dp vn=%vn_dp vd=%vd_dp

VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 .... \
             vm=%vm_sp vn=%vn_sp vd=%vd_sp
VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 .... \
             vm=%vm_dp vn=%vn_dp vd=%vd_dp
target/arm: Add stubs for AArch32 VFP decodetree Add the infrastructure for building and invoking a decodetree decoder for the AArch32 VFP encodings. At the moment the new decoder covers nothing, so we always fall back to the existing hand-written decode. We need to have one decoder for the unconditional insns and one for the conditional insns, as otherwise the patterns for conditional insns would incorrectly match against the unconditional ones too. Since translate.c is over 14,000 lines long and we're going to be touching pretty much every line of the VFP code as part of the decodetree conversion, we create a new translate-vfp.inc.c to hold the code which deals with VFP in the new scheme. It should be possible to convert this into a standalone translation unit eventually, but the conversion process will be much simpler if we simply #include it midway through translate.c to start with. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> 2019-06-11 15:39:41 +00:00			`# AArch32 VFP instruction descriptions (conditional insns)`
			`#`
			`# Copyright (c) 2019 Linaro, Ltd`
			`#`
			`# This library is free software; you can redistribute it and/or`
			`# modify it under the terms of the GNU Lesser General Public`
			`# License as published by the Free Software Foundation; either`
			`# version 2 of the License, or (at your option) any later version.`
			`#`
			`# This library is distributed in the hope that it will be useful,`
			`# but WITHOUT ANY WARRANTY; without even the implied warranty of`
			`# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU`
			`# Lesser General Public License for more details.`
			`#`
			`# You should have received a copy of the GNU Lesser General Public`
			`# License along with this library; if not, see <http://www.gnu.org/licenses/>.`

			`#`
			`# This file is processed by scripts/decodetree.py`
			`#`
			`# Encodings for the conditional VFP instructions are here:`
			`# generally anything matching A32`
			`# cccc 11.. .... .... .... 101. .... ....`
			`# and T32`
			`# 1110 110. .... .... .... 101. .... ....`
			`# 1110 1110 .... .... .... 101. .... ....`
			`# (but those patterns might also cover some Neon instructions,`
			`# which do not live in this file.)`
target/arm: Convert "double-precision" register moves to decodetree Convert the "double-precision" register moves to decodetree: this covers VMOV scalar-to-gpreg, VMOV gpreg-to-scalar and VDUP. Note that the conversion process has tightened up a few of the UNDEF encoding checks: we now correctly forbid: * VMOV-to-gpr with U:opc1:opc2 == 10x00 or x0x10 * VMOV-from-gpr with opc1:opc2 == 0x10 * VDUP with B:E == 11 * VDUP with Q == 1 and Vn<0> == 1 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- The accesses of elements < 32 bits could be improved by doing direct ld/st of the right size rather than 32-bit read-and-shift or read-modify-write, but we leave this for later cleanup, since this series is generally trying to stick to fixing the decode. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> 2019-06-11 15:39:44 +00:00
			`# VFP registers have an odd encoding with a four-bit field`
			`# and a one-bit field which are assembled in different orders`
			`# depending on whether the register is double or single precision.`
			`# Each individual instruction function must do the checks for`
			`# "double register selected but CPU does not have double support"`
			`# and "double register number has bit 4 set but CPU does not`
			`# support D16-D31" (which should UNDEF).`
			`%vm_dp 5:1 0:4`
			`%vm_sp 0:4 5:1`
			`%vn_dp 7:1 16:4`
			`%vn_sp 16:4 7:1`
			`%vd_dp 22:1 12:4`
			`%vd_sp 12:4 22:1`

			`%vmov_idx_b 21:1 5:2`
			`%vmov_idx_h 21:1 6:1`

			`# VMOV scalar to general-purpose register; note that this does`
			`# include some Neon cases.`
			`VMOV_to_gp ---- 1110 u:1 1. 1 .... rt:4 1011 ... 1 0000 \`
			`vn=%vn_dp size=0 index=%vmov_idx_b`
			`VMOV_to_gp ---- 1110 u:1 0. 1 .... rt:4 1011 ..1 1 0000 \`
			`vn=%vn_dp size=1 index=%vmov_idx_h`
			`VMOV_to_gp ---- 1110 0 0 index:1 1 .... rt:4 1011 .00 1 0000 \`
			`vn=%vn_dp size=2 u=0`

			`VMOV_from_gp ---- 1110 0 1. 0 .... rt:4 1011 ... 1 0000 \`
			`vn=%vn_dp size=0 index=%vmov_idx_b`
			`VMOV_from_gp ---- 1110 0 0. 0 .... rt:4 1011 ..1 1 0000 \`
			`vn=%vn_dp size=1 index=%vmov_idx_h`
			`VMOV_from_gp ---- 1110 0 0 index:1 0 .... rt:4 1011 .00 1 0000 \`
			`vn=%vn_dp size=2`

			`VDUP ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \`
			`vn=%vn_dp`
target/arm: Convert "single-precision" register moves to decodetree Convert the "single-precision" register moves to decodetree: * VMSR * VMRS * VMOV between general purpose register and single precision Note that the VMSR/VMRS conversions make our handling of the "should this UNDEF?" checks consistent between the two instructions: * VMSR to MVFR0, MVFR1, MVFR2 now UNDEF from EL0 (previously was a nop) * VMSR to FPSID now UNDEFs from EL0 or if VFPv3 or better (previously was a nop) * VMSR to FPINST and FPINST2 now UNDEF if VFPv3 or better (previously would write to the register, which had no guest-visible effect because we always UNDEF reads) We also tighten up the decode: we were previously underdecoding some SBZ or SBO bits. The conversion of VMOV_single includes the expansion out of the gen_mov_F0_vreg()/gen_vfp_mrs() and gen_mov_vreg_F0()/gen_vfp_msr() sequences into the simpler direct load/store of the TCG temp via neon_{load,store}_reg32(): we know in the new function that we're always single-precision, we don't need to use the old-and-deprecated cpu_F0* TCG globals, and we don't happen to have the declaration of gen_vfp_msr() and gen_vfp_mrs() at the point in the file where the new function is. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> 2019-06-11 15:39:44 +00:00
			`VMSR_VMRS ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000`
			`VMOV_single ---- 1110 000 l:1 .... rt:4 1010 . 001 0000 \`
			`vn=%vn_sp`
target/arm: Convert VFP two-register transfer insns to decodetree Convert the VFP two-register transfer instructions to decodetree (in the v8 Arm ARM these are the "Advanced SIMD and floating-point 64-bit move" encoding group). Again, we expand out the sequences involving gen_vfp_msr() and gen_msr_vfp(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> 2019-06-11 15:39:45 +00:00
			`VMOV_64_sp ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 .... \`
			`vm=%vm_sp`
			`VMOV_64_dp ---- 1100 010 op:1 rt2:4 rt:4 1011 00.1 .... \`
			`vm=%vm_dp`
target/arm: Convert VFP VLDR and VSTR to decodetree Convert the VFP single load/store insns VLDR and VSTR to decodetree. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> 2019-06-11 15:39:45 +00:00
			`# Note that the half-precision variants of VLDR and VSTR are`
			`# not part of this decodetree at all because they have bits [9:8] == 0b01`
			`VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8 \`
			`vd=%vd_sp`
			`VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8 \`
			`vd=%vd_dp`
target/arm: Convert the VFP load/store multiple insns to decodetree Convert the VFP load/store multiple insns to decodetree. This includes tightening up the UNDEF checking for pre-VFPv3 CPUs which only have D0-D15 : they now UNDEF for any access to D16-D31, not merely when the smallest register in the transfer list is in D16-D31. This conversion does not try to share code between the single precision and the double precision versions; this looks a bit duplicative of code, but it leaves the door open for a future refactoring which gets rid of the use of the "F0" registers by inlining the various functions like gen_vfp_ld() and gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }" conditionalisation. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> 2019-06-11 15:39:45 +00:00
			`# We split the load/store multiple up into two patterns to avoid`
			`# overlap with other insns in the "Advanced SIMD load/store and 64-bit move"`
			`# grouping:`
			`# P=0 U=0 W=0 is 64-bit VMOV`
			`# P=1 W=0 is VLDR/VSTR`
			`# P=U W=1 is UNDEF`
			`# leaving P=0 U=1 W=x and P=1 U=0 W=1 for load/store multiple.`
			`# These include FSTM/FLDM.`
			`VLDM_VSTM_sp ---- 1100 1 . w:1 l:1 rn:4 .... 1010 imm:8 \`
			`vd=%vd_sp p=0 u=1`
			`VLDM_VSTM_dp ---- 1100 1 . w:1 l:1 rn:4 .... 1011 imm:8 \`
			`vd=%vd_dp p=0 u=1`

			`VLDM_VSTM_sp ---- 1101 0.1 l:1 rn:4 .... 1010 imm:8 \`
			`vd=%vd_sp p=1 u=0 w=1`
			`VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \`
			`vd=%vd_dp p=1 u=0 w=1`
target/arm: Convert VFP VMLA to decodetree Convert the VFP VMLA instruction to decodetree. This is the first of the VFP 3-operand data processing instructions, so we include in this patch the code which loops over the elements for an old-style VFP vector operation. The existing code to do this looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since we are going to be converting instructions one at a time anyway we can take the opportunity to make the new loop use TCG temporaries, which means we can do that conversion one operation at a time rather than needing to do it all in one go. We include an UNDEF check which was missing in the old code: short-vector operations (with stride or length non-zero) were deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec field does not indicate that support for short vectors is present we UNDEF the operations that would use them. (This is a change of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which previously were all incorrectly allowing short-vector operations.) Note that the conversion fixes a bug in the old code for the case of VFP short-vector "mixed scalar/vector operations". These happen where the destination register is in a vector bank but but the second operand is in a scalar bank. For example vmla.f64 d10, d1, d16 with length 2 stride 2 is equivalent to the pair of scalar operations vmla.f64 d10, d1, d16 vmla.f64 d8, d3, d16 where the destination and first input register cycle through their vector but the second input is scalar (d16). In the old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d} as a temporary output for the multiply, which trashes the second input operand. For the fully-scalar case (where we never do a second iteration) and the fully-vector case (where the loop loads the new second input operand) this doesn't matter, but for the mixed scalar/vector case we will end up using the wrong value for later loop iterations. In the new code we use TCG temporaries and so avoid the bug. This bug is present for all the multiply-accumulate insns that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS. Note 2: the expression used to calculate the next register number in the vector bank is not in fact correct; we leave this behaviour unchanged from the old decoder and will fix this bug later in the series. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> 2019-06-11 15:39:46 +00:00
			`# 3-register VFP data-processing; bits [23,21:20,6] identify the operation.`
			`VMLA_sp ---- 1110 0.00 .... .... 1010 .0.0 .... \`
			`vm=%vm_sp vn=%vn_sp vd=%vd_sp`
			`VMLA_dp ---- 1110 0.00 .... .... 1011 .0.0 .... \`
			`vm=%vm_dp vn=%vn_dp vd=%vd_dp`
target/arm: Convert VFP VMLS to decodetree Convert the VFP VMLS instruction to decodetree. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> 2019-06-11 15:39:46 +00:00
			`VMLS_sp ---- 1110 0.00 .... .... 1010 .1.0 .... \`
			`vm=%vm_sp vn=%vn_sp vd=%vd_sp`
			`VMLS_dp ---- 1110 0.00 .... .... 1011 .1.0 .... \`
			`vm=%vm_dp vn=%vn_dp vd=%vd_dp`
target/arm: Convert VFP VNMLS to decodetree Convert the VFP VNMLS instruction to decodetree. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> 2019-06-11 15:39:46 +00:00
			`VNMLS_sp ---- 1110 0.01 .... .... 1010 .0.0 .... \`
			`vm=%vm_sp vn=%vn_sp vd=%vd_sp`
			`VNMLS_dp ---- 1110 0.01 .... .... 1011 .0.0 .... \`
			`vm=%vm_dp vn=%vn_dp vd=%vd_dp`
target/arm: Convert VFP VNMLA to decodetree Convert the VFP VNMLA instruction to decodetree. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> 2019-06-11 15:39:47 +00:00
			`VNMLA_sp ---- 1110 0.01 .... .... 1010 .1.0 .... \`
			`vm=%vm_sp vn=%vn_sp vd=%vd_sp`
			`VNMLA_dp ---- 1110 0.01 .... .... 1011 .1.0 .... \`
			`vm=%vm_dp vn=%vn_dp vd=%vd_dp`
target/arm: Convert VMUL to decodetree Convert the VMUL instruction to decodetree. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> 2019-06-11 15:39:47 +00:00
			`VMUL_sp ---- 1110 0.10 .... .... 1010 .0.0 .... \`
			`vm=%vm_sp vn=%vn_sp vd=%vd_sp`
			`VMUL_dp ---- 1110 0.10 .... .... 1011 .0.0 .... \`
			`vm=%vm_dp vn=%vn_dp vd=%vd_dp`
target/arm: Convert VNMUL to decodetree Convert the VNMUL instruction to decodetree. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> 2019-06-11 15:39:47 +00:00
			`VNMUL_sp ---- 1110 0.10 .... .... 1010 .1.0 .... \`
			`vm=%vm_sp vn=%vn_sp vd=%vd_sp`
			`VNMUL_dp ---- 1110 0.10 .... .... 1011 .1.0 .... \`
			`vm=%vm_dp vn=%vn_dp vd=%vd_dp`
target/arm: Convert VADD to decodetree Convert the VADD instruction to decodetree. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> 2019-06-11 15:39:48 +00:00
			`VADD_sp ---- 1110 0.11 .... .... 1010 .0.0 .... \`
			`vm=%vm_sp vn=%vn_sp vd=%vd_sp`
			`VADD_dp ---- 1110 0.11 .... .... 1011 .0.0 .... \`
			`vm=%vm_dp vn=%vn_dp vd=%vd_dp`