We came across an llvm bug when compiling some testcases that 64-bit
immediates are silently truncated into 32-bit and then packed into
BPF_JMP | BPF_K encoding. This caused comparison with wrong value.
This bug looks to be introduced by r308080 (llvm 5.0). The Select_Ri pattern is
supposed to be lowered into J*_Ri while the latter only support 32-bit
immediate encoding, therefore Select_Ri should have similar immediate
predicate check as what J*_Ri are doing.
The bug is fixed by
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315889 91177308-0d34-0410-b5e6-96231b3b80d8
in llvm 6.0.
This patch is largely the same as the fix in llvm 6.0 except
one minor adjustment for the test case.
Reported-by: John Fastabend <john.fastabend@gmail.com>
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@319633 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r316035 | tnorthover | 2017-10-17 14:43:52 -0700 (Tue, 17 Oct 2017) | 6 lines
AArch64: account for possible frame index operand in compares.
If the address of a local is used in a comparison, AArch64 can fold the
address-calculation into the comparison via "adds". Unfortunately, a couple of
places (both hit in this one test) are not ready to deal with that yet and just
assume the first source operand is a register.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@319231 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r318788 | mcrosier | 2017-11-21 10:08:34 -0800 (Tue, 21 Nov 2017) | 16 lines
[AArch64] Mark mrs of TPIDR_EL0 (thread pointer) as *having* side effects.
This partially reverts r298851. The the underlying issue is that we don't
currently model the dependency between mrs (read system register) and
msr (write system register) instructions.
Something like the below should never be reordered:
msr TPIDR_EL0, x0 ;; set thread pointer
mrs x8, TPIDR_EL0 ;; read thread pointer
but was being reordered after r298851. The functional part of the patch
that wasn't reverted needed to remain in place in order to not break
r299462.
PR35317
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@318854 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r317204 | sdardis | 2017-11-02 05:47:22 -0700 (Thu, 02 Nov 2017) | 15 lines
[mips] Use register scavenging with MSA.
MSA stores and loads to the stack are more likely to require an
emergency GPR spill slot due to the smaller offsets available
with those instructions.
Handle this by overestimating the size of the stack by determining
the largest offset presuming that all callee save registers are
spilled and accounting of incoming arguments when determining
whether an emergency spill slot is required.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39056
------------------------------------------------------------------------
------------------------------------------------------------------------
r318172 | sdardis | 2017-11-14 11:11:45 -0800 (Tue, 14 Nov 2017) | 5 lines
[mips] Simplify test for 5.0.1 (NFC)
Simplify testing that an emergency spill slot is used when MSA
is used so that it can be included in the 5.0.1 release.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@318191 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r314798 | sdardis | 2017-10-03 06:45:49 -0700 (Tue, 03 Oct 2017) | 9 lines
[mips] Enable spilling and reloading of the dsp register set.
The dsp register class is an alias of the gpr register class, so
we have to define instructions for spilling and reloading.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D38038
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@318183 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r315485 | spatel | 2017-10-11 11:24:21 -0700 (Wed, 11 Oct 2017) | 7 lines
[x86] avoid infinite loop from SoftenFloatOperand (PR34866)
Legalization of fp128 assumes things that we should have asserts for,
so that's another potential improvement.
Differential Revision: https://reviews.llvm.org/D38771
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@316607 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r314891 | dylanmckay | 2017-10-04 22:51:28 +1300 (Wed, 04 Oct 2017) | 8 lines
[AVR] Insert JMP for long branches
Previously, on long branches (relative jumps of >4 kB), an assertion
failure was hit, as AVRInstrInfo::insertIndirectBranch was not
implemented. Despite its name, it is called by the branch relaxator
for *all* unconditional jumps.
Patch by Thomas Backman.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@315833 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r314890 | dylanmckay | 2017-10-04 22:51:21 +1300 (Wed, 04 Oct 2017) | 16 lines
[AVR] Fix displacement overflow for LDDW/STDW
In some cases, the code generator attempts to generate instructions such as:
lddw r24, Y+63
which expands to:
ldd r24, Y+63
ldd r25, Y+64 # Oops! This is actually ld r25, Y in the binary
This commit limits the first offset to 62, and thus the second to 63.
It also updates some asserts in AVRExpandPseudoInsts.cpp, including for
INW and OUTW, which appear to be unused.
Patch by Thomas Backman.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@315832 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r313366 | ctopper | 2017-09-15 10:09:03 -0700 (Fri, 15 Sep 2017) | 9 lines
[X86] Don't create i64 constants on 32-bit targets when lowering v64i1 constant build vectors
When handling a v64i1 build vector of constants on 32-bit targets we were creating an illegal i64 constant that we then bitcasted back to v64i1. We need to instead create two 32-bit constants, bitcast them to v32i1 and concat the result. We should also take care to handle the halves being all zeros/ones after the split.
This patch splits the build vector and then recursively lowers the two pieces. This allows us to handle the all ones and all zeros cases with minimal effort. Ideally we'd just do the split and concat, and let lowering get called again on the new nodes, but getNode has special handling for CONCAT_VECTORS that reassembles the pieces back into a single BUILD_VECTOR. Hopefully the two temporary BUILD_VECTORS we had to create to do this that don't get returned don't cause any issues.
Fixes PR34605.
Differential Revision: https://reviews.llvm.org/D37858
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@315198 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r314891 | dylanmckay | 2017-10-04 22:51:28 +1300 (Wed, 04 Oct 2017) | 8 lines
[AVR] Insert JMP for long branches
Previously, on long branches (relative jumps of >4 kB), an assertion
failure was hit, as AVRInstrInfo::insertIndirectBranch was not
implemented. Despite its name, it is called by the branch relaxator
for *all* unconditional jumps.
Patch by Thomas Backman.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@314892 91177308-0d34-0410-b5e6-96231b3b80d8
[AArch64] Fix bug in store of vector 0 DAGCombine.
Summary:
Avoid using XZR/WZR directly as operands to split stores of zero
vectors. Doing so can lead to the XZR/WZR being used by an instruction
that doesn't allow it (e.g. add).
Fixes bug 34674.
Reviewers: t.p.northover, efriedma, MatzeB
Subscribers: aemerson, rengolin, javed.absar, mcrosier, eraman, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38146
PR34695.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@314796 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r312348 | matze | 2017-09-01 11:36:26 -0700 (Fri, 01 Sep 2017) | 39 lines
LiveIntervalAnalysis: Fix alias regunit reserved definition
A register in CodeGen can be marked as reserved: In that case we
consider the register always live and do not use (or rather ignore)
kill/dead/undef operand flags.
LiveIntervalAnalysis however tracks liveness per register unit (not per
register). We already needed adjustments for this in r292871 to deal
with super/sub registers. However I did not look at aliased register
there. Looking at ARM:
FPSCR (regunits FPSCR, FPSCR~FPSCR_NZCV) aliases with FPSCR_NZCV
(regunits FPSCR_NZCV, FPSCR~FPSCR_NZCV) hence they share a register unit
(FPSCR~FPSCR_NZCV) that represents the aliased parts of the registers.
This shared register unit was previously considered non-reserved,
however given that we uses of the reserved FPSCR potentially violate
some rules (like uses without defs) we should make FPSCR~FPSCR_NZCV
reserved too and stop tracking liveness for it.
This patch:
- Defines a register unit as reserved when: At least for one root
register, the root register and all its super registers are reserved.
- Adjust LiveIntervals::computeRegUnitRange() for new reserved
definition.
- Add MachineRegisterInfo::isReservedRegUnit() to have a canonical way
of testing.
- Stop computing LiveRanges for reserved register units in HMEditor even
with UpdateFlags enabled.
- Skip verification of uses of reserved reg units in the machine
verifier (this usually didn't happen because there would be no cached
liverange but there is no guarantee for that and I would run into this
case before the HMEditor tweak, so may as well fix the verifier too).
Note that this should only affect ARMs FPSCR/FPSCR_NZCV registers today;
aliased registers are rarely used, the only other cases are hexagons
P0-P3/P3_0 and C8/USR pairs which are not mixing reserved/non-reserved
registers in an alias.
Differential Revision: https://reviews.llvm.org/D37356
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@314565 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r314252 | gberry | 2017-09-26 14:40:46 -0700 (Tue, 26 Sep 2017) | 12 lines
[AArch64][Falkor] Fix bug in falkor prefetcher fix pass.
Summary:
In rare cases, loads that don't get prefetched that were marked as
strided loads could cause a crash if they occurred in a loop with other
colliding loads.
Reviewers: mcrosier
Subscribers: aemerson, rengolin, javed.absar, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38261
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@314555 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r314251 | gberry | 2017-09-26 14:40:41 -0700 (Tue, 26 Sep 2017) | 16 lines
[AArch64][Falkor] Fix correctness bug in falkor prefetcher fix pass and correct some opcode tag computations.
Summary:
This addresses a correctness bug for LD[1234]*_POST opcodes that have
the prefetcher fix applied to them: the base register was not being
written back from the temp after being incremented, so it would appear
to never be incremented.
Also, fix some opcode tag computations based on some updated HW details
to get better tag avoidance and thus better prefetcher performance.
Reviewers: mcrosier
Subscribers: aemerson, rengolin, javed.absar, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38256
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@314554 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r311921 | joerg | 2017-08-28 22:20:47 +0200 (Mon, 28 Aug 2017) | 16 lines
Fix ARMv4 support
ARMv4 doesn't support the "BX" instruction, which has been introduced
with ARMv4t. Adjust the call lowering and tail call implementation
accordingly.
Further changes are necessary to ensure that presence of the v4t feature
is correctly set. Most importantly, the "generic" CPU for thumb-*
triples should include ARMv4t, since thumb mode without thumb support
would naturally be pointless.
Add a couple of asserts to ensure thumb instructions are not emitted
without CPU support.
Differential Revision: https://reviews.llvm.org/D37030
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@314417 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r314183 | dylanmckay | 2017-09-26 15:07:54 +1300 (Tue, 26 Sep 2017) | 3 lines
[AVR] Fix the build after setting alignment to 1 in r314179
Changing all types to be byte-aligned broke a small number of tests.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@314382 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r312905 | dylanmckay | 2017-09-11 22:32:51 +1200 (Mon, 11 Sep 2017) | 10 lines
[AVR] Enable the '__do_copy_data' function
Also enables '__do_clear_bss'.
These functions are automaticalled called by the CRT if they are
declared.
We need these to be called otherwise RAM will start completely
uninitialised, even though we need to copy RAM variables from progmem to
RAM.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@314356 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r312337 | nha | 2017-09-01 09:56:32 -0700 (Fri, 01 Sep 2017) | 12 lines
AMDGPU: IMPLICIT_DEFs and DBG_VALUEs do not contribute to wait states
Summary:
This fixes a bug that was exposed on gfx9 in various
GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests,
e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D36193
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@314327 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r312337 | nha | 2017-09-01 09:56:32 -0700 (Fri, 01 Sep 2017) | 12 lines
AMDGPU: IMPLICIT_DEFs and DBG_VALUEs do not contribute to wait states
Summary:
This fixes a bug that was exposed on gfx9 in various
GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests,
e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D36193
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@314324 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r311623 | hans | 2017-08-23 18:08:27 -0700 (Wed, 23 Aug 2017) | 11 lines
[DAG] Fix Node Replacement in PromoteIntBinOp
When one operand is a user of another in a promoted binary operation
we may replace and delete the returned value before returning
triggering an assertion. Reorder node replacements to prevent this.
Fixes PR34137.
Landing on behalf of Nirav.
Differential Revision: https://reviews.llvm.org/D36581
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@311670 91177308-0d34-0410-b5e6-96231b3b80d8
This caused PR34080, which seems to have been fixed by r310792, but that change
introduced severe performance regressions.
Reverting to unblock the 5.0.0 release while these issues are worked out on trunk.
Also reverting a few tests that were added later and depended on the new scheduling:
LLVM :: CodeGen/X86/f16c-schedule.ll
LLVM :: CodeGen/X86/lea32-schedule.ll
LLVM :: CodeGen/X86/lea64-schedule.ll
LLVM :: CodeGen/X86/popcnt-schedule.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@311600 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r311572 | ctopper | 2017-08-23 09:41:02 -0700 (Wed, 23 Aug 2017) | 9 lines
[AVX512] Don't create SHRUNKBLEND SDNodes for 512-bit vectors
There are no 512-bit blend instructions so we shouldn't create SHRUNKBLEND for them.
On a side note, it looks like there may be a missed opportunity for constant folding TESTM when LHS and RHS are equal.
This fixes PR34139.
Differential Revision: https://reviews.llvm.org/D36992
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@311593 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r311263 | ctopper | 2017-08-19 15:02:02 -0700 (Sat, 19 Aug 2017) | 1 line
[AVX512] Use alignedstore256 in a pattern that's emitting a 256-bit movaps from an extract subvector operation.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@311478 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r311429 | ctopper | 2017-08-21 22:40:17 -0700 (Mon, 21 Aug 2017) | 9 lines
[X86] Prevent several calls to ISD::isConstantSplatVector from returning a narrower APInt than the original scalar type
ISD::isConstantSplatVector can shrink to the smallest splat width. But we don't check the size of the resulting APInt at all. This can cause us to misinterpret the results.
This patch just adds a flag to prevent the APInt from changing width.
Fixes PR34271.
Differential Revision: https://reviews.llvm.org/D36996
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@311462 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r311061 | compnerd | 2017-08-16 19:42:24 -0700 (Wed, 16 Aug 2017) | 10 lines
ARM: mark CPSR as clobbered for Windows VLAs
When lowering a VLA, we emit a __chstk call. However, this call can
internally clobber CPSR. We did not mark this register as an ImpDef,
which could potentially allow a comparison to be hoisted above the call
to `__chkstk`. In such a case, the CPSR could be clobbered, and the
check invalidated. When the support was initially added, it seemed that
the call would take care of preventing CPSR from being clobbered, but
this is not the case. Mark the register as clobbered to fix a possible
state corruption.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@311461 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r311071 | eladcohen | 2017-08-17 01:06:36 -0700 (Thu, 17 Aug 2017) | 13 lines
[SelectionDAG] Teach the vector-types operand scalarizer about SETCC
When v1i1 is legal (e.g. AVX512) the legalizer can reach
a case where a v1i1 SETCC with an illgeal vector type operand
wasn't scalarized (since v1i1 is legal) but its operands does
have to be scalarized. This used to assert because SETCC was
missing from the vector operand scalarizer.
This patch attemps to teach the legalizer to handle these cases
by scalazring the operands, converting the node into a scalar
SETCC node.
Differential revision: https://reviews.llvm.org/D36651
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@311409 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r311258 | mstorsjo | 2017-08-19 12:47:48 -0700 (Sat, 19 Aug 2017) | 9 lines
[ARM] Check the right order for halves of VZIP/VUZP if both parts are used
This is the exact same fix as in SVN r247254. In that commit, the fix was
applied only for isVTRNMask and isVTRN_v_undef_Mask, but the same issue
is present for VZIP/VUZP as well.
This fixes PR33921.
Differential Revision: https://reviews.llvm.org/D36899
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@311369 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r310498 | guyblank | 2017-08-09 10:21:01 -0700 (Wed, 09 Aug 2017) | 9 lines
[X86][AVX512] Choose correct registers in vpbroadcastb/w
Fixes the vpbroadcastb/w instructions which use GPRs as source operands, to use the correct registers.
The full GPR should be used, and not the subregister, as it happens before the patch.
Fixes pr33795
Differential Revision:
https://reviews.llvm.org/D36479
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@311110 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r310979 | qcolombet | 2017-08-15 17:17:05 -0700 (Tue, 15 Aug 2017) | 38 lines
[VirtRegRewriter] Properly model the register liveness on undef subreg definition
Undef subreg definition means that the content of the super register
doesn't matter at this point. While that's true for virtual registers,
this may not hold when replacing them with actual physical registers.
Indeed, some part of the physical register may be coalesced with the
related virtual register and thus, the values for those parts matter and
must be live.
The fix consists in checking whether or not subregs of the physical register
being assigned to an undef subreg definition are live through that def and
insert an implicit use if they are. Doing so, will keep them alive until
that point like they should be.
E.g., let vreg14 being assigned to R0_R1 then
%vreg14:gsub_0<def,read-undef> = COPY %R0 ; <-- R1 is still live here
%vreg14:gsub_1<def> = COPY %R1
Before this changes, the rewriter would change the code into:
%R0<def> = KILL %R0, %R0_R1<imp-def> ; <-- this tells R1 is redefined
%R1<def> = KILL %R1, %R0_R1<imp-def>, %R0_R1<imp-use> ; this value of this R1
; is believed to come
; from the previous
; instruction
Because of this invalid liveness, later pass could make wrong choices and in
particular clobber live register as it happened with the register scavenger in
llvm.org/PR34107
Now we would generate:
%R0<def> = KILL %R0, %R0_R1<imp-def>, %R0_R1<imp-use> ; This tells R1 needs to
; reach this point
%R1<def> = KILL %R1, %R0_R1<imp-def>, %R0_R1<imp-use>
The bug has been here forever, it got exposed recently because the register
scavenger got smarter.
Fixes llvm.org/PR34107
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@311108 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r310784 | ctopper | 2017-08-12 13:19:44 -0700 (Sat, 12 Aug 2017) | 16 lines
[X86] When handling addcarry intrinsic, create the flag result with the correct type so we don't crash if we use a memory instruction
Summary:
Previously we were creating the flag result with MVT::Other which is interpretted as a Chain node. If we used a memory form of the instruction we would end up with a copyToReg that consumed the chain result of the adcx instruction instead of the flag result.
Pretty sure we should be using MVT::i32 here, that's what we do other places we create these node types.
We should probably consider this for 5.0 as well.
Reviewers: RKSimon, zvi, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36645
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@310899 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r310552 | eladcohen | 2017-08-10 00:44:23 -0700 (Thu, 10 Aug 2017) | 19 lines
[SelectionDAG] When scalarizing vselect, don't assert on
a legal cond operand.
When scalarizing the result of a vselect, the legalizer currently expects
to already have scalarized the operands. While this is true for the true/false
operands (which have the same type as the result), it is not case for the
condition operand. On X86 AVX512, v1i1 is legal - this leads to operations such
as '< N x type> vselect < N x i1> < N x type> < N x type>' where < N x type > is
illegal to hit an assertion during the scalarization.
The handling is similar to r205625.
This also exposes the fact that (v1i1 extract_subvector) should be legal
and selectable on AVX512 - We do this by custom lowering to vector_extract_elt.
This still leaves us in some cases with redundant dag nodes which will be
combined in a separate soon to come patch.
This fixes pr33349.
Differential revision: https://reviews.llvm.org/D36511
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@310635 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r310534 | matze | 2017-08-09 15:22:05 -0700 (Wed, 09 Aug 2017) | 20 lines
ARM: Fix CMP_SWAP expansion
Clean up after my misguided attempt in r304267 to "fix" CMP_SWAP
returning an uninitialized status value.
- I was always using tMOVi8 to zero the status register which cannot
encode higher register numbers and llvm would silently miscompile)
- Nobody was ever looking at that status value outside the expansion.
ARMDAGToDAGISel::SelectCMP_SWAP() the only place creating CMP_SWAP
instructions was not mapping anything to it. (The cmpxchg status value
from llvm IR is lowered to a manual comparison after the CMP_SWAP)
So this:
- Renames the register from "status" to "temp" it make it obvious that
it isn't used outside the expansion.
- Remove the zeroing status/temp register.
- Keep the live-in list improvements from r304267
Fixes http://llvm.org/PR34056
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@310628 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r310190 | ctopper | 2017-08-05 16:34:44 -0700 (Sat, 05 Aug 2017) | 18 lines
[X86] Enable isel to use the PAUSE instruction even when SSE2 is disabled
Summary:
On older processors this instruction encoding is treated as a NOP.
MSVC doesn't disable intrinsics based on features the way clang/gcc does. Because the PAUSE instruction encoding doesn't crash older processors, some software out there uses these intrinsics without checking for SSE2.
This change also seems to also be consistent with gcc behavior.
Fixes PR34079
Reviewers: RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36361
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@310293 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r309930 | sdardis | 2017-08-03 02:38:46 -0700 (Thu, 03 Aug 2017) | 19 lines
[SelectionDAG] Resolve PR33978.
rL306209 taught SelectionDAG how to add the dereferenceable flag when
expanding memcpy and memmove. The fix however contained a nit where
the offset + size was constructed as an APInt of PointerSize rather
than PointerSizeInBits.
This lead to isDereferenceableAndAlignedPointer() get truncated values or
values which would be sign extended within that function leading to
incorrect results.
Thanks to Alex Crichton for reporting the issue!
This resolves PR33978.
Reviewers: inouehrs
Differential Revision: https://reviews.llvm.org/D36236
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@309956 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r309744 | mstorsjo | 2017-08-01 14:13:54 -0700 (Tue, 01 Aug 2017) | 29 lines
[AArch64] Rewrite stack frame handling for win64 vararg functions
The previous attempt, which made do with a single offset in
computeCalleeSaveRegisterPairs, wasn't quite enough. The previous
attempt only worked as long as CombineSPBump == true (since the
offset would be adjusted later in fixupCalleeSaveRestoreStackOffset).
Instead include the size for the fixed stack area used for win64
varargs in calculations in emitPrologue/emitEpilogue. The stack
consists of mainly three parts;
- AFI->getLocalStackSize()
- AFI->getCalleeSavedStackSize()
- FixedObject
Most of the places in the code which previously used the CSStackSize
now use PrologueSaveSize instead, which is the sum of the latter
two, while some cases which need exactly the middle one use
AFI->getCalleeSavedStackSize() explicitly instead of a local variable.
In addition to moving the offsetting into emitPrologue/emitEpilogue
(which fixes functions with CombineSPBump == false), also set the
frame pointer to point to the right location, where the frame pointer
and link register actually are stored. In addition to the prologue/epilogue,
this also requires changes to resolveFrameIndexReference.
Add tests for a function that keeps a frame pointer and another one
that uses a VLA.
Differential Revision: https://reviews.llvm.org/D35919
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@309843 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r309561 | sdardis | 2017-07-31 07:06:58 -0700 (Mon, 31 Jul 2017) | 14 lines
[SelectionDAG][mips] Fix PR33883
PR33883 shows that calls to intrinsic functions should not have their vector
arguments or returns subject to ABI changes required by the target.
This resolves PR33883.
Thanks to Alex Crichton for reporting the issue!
Reviewers: zoran.jovanovic, atanasyan
Differential Revision: https://reviews.llvm.org/D35765
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@309767 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r309495 | fhahn | 2017-07-29 13:35:28 -0700 (Sat, 29 Jul 2017) | 30 lines
[AArch64] Tie source and destination operands for AESMC/AESIMC.
Summary:
Most CPUs implementing AES fusion require instruction pairs of the form
AESE Vn, _
AESMC Vn, Vn
and
AESD Vn, _
AESIMC Vn, Vn
The constraint is added to AES(I)MC instructions which use the result of
an AES(E|D) instruction by using AES(I)MCTrr pseudo instructions, which
constraint source and destination registers to be the same.
A nice side effect of this change is that now all possible pairs are
scheduled back-to-back on the exynos-m1 for the misched-fusion-aes.ll
test case.
I had to update aes_load_store. The version I added initially was very
reduced and with the new constraint, AESE/AESMC could not be scheduled
back-to-back. I updated the test to be more realistic and still expose
the same scheduling problem as the initial test case.
Reviewers: t.p.northover, rengolin, evandro, kristof.beyls, silviu.baranga
Reviewed By: t.p.northover, evandro
Subscribers: aemerson, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D35299
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@309765 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r309323 | ab | 2017-07-27 14:27:25 -0700 (Thu, 27 Jul 2017) | 12 lines
[AArch64] Fix legality info passed to demanded bits for TBI opt.
The (seldom-used) TBI-aware optimization had a typo lying dormant since
it was first introduced, in r252573: when asking for demanded bits, it
told TLI that it was running after legalize, where the opposite was
true.
This is an important piece of information, that the demanded bits
analysis uses to make assumptions about the node. r301019 added such an
assumption, which was broken by the TBI combine.
Instead, pass the correct flags to TLO.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@309586 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r309343 | rnk | 2017-07-27 17:58:35 -0700 (Thu, 27 Jul 2017) | 16 lines
[X86] Fix latent bug in sibcall eligibility logic
The X86 tail call eligibility logic was correct when it was written, but
the addition of inalloca and argument copy elision broke its
assumptions. It was assuming that fixed stack objects were immutable.
Currently, we aim to emit a tail call if no arguments have to be
re-arranged in memory. This code would trace the outgoing argument
values back to check if they are loads from an incoming stack object.
If the stack argument is immutable, then we won't need to store it back
to the stack when we tail call.
Fortunately, stack objects track their mutability, so we can just make
the obvious check to fix the bug.
This was http://crbug.com/749826
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@309577 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r309422 | rnk | 2017-07-28 12:48:40 -0700 (Fri, 28 Jul 2017) | 25 lines
Fix conditional tail call branch folding when both edges are the same
The conditional tail call logic did the wrong thing when both
destinations of a conditional branch were the same:
BB#1: derived from LLVM BB %entry
Live Ins: %EFLAGS
Predecessors according to CFG: BB#0
JE_1 <BB#5>, %EFLAGS<imp-use,kill>
JMP_1 <BB#5>
BB#5: derived from LLVM BB %sw.epilog
Predecessors according to CFG: BB#1
TCRETURNdi64 <ga:@mergeable_conditional_tailcall>, 0, ...
We would fold the JE_1 to a TCRETURNdi64cc, and then remove our BB#5
successor. Then BB#5 would be deleted as it had no predecessors, leaving
a dangling "JMP_1 <BB#5>" reference behind to cause assertions later.
This patch checks that both conditional branch destinations are
different before doing the transform. The standard branch folding logic
is able to remove both the JMP_1 and the JE_1, and for my test case we
end up forming a better conditional tail call later.
Fixes PR33980
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@309574 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r309302 | rksimon | 2017-07-27 11:15:54 -0700 (Thu, 27 Jul 2017) | 3 lines
[SelectionDAG] Improve DAGTypeLegalizer::convertMask assertion (PR33960)
Improve DAGTypeLegalizer::convertMask's isSETCCorConvertedSETCC assertion to properly check for any mixture of SETCC or BUILD_VECTOR of constants, or a logical mask op of them.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@309348 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r308808 | arsenm | 2017-07-21 16:56:13 -0700 (Fri, 21 Jul 2017) | 6 lines
RA: Remove assert on empty live intervals
This is possible if there is an undef use when
splitting the vreg during spilling.
Fixes bug 33620.
------------------------------------------------------------------------
------------------------------------------------------------------------
r308813 | arsenm | 2017-07-21 17:24:01 -0700 (Fri, 21 Jul 2017) | 6 lines
RA: Remove another assert on empty intervals
This case is similar to the one fixed in r308808,
except when rematerializing.
Fixes bug 33884.
------------------------------------------------------------------------
------------------------------------------------------------------------
r308906 | arsenm | 2017-07-24 11:07:55 -0700 (Mon, 24 Jul 2017) | 6 lines
RA: Replace asserts related to empty live intervals
These don't exactly assert the same thing anymore, and
allow empty live intervals with non-empty uses.
Removed in r308808 and r308813.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_50@309171 91177308-0d34-0410-b5e6-96231b3b80d8