This fixes a case of bad index calculation when merging mismatching
vector types. This changes the existing code to just use the existing
extract_{subvector|element} and a bitcast (instead of bitcast first and
then newly created extract_xxx) so we don't need to adjust any indices
in the first place.
rdar://44584718
Differential Revision: https://reviews.llvm.org/D52681
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343493 91177308-0d34-0410-b5e6-96231b3b80d8
Generate tests expectations using update_llc_test_checks and reduce
number of "check prefixes" used in the tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343485 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
While looking at PR35606, I found out that the scheduling info is incorrect.
One can check that it's really a P5+P6 and not a 2*P56 with:
echo -e 'vzeroall\nvandps %xmm1, %xmm2, %xmm3' | ./bin/llvm-exegesis -mode=uops -snippets-file=-
(vandps executes on P5 only)
Reviewers: craig.topper, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52541
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343447 91177308-0d34-0410-b5e6-96231b3b80d8
When MachineCopyPropagation eliminates a dead 'copy', its associated debug information becomes invalid. as the recorded register has been removed. It causes the debugger to display wrong variable value.
Differential Revision: https://reviews.llvm.org/D52614
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343445 91177308-0d34-0410-b5e6-96231b3b80d8
We can only copy between a k-register and a GR32/GR64 register.
This patch detects that the copy will be illegal and prevents the domain reassignment from happening for that closure.
This probably isn't the best fix, and we should probably figure out how to handle this correctly.
Fixes PR38803.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343443 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The lowering of PHI nodes used to detect if all inputs originated
from IMPLICIT_DEF's. If so the PHI node was replaced by an
IMPLICIT_DEF. Now we also consider undef uses when checking the
inputs. So if all inputs are implicitly defined or undef we
lower the PHI to an IMPLICIT_DEF. This makes
PHIElimination::LowerPHINode more consistent as it checks
both implicit and undef properties at later stages.
Reviewers: MatzeB, tstellar
Reviewed By: MatzeB
Subscribers: jvesely, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D52558
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343417 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When PR16508 was solved (in rL185363) a regression test was
added as test/CodeGen/PowerPC/2013-07-01-PHIElimBug.ll.
I discovered that the test case no longer reproduced the
scenario from PR16508. This problem could have been amended
by adding an extra RUN line with "-O1" (or possibly "-O0"),
but instead I added a mir-reproducer
test/CodeGen/PowerPC/2013-07-01-PHIElimBug.mir
to get a reproducer that is less sensitive to changes in
earlier passes (including O-level).
While being at it I also corrected a code comment in
PHIElimination::EliminatePHINodes that has been incorrect
since the related bugfix from rL185363.
Reviewers: MatzeB, hfinkel
Reviewed By: MatzeB
Subscribers: nemanjai, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D52553
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343416 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This function turns (X >> C1) & C2 into a BMI BEXTR or TBM BEXTRI instruction. For BMI BEXTR we have to materialize an immediate into a register to feed to the BEXTR instruction.
The BMI BEXTR instruction is 2 uops on Intel CPUs. It looks like on SKL its one port 0/6 uop and one port 1/5 uop. Despite what Agner's tables say. I know one of the uops is a regular shift uop so it would have to go through the port 0/6 shifter unit. So that's the same or worse execution wise than the shift+and which is one 0/6 uop and one 0/1/5/6 uop. The move immediate into register is an additional 0/1/5/6 uop.
For now I've limited this transform to AMD CPUs which have a single uop BEXTR. If may also might make sense if we can fold a load or if the and immediate is larger than 32-bits and can't be encoded as a sign extended 32-bit value or if LICM or CSE can hoist the move immediate and share it. But we'd need to look more carefully at that. In the regression I looked at it doesn't look load folding or large immediates were occurring so the regression isn't caused by the loss of those. So we could try to be smarter here if we find a compelling case.
Reviewers: RKSimon, spatel, lebedev.ri, andreadb
Reviewed By: RKSimon
Subscribers: llvm-commits, andreadb, RKSimon
Differential Revision: https://reviews.llvm.org/D52570
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343399 91177308-0d34-0410-b5e6-96231b3b80d8
By removing demanded target shuffles that simplify to zero/undef/identity before simplifying its inputs we improve chances of further simplification, as only the immediate parent user of the combined is added back to the work list - this still doesn't help us if its passed through other ops though (bitcasts....).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343390 91177308-0d34-0410-b5e6-96231b3b80d8
Exposed an issue that recursive calls to getTargetConstantBitsFromNode don't handle changes to EltSizeInBits yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343384 91177308-0d34-0410-b5e6-96231b3b80d8
The shift amount might have peeked through a extract_subvector, altering the number of vector elements in the 'Amt' variable - so we were incorrectly calculating the ratio when peeking through bitcasts, resulting in incorrectly detecting splats.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343373 91177308-0d34-0410-b5e6-96231b3b80d8
Correctly check for relocations in the constant to promote. And don't
allow promoting a constant multiple times.
This partially fixes https://bugs.llvm.org//show_bug.cgi?id=32780 ;
it's not a complete fix because we also need to prevent
ARMConstantIslands from cloning the constant.
(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)
Differential Revision: https://reviews.llvm.org/D51472
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343361 91177308-0d34-0410-b5e6-96231b3b80d8
The code in X86ISelDAGToDAG only looks through truncates if the sign flag isn't used, but that is overly restrictive. A future patch will improve this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343355 91177308-0d34-0410-b5e6-96231b3b80d8
- Add fix so that all code paths that create DWARFContext
with an ObjectFile initialise the target architecture in the context
- Add an assert that the Arch is known in the Dwarf CallFrameString method
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343317 91177308-0d34-0410-b5e6-96231b3b80d8
Lower integer arguments larger then 32 bits for MIPS32.
setMostSignificantFirst is used in order for G_UNMERGE_VALUES and
G_MERGE_VALUES to always hold registers in same order, regardless of
endianness.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D52409
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343315 91177308-0d34-0410-b5e6-96231b3b80d8
On GreenDragon `CodeGen/X86/cpus.ll` is timing out on the bot with Asan
and UBSan enabled. With the same configuration on my machine, the test
passes but takes more than 3 minutes to do so. I could increase the
timeout, but I believe it makes more sense to split up the test because
it allows for more parallelism.
Differential revision: https://reviews.llvm.org/D52603
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343313 91177308-0d34-0410-b5e6-96231b3b80d8
The NoMovt feature prevents the use of MOVW/MOVT
instructions on Cortex-M23 for performance reasons.
These instructions are required for execute only code
so NoMovt should be disabled when that option is enabled.
Differential Revision: https://reviews.llvm.org/D52551
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343302 91177308-0d34-0410-b5e6-96231b3b80d8
Had we emitted this IR earlier, InstCombine would have removed icmp so I'm going to assume using the i1 directly would be considered canonical.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343244 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: It is currently broken and for Sparc there is not much benefit
in using a builtin version compared to a library version. Both versions
needs to store the same four values in setjmp and flush the register
windows in longjmp. If the need for a builtin setjmp/longjmp arises there
is an improved implementation available at https://reviews.llvm.org/D50969.
Reviewers: jyknight, joerg, venkatra
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D51487
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343210 91177308-0d34-0410-b5e6-96231b3b80d8
It was the case when calling MO::dump(), but MI::dump() was still
depending on hasComplexRegisterTies().
The MIR output is not affected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343107 91177308-0d34-0410-b5e6-96231b3b80d8
This caused the DebugInfo/Sparc/gnu-window-save.ll test to fail.
> Functions that have signed return addresses need additional dwarf support:
> - After signing the LR, and before authenticating it, the LR register is in a
> state the is unusable by a debugger or unwinder
> - To account for this a new directive, .cfi_negate_ra_state, is added
> - This directive says the signed state of the LR register has now changed,
> i.e. unsigned -> signed or signed -> unsigned
> - This directive has the same CFA code as the SPARC directive GNU_window_save
> (0x2d), adding a macro to account for multiply defined codes
> - This patch matches the gcc implementation of this support:
> https://patchwork.ozlabs.org/patch/800271/
>
> Differential Revision: https://reviews.llvm.org/D50136
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343103 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds a check to optimize conditional branch (BC and BCn) based on a constant set by CRSET or CRUNSET.
Other optimizers, such as block placement, may generate such code and hence
I do this at the very end of the optimization in pre-emit peephole pass.
A conditional branch based on a constant is eliminated or converted into unconditional branch.
Also CRSET/CRUNSET is eliminated if the condition code register is not used
by instruction other than the branch to be optimized.
Differential Revision: https://reviews.llvm.org/D52345
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343100 91177308-0d34-0410-b5e6-96231b3b80d8
Similar to the existing ISD::SRL constant vector shifts from D49562, this patch adds ISD::SRA support with ISD::MULHS.
As we're dealing with signed values, we have to handle shift by zero and shift by one special cases, so XOP+AVX2/AVX512 splitting/extension is still a better solution - really we should still use ISD::MULHS if one of the special cases are used but for now I've just left a TODO and filtered by isKnownNeverZero.
Differential Revision: https://reviews.llvm.org/D52171
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343093 91177308-0d34-0410-b5e6-96231b3b80d8