This reapplies commit r337489 reverted by r337541
Additionally, this commit contains a speculative fix to the issue reported in r337541
(the report does not contain an actionable reproducer, just a stack trace)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337606 91177308-0d34-0410-b5e6-96231b3b80d8
Ideally our ISD node types going into the isel table would have types consistent with their instruction domain. This prevents us having to duplicate patterns with different types for the same instruction.
Unfortunately, it seems our shuffle combining is currently relying on this a little remove some bitcasts. This seems to enable some switching between shufps and shufd. Hopefully there's some way we can address this in the combining.
Differential Revision: https://reviews.llvm.org/D49280
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337590 91177308-0d34-0410-b5e6-96231b3b80d8
We can safely use getConstant here as we're still lowering, which allows constant folding to kick in and simplify the vector shift codegen.
Noticed while working on D49562.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337578 91177308-0d34-0410-b5e6-96231b3b80d8
Improve AVX1 256-bit vector HADD/HSUB matching by using SplitOpsAndApply to split into 128-bit instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337568 91177308-0d34-0410-b5e6-96231b3b80d8
When merging through a TokenFactor we need to check that the
load may be ordered such that no other aliasing memory operations may
happen. It is not sufficient to just check that the load is a member
of the chain token factor as it there may be a indirect chain. Require
the load's chain has only one use.
This fixes PR37826.
Reviewers: spatel, davide, efriedma, craig.topper, RKSimon
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D49388
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337560 91177308-0d34-0410-b5e6-96231b3b80d8
As a consequence of recent discussions
(http://lists.llvm.org/pipermail/llvm-dev/2018-May/123164.html), this patch
changes the SystemZ SchedModels so that the IssueWidth is 6, which is the
decoder capacity, and NumMicroOps become the number of decoder slots needed
per instruction.
In addition, the SchedWrite latencies now match the MachineInstructions
def-operand indexes, and ReadAdvances have been added on instructions with
one register operand and one memory operand.
Review: Ulrich Weigand
https://reviews.llvm.org/D47008
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337538 91177308-0d34-0410-b5e6-96231b3b80d8
We have a number of cases where we fail to reduce vector op widths, performing the op in a larger vector and then extracting a subvector. This is often because by default it would create illegal types.
This peephole patch attempts to handle a few common cases detailed in PR36761, which typically involved extension+conversion to vX2f64 types.
Differential Revision: https://reviews.llvm.org/D49556
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337500 91177308-0d34-0410-b5e6-96231b3b80d8
This is mostly a preparation work for adding a limited support for
select instructions. It proved to be difficult to do due to size and
irregularity of Vectorizer::isConsecutiveAccess, this is fixed here I
believe.
It also turned out that these changes make it simpler to finish one of
the TODOs and fix a number of other small issues, namely:
1. Looking through bitcasts to a type of a different size (requires
careful tracking of the original load/store size and some math
converting sizes in bytes to expected differences in indices of GEPs).
2. Reusing partial analysis of pointers done by first attempt in proving
them consecutive instead of starting from scratch. This added limited
support for nested GEPs co-existing with difficult sext/zext
instructions. This also required a careful handling of negative
differences between constant parts of offsets.
3. Handing a case where the first pointer index is not an add, but
something else (a function parameter for instance).
I observe an increased number of successful vectorizations on a large
set of shader programs. Only few shaders are affected, but those that
are affected sport >5% less loads and stores than before the patch.
Reviewed By: rampitec
Differential-Revision: https://reviews.llvm.org/D49342
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337489 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes the latency/throughput of LEA instructions in the BtVer2
scheduling model.
On Jaguar, A 3-operands LEA has a latency of 2cy, and a reciprocal throughput of
1. That is because it uses one cycle of SAGU followed by 1cy of ALU1. An LEA
with a "Scale" operand is also slow, and it has the same latency profile as the
3-operands LEA. An LEA16r has a latency of 3cy, and a throughput of 0.5 (i.e.
RThrouhgput of 2.0).
This patch adds a new TIIPredicate named IsThreeOperandsLEAFn to X86Schedule.td.
The tablegen backend (for instruction-info) expands that definition into this
(file X86GenInstrInfo.inc):
```
static bool isThreeOperandsLEA(const MachineInstr &MI) {
return (
(
MI.getOpcode() == X86::LEA32r
|| MI.getOpcode() == X86::LEA64r
|| MI.getOpcode() == X86::LEA64_32r
|| MI.getOpcode() == X86::LEA16r
)
&& MI.getOperand(1).isReg()
&& MI.getOperand(1).getReg() != 0
&& MI.getOperand(3).isReg()
&& MI.getOperand(3).getReg() != 0
&& (
(
MI.getOperand(4).isImm()
&& MI.getOperand(4).getImm() != 0
)
|| (MI.getOperand(4).isGlobal())
)
);
}
```
A similar method is generated in the X86_MC namespace, and included into
X86MCTargetDesc.cpp (the declaration lives in X86MCTargetDesc.h).
Back to the BtVer2 scheduling model:
A new scheduling predicate named JSlowLEAPredicate now checks if either the
instruction is a three-operands LEA, or it is an LEA with a Scale value
different than 1.
A variant scheduling class uses that new predicate to correctly select the
appropriate latency profile.
Differential Revision: https://reviews.llvm.org/D49436
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337469 91177308-0d34-0410-b5e6-96231b3b80d8
We were emitting incorrect calls to libm functions that LLVM had decided it
knew about because the default is soft-float.
Recommitted without breaking ELF this time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337450 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The use of exception handling instructions should only be enabled with
`-mattr=+exception-handling` option.
Reviewers: jgravelle-google
Subscribers: dschuff, sbc100, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D49391
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337425 91177308-0d34-0410-b5e6-96231b3b80d8
As discussed on PR38197, this canonicalizes MOVS*(N0, OP(N0, N1)) --> MOVS*(N0, SCALAR_TO_VECTOR(OP(N0[0], N1[0])))
This returns the scalar-fp codegen lost by rL336971.
Additionally it handles the OP(N1, N0)) case for commutable (FADD/FMUL) ops.
Differential Revision: https://reviews.llvm.org/D49474
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337419 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If unfolding an SUnit results in both load or the operation using it which
already exist in the DAG, abort the unfold if they are already scheduled.
If not, make sure we don't add duplicate dependencies.
This fixes PR37916.
Reviewers: davide, eli.friedman, fhahn, bogner
Subscribers: MatzeB, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48666
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337409 91177308-0d34-0410-b5e6-96231b3b80d8
We were emitting incorrect calls to libm functions that LLVM had decided it
knew about because the default is soft-float.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337385 91177308-0d34-0410-b5e6-96231b3b80d8
While working on PR38197, I noticed that we don't make use of FADD/FMUL being able to commute the inputs to support the addps+movss -> addss style combine
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337375 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This is how it appears to be handled in GCC and it prevents a
"Unknown mismatch" error in the SelectionDAGBuilder.
Reviewers: venkatra, jyknight, jrtc27
Reviewed By: jyknight, jrtc27
Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D49218
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337370 91177308-0d34-0410-b5e6-96231b3b80d8
This required an annoying amount of tablegen multiclass changes to make only VUNPCKHPDZ128rr commutable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337357 91177308-0d34-0410-b5e6-96231b3b80d8
I'm trying to restrict the MOVLHPS/MOVHLPS ISD nodes to SSE1 only. With SSE2 we can use unpcks. I believe this will allow some patterns to be cleaned up to require fewer bitcasts.
I've put in an odd isel hack to still select MOVHLPS instruction from the unpckh node to avoid changing tests and because movhlps is a shorter encoding. Ideally we'd do execution domain switching on this, but the operands are in the wrong order and are tied. We might be able to try a commute in the domain switching using custom code.
We already support domain switching for UNPCKLPD and MOVLHPS.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337348 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1,
e500v2, and several e200 cores. This adds support targeting the e500v2,
as this is more common than the e500v1, and is in SoCs still on the
market.
This patch is very intrusive because the SPE is binary incompatible with
the traditional FPU. After discussing with others, the cleanest
solution was to make both SPE and FPU features on top of a base PowerPC
subset, so all FPU instructions are now wrapped with HasFPU predicates.
Supported by this are:
* Code generation following the SPE ABI at the LLVM IR level (calling
conventions)
* Single- and Double-precision math at the level supported by the APU.
Still to do:
* Vector operations
* SPE intrinsics
As this changes the Callee-saved register list order, one test, which
tests the precise generated code, was updated to account for the new
register order.
Reviewed by: nemanjai
Differential Revision: https://reviews.llvm.org/D44830
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337347 91177308-0d34-0410-b5e6-96231b3b80d8
The presence of these symbols in the symbol table can cause symbol type
mismatch errors (or undefined symbol errors on emulated TLS targets)
and they can't be ICF'd anyway.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337338 91177308-0d34-0410-b5e6-96231b3b80d8