in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:
1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.
2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.
3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.
The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185956 91177308-0d34-0410-b5e6-96231b3b80d8
When folding sub x, x (and other similar constructs), where x is a vector, the
result is a vector of zeros. After type legalization, make sure that the input
zero elements have a legal type. This type may be larger than the result's
vector element type.
This was another bug found by llvm-stress.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185949 91177308-0d34-0410-b5e6-96231b3b80d8
ReduceLoadWidth unconditionally drops extensions from loads. Limit it to the
case when all of the bits the extension would otherwise produce are dropped by
the shrink. It would be possible to shrink the load in more cases by merging
the extensions, but this isn't trivial and a very rare case. I left a TODO for
that case.
Fixes PR16551.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185755 91177308-0d34-0410-b5e6-96231b3b80d8
DAGCombiner was counting all uses of a load node when considering whether it's
worth combining into a zextload. Really, it wants to ignore the chain and just
count real uses.
rdar://problem/13896307
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185419 91177308-0d34-0410-b5e6-96231b3b80d8
When (srl (anyextend x), c) is folded into (anyextend (srl x, c)), the
high bits are not cleared. Add 'and' to clear off them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184575 91177308-0d34-0410-b5e6-96231b3b80d8
Change SelectionDAG::getXXXNode() interfaces as well as call sites of
these functions to pass in SDLoc instead of DebugLoc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182703 91177308-0d34-0410-b5e6-96231b3b80d8
Fold (xor (and x, y), y) -> (and (not x), y)
This removes an opportunity for a constant to appear twice.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181395 91177308-0d34-0410-b5e6-96231b3b80d8
Optimize CONCAT_VECTOR nodes that merge EXTRACT_SUBVECTOR values that extract from the same vector.
rdar://13402653
PR15866
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180871 91177308-0d34-0410-b5e6-96231b3b80d8
This already helps SSE2 x86 a lot because it lacks an efficient way to
represent a vector select. The long term goal is to enable the backend to match
a canonicalized pattern into a single instruction (e.g. vabs or pabs).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180597 91177308-0d34-0410-b5e6-96231b3b80d8
This exposed an issue with PowerPC AltiVec where it appears it was setting the wrong vector boolean contents. The included change
fixes the PowerPC tests, and was OK'd by Hal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180129 91177308-0d34-0410-b5e6-96231b3b80d8
This pattern occurs in SROA output due to the way vector arguments are lowered
on ARM.
The testcase from PR15525 now compiles into this, which is better than the code
we got with the old scalarrepl:
_Store:
ldr.w r9, [sp]
vmov d17, r3, r9
vmov d16, r1, r2
vst1.8 {d16, d17}, [r0]
bx lr
Differential Revision: http://llvm-reviews.chandlerc.com/D647
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179106 91177308-0d34-0410-b5e6-96231b3b80d8
This is helps on architectures where i8,i16 are not legal but we have byte, and
short loads/stores. Allowing us to merge copies like the one below on ARM.
copy(char *a, char *b, int n) {
do {
int t0 = a[0];
int t1 = a[1];
b[0] = t0;
b[1] = t1;
radar://13536387
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178546 91177308-0d34-0410-b5e6-96231b3b80d8
We would also like to merge sequences that involve a variable index like in the
example below.
int index = *idx++
int i0 = c[index+0];
int i1 = c[index+1];
b[0] = i0;
b[1] = i1;
By extending the parsing of the base pointer to handle dags that contain a
base, index, and offset we can handle examples like the one above.
The dag for the code above will look something like:
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i8 load %index))))
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i32 add (i32 signextend (i8 load %index))
(i32 1)))))
The code that parses the tree ignores the intermediate sign extensions. However,
if there is a sign extension it needs to be on all indexes.
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (add (i8 load %index)
(i8 1))))
vs
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i32 add (i32 signextend (i8 load %index))
(i32 1)))))
radar://13536387
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178483 91177308-0d34-0410-b5e6-96231b3b80d8
- Handle the case where the result of 'insert_subvect' is bitcasted
before 'extract_subvec'. This removes the redundant insertf128/extractf128
pair on unaligned 256-bit vector load/store on vectors of non 64-bit integer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177945 91177308-0d34-0410-b5e6-96231b3b80d8
For instance, following transformation will be disabled:
x + x + x => 3.0f * x;
The problem of these transformations is that it introduces a FP constant, which
following Instruction-Selection pass cannot handle.
Reviewed by Nadav, thanks a lot!
rdar://13445387
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177933 91177308-0d34-0410-b5e6-96231b3b80d8
LegalizeDAG.cpp uses the value of the comparison operands when checking
the legality of BR_CC, so DAGCombiner should do the same.
v2:
- Expand more BR_CC value types for NVPTX
v3:
- Expand correct BR_CC value types for Hexagon, Mips, and XCore.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176694 91177308-0d34-0410-b5e6-96231b3b80d8
A legal BUILD_VECTOR goes in and gets constant folded into another legal
BUILD_VECTOR so we don't lose any legality here. The problematic PPC
optimization that made this check necessary was fixed recently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175759 91177308-0d34-0410-b5e6-96231b3b80d8
(2xi32) (truncate ((2xi64) bitcast (buildvector i32 a, i32 x, i32 b, i32 y)))
can be folded into a (2xi32) (buildvector i32 a, i32 b).
Such a DAG would cause uneccessary vdup instructions followed by vmovn
instructions.
We generate this code on ARM NEON for a setcc olt, 2xf64, 2xf64. For example, in
the vectorized version of the code below.
double A[N];
double B[N];
void test_double_compare_to_double() {
int i;
for(i=0;i<N;i++)
A[i] = (double)(A[i] < B[i]);
}
radar://13191881
Fixes bug 15283.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175670 91177308-0d34-0410-b5e6-96231b3b80d8
DAGCombiner::ReduceLoadWidth was converting (trunc i32 (shl i64 v, 32))
into (shl i32 v, 32) into undef. To prevent this, check the shift count
against the final result size.
Patch by: Kevin Schoedel
Reviewed by: Nadav Rotem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174972 91177308-0d34-0410-b5e6-96231b3b80d8
Sorry for the lack of a test case. I tried writing one for i386 as i know selects are illegal on this target, but they are actually considered legal by isel and expanded later.
I can't see any targets to trigger this, but checking for the legality of a node before forming it is general goodness.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174934 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, even when a pre-increment load or store was generated,
we often needed to keep a copy of the original base register for use
with other offsets. If all of these offsets are constants (including
the offset which was combined into the addressing mode), then this is
clearly unnecessary. This change adjusts these other offsets to use the
new incremented address.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174746 91177308-0d34-0410-b5e6-96231b3b80d8
base point of a load, and the overall alignment of the load. This caused infinite loops in DAG combine with the
original application of this patch.
ORIGINAL COMMIT LOG:
When the target-independent DAGCombiner inferred a higher alignment for a load,
it would replace the load with one with the higher alignment. However, it did
not place the new load in the worklist, which prevented later DAG combines in
the same phase (for example, target-specific combines) from ever seeing it.
This patch corrects that oversight, and updates some tests whose output changed
due to slightly different DAGCombine outputs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174431 91177308-0d34-0410-b5e6-96231b3b80d8
It caused hangups in compiling clang/lib/Parse/ParseDecl.cpp and clang/lib/Driver/Tools.cpp in stage2 on some hosts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174374 91177308-0d34-0410-b5e6-96231b3b80d8
it would replace the load with one with the higher alignment. However, it did
not place the new load in the worklist, which prevented later DAG combines in
the same phase (for example, target-specific combines) from ever seeing it.
This patch corrects that oversight, and updates some tests whose output changed
due to slightly different DAGCombine outputs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174343 91177308-0d34-0410-b5e6-96231b3b80d8
The optimization handles esoteric cases but adds a lot of complexity both to the X86 backend and to other backends.
This optimization disables an important canonicalization of chains of SEXT nodes and makes SEXT and ZEXT asymmetrical.
Disabling the canonicalization of consecutive SEXT nodes into a single node disables other DAG optimizations that assume
that there is only one SEXT node. The AVX mask optimizations is one example. Additionally this optimization does not update the cost model.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172968 91177308-0d34-0410-b5e6-96231b3b80d8