Commit Graph

1907 Commits

Author SHA1 Message Date
QingShan Zhang
38051ae89a [PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's multiple for i64 pre-inc load/store
For the below case, pre-inc prep think it's a good candidate to use pre-inc for the bucket, but 64bit integer load/store update (pre-inc) instruction on Power requires the displacement field should be DS-form (4's multiple). Since it can't satisfy the constraint, we have to do some fix ups later. As below, the original load/stores could be well-form, it makes things worse.

unsigned long long result = 0;
unsigned long long foo(char *p, unsigned long long n) {
  for (unsigned long long i = 0; i < n; i++) {
    unsigned long long x1 = *(unsigned long long *)(p - 50000 + i);
    unsigned long long x2 = *(unsigned long long *)(p - 61024 + i);
    unsigned long long x3 = *(unsigned long long *)(p - 62048 + i);
    unsigned long long x4 = *(unsigned long long *)(p - 64096 + i);
    result *= x1 * x2 * x3 * x4;
  }
  return result;
}

Patch by jedilyn(Kewen Lin).

Differential Revision: https://reviews.llvm.org/D48813 
--This line, and  those below, will be ignored--

M    lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
A    test/CodeGen/PowerPC/preincprep-i64-check.ll


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336074 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-02 05:46:09 +00:00
Sanjay Patel
3d6697fe50 [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc 
can produce -0.0 where the original code does not:

#include <stdio.h>
  
int main(int argc) {
  float x;
  x = -0.8 * argc;
  printf("%f\n", (float)((int)x));
  return 0;
}

$ clang -O0 -mavx fp.c ; ./a.out 
0.000000
$ clang -O1 -mavx fp.c ; ./a.out 
-0.000000

Ideally, we'd use IR/node flags to predicate the transform, but the IR parser 
doesn't currently allow fast-math-flags on the cast instructions. So for now, 
just use the function attribute that corresponds to clang's "-fno-signed-zeros" 
option.

Differential Revision: https://reviews.llvm.org/D48085


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335761 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-27 18:16:40 +00:00
Sanjay Patel
af1e6f153f [DAGCombiner] eliminate setcc bool math when input is low-bit of some value
This patch has the same motivating example as D48466:
define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) {
    %c.0282 = and i32 %c.0282.in, 268435455
    %a16 = lshr i64 32508, %x
    %a17 = and i64 %a16, 1
    %tobool = icmp eq i64 %a17, 0
    %. = select i1 %tobool, i32 1, i32 2
    %.286 = select i1 %tobool, i32 27, i32 26
    %shr97 = lshr i32 %c.0282, %.
    %shl98 = shl i32 %c.0282.in, %.286
    %or99 = or i32 %shr97, %shl98
    %shr100 = lshr i32 %d.0280, %.
    %shl101 = shl i32 %d.0280, %.286
    %or102 = or i32 %shr100, %shl101
    store i32 %or99, i32* %ptr0
    store i32 %or102, i32* %ptr1
    ret void
}

...but I'm trying to kill the setcc bool math sooner rather than later.

By matching a larger pattern that includes both the low-bit mask and the trailing add/sub, 
we can create a universally good fold because we always eliminate the condition code 
intermediate value.

Here are Alive proofs for these (currently instcombine folds the 'add' variants, but 
misses the 'sub' patterns):
https://rise4fun.com/Alive/Gsyp

Name: sub of zext cmp mask
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %z = zext i1 %c to i32
  %r = sub i32 C1, %z
  =>
  %optional_cast = zext i8 %a to i32
  %r = add i32 %optional_cast, C1-1

Name: add of zext cmp mask
  %a = and i32 %x, 1
  %c = icmp eq i32 %a, 0
  %z = zext i1 %c to i8
  %r = add i8 %z, C1
  =>
  %optional_cast = trunc i32 %a to i8
  %r = sub i8 C1+1, %optional_cast

All of the tests look like improvements or neutral to me. But it is possible that x86 
test+set+bitop is better than what we now show here. I suspect we could do better by 
adding another fold for the 'sub' variants.

We start with select-of-constant in IR in the larger motivating test, so that's why I 
included tests with selects. Proofs for those variants:
https://rise4fun.com/Alive/Bx1

Name: true const is bigger
Pre: C2 == (C1 + 1)
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %r = select i1 %c, i64 C2, i64 C1
  =>
  %z = zext i8 %a to i64
  %r = sub i64 C2, %z

Name: false const is bigger
Pre: C2 == (C1 + 1)
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %r = select i1 %c, i64 C1, i64 C2
  =>
  %z = zext i8 %a to i64
  %r = add i64 C1, %z

Differential Revision: https://reviews.llvm.org/D48466


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335433 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-24 14:37:30 +00:00
Sanjay Patel
31d9a1568a [PowerPC] add more tests for bit hacking opportunities with setcc; NFC
Missed cases where the input and output are the same size in rL335390.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335395 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 22:06:33 +00:00
Sanjay Patel
5fd79cfd40 [PowerPC] add tests for bit hacking opportunities with setcc; NFC
We likely gave up on folding some select-of-constants patterns in 
IR with rL331486, and we need to recover those in the DAG.

The tests without select are based on our current DAGCombiner 
optimizations for select-of-constants.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335390 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 21:16:29 +00:00
Mikael Holmen
b16b4ba59a [DebugInfo] Make sure all DBG_VALUEs' reguse operands have IsDebug property
Summary:
In some cases, these operands lacked the IsDebug property, which is meant to signal that
they should not affect codegen. This patch adds a check for this property in the
MachineVerifier and adds it where it was missing.

This includes refactorings to use MachineInstrBuilder construction functions instead of
manually setting up the intrinsic everywhere.

Patch by: JesperAntonsson

Reviewers: aprantl, rnk, echristo, javed.absar

Reviewed By: aprantl

Subscribers: qcolombet, sdardis, nemanjai, JDevlieghere, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D48319

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335214 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 10:03:34 +00:00
Alina Sbirlea
8d9ac7ff94 Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor

Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)

2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:

1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.

2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
  i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
  ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
  - Since Pred is not deleted (BB is), the entry block does not need updating.
  - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
  - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
  - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
  - adding some test owner as subscribers for the interesting tests modified:
    test/CodeGen/X86/avx-cmp.ll
    test/CodeGen/AMDGPU/nested-loop-conditions.ll
    test/CodeGen/AMDGPU/si-annotate-cf.ll
    test/CodeGen/X86/hoist-spill.ll
    test/CodeGen/X86/2006-11-17-IllegalMove.ll

3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.

Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar

Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D48202

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335183 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-20 22:01:04 +00:00
Stanislav Mekhanoshin
3e703da257 Allow binop C1, (select cc, CF, CT) -> select folding
Previously this folding was done only if select is a first operand.
However, for non-commutative operations constant may go before
select.

Differential Revision: https://reviews.llvm.org/D48223

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335167 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-20 20:24:20 +00:00
Strahinja Petrovic
96f8f2ad39 [PowerPC] Fix label address calculation for ppc32
This patch fixes calculating address of label on ppc32 (for -fPIC).

Differential Revision: https://reviews.llvm.org/D46582


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335043 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-19 13:07:40 +00:00
QingShan Zhang
21cf43199f If the arch is P9, we will select the DFLOADf32/DFLOADf64 pseudo instruction when we are loading a floating,
and expand it post RA basing on the register pressure. However, we miss to do the add-imm peephole for these pseudo instruction.

Differential Revision: https://reviews.llvm.org/D47568
Reviewed By: Nemanjai



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335024 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-19 06:54:51 +00:00
Stanislav Mekhanoshin
2a6f354a56 Tests for dag combine select (binop) -> select. NFC.
Tests will be updated with https://reviews.llvm.org/D48223

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334987 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 21:49:07 +00:00
Michael Berg
f4fa78a051 Utilize new SDNode flag functionality to expand current support for fma
Summary: This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions.

Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar, rampitec, nhaehnle, nemanjai

Reviewed By: rampitec, nhaehnle

Subscribers: tpr, nemanjai, wdng

Differential Revision: https://reviews.llvm.org/D47918

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334876 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-16 00:03:06 +00:00
Michael Berg
71bbcb1f4a propagate fast math flags via IR on fma and sub expressions
Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode.

Reviewers: spatel, arsenm, hfinkel, javed.absar

Reviewed By: spatel

Subscribers: nemanjai, wdng

Differential Revision: https://reviews.llvm.org/D47388

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334242 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-07 22:49:09 +00:00
Hiroshi Inoue
61db82a706 [PowerPC] avoid unprofitable Repl32 flag in BitPermutationSelector
BitPermutationSelector sets Repl32 flag for bit groups which can be (potentially) benefit from 32-bit rotate-and-mask instructions with bit replication, i.e. rlwinm/rlwimi copies lower 32 bits into upper 32 bits on 64-bit PowerPC before rotation.
However, enforcing 32-bit instruction sometimes results in redundant generated code.
For example, the following simple code is compiled into rotldi + rlwimi while it can be compiled into only rldimi instruction if Repl32 flag is not set on the bit group for (a & 0xFFFFFFFF).

uint64_t func(uint64_t a, uint64_t b) {
	return (a & 0xFFFFFFFF) | (b << 32) ;
}

To avoid such problem, this patch checks the potential benefit of Repl32 flag before setting it. If a bit group does not require rotation (i.e. RLAmt == 0) and won't be merged into another group, we do not benefit from Repl32 flag on this group.

Differential Revision: https://reviews.llvm.org/D47867



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334195 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-07 13:21:14 +00:00
Michael Berg
1de1655ed5 guard fsqrt with fmf sub flags
Summary:
This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
It contains only context for fsqrt.


Reviewers: spatel, hfinkel, arsenm

Reviewed By: spatel

Subscribers: hfinkel, wdng, andrew.w.kaylor, wristow, efriedma, nemanjai

Differential Revision: https://reviews.llvm.org/D47749

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334113 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-06 18:47:55 +00:00
Michael Berg
e23e5cc5b1 guard fneg with fmf sub flags
Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.

Reviewers: spatel, hfinkel

Reviewed By: spatel

Subscribers: nemanjai

Differential Revision: https://reviews.llvm.org/D47389

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334037 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-05 18:49:47 +00:00
Michael Berg
39e47efe75 NFC: adding baseline fneg case for fmf
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334035 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-05 18:12:25 +00:00
Hiroshi Inoue
8b13d9fafa [PowerPC] reduce rotate in BitPermutationSelector
BitPermutationSelector builds the output value by repeating rotate-and-mask instructions with input registers.
Here, we may avoid one rotate instruction if we start building from an input register that does not require rotation.

For example of the test case bitfieldinsert.ll, it first rotates left r4 by 8 bits and then inserts some bits from r5 without rotation.
This can be executed by one rlwimi instruction, which rotates r4 by 8 bits and inserts its bits into r5.

This patch adds a check for rotation amounts in the comparator used in sorting to process the input without rotation first.

Differential Revision: https://reviews.llvm.org/D47765



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334011 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-05 11:58:01 +00:00
Lei Huang
babbfe4278 [PowerPC] Fix the incorrect iterator inside peephole
Instruction selection can insert nodes into the underlying list after the root
node so iterating will thereby miss it. We should NOT assume that, the root node
is the last element in the DAG nodelist.

Patch by: steven.zhang (Qing Shan Zhang)

Differential Revision: https://reviews.llvm.org/D47437

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333415 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-29 13:38:56 +00:00
Lei Huang
1753453070 [Power9]Legalize and emit code for HW/Byte vector extract and convert to QP
Implemente patterns to extract HWord and Byte vector elements and convert to
quad-precision.

Differential Revision: https://reviews.llvm.org/D46774

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333377 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-28 16:43:29 +00:00
Lei Huang
b0bd404974 [PowerPC] Remove the match pattern in the definition of LXSDX/STXSDX
The match pattern in the definition of LXSDX is xoaddr, so the Pseudo
instruction XFLOADf64 never gets selected. XFLOADf64 expands to LXSDX/LFDX post
RA based on the register pressure. To avoid ambiguity, we need to remove the
select pattern for LXSDX, same as what was done for LXSD. STXSDX also have
the same issue.

Patch by Qing Shan Zhang (steven.zhang).

Differential Revision: https://reviews.llvm.org/D47178

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333150 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-24 03:20:28 +00:00
Lei Huang
879a0d8726 [Power9]Legalize and emit code for W vector extract and convert to QP
Implemente patterns to extract [Un]signed Word vector element and convert to
quad-precision.

Differential Revision: https://reviews.llvm.org/D46536

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333115 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-23 19:31:54 +00:00
Lei Huang
e99b1c16bb [Power9]Legalize and emit code for DW vector extract and convert to QP
Implemente patterns to extract [Un]signed DWord vector element and convert to
quad-precision.

Differential Revision: https://reviews.llvm.org/D46333

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333112 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-23 18:36:51 +00:00
Sanjay Patel
fa475c558d [PowerPC] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332549 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-16 22:48:48 +00:00
Sanjay Patel
f46201df37 [DAG] propagate FMF for all FPMathOperators
This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups. 
It gets us most of the FMF functionality that we want without adding any state bits to the flags. It 
also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch.

It should provide a superset of the functionality from D46563 - the extra tests show propagation and 
codegen diffs for fcmp, vecreduce, and FP libcalls.

The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last 
node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently, 
so that shouldn't make any difference.

Differential Revision: https://reviews.llvm.org/D46854


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332358 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-15 14:16:24 +00:00
Sanjay Patel
e6979f263b [PowerPC] add more tests for FMF propagation; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332295 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-14 21:17:49 +00:00
Shiva Chen
a8a13bc662 [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is

!DILabel(scope: !1, name: "foo", file: !2, line: 3)

We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is

llvm.dbg.label(metadata !1)

It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.

We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.

Differential Revision: https://reviews.llvm.org/D45024

Patch by Hsiangkai Wang.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331841 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 02:40:45 +00:00
Lei Huang
38dc71ca1e [Power9]Legalize and emit code for truncate and convert QP to HW and Byte
Legalize and emit code for truncate and convert float128 to (un)signed short
and (un)signed char.

Differential Revision: https://reviews.llvm.org/D46194

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331797 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-08 18:52:06 +00:00
Lei Huang
80ff171fdc [Power9]Legalize and emit code for truncate and convert Quad-Precision to Word
Legalize and emit code for:

  * xscvqpswz : VSX Scalar truncate & Convert Quad-Precision to Signed Word
  * xscvqpuwz : VSX Scalar truncate & Convert Quad-Precision to Unsigned Word

Differential Revision: https://reviews.llvm.org/D45635

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331790 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-08 18:34:00 +00:00
Lei Huang
147e51b204 [Power9]Legalize and emit code for truncate and convert QP to DW
Legalize and emit code for:

  * xscvqpsdz : VSX Scalar truncate & Convert Quad-Precision to Signed Dword
  * xscvqpudz : VSX Scalar truncate & Convert Quad-Precision to Unsigned Dword

Differential Revision: https://reviews.llvm.org/D45553

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331787 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-08 18:23:31 +00:00
Lei Huang
4ad2dcac81 [PowerPC] Unify handling for conversion of FP_TO_INT feeding a store
Existing DAG combine only handles conversions for FP_TO_SINT:
"{f32, f64} x { i32, i16 }"

This patch simplifies the code to handle:
"{ FP_TO_SINT, FP_TO_UINT } x { f64, f32 } x { i64, i32, i16, i8 }"

Differential Revision: https://reviews.llvm.org/D46102

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331778 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-08 17:36:40 +00:00
Michael Berg
eb53cf7275 Fast Math Flag mapping into SDNode
Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage.

Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar

Reviewed By: spatel

Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng

Differential Revision: https://reviews.llvm.org/D45710

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331547 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-04 18:48:20 +00:00
Sanjay Patel
da9cac629c [PowerPC] add more FMF debug output; NFC
We can't see all of the problems currently unless
we look at debug output when the global 'unsafe' is
on. It's a mess. This is another attempt to make
sure that D45710 is not making changes unintentionally.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331476 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-03 18:49:35 +00:00
Sanjay Patel
00c1a2374f [PowerPC] add tests for FMF propagation; NFC
I'm choosing PPC out of convenience because it does
all of the transforms of interest in these tests by
default. There are multiple FMF problems shown in the 
current checks. D45710 is proposing to fix part of 
that.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331471 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-03 17:41:37 +00:00
Nemanja Ivanovic
7577f35f34 [PowerPC] Implement isMaskAndCmp0FoldingBeneficial
Sinking the and closer to a compare against zero is beneficial on PPC as it
allows us to emit record-form instructions. In the future, we may expand this
to a larger set of operations that feed compares against zero since PPC has
lots of record-form instructions.

Differential revision: https://reviews.llvm.org/D46060


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331416 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-02 23:55:23 +00:00
Nemanja Ivanovic
47a8f72fea [PowerPC] No CTR loop if the candidate exiting block is in a different loop
The CTR loops pass will insert the decrementing branch instruction in an exiting
block for the loop being transformed. However if that block is part of another
loop as well (whether a nested loop or with irreducible CFG), it is not valid
to use that exiting block. In fact, if the loop hass irreducible CFG, we don't
bother analyzing it and we just bail on the transformation. In practice, this
doesn't lead to a noticeable reduction in the number of loops transformed by
this pass.

Fixes https://bugs.llvm.org/show_bug.cgi?id=37229

Differential Revision: https://reviews.llvm.org/D46162


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331410 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-02 22:56:04 +00:00
Francis Visoiu Mistrih
bea0f24e1c [MIR] Add support for debug metadata for fixed stack objects
Debug var, expr and loc were only supported for non-fixed stack objects.

This patch adds the following fields to the "fixedStack:" entries, and
renames the ones from "stack:" to:

* debug-info-variable
* debug-info-expression
* debug-info-location

Differential Revision: https://reviews.llvm.org/D46032

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330859 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-25 18:58:06 +00:00
Hiroshi Inoue
c7e248c329 [PowerPC] fix incorrect vectorization of abs() on POWER9
Vectorized loops with abs() returns incorrect results on POWER9. This patch fixes it.
For example the following code returns negative result if input values are negative though it sums up the absolute value of the inputs.

int vpx_satd_c(const int16_t *coeff, int length) {
  int satd = 0;
  for (int i = 0; i < length; ++i) satd += abs(coeff[i]);
  return satd;
}

This problem causes test failures for libvpx.
For vector absolute and vector absolute difference on POWER9, LLVM generates VABSDUW (Vector Absolute Difference Unsigned Word) instruction or variants.
Since these instructions are for unsigned integers, we need adjustment for signed integers.
For abs(sub(a, b)), we generate VABSDUW(a+0x80000000, b+0x80000000). Otherwise, abs(sub(-1, 0)) returns 0xFFFFFFFF(=-1) instead of 1. For abs(a), we generate VABSDUW(a+0x80000000, 0x80000000).

Differential Revision: https://reviews.llvm.org/D45522



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330497 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-21 09:32:17 +00:00
Sanjay Patel
f02c6abfc7 [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
This was originally committed at rL328921 and reverted at rL329920 to
investigate failures in Chrome. This time I've added to the ReleaseNotes
to warn users of the potential of exposing UB and let me repeat that
here for more exposure:

  Optimization of floating-point casts is improved. This may cause surprising
  results for code that is relying on undefined behavior. Code sanitizers can
  be used to detect affected patterns such as this:

    int main() {
      float x = 4294967296.0f;
      x = (float)((int)x);
      printf("junk in the ftrunc: %f\n", x);
      return 0;
    }

    $ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out
    ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of 
                   representable values of type 'int'
    junk in the ftrunc: 0.000000


Original commit message:

fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC,
so replace a pair of casts with the equivalent node. We don't have to account for
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330437 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-20 15:07:55 +00:00
Lei Huang
3c8a2d808d [NFC] test case clean up
1. remove redundant tests
2. update XForm_tests to generated expected code gen

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330290 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-18 20:22:26 +00:00
Lei Huang
bd37d91039 [Power9]Legalize and emit code for converting Unsigned HWord/Char to Quad-Precision
Legalize and emit code for converting unsigned HWord/Char to QP:

xscvsdqp
xscvudqp

Only covering patterns for unsigned forms cause we don't have part-word
sign-extending integer loads into VSX registers.

Differential Revision: https://reviews.llvm.org/D45494

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330278 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-18 17:41:46 +00:00
Lei Huang
ad6944ff33 [Power9]Legalize and emit code for converting (Un)Signed Word to Quad-Precision
Legalize and emit code for converting (Un)Signed Word to quad-precision via:

xscvsdqp
xscvudqp

Differential Revision: https://reviews.llvm.org/D45389

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330273 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-18 16:34:22 +00:00
Nemanja Ivanovic
dc9a6a7dba [PowerPC] Mark the BDNZ intrinsic as NoDuplicate
Duplicating this intrinsic is not generally valid because it has the side-effect
of decrementing the CTR. Any passes that duplicate it would need to be taught to
keep the regions formed completely disjoint.
This patch should be NFC for typical uses as CTRLoops runs after the remaining
loop passes. It only affects situations where the loop passes are scheduled on
the IR after the codegen passes (as is the case with some JIT pipelines).

Fixes https://bugs.llvm.org/show_bug.cgi?id=37050


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330186 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-17 13:07:01 +00:00
Sanjay Patel
8a7e3a43d6 [DAGCombiner, PowerPC] allow X - (fpext(-Y) --> X + fpext(Y) with multiple uses
This is a transform that I limited in instcombine in rL329821 because it was 
creating more instructions in IR when the cast has multiple uses.

But if the cast is free, then we can do the transform regardless of other
uses because it improves the potential throughput of the calculation by
removing a dependency on the fneg.

Differential Revision: https://reviews.llvm.org/D45598


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330098 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-15 16:43:48 +00:00
Sanjay Patel
a42d068725 [PowerPC] add fsub-fneg test; NFC
This is a test for a transform that was suggested in the post-commit
mailing list thread for rL329821. The target in question is not in
trunk, so PPC gets to stand in for it because it's the only in-tree
target that sets 'isFPExtFree()' to 'true'.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329963 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-12 22:14:23 +00:00
Lei Huang
d018c5d747 [Power9]Legalize and emit code for converting (Un)Signed DWord to Quad-Precision
Legalize and emit code for:

  * xscvsdqp
  * xscvudqp

Differential Revision: https://reviews.llvm.org/D45230

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329931 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-12 18:00:14 +00:00
Sanjay Patel
b7ab0ed219 revert r328921 - [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
This change is exposing UB in source code - as was warned/predicted. :)
See D44909 for discussion. Reverting while we figure out how to fix things.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329920 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-12 15:27:01 +00:00
Nemanja Ivanovic
4cd717c513 [PowerPC] Fix condition for 64-bit rotate when replacing r+r instr with r+i
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=37039
The condition only covers one of the two 64-bit rotate instructions. This just
adds the second (RLDICLo).

Patch by Josh Stone.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329852 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-11 21:25:44 +00:00
Sam Parker
3757f28f1d [DAGCombine] Improve ReduceLoad for SRL
Recommitting r329283, third time lucky...

If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.

Differential Revision: https://reviews.llvm.org/D41350


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329551 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-09 08:16:11 +00:00
Tim Northover
f6621a4e4b Reapply ARM: Do not spill CSR to stack on entry to noreturn functions
Should fix UBSan bot by also checking there's no "uwtable" attribute
before skipping. Otherwise the unwind table will be useless since its
moves expect CSRs to actually be preserved.

A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.

Should fix PR9970.

Patch mostly by myeisha (pmb).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329494 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-07 10:57:03 +00:00