Commit Graph

1943 Commits

Author SHA1 Message Date
Nemanja Ivanovic
4df6ca02ff [PowerPC] Enhance the selection(ISD::VSELECT) of vector type
To make ISD::VSELECT available(legal) so long as there are altivec instruction,
otherwise it's default behavior is expanding.
Use xxsel to match vselect if vsx is open, or use vsel.

In order to do not write many patterns in td file, promote (for vector it's
bitcast) all other type into v4i32 and only pattern match vselect of v4i32 into
vsel or xxsel.

Patch by wuzish
Differential revision: https://reviews.llvm.org/D49531


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339779 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-15 15:30:36 +00:00
Nemanja Ivanovic
7c82b970fb [PowerPC] Don't run BV DAG Combine before legalization if it assumes legal types
When trying to combine a DAG that builds a vector out of sign-extensions of
vector extracts, the code assumes legal input types. Due to that, we have to
disable this combine prior to legalization.
In some cases, the DAG will look slightly different after legalization so
account for that in the matching code.

This is a fix for https://bugs.llvm.org/show_bug.cgi?id=38087

Differential Revision: https://reviews.llvm.org/D49080


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339769 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-15 12:58:13 +00:00
Sanjay Patel
3d464de968 [SelectionDAG] try harder to convert funnel shift to rotate
Similar to rL337966 - if the DAGCombiner's rotate matching was 
working as expected, I don't think we'd see any test diffs here.

AArch only goes right, and PPC only goes left. 
x86 has both, so no diffs there.

Differential Revision: https://reviews.llvm.org/D50091


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339359 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-09 17:26:22 +00:00
Zaara Syeda
15de990f32 [PowerPC] Improve codegen for vector loads using scalar_to_vector
This patch aims to improve the codegen for vector loads involving the
scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used
for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X)
to utilize:

LXSD and LXSDX for i64 and f64
LXSIWAX for i32 (sign extension to i64)
LXSIWZX for i32 and f64

Committing on behalf of Amy Kwan.
Differential Revision: https://reviews.llvm.org/D48950

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339260 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-08 15:20:43 +00:00
Nemanja Ivanovic
121bd57dbe [PowerPC] Do not round values prior to converting to integer
Adding the FP_ROUND nodes when combining FP_TO_[SU]INT of elements
feeding a BUILD_VECTOR into an FP_TO_[SU]INT of the built vector
loses precision. This patch removes the code that adds these nodes
to true f64 operands. It also adds patterns required to ensure
the code is still vectorized rather than converting individual
elements and inserting into a vector.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38342

Differential Revision: https://reviews.llvm.org/D50121


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338658 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-02 00:03:22 +00:00
Sanjay Patel
8fe02fad97 [SelectionDAG] fix bug in translating funnel shift with non-power-of-2 type
The bug is visible in the constant-folded x86 tests. We can't use the
negated shift amount when the type is not power-of-2:
https://rise4fun.com/Alive/US1r

...so in that case, use the regular lowering that includes a select
to guard against a shift-by-bitwidth. This path is improved by only
calculating the modulo shift amount once now.

Also, improve the rotate (with power-of-2 size) lowering to use
a negate rather than subtract from bitwidth. This improves the
codegen whether we have a rotate instruction or not (although
we can still see that we're not matching to a legal rotate in
all cases).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338592 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-01 17:17:08 +00:00
Sanjay Patel
648b708c20 [DAGCombiner] transform sub-of-shifted-signbit to add
This is exchanging a sub-of-1 with add-of-minus-1:
https://rise4fun.com/Alive/plKAH

This is another step towards improving select-of-constants codegen (see D48970).

x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral.
I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but
I think canonicalizing to 'add' is more likely to produce further transforms because we have more
folds for 'add'.

Differential Revision: https://reviews.llvm.org/D49924


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338317 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-30 22:21:37 +00:00
Sanjay Patel
c9159350ae [AArch64, PowerPC, x86] add more signbit math tests; NFC
The tests with a constant sub operand were added with rL338143,
but the potential transform doesn't have that requirement, so
adding more tests with variable operands.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338150 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-27 18:31:21 +00:00
Sanjay Patel
fac43a3123 [AArch64, PowerPC, x86] add more signbit math tests; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338143 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-27 18:12:29 +00:00
Sanjay Patel
125191a9e0 [DAGCombiner] fold 'not' with signbit math
This is a follow-up suggested in D48970. 

Alive proofs:
https://rise4fun.com/Alive/sII

We can eliminate an instruction in the usual select-of-constants 
to bit hack transform by adjusting the add/sub with constant.
This is always a win. 

There are more transforms that are likely wins, but they may need 
target hooks in case some targets do not benefit. 

This is another step towards making up for canonicalizing to 
select-of-constants in rL331486.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338132 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-27 16:42:55 +00:00
Sanjay Patel
67980f703b [PowerPC] add more tests for signbit math; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338130 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-27 16:22:18 +00:00
Sanjay Patel
2b264247a9 [SelectionDAG] try to convert funnel shift directly to rotate if legal
If the DAGCombiner's rotate matching was working as expected, 
I don't think we'd see any test diffs here. 

This sidesteps the issue of custom lowering for rotates raised in PR38243:
https://bugs.llvm.org/show_bug.cgi?id=38243
...by only dealing with legal operations.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337966 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-25 21:38:30 +00:00
Sanjay Patel
d73322deb3 [AArch, PowerPC] add more tests for legal rotate ops; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337964 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-25 21:25:50 +00:00
Stefan Pintilie
f9fb677cc2 [Power9] Code Cleanup - Remove needsAggressiveScheduling()
As we already return true from needsAggressiveScheduling() for the most recent
hardware it would be cleaner to just return true for all PowerPC hardware.

Differential Revision: https://reviews.llvm.org/D48663

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337488 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-19 19:34:18 +00:00
Justin Hibbits
c486a43e86 Introduce codegen for the Signal Processing Engine
Summary:
The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1,
e500v2, and several e200 cores.  This adds support targeting the e500v2,
as this is more common than the e500v1, and is in SoCs still on the
market.

This patch is very intrusive because the SPE is binary incompatible with
the traditional FPU.  After discussing with others, the cleanest
solution was to make both SPE and FPU features on top of a base PowerPC
subset, so all FPU instructions are now wrapped with HasFPU predicates.

Supported by this are:
* Code generation following the SPE ABI at the LLVM IR level (calling
conventions)
* Single- and Double-precision math at the level supported by the APU.

Still to do:
* Vector operations
* SPE intrinsics

As this changes the Callee-saved register list order, one test, which
tests the precise generated code, was updated to account for the new
register order.

Reviewed by: nemanjai
Differential Revision: https://reviews.llvm.org/D44830

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337347 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-18 04:25:10 +00:00
Sanjay Patel
bb12f48c1c [Intrinsics] define funnel shift IR intrinsics + DAG builder support
As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html

We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions, 
and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation 
by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift" 
operation which may also be a single hardware instruction.

Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working 
on that (much larger patch), but then I concluded that we don't need it. At least as a first 
step, we have all of the backend support necessary to match these ops...because it was required. 
And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are 
likely all that we'll ever need.

There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes
(along with improving the oversized shift documentation). Again, I don't think that's strictly 
necessary (as the test results here prove). That can be an efficiency improvement as a small 
follow-up patch.

So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support. 

Differential Revision: https://reviews.llvm.org/D49242


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337221 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-16 22:59:31 +00:00
Sanjay Patel
08378e6ecf [DAGCombiner] extend(ifpositive(X)) -> shift-right (not X)
This is almost the same as an existing IR canonicalization in instcombine, 
so I'm assuming this is a good early generic DAG combine too.

The motivation comes from reduced bit-hacking for select-of-constants in IR 
after rL331486. We want to restore that functionality in the DAG as noted in
the commit comments for that change and the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html

The PPC and AArch tests show that those targets are already doing something 
similar. x86 will be neutral in the minimal case and generally better when 
this pattern is extended with other ops as shown in the signbit-shift.ll tests.

Note the asymmetry: we don't include the (extend (ifneg X)) transform because 
it already exists in SimplifySelectCC(), and that is verified in the later 
unchanged tests in the signbit-shift.ll files. Without the 'not' op, the 
general transform to use a shift is always a win because that's a single 
instruction.

Alive proofs:
https://rise4fun.com/Alive/ysli

Name: if pos, get -1
  %c = icmp sgt i16 %x, -1
  %r = sext i1 %c to i16
  =>
  %n = xor i16 %x, -1
  %r = ashr i16 %n, 15

Name: if pos, get 1
  %c = icmp sgt i16 %x, -1
  %r = zext i1 %c to i16
  =>
  %n = xor i16 %x, -1
  %r = lshr i16 %n, 15

Differential Revision: https://reviews.llvm.org/D48970



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337130 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-15 16:27:07 +00:00
Nemanja Ivanovic
674b14846a [PowerPC] Materialize more constants with CR-field set in late peephole
Revision r322373 fixed a bug in how we materialize constants when the CR-field
needs to be set.

However the fix is overly conservative. It will only do the transform if
AND-ing the input with the new constant produces the same new constant.
This is of course correct, but not necessarily required.

If there are no futher uses of the constant, the constant can be changed.
If there are no uses of the GPR result, the final result of the materialization
isn't important other than it needs to compare to zero correctly (lt, gt, eq).

Differential revision: https://reviews.llvm.org/D42109


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337008 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-13 15:21:03 +00:00
Stefan Pintilie
a120e0c37c [PowerPC] [NFC] Update __float128 tests
Add the two options -ppc-vsr-nums-as-vr and -ppc-asm-full-reg-names to
the __float128 tests. Then modify the tests as required.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336940 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-12 20:18:57 +00:00
Joel E. Denny
ce57a4ff8c [FileCheck] Add -allow-deprecated-dag-overlap to failing llvm tests
See https://reviews.llvm.org/D47106 for details.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D47171

This commit drops that patch's changes to:

  llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
  llvm/test/CodeGen/NVPTX/param-load-store.ll

For some reason, the dos line endings there prevent me from commiting
via the monorepo.  A follow-up commit (not via the monorepo) will
finish the patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336843 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-11 20:25:49 +00:00
Stefan Pintilie
9e4aa1f63d [Power9] Add remaining __flaot128 builtin support for FMA round to odd
Implement this as it is done on GCC:

__float128 a, b, c, d;
a = __builtin_fmaf128_round_to_odd (b, c, d);         // generates xsmaddqpo
a = __builtin_fmaf128_round_to_odd (b, c, -d);        // generates xsmsubqpo
a = - __builtin_fmaf128_round_to_odd (b, c, d);       // generates xsnmaddqpo
a = - __builtin_fmaf128_round_to_odd (b, c, -d);      // generates xsnmsubpqp

Differential Revision: https://reviews.llvm.org/D48218

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336754 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-11 01:42:22 +00:00
Stefan Pintilie
7ee7123814 [Power9] Add __float128 builtins for Rounding Operations
Added __float128 support for a number of rounding operations:

trunc
rint
nearbyint
round
floor
ceil

Differential Revision: https://reviews.llvm.org/D48415

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336601 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-09 20:38:40 +00:00
Stefan Pintilie
8e48754afe [Power9] [LLVM] Add __float128 support for trunc to double round to odd
Add support for this builtin:
double builtin_truncf128_round_to_odd(float128)

Differential Revision: https://reviews.llvm.org/D48483

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336595 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-09 20:09:22 +00:00
Stefan Pintilie
3ea14f7d95 [Power9] Add __float128 builtins for Round To Odd
GCC has builtins for these round to odd instructions:

__float128 __builtin_sqrtf128_round_to_odd (__float128)
__float128 __builtin_{add,sub,mul,div}f128_round_to_odd (__float128, __float128)
__float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128)

Differential Revision: https://reviews.llvm.org/D47550

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336578 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-09 18:50:06 +00:00
Stefan Pintilie
0b08bea28b [Power9] Add __float128 support for compare operations
Added handling for the select f128.

Differential Revision: https://reviews.llvm.org/D48294

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336548 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-09 13:36:14 +00:00
Stefan Pintilie
0b9754cd9a [Power9] Add __float128 library call for frem
Power 9 does not have a hardware instruction for frem but we can call fmodf128.

Differential Revision: https://reviews.llvm.org/D48552

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336406 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-06 02:47:02 +00:00
Lei Huang
37aa5135e5 [Power9] Add lib calls for float128 operations with no equivalent PPC instructions
Map the following instructions to the proper float128 lib calls:
  pow[i], exp[2], log[2|10], sin, cos, fmin, fmax

Differential Revision: https://reviews.llvm.org/D48544

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336361 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-05 15:21:37 +00:00
Sanjay Patel
b06fd49497 [AArch64, PowerPC, x86] add tests for signbit bit hacks; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336348 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-05 13:16:46 +00:00
Lei Huang
bd76e146be [Power9] Optimize codgen for conversions of int to float128
Optimize code sequences for integer conversion to fp128 when the integer is a result of:
  * float->int
  * float->long
  * double->int
  * double->long

Differential Revision: https://reviews.llvm.org/D48429

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336316 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-05 07:46:01 +00:00
Lei Huang
e5da5ca56a [Power9][NFC] add back-end tests for passing homogeneous fp128 aggregates by value
Tests to verify that we are passing fp128 via VSX registers as per ABI.
These are related to clang commit rL336308.

Differential Revision: https://reviews.llvm.org/D48310

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336314 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-05 06:51:38 +00:00
Lei Huang
d50e092736 [Power9] Add tests for passing float128 in VSX reg for non-homogenous aggregates
Add missing testcase for rL336310

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336313 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-05 06:29:28 +00:00
Lei Huang
1ae65deb24 [Power9]Legalize and emit code for quad-precision convert from single-precision
Legalize and emit code for quad-precision floating point operation conversion of
single-precision value to quad-precision.

Differential Revision: https://reviews.llvm.org/D47569

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336307 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-05 04:18:37 +00:00
Lei Huang
b42f206bec [Power9] Implement float128 parameter passing and return values
This patch enable parameter passing and return by value for float128 types.
Passing aggregate/union which contain float128 members will be submitted in
subsequent patches.

Differential Revision: https://reviews.llvm.org/D47552

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336306 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-05 04:10:15 +00:00
Lei Huang
4fcda06d85 [Power9]Legalize and emit code for round & convert quad-precision values
Legalize and emit code for round & convert float128 to double precision and
single precision.

Differential Revision: https://reviews.llvm.org/D46997

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336299 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-04 21:59:16 +00:00
Stefan Pintilie
1631efd1b8 [PowerPC] Replace the Post RA List Scheduler with the Machine Scheduler
We want to run the Machine Scheduler instead of the List Scheduler after RA.
  Checked with a performance run on a Power 9 machine with SPEC 2006 and while
  some benchmarks improved and others degraded the geomean was slightly improved
  with the Machine Scheduler.

  Differential Revision: https://reviews.llvm.org/D45265

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336295 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-04 18:54:25 +00:00
Gabor Buella
0aae914817 NFC - Various typo fixes in tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336268 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-04 13:28:39 +00:00
QingShan Zhang
38051ae89a [PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's multiple for i64 pre-inc load/store
For the below case, pre-inc prep think it's a good candidate to use pre-inc for the bucket, but 64bit integer load/store update (pre-inc) instruction on Power requires the displacement field should be DS-form (4's multiple). Since it can't satisfy the constraint, we have to do some fix ups later. As below, the original load/stores could be well-form, it makes things worse.

unsigned long long result = 0;
unsigned long long foo(char *p, unsigned long long n) {
  for (unsigned long long i = 0; i < n; i++) {
    unsigned long long x1 = *(unsigned long long *)(p - 50000 + i);
    unsigned long long x2 = *(unsigned long long *)(p - 61024 + i);
    unsigned long long x3 = *(unsigned long long *)(p - 62048 + i);
    unsigned long long x4 = *(unsigned long long *)(p - 64096 + i);
    result *= x1 * x2 * x3 * x4;
  }
  return result;
}

Patch by jedilyn(Kewen Lin).

Differential Revision: https://reviews.llvm.org/D48813 
--This line, and  those below, will be ignored--

M    lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
A    test/CodeGen/PowerPC/preincprep-i64-check.ll


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336074 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-02 05:46:09 +00:00
Sanjay Patel
3d6697fe50 [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc 
can produce -0.0 where the original code does not:

#include <stdio.h>
  
int main(int argc) {
  float x;
  x = -0.8 * argc;
  printf("%f\n", (float)((int)x));
  return 0;
}

$ clang -O0 -mavx fp.c ; ./a.out 
0.000000
$ clang -O1 -mavx fp.c ; ./a.out 
-0.000000

Ideally, we'd use IR/node flags to predicate the transform, but the IR parser 
doesn't currently allow fast-math-flags on the cast instructions. So for now, 
just use the function attribute that corresponds to clang's "-fno-signed-zeros" 
option.

Differential Revision: https://reviews.llvm.org/D48085


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335761 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-27 18:16:40 +00:00
Sanjay Patel
af1e6f153f [DAGCombiner] eliminate setcc bool math when input is low-bit of some value
This patch has the same motivating example as D48466:
define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) {
    %c.0282 = and i32 %c.0282.in, 268435455
    %a16 = lshr i64 32508, %x
    %a17 = and i64 %a16, 1
    %tobool = icmp eq i64 %a17, 0
    %. = select i1 %tobool, i32 1, i32 2
    %.286 = select i1 %tobool, i32 27, i32 26
    %shr97 = lshr i32 %c.0282, %.
    %shl98 = shl i32 %c.0282.in, %.286
    %or99 = or i32 %shr97, %shl98
    %shr100 = lshr i32 %d.0280, %.
    %shl101 = shl i32 %d.0280, %.286
    %or102 = or i32 %shr100, %shl101
    store i32 %or99, i32* %ptr0
    store i32 %or102, i32* %ptr1
    ret void
}

...but I'm trying to kill the setcc bool math sooner rather than later.

By matching a larger pattern that includes both the low-bit mask and the trailing add/sub, 
we can create a universally good fold because we always eliminate the condition code 
intermediate value.

Here are Alive proofs for these (currently instcombine folds the 'add' variants, but 
misses the 'sub' patterns):
https://rise4fun.com/Alive/Gsyp

Name: sub of zext cmp mask
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %z = zext i1 %c to i32
  %r = sub i32 C1, %z
  =>
  %optional_cast = zext i8 %a to i32
  %r = add i32 %optional_cast, C1-1

Name: add of zext cmp mask
  %a = and i32 %x, 1
  %c = icmp eq i32 %a, 0
  %z = zext i1 %c to i8
  %r = add i8 %z, C1
  =>
  %optional_cast = trunc i32 %a to i8
  %r = sub i8 C1+1, %optional_cast

All of the tests look like improvements or neutral to me. But it is possible that x86 
test+set+bitop is better than what we now show here. I suspect we could do better by 
adding another fold for the 'sub' variants.

We start with select-of-constant in IR in the larger motivating test, so that's why I 
included tests with selects. Proofs for those variants:
https://rise4fun.com/Alive/Bx1

Name: true const is bigger
Pre: C2 == (C1 + 1)
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %r = select i1 %c, i64 C2, i64 C1
  =>
  %z = zext i8 %a to i64
  %r = sub i64 C2, %z

Name: false const is bigger
Pre: C2 == (C1 + 1)
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %r = select i1 %c, i64 C1, i64 C2
  =>
  %z = zext i8 %a to i64
  %r = add i64 C1, %z

Differential Revision: https://reviews.llvm.org/D48466


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335433 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-24 14:37:30 +00:00
Sanjay Patel
31d9a1568a [PowerPC] add more tests for bit hacking opportunities with setcc; NFC
Missed cases where the input and output are the same size in rL335390.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335395 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 22:06:33 +00:00
Sanjay Patel
5fd79cfd40 [PowerPC] add tests for bit hacking opportunities with setcc; NFC
We likely gave up on folding some select-of-constants patterns in 
IR with rL331486, and we need to recover those in the DAG.

The tests without select are based on our current DAGCombiner 
optimizations for select-of-constants.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335390 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 21:16:29 +00:00
Mikael Holmen
b16b4ba59a [DebugInfo] Make sure all DBG_VALUEs' reguse operands have IsDebug property
Summary:
In some cases, these operands lacked the IsDebug property, which is meant to signal that
they should not affect codegen. This patch adds a check for this property in the
MachineVerifier and adds it where it was missing.

This includes refactorings to use MachineInstrBuilder construction functions instead of
manually setting up the intrinsic everywhere.

Patch by: JesperAntonsson

Reviewers: aprantl, rnk, echristo, javed.absar

Reviewed By: aprantl

Subscribers: qcolombet, sdardis, nemanjai, JDevlieghere, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D48319

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335214 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 10:03:34 +00:00
Alina Sbirlea
8d9ac7ff94 Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor

Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)

2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:

1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.

2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
  i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
  ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
  - Since Pred is not deleted (BB is), the entry block does not need updating.
  - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
  - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
  - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
  - adding some test owner as subscribers for the interesting tests modified:
    test/CodeGen/X86/avx-cmp.ll
    test/CodeGen/AMDGPU/nested-loop-conditions.ll
    test/CodeGen/AMDGPU/si-annotate-cf.ll
    test/CodeGen/X86/hoist-spill.ll
    test/CodeGen/X86/2006-11-17-IllegalMove.ll

3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.

Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar

Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D48202

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335183 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-20 22:01:04 +00:00
Stanislav Mekhanoshin
3e703da257 Allow binop C1, (select cc, CF, CT) -> select folding
Previously this folding was done only if select is a first operand.
However, for non-commutative operations constant may go before
select.

Differential Revision: https://reviews.llvm.org/D48223

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335167 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-20 20:24:20 +00:00
Strahinja Petrovic
96f8f2ad39 [PowerPC] Fix label address calculation for ppc32
This patch fixes calculating address of label on ppc32 (for -fPIC).

Differential Revision: https://reviews.llvm.org/D46582


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335043 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-19 13:07:40 +00:00
QingShan Zhang
21cf43199f If the arch is P9, we will select the DFLOADf32/DFLOADf64 pseudo instruction when we are loading a floating,
and expand it post RA basing on the register pressure. However, we miss to do the add-imm peephole for these pseudo instruction.

Differential Revision: https://reviews.llvm.org/D47568
Reviewed By: Nemanjai



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335024 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-19 06:54:51 +00:00
Stanislav Mekhanoshin
2a6f354a56 Tests for dag combine select (binop) -> select. NFC.
Tests will be updated with https://reviews.llvm.org/D48223

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334987 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 21:49:07 +00:00
Michael Berg
f4fa78a051 Utilize new SDNode flag functionality to expand current support for fma
Summary: This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions.

Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar, rampitec, nhaehnle, nemanjai

Reviewed By: rampitec, nhaehnle

Subscribers: tpr, nemanjai, wdng

Differential Revision: https://reviews.llvm.org/D47918

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334876 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-16 00:03:06 +00:00
Michael Berg
71bbcb1f4a propagate fast math flags via IR on fma and sub expressions
Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode.

Reviewers: spatel, arsenm, hfinkel, javed.absar

Reviewed By: spatel

Subscribers: nemanjai, wdng

Differential Revision: https://reviews.llvm.org/D47388

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334242 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-07 22:49:09 +00:00
Hiroshi Inoue
61db82a706 [PowerPC] avoid unprofitable Repl32 flag in BitPermutationSelector
BitPermutationSelector sets Repl32 flag for bit groups which can be (potentially) benefit from 32-bit rotate-and-mask instructions with bit replication, i.e. rlwinm/rlwimi copies lower 32 bits into upper 32 bits on 64-bit PowerPC before rotation.
However, enforcing 32-bit instruction sometimes results in redundant generated code.
For example, the following simple code is compiled into rotldi + rlwimi while it can be compiled into only rldimi instruction if Repl32 flag is not set on the bit group for (a & 0xFFFFFFFF).

uint64_t func(uint64_t a, uint64_t b) {
	return (a & 0xFFFFFFFF) | (b << 32) ;
}

To avoid such problem, this patch checks the potential benefit of Repl32 flag before setting it. If a bit group does not require rotation (i.e. RLAmt == 0) and won't be merged into another group, we do not benefit from Repl32 flag on this group.

Differential Revision: https://reviews.llvm.org/D47867



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334195 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-07 13:21:14 +00:00