Commit Graph

5109 Commits

Author SHA1 Message Date
Stefan Pintilie
575a95094e [Power9] Add missing instructions: extswsli, popcntb
Added the following P9 instructions: extswsli, extswsli., popcntb

Differential Revision: https://reviews.llvm.org/D37342

llvm-svn: 313147
2017-09-13 14:05:27 +00:00
Lei Huang
ffe49cfc41 Update branch coalescing to be a PowerPC specific pass
Implementing this pass as a PowerPC specific pass.  Branch coalescing utilizes
the analyzeBranch method which currently does not include any implicit operands.
This is not an issue on PPC but must be handled on other targets.

Pass is currently off by default. Enabled via -enable-ppc-branch-coalesce.

Differential Revision : https: // reviews.llvm.org/D32776

llvm-svn: 313061
2017-09-12 18:39:11 +00:00
Kyle Butt
6ab29ccb20 PPC: Don't select lxv/stxv for insufficiently aligned stack slots.
The lxv/stxv instructions require an offset that is 0 % 16. Previously we were
selecting lxv/stxv for loads and stores to the stack where the offset from the
slot was a multiple of 16, but the stack slot was not 16 or more byte aligned.
When the frame gets lowered these transform to r(1|31) + slot + offset.
If slot is not aligned, slot + offset may not be 0 % 16.
Now we require 16 byte or more alignment for select lxv/stxv to stack slots.

Includes a testcase that shows both sufficiently and insufficiently aligned
stack slots.

llvm-svn: 312843
2017-09-09 00:37:56 +00:00
Dean Michael Berris
ff78401ab1 [XRay][CodeGen][PowerPC] Fix tail exit codegen for XRay in PPC
Summary:
This fixes code-gen for XRay in PPC. The regression wasn't caught by
codegen tests  which we add in this change.

What happened was the following:

- For tail exits, we used to unconditionally prepend the returns/exits
  with a pseudo-instruction that gets lowered to the instrumentation
  sled (and leave the actual return/exit instruction as-is).
- Changes to the XRay instrumentation pass caused the tail exits to
  suddenly also emit the tail exit pseudo-instruction, since the check
  for whether a return instruction was also a call instruction meant it
  was a tail exit instruction.
- None of the tests caught the regression either due to non-existent
  tests, or the tests being disabled/removed for continuous breakage.

This change re-introduces some of the basic tests and verifies that
we're back to a state that allows the back-end to generate appropriate
XRay instrumented binaries for PPC in the presence of tail exits.

Reviewers: echristo, timshen

Subscribers: nemanjai, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D37570

llvm-svn: 312772
2017-09-08 01:47:56 +00:00
Hal Finkel
40848324a8 [PowerPC] Don't use xscvdpspn on the P7
xscvdpspn was not introduced until the P8, so don't use it on the P7. Fixes a
regression introduced in r288152.

llvm-svn: 312612
2017-09-06 03:08:26 +00:00
Tony Jiang
d0cf25e7a3 [PPC][NFC] Renaming things with 'xxinsert' moniker to 'vecinsert' to make it more general.
Commit on behalf of Graham Yiu (gyiu@ca.ibm.com)

llvm-svn: 312547
2017-09-05 18:08:02 +00:00
Hiroshi Inoue
c0119575f5 [PowerPC] eliminate redundant compare instruction
If multiple conditional branches are executed based on the same comparison, we can execute multiple conditional branches based on the result of one comparison on PPC. For example,

if (a == 0) { ... }
else if (a < 0) { ... }

can be executed by one compare and two conditional branches instead of two pairs of a compare and a conditional branch.

This patch identifies a code sequence of the two pairs of a compare and a conditional branch and merge the compares if possible.
To maximize the opportunity, we do canonicalization of code sequence before merging compares.
For the above example, the input for this pass looks like:

cmplwi r3, 0
beq    0, .LBB0_3
cmpwi  r3, -1
bgt    0, .LBB0_4

So, before merging two compares, we canonicalize it as

cmpwi  r3, 0       ; cmplwi and cmpwi yield same result for beq
beq    0, .LBB0_3
cmpwi  r3, 0       ; greather than -1 means greater or equal to 0
bge    0, .LBB0_4

The generated code should be

cmpwi  r3, 0
beq    0, .LBB0_3
bge    0, .LBB0_4

Differential Revision: https://reviews.llvm.org/D37211

llvm-svn: 312514
2017-09-05 04:15:17 +00:00
Eric Christopher
63c50e9047 Temporarily revert "Update branch coalescing to be a PowerPC specific pass"
From comments and code review it wasn't intended to be enabled by default yet.

This reverts commit r311588.

llvm-svn: 312214
2017-08-31 05:56:16 +00:00
Stefan Pintilie
76f5d80f5b [Power9] Add new instructions for floating point status and control registers.
Added the following P9 instructions: mffsce, mffscdrn, mffscdrni, mffscrn,
  mffscrni, mffsl

Differential Revision: https://reviews.llvm.org/D37167

llvm-svn: 311903
2017-08-28 18:46:01 +00:00
NAKAMURA Takumi
b40db7c573 Untabify.
llvm-svn: 311875
2017-08-28 06:47:47 +00:00
Sanjay Patel
7708490317 [DAG] convert vector select-of-constants to logic/math
This goes back to a discussion about IR canonicalization. We'd like to preserve and convert
more IR to 'select' than we currently do because that's likely the best choice in IR:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html
...but that's often not true for codegen, so we need to account for this pattern coming in
to the backend and transform it to better DAG ops.

Steps in this patch:

  1. Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely
     enable this transform. Other targets will probably want that anyway to distinguish scalars
     from vectors. We're using that here to exclude AVX512 targets, but it may not be necessary.

  2. Convert a vselect to ext+add. This eliminates a constant load/materialization, and the
     vector ext is often free.

Implementing a more general fold using xor+and can be a follow-up for targets that don't have
a legal vselect. It's also possible that we can remove the TLI hook for the special case fold
implemented here because we're eliminating a constant, but it needs to be tested on other
targets.

Differential Revision: https://reviews.llvm.org/D36840

llvm-svn: 311731
2017-08-24 23:24:43 +00:00
Lei Huang
b2307c41ea Update branch coalescing to be a PowerPC specific pass
Implementing this pass as a PowerPC specific pass.  Branch coalescing utilizes
the analyzeBranch method which currently does not include any implicit operands.
This is not an issue on PPC but must be handled on other targets.

Differential Revision : https: // reviews.llvm.org/D32776

llvm-svn: 311588
2017-08-23 19:25:04 +00:00
Hiroshi Inoue
fc231e5848 [PowerPC] better instruction selection for OR (XOR) with a 32-bit immediate
- recommitting after fixing a test failure on MacOS

On PPC64, OR (XOR) with a 32-bit immediate can be done with only two instructions, i.e. ori + oris.
But the current LLVM generates three or four instructions for this purpose (and also it clobbers one GPR).

This patch makes PPC backend generate ori + oris (xori + xoris) for OR (XOR) with a 32-bit immediate.

e.g. (x | 0xFFFFFFFF) should be

	ori 3, 3, 65535
	oris 3, 3, 65535

but LLVM generates without this patch

	li 4, 0
	oris 4, 4, 65535
	ori 4, 4, 65535
	or 3, 3, 4

Differential Revision: https://reviews.llvm.org/D34757

llvm-svn: 311538
2017-08-23 08:55:18 +00:00
Hiroshi Inoue
40ee17ae6e Revert rL311526: [PowerPC] better instruction selection for OR (XOR) with a 32-bit immediate
This reverts commit rL311526 due to failures in some buildbot.

llvm-svn: 311530
2017-08-23 06:38:05 +00:00
Hiroshi Inoue
e1703b792d [PowerPC] better instruction selection for OR (XOR) with a 32-bit immediate
On PPC64, OR (XOR) with a 32-bit immediate can be done with only two instructions, i.e. ori + oris.
But the current LLVM generates three or four instructions for this purpose (and also it clobbers one GPR).

This patch makes PPC backend generate ori + oris (xori + xoris) for OR (XOR) with a 32-bit immediate.

e.g. (x | 0xFFFFFFFF) should be

	ori 3, 3, 65535
	oris 3, 3, 65535

but LLVM generates without this patch

	li 4, 0
	oris 4, 4, 65535
	ori 4, 4, 65535
	or 3, 3, 4

Differential Revision: https://reviews.llvm.org/D34757

llvm-svn: 311526
2017-08-23 05:15:15 +00:00
Sean Fertile
a00a3d6bcb [PPC] Refine checks for emiting TOC restore nop and tail-call eligibility.
For the medium and large code models we only need to check if a call crosses
dso-boundaries when considering tail-call elgibility.

Differential Revision: https://reviews.llvm.org/D34245

llvm-svn: 311353
2017-08-21 17:35:32 +00:00
Stefan Pintilie
ea90a707df [PowerPC] Check if the pre-increment PHI Node already exists
Preparations to use the per-increment are sometimes done in the target
independent pass Loop Strength Reduction. We try to detect them in the PowerPC
specific pass so that they are not done twice and so that we do not add PHIs
that are not required.

Differential Revision: https://reviews.llvm.org/D36736

llvm-svn: 311332
2017-08-21 13:36:18 +00:00
Lei Huang
c0f9188920 [PowerPC] Add codegen for VSX word extract convert to FP
Add codegen for VSX word extract conversion from signed/unsigned to single/double
precision.

For UINT_TO_FP:
Extract word unsigned and convert to float was implemented in https://reviews.llvm.org/D20239.
Here we will add the missing extract integer and conversion to double. This
utilizes the new P9 instruction xxextractuw to extracting an integer element
when the result will be converted to double thereby saving 2 direct moves
(VSR <-> GPR).

For SINT_TO_FP:
We will implement the following sequence which will also reduce the number of
instructions by saving 2 direct moves.

v4i32->f32:
        xxspltw
        xvcvsxwsp
        xscvspdpn

v4i32->f64:
        xxspltw
        xvcvsxwdp

Differential Revision: https://reviews.llvm.org/D35859

llvm-svn: 310866
2017-08-14 18:09:29 +00:00
Chandler Carruth
e390910302 [PowerPC] Revert r310346 (and followups r310356 & r310424) which
introduce a miscompile bug.

There appears to be a bug where the generated code to extract the sign
bit doesn't work correctly for 32-bit inputs. I've replied to the
original commit pointing out the problem. I think I see by inspection
(and reading the manual for PPC) how to fix this, but I can't be 100%
confident and I also don't know what the best way to test this is.
Currently it seems nearly impossible to get the backend to hit this code
path, but the patch autohr is likely in a better position to craft such
test cases than I am, and based on where the bug is it should be easily
done.

Original commit message for r310346:
"""
[PowerPC] Eliminate compares - add i32 sext/zext handling for SETLE/SETGE

Adds handling for SETLE/SETGE comparisons on i32 values. Furthermore, it
adds the handling for the special case where RHS == 0.

Differential Revision: https://reviews.llvm.org/D34048
"""

llvm-svn: 310809
2017-08-14 03:41:00 +00:00
Krzysztof Parzyszek
3786e693af Add "Restored" flag to CalleeSavedInfo
The liveness-tracking code assumes that the registers that were saved
in the function's prolog are live outside of the function. Specifically,
that registers that were saved are also live-on-exit from the function.
This isn't always the case as illustrated by the LR register on ARM.

Differential Revision: https://reviews.llvm.org/D36160

llvm-svn: 310619
2017-08-10 16:17:32 +00:00
Nemanja Ivanovic
27b5e26d4e My commit r310346 introduced some valid warnings. This cleans them up.
llvm-svn: 310424
2017-08-08 22:17:31 +00:00
Nemanja Ivanovic
95f3a7ac95 [PowerPC] Don't crash on larger splats achieved through 1-byte splats
We've implemented a 1-byte splat using XXSPLTISB on P9. However, LLVM will
produce a 1-byte splat even for wider element BUILD_VECTOR nodes. This patch
prevents crashing in that situation.

Differential Revision: https://reviews.llvm.org/D35650

llvm-svn: 310358
2017-08-08 13:52:45 +00:00
Nemanja Ivanovic
4fc5c41e66 Appease compilers that have the -Wcovered-switch-default switch.
llvm-svn: 310356
2017-08-08 12:41:56 +00:00
Nemanja Ivanovic
e6ddca8e6d [PowerPC] Eliminate compares - add i32 sext/zext handling for SETLE/SETGE
Adds handling for SETLE/SETGE comparisons on i32 values. Furthermore, it adds
the handling for the special case where RHS == 0.

Differential Revision: https://reviews.llvm.org/D34048

llvm-svn: 310346
2017-08-08 11:20:44 +00:00
Rafael Espindola
2b2c232b43 Fix the ppc jit tests.
llvm-svn: 309921
2017-08-03 04:52:45 +00:00
Rafael Espindola
f2011a3ae7 Delete Default and JITDefault code models
IMHO it is an antipattern to have a enum value that is Default.

At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.

This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.

llvm-svn: 309911
2017-08-03 02:16:21 +00:00
Stefan Pintilie
12190a0ab7 [Power9] Exploit vector absolute difference instructions on Power 9
Power 9 has instructions to do absolute difference (VABSDUB, VABSDUH, VABSDUW)
for byte, halfword and word. We should take advantage of these.

Differential Revision: https://reviews.llvm.org/D34684

llvm-svn: 309876
2017-08-02 20:07:21 +00:00
Hiroshi Inoue
0930805caf [PowerPC] Change method names; NFC
Changed method names based on the discussion in https://reviews.llvm.org/D34986:
getInt64 -> selectI64Imm,
getInt64Count -> selectI64ImmInstrCount.

llvm-svn: 309541
2017-07-31 06:27:09 +00:00
Hiroshi Inoue
4a193c9d63 [PowerPC] enable optimizeCompareInstr for branch with static branch hint
In optimizeCompareInstr, a compare instruction is eliminated by using a record form instruction if possible.
If the branch instruction that uses the result of the compare has a static branch hint, the optimization does not happen.
This patch makes this optimization happen regardless of the branch hint by splitting branch hint and branch condition before checking the predicate to identify the possible optimizations.

Differential Revision: https://reviews.llvm.org/D35801

llvm-svn: 309255
2017-07-27 08:14:48 +00:00
Peter Collingbourne
b84653db71 Change CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. NFCI.
This was a use-after-free waiting to happen.

llvm-svn: 309159
2017-07-26 19:15:29 +00:00
Stefan Pintilie
37e1fd9bb4 [NFC] test commit.
Added a comment to explain how to add a PPCISD node.

llvm-svn: 309114
2017-07-26 13:44:59 +00:00
Eric Christopher
f36585bcba Update the comments on default subtargets based on feedback.
llvm-svn: 309041
2017-07-25 22:21:08 +00:00
Nemanja Ivanovic
a1f6d5e3d0 [PowerPC] Pretty-print CR bits the way the binutils disassembler does
This patch just adds printing of CR bit registers in a more human-readable
form akin to that used by the GNU binutils.

Differential Revision: https://reviews.llvm.org/D31494

llvm-svn: 309001
2017-07-25 18:26:35 +00:00
Nemanja Ivanovic
dcf97018cb [PowerPC] - Recommit r304907 now that the issue has been fixed
This is just a recommit since the issue that the commit exposed is now
resolved.

llvm-svn: 308995
2017-07-25 17:54:51 +00:00
Guozhi Wei
0252b5a66a [PPC] Add Defs = [CARRY] to MIR SRADI_32
MIR SRADI uses instruction template XSForm_1rc which declares Defs = [CARRY]. But MIR SRADI_32 uses instruction template XSForm_1, and it doesn't declare such implicit definition. With patch D33720 it causes wrong code generation for perl.

This patch adds the implicit definition.

Differential Revision: https://reviews.llvm.org/D35699

llvm-svn: 308780
2017-07-21 21:06:08 +00:00
Jonas Paulsson
c38a4eb7d4 [SystemZ, LoopStrengthReduce]
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.

In order to achieve this, the following common code changes were made:

 * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
 LSR should do instruction-based addressing evaluations by calling
 isLegalAddressingMode() with the Instruction pointers.
 * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
 as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
 not just loads or stores.

SystemZ changes:

 * isLSRCostLess() implemented with Insns first, and without ImmCost.
 * New function supportedAddressingMode() that is a helper for TTI methods
 looking at Instructions passed via pointers.

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262
https://reviews.llvm.org/D35049

llvm-svn: 308729
2017-07-21 11:59:37 +00:00
Hiroshi Inoue
309f36a350 fix formatting issue; NFC
llvm-svn: 308305
2017-07-18 13:31:40 +00:00
Eric Christopher
45b56e99bf Add a set of comments explaining why getSubtargetImpl() is deleted on these targets.
llvm-svn: 307999
2017-07-14 04:33:43 +00:00
Nemanja Ivanovic
9190dc4f73 [PowerPC] Ensure displacements for DQ-Form instructions are multiples of 16
As outlined in the PR, we didn't ensure that displacements for DQ-Form
instructions are multiples of 16. Since the instruction encoding encodes
a quad-word displacement, a sub-16 byte displacement is meaningless and
ends up being encoded incorrectly.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33671.

Differential Revision: https://reviews.llvm.org/D35007

llvm-svn: 307934
2017-07-13 18:17:10 +00:00
Rafael Espindola
d77365c081 Fully fix the movw/movt addend.
The issue is not if the value is pcrel. It is whether we have a
relocation or not.

If we have a relocation, the static linker will select the upper
bits. If we don't have a relocation, we have to do it.

llvm-svn: 307730
2017-07-11 23:18:25 +00:00
Tony Jiang
2a8d3d1229 [PPC] Fix two bugs in frame lowering.
1. The available program storage region of the red zone to compilers is 288
 bytes rather than 244 bytes.
2. The formula for negative number alignment calculation should be
y = x & ~(n-1) rather than y = (x + (n-1)) & ~(n-1).

Differential Revision: https://reviews.llvm.org/D34337

llvm-svn: 307672
2017-07-11 16:42:20 +00:00
Hiroshi Inoue
5a33df74aa fix formatting; NFC
llvm-svn: 307662
2017-07-11 15:41:31 +00:00
Hiroshi Inoue
6fa83f356e [PowerPC] fix latency for simple integer instructions in POWER9 scheduler
In the POWER9 instruction scheduler, SchedWriteRes for the simple integer instructions are misconfigured to use that of (costly) DFU instructions.
This results in surprisingly long instruction latency estimation and causes misbehavior in some optimizers such as if-conversion.

Differential Revision: https://reviews.llvm.org/D34869

llvm-svn: 307624
2017-07-11 05:37:16 +00:00
Hiroshi Inoue
4693e1825c [PowerPC] avoid redundant analysis while lowering an immediate; NFC
This patch reduces compilation time by avoiding redundant analysis while selecting instructions to create an immediate.
If the instruction count required to create the input number without rotate is 2, we do not need further analysis to find a shorter instruction sequence with rotate; rotate + load constant cannot be done by 1 instruction (i.e. getInt64CountDirectnever return 0).
This patch should not change functionality.

Differential Revision: https://reviews.llvm.org/D34986

llvm-svn: 307623
2017-07-11 05:28:26 +00:00
Tony Jiang
2bbff7b22b [PPC CodeGen] Expand the bitreverse.i64 intrinsic.
Differential Revision: https://reviews.llvm.org/D34908
Fix PR: https://bugs.llvm.org/show_bug.cgi?id=33093

llvm-svn: 307563
2017-07-10 18:11:23 +00:00
Lei Huang
99263ca636 [PowerPC] Reduce register pressure by not materializing a constant just for use as an index register for X-Form loads/stores.
For this example:
float test (int *arr) {
    return arr[2];
}

We currently generate the following code:
  li r4, 8
  lxsiwax f0, r3, r4
  xscvsxdsp f1, f0

With this patch, we will now generate:
  addi r3, r3, 8
  lxsiwax f0, 0, r3
  xscvsxdsp f1, f0

Originally reported in: https://bugs.llvm.org/show_bug.cgi?id=27204
Differential Revision: https://reviews.llvm.org/D35027

llvm-svn: 307553
2017-07-10 16:44:45 +00:00
Hiroshi Inoue
f127dd99e3 fix typos in comments and error messages; NFC
llvm-svn: 307533
2017-07-10 12:44:25 +00:00
Hiroshi Inoue
0df97f9412 fix formatting; NFC
llvm-svn: 307523
2017-07-10 06:32:52 +00:00
Lei Huang
eab61d9acf [PowerPC] NFC : Common up definitions of isIntS16Immediate and update parameter to int16_t
llvm-svn: 307442
2017-07-07 21:12:35 +00:00
Tony Jiang
e91b1f0847 [PPC CodeGen] Expand the bitreverse.i32 intrinsic.
Differential Revision: https://reviews.llvm.org/D33572
Fix PR: https://bugs.llvm.org/show_bug.cgi?id=33093

llvm-svn: 307413
2017-07-07 16:41:55 +00:00