1171 Commits

Author SHA1 Message Date
Simon Pilgrim
07898901df [DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.

Differential Revision: https://reviews.llvm.org/D31249

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299201 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-31 11:24:16 +00:00
Eric Christopher
193628a590 Kill some trailing whitespace to make some new changes a bit easier.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298637 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-23 19:41:10 +00:00
Nirav Dave
11fdc7845a Make library calls sensitive to regparm module flag (Fixes PR3997).
Reviewers: mkuper, rnk

Subscribers: mehdi_amini, jyknight, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D27050

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298179 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-18 00:44:07 +00:00
Reid Kleckner
1c19be8a98 Remove getArgumentList() in favor of arg_begin(), args(), etc
Users often call getArgumentList().size(), which is a linear way to get
the number of function arguments. arg_size(), on the other hand, is
constant time.

In general, the fact that arguments are stored in an iplist is an
implementation detail, so I've removed it from the Function interface
and moved all other users to the argument container APIs (arg_begin(),
arg_end(), args(), arg_size()).

Reviewed By: chandlerc

Differential Revision: https://reviews.llvm.org/D31052

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298010 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-16 22:59:15 +00:00
Hiroshi Inoue
a2d20c4bca Test commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297959 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-16 16:30:06 +00:00
Tim Shen
2aec349a25 Revert "Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size""
After inspection, it's an UB in our code base. Someone cast a var-arg
function pointer to a non-var-arg one. :/

Re-commit r296771 to continue testing on the patch.

Sorry for the trouble!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297256 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-08 02:41:35 +00:00
Tim Shen
107e5edeba Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size"
This reverts commit r296771.

We found some wide spread test failures internally. I'm working on a
testcase. Politely revert the patch in the mean time. :)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297124 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-07 07:40:10 +00:00
Nemanja Ivanovic
521df0adae [PowerPC] Fix failure with STBRX when store is narrower than the bswap
Fixes a crash caused by r296811 by truncating the input of the STBRX node
when the bswap is wider than i32.

Fixes https://bugs.llvm.org/show_bug.cgi?id=32140

Differential Revision: https://reviews.llvm.org/D30615


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297001 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-06 07:32:13 +00:00
Guozhi Wei
e225b39cd4 [PPC] Fix code generation for bswap(int32) followed by store16
This patch fixes pr32063.

Current code in PPCTargetLowering::PerformDAGCombine can transform

bswap
store

into a single PPCISD::STBRX instruction. but it doesn't consider the case that the operand size of bswap may be larger than store size. When it occurs, we need 2 modifications,

1 For the last operand of PPCISD::STBRX, we should not use DAG.getValueType(N->getOperand(1).getValueType()), instead we should use cast<StoreSDNode>(N)->getMemoryVT().

2 Before PPCISD::STBRX, we need to shift the original operand of bswap to the right side.

Differential Revision: https://reviews.llvm.org/D30362



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296811 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-02 21:07:59 +00:00
Nemanja Ivanovic
6981f9a951 [PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size
This patch reduces the stack frame size by not allocating the parameter area if
it is not required. In the current implementation LowerFormalArguments_64SVR4
already handles the parameter area, but LowerCall_64SVR4 does not
(when calculating the stack frame size). What this patch does is make
LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29881


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296771 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-02 17:38:59 +00:00
Tony Jiang
758e067345 [PowerPC] Expand ISEL instruction into if-then-else sequence.
Generally, the ISEL is expanded into if-then-else sequence, in some
cases (like when the destination register is the same with the true
or false value register), it may just be expanded into just the if
or else sequence.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292154 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-16 20:12:26 +00:00
Tony Jiang
748e859f36 Revert "[PowerPC] Expand ISEL instruction into if-then-else sequence."
This reverts commit 1d0e0374438ca6e153844c683826ba9b82486bb1.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292131 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-16 15:01:07 +00:00
Tony Jiang
541103a1c6 [PowerPC] Expand ISEL instruction into if-then-else sequence.
Generally, the ISEL is expanded into if-then-else sequence, in some
cases (like when the destination register is the same with the true
or false value register), it may just be expanded into just the if
or else sequence.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292128 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-16 14:43:12 +00:00
Eugene Zelenko
4a9353b234 [PowerPC] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291872 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-13 00:58:58 +00:00
Hal Finkel
68c84942ec [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:

 1. When determining tail-call eligibility. If we make a tail call (i.e.
    directly branch to a function) then there is no place for the linker to add
    a TOC restoration.
 2. When determining when we need to add a nop instruction after a call.
    Likewise, if there is a possibility that the linker might need to add a
    TOC restoration after a call, then we need to put a nop after the call
    (the bl instruction).

First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.

Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).

There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.

Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. Unfortunately, because of the way
that the ABI works, we need to generate code as if we supported interposition
whenever the linker might insert stubs for the purpose of supporting it.

Differential Revision: https://reviews.llvm.org/D27231

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291003 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-04 21:05:13 +00:00
Chandler Carruth
e87e7067ed Revert r289638: [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
This patch appears to result in trampolines in vtables being miscompiled
when they in turn tail call a method.

I've posted some preliminary details about the failure on the thread for
this commit and talked to Hal. He was comfortable going ahead and
reverting until we sort out what is wrong.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289928 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-16 07:31:20 +00:00
Hal Finkel
e573748031 [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:

 1. When determining tail-call eligibility. If we make a tail call (i.e.
    directly branch to a function) then there is no place for the linker to add
    a TOC restoration.
 2. When determining when we need to add a nop instruction after a call.
    Likewise, if there is a possibility that the linker might need to add a
    TOC restoration after a call, then we need to put a nop after the call
    (the bl instruction).

First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.

Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).

There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.

Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. This will yield subtle bugs if
interposition is attempted. As a result, regardless of whether we're in PIC
mode, we don't assume that we need to add the nop to support the possibility of
ELF interposition. However, the necessary check is in place (i.e. calling
GV->isInterposable and TM.shouldAssumeDSOLocal) so when we have functions for
which interposition is allowed at the IR level, we'll add the nop as necessary.
In the mean time, we'll generate more tail calls and fewer nops when compiling
position-independent code.

Differential Revision: https://reviews.llvm.org/D27231

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289638 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-14 07:24:50 +00:00
Guozhi Wei
9def2dd316 [PPC] Prefer direct move on power8 if load 1 or 2 bytes to VSR
Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX.
This patch fixes pr31144.

Differential Revision: https://reviews.llvm.org/D27287



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289473 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-12 22:09:02 +00:00
Nemanja Ivanovic
502534bc2a [PowerPC] Improvements for BUILD_VECTOR Vol. 2
This patch corresponds to review:
https://reviews.llvm.org/D26023

This patch adds support for converting a vector of loads into a single load if
the loads are consecutive (in either direction).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288219 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-29 23:57:54 +00:00
Nemanja Ivanovic
17f24fa4e1 [PowerPC] Improvements for BUILD_VECTOR Vol. 2
This patch corresponds to review:
https://reviews.llvm.org/D25980

This is the 2nd patch in a series of 4 that improve the lowering and combining
for BUILD_VECTOR nodes on PowerPC. This particular patch combines a build vector
of fp-to-int conversions into an fp-to-int conversion of a build vector of fp
values. For example:
Converts (build_vector (fp_to_[su]i $A), (fp_to_[su]i $B), ...)
Into (fp_to_[su]i (build_vector $A, $B, ...))).
Which is a natural match for much cleaner code.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288218 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-29 23:36:03 +00:00
Nemanja Ivanovic
0b07a5ae00 Revert https://reviews.llvm.org/rL287679
This commit caused some miscompiles that did not show up on any of the bots.
Reverting until we can investigate the cause of those failures.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288214 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-29 23:00:33 +00:00
Nemanja Ivanovic
1ec5b2fb75 [PowerPC] Improvements for BUILD_VECTOR Vol. 1
This patch corresponds to review:
https://reviews.llvm.org/D25912

This is the first patch in a series of 4 that improve the lowering and combining
for BUILD_VECTOR nodes on PowerPC.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288152 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-29 16:11:34 +00:00
Nemanja Ivanovic
aa687a6ca9 [PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE
This patch corresponds to review:
https://reviews.llvm.org/D26861

It also fixes PR30730.

Committing on behalf of Lei Huang.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287679 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-22 19:02:07 +00:00
Ehsan Amiri
072e86da0c [PPC][DAGCombine] Convert SETCC to subtract when the result is zero extended
When we see a SETCC whose only users are zero extend operations, we can replace
it with a subtraction. This results in doing all calculations in GPRs and
avoids CR use.

Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are
ways that this can be extended. For example for signed condition codes. In that
case we will be introducing additional sign extend instructions, so more careful
profitability analysis may be required.

Another direction to extend this is for equal, not equal conditions. Also when
users of SETCC are any_ext or sign_ext, we might be able to do something 
similar.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287329 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-18 10:41:44 +00:00
Joerg Sonnenberger
8833323011 Always use relative jump table encodings on PowerPC64.
For the default, small and medium code model, use the existing
difference from the jump table towards the label. For all other code
models, setup the picbase and use the difference between the picbase and
the block address.

Overall, this results in smaller data tables at the expensive of one or
two more arithmetic operation at the jump site. Given that we only create
jump tables with a lot more than two entries, it is a net win in size.
For larger code models the assumption remains that individual functions
are no larger than 2GB.

Differential Revision: https://reviews.llvm.org/D26336


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287059 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-16 00:37:30 +00:00
Tony Jiang
6ad6c513ee [PowerPC] Implement BE VSX load/store builtins - llvm portion.
This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286967 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-15 14:25:56 +00:00
Evandro Menezes
3f647d62d1 [DAG Combiner] Fix the native computation of the Newton series for reciprocals
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself.  However, the original code did not properly consider this condition
if returned by a target.  This patch addresses the issues to allow a target
to compute the series on its own.

Differential revision: https://reviews.llvm.org/D22975

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286523 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-10 23:31:06 +00:00
Ehsan Amiri
300e976507 [PPC] Generate positive FP zero using xor insn instead of loading from constant area
https://reviews.llvm.org/D23614

Currently we load +0.0 from constant area. That can change to be generated using
XOR instruction.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284995 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-24 17:31:09 +00:00
Sanjay Patel
928f047b68 [Target] remove TargetRecip class; 2nd try
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.

This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.

original commit msg:

[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering

This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284746 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 16:55:45 +00:00
Sanjay Patel
bbcb21daf0 revert r284495: [Target] remove TargetRecip class
There's something wrong with the StringRef usage while parsing the attribute string.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284513 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 18:36:49 +00:00
Sanjay Patel
5800d6e9a7 [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284495 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 17:05:05 +00:00
Sanjay Patel
b60ab5d110 [Target] move reciprocal estimate settings from TargetOptions to TargetLowering
The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because that doesn't work with LTO.

Ideally, these reciprocal attributes will be moved to the instruction-level via
FMF, metadata, or something else. But making them function attributes is at least
an improvement over the current state.

The ingredients of this patch are:

    Remove the reciprocal estimate command-line debug option.
    Add TargetRecip to TargetLowering.
    Remove TargetRecip from TargetOptions.
    Clean up the TargetRecip implementation to work with this new scheme.
    Set the default reciprocal settings in TargetLoweringBase (everything is off).
    Update the PowerPC defaults, users, and tests.
    Update the x86 defaults, users, and tests.

Note that if this patch needs to be reverted, the related clang patch checked in
at r283251 should be reverted too.

Differential Revision: https://reviews.llvm.org/D24816



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283252 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-04 20:46:43 +00:00
Nemanja Ivanovic
d0e875cdad [Power9] Part-word VSX integer scalar loads/stores and sign extend instructions
This patch corresponds to review:
https://reviews.llvm.org/D23155

This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:

    Int to Fp conversions of 1 or 2-byte values loaded from memory
    Building vectors of 1 or 2-byte integers with values loaded from memory
    Storing individual 1 or 2-byte elements from integer vectors

This patch implements all of those uses.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283190 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-04 06:59:23 +00:00
Hal Finkel
4c305bebf0 [PowerPC] Refactor soft-float support, and enable PPC64 soft float
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.

Fixes PR26970.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283060 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-02 02:10:20 +00:00
Nemanja Ivanovic
7a5ffa3882 [Power9] Builtins for ELF v.2 API conformance - back end portion
This patch corresponds to review:
https://reviews.llvm.org/D24396

This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282478 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-27 08:42:12 +00:00
Nemanja Ivanovic
a04f9019ef [Power9] Exploit move and splat instructions for build_vector improvement
This patch corresponds to review:
https://reviews.llvm.org/D21135

This patch exploits the following instructions:
mtvsrws
lxvwsx
mtvsrdd
mfvsrld

In order to improve some build_vector and extractelement patterns.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282246 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-23 13:25:31 +00:00
Nemanja Ivanovic
f2f9e2bcc5 [PowerPC] Sign extend sub-word values for atomic comparisons
Atomic comparison instructions use the sub-word load instruction on
Power8 and up but the value is not sign extended prior to the signed word
compare instruction. This patch adds that sign extension.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282182 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-22 19:06:38 +00:00
Nemanja Ivanovic
a941fe247e [Power9] Add exploitation of non-permuting memory ops
This patch corresponds to review:
https://reviews.llvm.org/D19825

The new lxvx/stxvx instructions do not require the swaps to line the elements
up correctly. In order to select them over the lxvd2x/lxvw4x instructions which
require swaps, the patterns for the old instruction have a predicate that
ensures they won't be selected on Power9 and newer CPUs.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282143 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-22 09:52:19 +00:00
Sanjay Patel
a7c48ccd3f getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281493 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14 16:05:51 +00:00
Sanjay Patel
04e0167eac getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281490 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14 15:43:44 +00:00
Sanjay Patel
f4559b5e2c getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281489 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14 15:21:00 +00:00
Nemanja Ivanovic
7328eb7558 Fix code-gen crash on Power9 for insert_vector_elt with variable index (PR30189)
This patch corresponds to review:
https://reviews.llvm.org/D24021

In the initial implementation of this instruction, I forgot to account for
variable indices. This patch fixes PR30189 and should probably be merged into
3.9.1 (I'll open a bug according to the new instructions).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281479 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14 14:19:09 +00:00
Justin Lebar
c71d5b41ef [CodeGen] Split out the notions of MI invariance and MI dereferenceability.
Summary:
An IR load can be invariant, dereferenceable, neither, or both.  But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.

This patch splits up the notions of invariance and dereferenceability at
the MI level.  It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D23371

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281151 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-11 01:38:58 +00:00
Hal Finkel
77579f5cd8 Add ISD::EH_DWARF_CFA, simplify @llvm.eh.dwarf.cfa on Mips, fix on PowerPC
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:

  ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)

where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.

This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.

Fixes PR26761.

Differential Revision: https://reviews.llvm.org/D24038

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280350 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-01 10:28:47 +00:00
Hal Finkel
557221a0e1 [PowerPC] Add support for -mlongcall
The "long call" option forces the use of the indirect calling sequence for all
calls (even those that don't really need it). GCC provides this option; This is
helpful, under certain circumstances, for building very-large binaries, and
some other specialized use cases.

Fixes PR19098.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280040 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-30 00:59:23 +00:00
Hal Finkel
e060ffb4b2 [PowerPC] Fix i8/i16 atomics for little-Endian targets without partword atomics
For little-Endian PowerPC, we generally target only P8 and later by default.
However, generic (older) 64-bit configurations are still an option, and in that
case, partword atomics are not available (e.g. stbcx.). To lower i8/i16 atomics
without true i8/i16 atomic operations, we emulate using i32 atomics in
combination with a bunch of shifting and masking, etc. The amount by which to
shift in little-Endian mode is different from the amount in big-Endian mode (it
is inverted -- meaning we can leave off the xor when computing the amount).

Fixes PR22923.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280022 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-29 22:25:36 +00:00
Hal Finkel
afa0d1049b [PowerPC] Implement lowering for atomicrmw min/max/umin/umax
Implement lowering for atomicrmw min/max/umin/umax. Fixes PR28818.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279933 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-28 16:17:58 +00:00
Justin Bogner
6673ea81f6 Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278902 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-17 05:10:15 +00:00
Chuang-Yu Cheng
7177ff558c [ppc64] Don't apply sibling call optimization if callee has any byval arg
This is a quick work around, because in some cases, e.g. caller's stack
size > callee's stack size, we are still able to apply sibling call
optimization even callee has any byval arg.

This patch fix: https://llvm.org/bugs/show_bug.cgi?id=28328

Reviewers: hfinkel kbarton nemanjai amehsan
Subscribers: hans, tjablin

https://reviews.llvm.org/D23441

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278900 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-17 03:17:44 +00:00
Pierre Gousseau
349838b560 [x86] Refactor a PowerPC specific ctlz/srl transformation (NFC).
Following the discussion on D22038, this refactors a PowerPC specific setcc -> srl(ctlz) transformation so it can be used by other targets.

Differential Revision: https://reviews.llvm.org/D23445

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278799 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-16 13:53:53 +00:00