Commit Graph

17600 Commits

Author SHA1 Message Date
Matt Arsenault
4fd45ebabd AMDGPU: Fix shouldConvertConstantLoadToIntImm behavior
This should really be true for any immediate, not just
inline ones.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277260 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-30 01:40:36 +00:00
Weiming Zhao
7420263227 DAG: avoid duplicated truncating for sign extended operand
Summary:
When performing cmp for EQ/NE and the operand is sign extended, we can
avoid the truncaton if the bits to be tested are no less than origianl
bits.

Reviewers: eli.friedman

Subscribers: eli.friedman, aemerson, nemanjai, t.p.northover, llvm-commits

Differential Revision: https://reviews.llvm.org/D22933

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277252 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 23:33:48 +00:00
Tim Northover
d6e3a6564c GlobalISel: translate "unreachable" (into nothing)
Easiest instruction ever!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277225 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 22:41:55 +00:00
Tim Northover
0f15518dae GlobalISel: support translation of intrinsic calls.
These come in two variants for now: G_INTRINSIC and G_INTRINSIC_W_SIDE_EFFECTS.
We may decide to split the latter up with finer-grained restrictions later, if
necessary.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277224 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 22:32:36 +00:00
Michael Kuperstein
53c51fa032 [X86] Match PSADBW in straight-line code
Up until now, we only had code to match PSADBW patterns that look like what
comes out of the loop vectorizer - a partial reduction inside the loop body
that gets fed into a horizontal operation in a different basic block.

This adds support for straight-line patterns, like those generated by the
SLP vectorizer.

Differential Revision: https://reviews.llvm.org/D22889


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277219 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 21:45:51 +00:00
Michael Kuperstein
9d2d4392e4 [Hexagon] Fix test that uses -debug-only to require asserts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277218 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 21:44:33 +00:00
Simon Pilgrim
6b00b5fe86 [X86][AVX] Fix VBROADCASTF128 selection bug (PR28770)
Support for lowering to VBROADCASTF128 etc. in D22460 was not correctly ensuring that the only users of the 128-bit vector load were the insertions of the vector into the lower/upper subvectors.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277214 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 21:05:10 +00:00
Tim Northover
9c9955b41f CodeGen: add new "intrinsic" MachineOperand kind.
This will be used during GlobalISel, where we need a more robust and readable
way to write tests than a simple immediate ID.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277209 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 20:32:59 +00:00
Eli Bendersky
e66318d700 Add a REQUIRES: assert on a Lanai test that uses a -debug-only flag
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277204 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 19:35:22 +00:00
Simon Pilgrim
ee3cf6f987 Fixed line endings
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277199 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 18:58:57 +00:00
Andrew Kaylor
2f75f99d2f Recommitting r275284: add support to inline __builtin_mempcpy
Patch by Sunita Marathe

Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277189 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 18:23:18 +00:00
Kyle Butt
9f1f15e084 Codegen: MachineBlockPlacement Improve probability layout.
The following pattern was being layed out poorly:

              A
             / \
            B   C
           / \ / \
          D   E   ? (Doesn't matter)

Where A->B is far more likely than A->C, and prob(B->D) = prob(B->E)

The current algorithm gives:
A,B,C,E (D goes on worklist)

It does this even if C has a frequency count of 0. This patch
adjusts the layout calculation so that if freq(B->E) >> freq(C->E)
then we go ahead and layout E rather than C. Fallthrough half the time
is better than fallthrough never, or fallthrough very rarely. The
resulting layout is:

A,B,E, (C and D are in a worklist)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277187 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 18:09:28 +00:00
Kyle Butt
02e59638f8 Tests: Add branch weights to non-layout tests.
Add branch weights to a few tests that aren't testing layout to make them less
sensitive to changes in the layout algorithm.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277186 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 18:09:25 +00:00
Tim Northover
57c3cc8560 GlobalISel: add generic conditional branch.
Just the basic equivalent to DAG's condbr for now, we'll get to things like
br_cc when we start doing more legalization.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277184 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 17:58:00 +00:00
Krzysztof Parzyszek
83fd8feb00 [Hexagon] Testcase for not merging stores into a misaligned store
The DAG combiner will try to merge consecutive stores into a bigger
store, unless the resulting store is not fast. Misaligned vector stores
are allowed on Hexagon, but are not fast. Add a testcase to make sure
this type of merging does not occur.

Patch by Pranav Bhandarkar.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277182 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 17:55:37 +00:00
Krzysztof Parzyszek
227b764c52 Revert r277178, the actual change had already been applied
Will submit another patch with the testcase only.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277180 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 17:50:47 +00:00
Krzysztof Parzyszek
36b1b46f1c [Hexagon] Misaligned loads and stores are not fast
The DAG combiner tries to merge stores to adjacent vector wide memory
locations by creating stores which are integral multiples of the vector
width. Discourage this by informing it that this is slow. This should
not affect legalization passes, because all of them ignore the "Fast"
argument.

Patch by Pranav Bhandarkar.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277178 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 17:45:16 +00:00
Ahmed Bougacha
a4174a215c [AArch64][GlobalISel] Select G_XOR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277173 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 16:56:25 +00:00
Ahmed Bougacha
d8a8826830 [GlobalISel] Add G_XOR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277172 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 16:56:20 +00:00
Ahmed Bougacha
8d4e8d2a52 [AArch64][GlobalISel] Select G_LOAD/G_STORE.
Mostly straightforward as we ignore addressing modes and just
use the base + unsigned immediate offset (always 0) variants.

This currently fails to select extloads because we have yet to
agree on a representation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277171 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 16:56:16 +00:00
Brendon Cahoon
c1359c9fbb MachinePipeliner pass that implements Swing Modulo Scheduling
Software pipelining is an optimization for improving ILP by
overlapping loop iterations. Swing Modulo Scheduling (SMS) is
an implementation of software pipelining that attempts to
reduce register pressure and generate efficient pipelines with
a low compile-time cost.

This implementaion of SMS is a target-independent back-end pass.
When enabled, the pass should run just prior to the register
allocation pass, while the machine IR is in SSA form. If the pass
is successful, then the original loop is replaced by the optimized
loop. The optimized loop contains one or more prolog blocks, the
pipelined kernel, and one or more epilog blocks.

This pass is enabled for Hexagon only. To enable for other targets,
a couple of target specific hooks must be implemented, and the
pass needs to be called from the target's TargetMachine
implementation.

Differential Review: http://reviews.llvm.org/D16829


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277169 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 16:44:44 +00:00
Krzysztof Parzyszek
a6ad276d07 [Hexagon] Custom lower VECTOR_SHUFFLE and EXTRACT_SUBVECTOR for HVX
If the mask of a vector shuffle has alternating odd or even numbers
starting with 1 or 0 respectively up to the largest possible index
for the given type in the given HVX mode (single of double) we can
generate vpacko or vpacke instruction respectively.

E.g.
  %42 = shufflevector <32 x i16> %37, <32 x i16> %41,
                      <32 x i32> <i32 1, i32 3, ..., i32 63>
  is %42.h = vpacko(%41.w, %37.w)

Patch by Pranav Bhandarkar.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277168 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 16:44:27 +00:00
Krzysztof Parzyszek
1f44345f2c [Hexagon] Improve balancing of address calculation
Rebalances address calculation trees and applies Hexagon-specific
optimizations to the trees to improve instruction selection.

Patch by Tobias Edler von Koch.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277151 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 15:15:35 +00:00
David L Kreitzer
af3f28bc66 Avoid unnecessary 32-bit to 64-bit zero extensions following
32-bit CMOV instructions on x86_64. The 32-bit CMOV implicitly
zero extends.

Differential Revision: https://reviews.llvm.org/D22941


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277148 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 15:09:54 +00:00
Daniel Sanders
e2a16fdce2 Re-commit: [mips][fastisel] Handle 0-4 arguments without SelectionDAG.
Summary:
Implements fastLowerArguments() to avoid the need to fall back on
SelectionDAG for 0-4 argument functions that don't do tricky things like
passing double in a pair of i32's.

This allows us to move all except one test to -fast-isel-abort=3. The
remaining one has function prototypes of the form 'i32 (i32, double, double)'
which requires floats to be passed in GPR's.

The previous commit had an uninitialized variable that caused the incoming
argument region to have undefined size. This has been fixed.

Reviewers: sdardis

Subscribers: dsanders, llvm-commits, sdardis

Differential Revision: https://reviews.llvm.org/D22680


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277136 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 12:27:28 +00:00
Simon Pilgrim
e6abaac391 [X86][SSE] Optimize the truncation of vector comparison results with PACKSS
We currently default to using either generic shuffles or MASK+PACKUS/PACKSS to truncate all integer vectors. For vector comparisons, we know that the result will be either all or zero bits in every element, which can be efficiently truncated by directly using PACKSS to repeatedly halve the size of each element.

Due to the limited input values (-1 or 0) we don't need to account for vector element size, so for simplicity we just use the PACKSS(vXi16,vXi16) implementation in all cases. Additionally for AVX2 PACKSS of 256bit data we must perform a PERMQ shuffle to reorder the data into the correct order. I did investigate performing a single shuffle after all the PACKSS calls but the need to cross 128bit lanes makes this difficult to achieve efficiently.

We avoid performing this on AVX512 as it should have better alternative truncation instructions.

Differential Revision: https://reviews.llvm.org/D22814

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277132 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 10:23:10 +00:00
Prakhar Bahuguna
322f342e82 [Thumb] Emit Thumb move in both Thumb modes for struct_byval predicates
Summary:
The MOV/MOVT instructions being chosen for struct_byval predicates was
conditional only on Thumb2, resulting in an ARM MOV/MOVT instruction
being incorrectly emitted in Thumb1 mode. This is especially apparent
with v8-m.base targets. This patch ensures that Thumb instructions are
emitted in both Thumb modes.

Reviewers: rengolin, t.p.northover

Subscribers: llvm-commits, aemerson, rengolin

Differential Revision: https://reviews.llvm.org/D22865

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277128 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 09:16:46 +00:00
Craig Topper
f7938da3bf [AVX512] Mark EVEX VMOVSSrm and VMOVSDrm as canFoldAsLoad and isReMaterializable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277120 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 06:06:04 +00:00
Craig Topper
9e64e8e98b [AVX512] Add AVX512 run lines to some tests for scalar fma/add/sub/mul/div and regenerate. Follow up commits will bring AVX512 code up to the same quality as AVX/SSE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277118 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 06:05:58 +00:00
Craig Topper
c6c1814d38 [AVX512] Remove the intrinsic forms of VMOVSS/VMOVSD. We don't need two different forms of 'rr' and 'rm'. This matches SSE/AVX.
I'm not convinced the patterns for the rm_Int was correct anyway. It had a tied source that should't exist for the unmasked version. The load form of MOVSS always zeros the most significant bits. I've left the patterns off the masked load instructions as I'm not sure what the correct pattern should be and we don't have any tests currently. Nor do we implement masked scalar load intrinsics in clang currently.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277098 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 02:49:08 +00:00
Changpeng Fang
539fec5dc2 AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB.
Differential Revision: http://reviews.llvm.org/D22021

Reviewed by: arsenm

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277073 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 23:01:45 +00:00
Krzysztof Parzyszek
1c394fcb96 Fix build breaks after r277028
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277031 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 20:25:21 +00:00
Krzysztof Parzyszek
5559171657 [Hexagon] Implement MI-level constant propagation
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277028 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 20:01:59 +00:00
Krzysztof Parzyszek
18fcd7fd21 [Hexagon] Insert CFI instructions before throwing calls
Normally, CFI instructions should be inserted after allocframe, but
if allocframe is in the same packet with a call, the CFI instructions
should be inserted before that packet.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277020 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 19:13:46 +00:00
Ahmed Bougacha
e2c6755e74 [AArch64][GlobalISel] Select G_BR.
This is the first unsized instruction we support; move down the
'sized' check to binops.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277007 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 17:15:15 +00:00
Ahmed Bougacha
e4fd36eb91 [MIRParser] Accept unsized generic instructions.
Since r276158, we require generic instructions to have a sized type.
G_BR doesn't; relax the restriction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277006 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 17:15:12 +00:00
Ahmed Bougacha
70d652907a [AArch64][GlobalISel] Select GPR G_SUB.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277003 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 16:58:35 +00:00
Ahmed Bougacha
8686febe9e [AArch64][GlobalISel] Select GPR G_AND.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277002 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 16:58:31 +00:00
Ahmed Bougacha
e27b94c59f [GlobalISel] Remove types on selected insts instead of using LLT().
LLT() has a particular meaning: it's one invalid type. But we really
want selected instructions to have no type whatsoever.

Also verify that types don't linger after ISel, and enable the verifier
on the AArch64 select test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277001 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 16:58:27 +00:00
Ahmed Bougacha
d59b26e1ea [AArch64][GlobalISel] Remove 'alignment' from MIR tests. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277000 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 16:58:21 +00:00
Wei Ding
ee8c4ca1e1 AMDGPU : Add intrinsics for compare with the full wavefront result
Differential Revision: http://reviews.llvm.org/D22482

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276998 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 16:42:13 +00:00
Daniel Sanders
d1a7ed8d38 Revert r276982 and r276984: [mips][fastisel] Handle 0-4 arguments without SelectionDAG
It seems that the stack offset in callabi.ll varies between machines. I'll look
into it.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276989 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 15:37:42 +00:00
Craig Topper
1314c16889 [X86] Remove CustomInserter for FMA3 instructions. Looks like since we got full commuting support for FMAs after this was added, the coalescer can now get this right on its own.
Differential Revision: https://reviews.llvm.org/D22799

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276987 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 15:28:56 +00:00
Daniel Sanders
5604769b28 [mips][fastisel] Handle 0-4 arguments without SelectionDAG.
Summary:
Implements fastLowerArguments() to avoid the need to fall back on
SelectionDAG for 0-4 argument functions that don't do tricky things like
passing double in a pair of i32's.

This allows us to move all except one test to -fast-isel-abort=3. The
remaining one has function prototypes of the form 'i32 (i32, double, double)'
which requires floats to be passed in GPR's.

Reviewers: sdardis

Subscribers: dsanders, llvm-commits, sdardis

Differential Revision: https://reviews.llvm.org/D22680

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276982 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 14:55:28 +00:00
Nicolai Haehnle
b18ca96c79 AMDGPU: add execfix flag to SI_ELSE
Summary:
SI_ELSE is lowered into two parts:

s_or_saveexec_b64 dst, src (at the start of the basic block)

s_xor_b64 exec, exec, dst (at the end of the basic block)

The idea is that dst contains the exec mask of the preceding IF block. It can
happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside
the basic block that contains SI_ELSE, in which case it introduces an instruction

s_and_b64 exec, exec, s[...]

which masks out bits that can correspond to both the IF and the ELSE paths.
So the resulting sequence must be:

s_or_savexec_b64 dst, src

s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode
s_and_b64 dst, dst, exec <-- added by SILowerControlFlow

s_xor_b64 exec, exec, dst

Whether to add the additional s_and_b64 dst, dst, exec is currently determined
via the ExecModified tracking. With this change, it is instead determined by
an additional flag on SI_ELSE which is set by SIWholeQuadMode.

Finally: It also occured to me that an alternative approach for the long run
is for SILowerControlFlow to unconditionally emit

s_or_saveexec_b64 dst, src

...

s_and_b64 dst, dst, exec
s_xor_b64 exec, exec, dst

and have a pass that detects and cleans up the "redundant AND with exec"
pattern where possible. This could be useful anyway, because we also add
instructions

s_and_b64 vcc, exec, vcc

before s_cbranch_scc (in moveToALU), and those are often redundant. I have
some pending changes to how KILL is lowered that could also benefit from
such a cleanup pass.

In any case, this current patch could help in the short term with the whole
ExecModified business.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22846

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276972 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 11:39:24 +00:00
Krzysztof Parzyszek
ca740c1356 [Hexagon] Find speculative loop preheader in hardware loop generation
Before adding a new preheader block, check if there is a candidate block
where the loop setup could be placed speculatively. This will be off by
default.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276919 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-27 21:20:54 +00:00
Krzysztof Parzyszek
6d5ee09dd7 [Hexagon] Do not optimize volatile stack spill slots
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276916 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-27 20:50:42 +00:00
Andrew Kaylor
67e13d65ae Revert EH-specific checks in BranchFolding that were causing blow ups in compile time.
Differential Revision: https://reviews.llvm.org/D22839



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276898 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-27 17:55:33 +00:00
Tim Northover
331274dbc8 GlobalISel: support zero-sized allocas
All allocas must be at least 1 byte at the MachineIR level so we allocate just
one byte.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276897 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-27 17:47:54 +00:00
Simon Pilgrim
30428b19cd [X86][SSE] Updated test so that both are applying the post-multiply
This is to ensure that there are no diffs other than due to buildvector/legalization

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276882 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-27 15:30:20 +00:00