Commit Graph

17468 Commits

Author SHA1 Message Date
Tom Stellard
a093ef43dd Merging r288433:
------------------------------------------------------------------------
r288433 | oranevskyy | 2016-12-01 14:58:35 -0800 (Thu, 01 Dec 2016) | 24 lines

[ARM] Fix for 64-bit CAS expansion on ARM32 with -O0

Summary:
This patch fixes comparison of 64-bit atomic with its expected value in CMP_SWAP_64 expansion.

Currently, the low words are compared with CMP, while the high words are compared with SBC. SBC expects the carry flag to be set if CMP detects a difference. CMP might leave the carry unset for unequal arguments though if the first one is >= than the second. This might cause the comparison logic to detect false equality.

Example of the broken C++ code:
```
std::atomic<long long> at(2);

long long ll = 1;
std::atomic_compare_exchange_strong(&at, &ll, 3);
```
Even though the atomic `at` and the expected value `ll` are not equal and `atomic_compare_exchange_strong` returns `false`, `at` is changed to 3.

The patch replaces SBC with CMPEQ.

Reviewers: t.p.northover

Subscribers: aemerson, rengolin, llvm-commits, asl

Differential Revision: https://reviews.llvm.org/D27315

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288847 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-06 20:09:33 +00:00
Tom Stellard
b2d6212d1a Merging r288418:
------------------------------------------------------------------------
r288418 | tnorthover | 2016-12-01 13:31:59 -0800 (Thu, 01 Dec 2016) | 13 lines

AArch64: fix 128-bit cmpxchg at -O0 (again, again).

This time the issue is fortunately just a simple mistake rather than a horrible
design spectre. I thought SUBS/SBCS provided sufficient NZCV flags for
comparing two 64-bit values, but they don't.

The fix is slightly clunkier in AArch64 because we can't use conditional
execution to emit a pair of CMPs. Traditionally an "icmp ne i128" would map to
an EOR/EOR/ORR/CBNZ, but that uses more registers so it's easier to go with a
CSET/CINC/CBNZ combination. Slightly less efficient, but this is -O0 anyway.

Thanks to Anton Korobeynikov for pointing out the issue.

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288846 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-06 20:09:32 +00:00
Tom Stellard
13422af5bb Merging r277755:
------------------------------------------------------------------------
r277755 | tnorthover | 2016-08-04 12:32:28 -0700 (Thu, 04 Aug 2016) | 5 lines

AArch64: don't assume all i128s are BUILD_PAIRs

It leads to a crash when they're not. I'm *sure* I've made this mistake before,
at least once.

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288845 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-06 20:09:30 +00:00
Tom Stellard
de903c28a7 Revert "Merging r278268:"
This reverts commit r288454.  This was committed accidently.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288456 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-02 02:06:41 +00:00
Tom Stellard
e77b40ba40 Merging r278268:
------------------------------------------------------------------------
r278268 | nhaehnle | 2016-08-10 11:51:14 -0700 (Wed, 10 Aug 2016) | 28 lines

LiveIntervalAnalysis: fix a crash in repairOldRegInRange

Summary:
See the new test case for one that was (non-deterministically) crashing
on trunk and deterministically hit the assertion that I added in D23302.
Basically, the machine function contains a sequence

     DS_WRITE_B32 %vreg4, %vreg14:sub0, ...
     DS_WRITE_B32 %vreg4, %vreg14:sub0, ...
     %vreg14:sub1<def> = COPY %vreg14:sub0

and SILoadStoreOptimizer::mergeWrite2Pair merges the two DS_WRITE_B32
instructions into one before calling repairIntervalsInRange.

Now repairIntervalsInRange wants to repair %vreg14, in particular, and
ends up trying to repair %vreg14:sub1 as well, but that only becomes
active _after_ the range that is to be repaired, hence the crash due
to LR.find(...) == LR.begin() at the start of repairOldRegInRange.

I believe that just skipping those subrange is fine, but again, not too
familiar with that code.

Reviewers: MatzeB, kparzysz, tstellarAMD

Subscribers: llvm-commits, MatzeB

Differential Revision: https://reviews.llvm.org/D23303

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288454 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-02 02:04:59 +00:00
Tom Stellard
c95a7375b1 Merging r287339:
------------------------------------------------------------------------
r287339 | nhaehnle | 2016-11-18 03:55:52 -0800 (Fri, 18 Nov 2016) | 20 lines

AMDGPU: Fix legalization of MUBUF instructions in shaders

Summary:
The addr64-based legalization is incorrect for MUBUF instructions with idxen
set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions.  This affects
e.g.  shaders that access buffer textures.

Since we never actually need the addr64-legalization in shaders, this patch
takes the easy route and keys off the calling convention.  If this ever
affects (non-OpenGL) compute, the type of legalization needs to be chosen
based on some TSFlag.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26747

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288106 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-29 03:41:29 +00:00
Tom Stellard
25e2616626 Merging r280589:
------------------------------------------------------------------------
r280589 | nhaehnle | 2016-09-03 05:26:32 -0700 (Sat, 03 Sep 2016) | 19 lines

AMDGPU: Fix an interaction between WQM and polygon stippling

Summary:
This fixes a rare bug in polygon stippling with non-monolithic pixel shaders.

The underlying problem is as follows: the prolog part contains the polygon
stippling sequence, i.e. a kill. The main part then enables WQM based on the
_reduced_ exec mask, effectively undoing most of the polygon stippling.

Since we cannot know whether polygon stippling will be used, the main part
of a non-monolithic shader must always return to exact mode to fix this
problem.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23131

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288105 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-29 03:41:28 +00:00
Tom Stellard
efccb7dcb2 Merging r277504:
------------------------------------------------------------------------
r277504 | nhaehnle | 2016-08-02 12:31:14 -0700 (Tue, 02 Aug 2016) | 21 lines

AMDGPU: Stay in WQM for non-intrinsic stores

Summary:
Two types of stores are possible in pixel shaders: stores to memory that are
explicitly requested at the API level, and stores that are an implementation
detail of register spilling or lowering of arrays.

For the first kind of store, we must ensure that helper pixels have no effect
and hence WQM must be disabled. The second kind of store must always be
executed, because the written value may be loaded again in a way that is
relevant for helper pixels as well -- and there are no externally visible
effects anyway.

This is a candidate for the 3.9 release branch.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D22675

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288104 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-29 03:41:26 +00:00
Tom Stellard
25c1eb0bc3 Merging r277500:
------------------------------------------------------------------------
r277500 | nhaehnle | 2016-08-02 12:17:37 -0700 (Tue, 02 Aug 2016) | 18 lines

AMDGPU: Track physical registers in SIWholeQuadMode

Summary:
There are cases where uniform branch conditions are computed in VGPRs, and
we didn't correctly mark those as WQM.

The stray change in basic-branch.ll is because invoking the LiveIntervals
analysis leads to the detection of a dead register that would otherwise
not be seen at -O0.

This is a candidate for the 3.9 branch, as it fixes a possible hang.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22673

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288103 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-29 03:41:24 +00:00
Pawel Bylica
fadb1c1c61 Merging r286998:
------------------------------------------------------------------------
r286998 | chfast | 2016-11-15 19:29:24 +0100 (wto, 15 lis 2016) | 12 lines

Integer legalization: fix MUL expansion

Summary:
This fixes the runtime results produces by the fallback multiplication expansion introduced in r270720.

For tests I created a fuzz tester that compares the results with Boost.Multiprecision.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26628


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288086 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-28 22:45:01 +00:00
Pawel Bylica
9b2f9c4eed Merging r281403:
------------------------------------------------------------------------
r281403 | chfast | 2016-09-13 23:55:41 +0200 (wto, 13 wrz 2016) | 9 lines

[CodeGen] Fix invalid shift in mul expansion

Summary: When expanding mul in type legalization make sure the type for
shift amount can actually fit the value.
This fixes PR30354 https://llvm.org/bugs/show_bug.cgi?id=30354.

Reviewers: hfinkel, majnemer, RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D24478


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@288085 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-28 22:41:59 +00:00
Guy Blank
56938b6ec4 merge r276347
[X86] Do not use AND8ri8 in AVX512 pattern

This variant is (as documented in the TD) for disassembler use only, and should
not be used in patterns - it is longer, and is broken on 64-bit.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@287855 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-24 06:52:24 +00:00
Tom Stellard
11d3b2906a Merging r282182:
------------------------------------------------------------------------
r282182 | nemanja.i.ibm | 2016-09-22 12:06:38 -0700 (Thu, 22 Sep 2016) | 6 lines

[PowerPC] Sign extend sub-word values for atomic comparisons

Atomic comparison instructions use the sub-word load instruction on
Power8 and up but the value is not sign extended prior to the signed word
compare instruction. This patch adds that sign extension.

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@287811 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-23 21:17:33 +00:00
Tom Stellard
b0558963d0 Merging r279933:
------------------------------------------------------------------------
r279933 | hfinkel | 2016-08-28 09:17:58 -0700 (Sun, 28 Aug 2016) | 4 lines

[PowerPC] Implement lowering for atomicrmw min/max/umin/umax

Implement lowering for atomicrmw min/max/umin/umax. Fixes PR28818.

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@287810 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-23 21:17:31 +00:00
Tom Stellard
1a4e1d82fe Merging r281479:
------------------------------------------------------------------------
r281479 | nemanja.i.ibm | 2016-09-14 07:19:09 -0700 (Wed, 14 Sep 2016) | 9 lines

Fix code-gen crash on Power9 for insert_vector_elt with variable index (PR30189)

This patch corresponds to review:
https://reviews.llvm.org/D24021

In the initial implementation of this instruction, I forgot to account for
variable indices. This patch fixes PR30189 and should probably be merged into
3.9.1 (I'll open a bug according to the new instructions).

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@287809 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-23 21:03:25 +00:00
Simon Pilgrim
ef867d47bf [3.9.1] Merging r283070 - [X86][AVX] Ensure broadcast loads respect dependencies
To allow broadcast loads of a non-zero'th vector element, lowerVectorShuffleAsBroadcast can replace a load with a new load with an adjusted address, but unfortunately we weren't ensuring that the new load respected the same dependencies.

This patch adds a TokenFactor and updates all dependencies of the old load to reference the new load instead.

Bug found during internal testing.

Differential Revision: https://reviews.llvm.org/D25039

As discussed on PR30596

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@286251 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 17:01:05 +00:00
Simon Pilgrim
d5d7df5232 [3.9.1] Merging r282613 - [X86][AVX] Add test showing that VBROADCAST loads don't correctly respect dependencies
As discussed in PR30596, this is a preliminary test update before we can merge r283070

Note: This required the test to be regenerated after the merge as 3.9.1 doesn't have trunk's latest lea -> mov simplifications

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@286248 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 16:45:26 +00:00
Simon Pilgrim
46079999e9 [3.9.1] Merging r280837 [X86] Don't reduce the width of vector mul if the target doesn't support SSE2.
The patch is to fix PR30298, which is caused by rL272694. The solution is to
bail out if the target has no SSE2.

Differential Revision: https://reviews.llvm.org/D24288

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@282753 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-29 19:16:52 +00:00
Matthias Braun
2eb3d6dc8d Cherry pick r281957 (see http://llvm.org/PR30463)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@282615 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-28 18:17:12 +00:00
Hans Wennborg
e87f84d870 Merging r278559:
------------------------------------------------------------------------
r278559 | efriedma | 2016-08-12 13:28:02 -0700 (Fri, 12 Aug 2016) | 7 lines

[AArch64LoadStoreOpt] Handle offsets correctly for post-indexed paired loads.

Trunk would try to create something like "stp x9, x8, [x0], #512", which isn't actually a valid instruction.

Differential revision: https://reviews.llvm.org/D23368


------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@279123 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-18 19:43:50 +00:00
Hans Wennborg
3fa944fd5e Merging r278562:
------------------------------------------------------------------------
r278562 | efriedma | 2016-08-12 13:39:51 -0700 (Fri, 12 Aug 2016) | 7 lines

[AArch64LoadStoreOptimizer] Check aliasing correctly when creating paired loads/stores.

The existing code accidentally skipped the aliasing check in edge cases.

Differential revision: https://reviews.llvm.org/D23372


------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@279107 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-18 18:14:41 +00:00
Hans Wennborg
4126f23e17 Merging r278900:
------------------------------------------------------------------------
r278900 | cycheng | 2016-08-16 20:17:44 -0700 (Tue, 16 Aug 2016) | 12 lines

[ppc64] Don't apply sibling call optimization if callee has any byval arg

This is a quick work around, because in some cases, e.g. caller's stack
size > callee's stack size, we are still able to apply sibling call
optimization even callee has any byval arg.

This patch fix: https://llvm.org/bugs/show_bug.cgi?id=28328

Reviewers: hfinkel kbarton nemanjai amehsan
Subscribers: hans, tjablin

https://reviews.llvm.org/D23441
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@278990 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-17 22:03:07 +00:00
Hans Wennborg
5b9cd0413c Merging r278841:
------------------------------------------------------------------------
r278841 | haicheng | 2016-08-16 13:06:25 -0700 (Tue, 16 Aug 2016) | 3 lines

[BranchFolding] Change a test case of r278575.

Rename the operands to make the test less brittle.
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@278874 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-17 00:15:15 +00:00
Hans Wennborg
56dbbe30a2 Merging r278575 (with changes to the test):
------------------------------------------------------------------------
r278575 | haicheng | 2016-08-12 16:13:38 -0700 (Fri, 12 Aug 2016) | 6 lines

Reapply [BranchFolding] Restrict tail merging loop blocks after MBP

Fixed a bug in the test case.

To fix PR28104, this patch restricts tail merging to blocks that belong to the
same loop after MBP.
------------------------------------------------------------------------

I had to adjust the test as it wasn't passing on the branch, presumably
due to different machine block placement.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@278827 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-16 17:51:12 +00:00
Hans Wennborg
23fab3af81 Merging r278370:
------------------------------------------------------------------------
r278370 | mkuper | 2016-08-11 10:38:33 -0700 (Thu, 11 Aug 2016) | 7 lines

Make TwoAddressInstructionPass::rescheduleMIBelowKill subreg-aware

This fixes PR28824.

Differential Revision: https://reviews.llvm.org/D23220


------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@278422 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-11 21:39:47 +00:00
Hans Wennborg
3248ce68a1 Merging r276051 and r276823:
------------------------------------------------------------------------
r276051 | arsenm | 2016-07-19 16:16:53 -0700 (Tue, 19 Jul 2016) | 8 lines

AMDGPU: Change fdiv lowering based on !fpmath metadata

If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.

Simplify the lowering tests by using per function
subtarget features.
------------------------------------------------------------------------

------------------------------------------------------------------------
r276823 | arsenm | 2016-07-26 16:25:44 -0700 (Tue, 26 Jul 2016) | 4 lines

AMDGPU: Use rcp for fdiv 1, x with fpmath metadata

Using rcp should be OK for safe math usually, so this
should not be replacing the original fdiv.
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@278243 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-10 16:45:40 +00:00
Hans Wennborg
a3332eebf6 Merging r278002:
------------------------------------------------------------------------
r278002 | sbaranga | 2016-08-08 06:13:57 -0700 (Mon, 08 Aug 2016) | 18 lines

[AArch64] PR28877: Don't assume we're running after legalization when creating vcvtfp2fxs

Summary:
The DAG combine transformation that was generating the
aarch64_neon_vcvtfp2fxs node was assuming that all
inputs where legal and wasn't accounting that the input
could be a v4f64 if we're trying to do the transformation
before legalization. We now bail out in this case.

All illegal types besides v4f64 were already rejected.

Fixes https://llvm.org/bugs/show_bug.cgi?id=28877.

Reviewers: jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23261
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@278239 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-10 16:27:26 +00:00
Hans Wennborg
51644ff06b Merging r278086:
------------------------------------------------------------------------
r278086 | matze | 2016-08-08 18:47:26 -0700 (Mon, 08 Aug 2016) | 6 lines

X86InstrInfo: Update liveness in classifyLea()

We need to update liveness information when we create COPYs in
classifyLea().

This fixes http://llvm.org/28301
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@278128 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-09 15:48:01 +00:00
Hans Wennborg
7710002274 Merging r277504:
------------------------------------------------------------------------
r277504 | nha | 2016-08-02 12:31:14 -0700 (Tue, 02 Aug 2016) | 20 lines

AMDGPU: Stay in WQM for non-intrinsic stores

Summary:
Two types of stores are possible in pixel shaders: stores to memory that are
explicitly requested at the API level, and stores that are an implementation
detail of register spilling or lowering of arrays.

For the first kind of store, we must ensure that helper pixels have no effect
and hence WQM must be disabled. The second kind of store must always be
executed, because the written value may be loaded again in a way that is
relevant for helper pixels as well -- and there are no externally visible
effects anyway.

This is a candidate for the 3.9 release branch.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D22675
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@277620 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-03 18:13:01 +00:00
Hans Wennborg
d404fac34a Merging r277500:
------------------------------------------------------------------------
r277500 | nha | 2016-08-02 12:17:37 -0700 (Tue, 02 Aug 2016) | 17 lines

AMDGPU: Track physical registers in SIWholeQuadMode

Summary:
There are cases where uniform branch conditions are computed in VGPRs, and
we didn't correctly mark those as WQM.

The stray change in basic-branch.ll is because invoking the LiveIntervals
analysis leads to the detection of a dead register that would otherwise not
be seen at -O0.

This is a candidate for the 3.9 branch, as it fixes a possible hang.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22673
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@277619 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-03 18:09:48 +00:00
Hans Wennborg
f89a462dfb Merging r277371:
------------------------------------------------------------------------
r277371 | mkuper | 2016-08-01 12:39:49 -0700 (Mon, 01 Aug 2016) | 9 lines

[DAGCombine] Make sext(setcc) combine respect getBooleanContents

We used to combine "sext(setcc x, y, cc) -> (select (setcc x, y, cc), -1, 0)"
Instead, we should combine to (select (setcc x, y, cc), T, 0) where the value
of T is 1 or -1, depending on the type of the setcc, and getBooleanContents()
for the type if it is not i1.

This fixes PR28504.

------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@277509 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-02 19:54:53 +00:00
Hans Wennborg
b55453a38e Merging r276648:
------------------------------------------------------------------------
r276648 | delena | 2016-07-25 09:51:00 -0700 (Mon, 25 Jul 2016) | 6 lines

AVX-512: Fixed [US]INT_TO_FP selection for i1 vectors.
It failed with assertion before this patch.

Differential Revision: https://reviews.llvm.org/D22735


------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@277508 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-02 19:41:53 +00:00
Renato Golin
358d8cce26 Merging r276701 and r277439
The saturation instructions appeared in v6T2 / DSP extensions, but they
were being accepted / generated on any, with the new introduction of the
saturation detection in the back-end. This commit restricts the usage to
v6T2 / DSP-enable only cores.

Fixes PR28607.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@277440 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-02 10:26:08 +00:00
Hans Wennborg
86c801f95b Merging r276435:
------------------------------------------------------------------------
r276435 | arsenm | 2016-07-22 10:01:21 -0700 (Fri, 22 Jul 2016) | 4 lines

AMDGPU: Fix i1 fp_to_int

R600's i1 fp_to_uint selected but was incorrect according to
what instcombine constant folds to.
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@277082 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 23:29:33 +00:00
Hans Wennborg
4444fc423b Merging r276119:
------------------------------------------------------------------------
r276119 | yaxunl | 2016-07-20 07:38:06 -0700 (Wed, 20 Jul 2016) | 3 lines

AMDGPU: Fix bug causing crash due to invalid opencl version metadata.

Differential Revision: https://reviews.llvm.org/D22526
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@277079 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 23:18:47 +00:00
Hans Wennborg
7653e62294 Merging r275869:
------------------------------------------------------------------------
r275869 | arsenm | 2016-07-18 11:34:53 -0700 (Mon, 18 Jul 2016) | 7 lines

AMDGPU: Remove dead check in AMDGPUPromoteAlloca

This is currently only called with GEP users. A direct
alloca would only happen with current typed pointers
for arrays which are a perverse case.

Also fix crashes on 0 x and 1 x arrays.
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@277077 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 23:12:00 +00:00
Hans Wennborg
7230d55ab6 Merging r275981 and r276740:
------------------------------------------------------------------------
r275981 | rksimon | 2016-07-19 08:07:43 -0700 (Tue, 19 Jul 2016) | 13 lines

[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR

D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.

It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).

This patch changes both scalar and packed versions back to using x86-specific builtins.

It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.

A companion clang patch is at D22105

Differential Revision: https://reviews.llvm.org/D22106
------------------------------------------------------------------------

------------------------------------------------------------------------
r276740 | rksimon | 2016-07-26 03:41:28 -0700 (Tue, 26 Jul 2016) | 5 lines

[X86][SSE] Fixed issue with memory folding of (v)cvtsd2ss intrinsics

Fixed typo in the intrinsic definitions of (v)cvtsd2ss with memory folding.

This was only unearthed when rL276102 started using the intrinsic again.....
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@276990 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 15:38:57 +00:00
Hans Wennborg
8830176842 Merging r275928 and r276438:
------------------------------------------------------------------------
r275928 | arsenm | 2016-07-18 16:09:51 -0700 (Mon, 18 Jul 2016) | 1 line

AMDGPU: Fix test name and broken CHECK-LABEL
------------------------------------------------------------------------

------------------------------------------------------------------------
r276438 | arsenm | 2016-07-22 10:01:33 -0700 (Fri, 22 Jul 2016) | 6 lines

AMDGPU: Fix groupstaticsize for large LDS

The size can exceed s_movk_i32's limit, and we don't
want to use it this early since it inhibits optimizations.

This should probably be merged to the release branch.
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@276664 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-25 17:42:22 +00:00
Daniel Sanders
d1b48cb318 Merging r275967:
------------------------------------------------------------------------
r275967 | dsanders | 2016-07-19 11:49:03 +0100 (Tue, 19 Jul 2016) | 16 lines

[mips] Correct label prefixes for N32 and N64.

Summary:
N32 and N64 follow the standard ELF conventions (.L) whereas O32 uses its own
($).

This fixes the majority of object differences between -fintegrated-as and
-fno-integrated-as.

Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: https://reviews.llvm.org/D22412


------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@276561 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-24 11:39:45 +00:00
Tim Northover
00d9b1b37f Merging r275866:
------------------------------------------------------------------------
r275866 | tnorthover | 2016-07-18 11:28:52 -0700 (Mon, 18 Jul 2016) | 6 lines

CodeGenPrep: use correct function to determine Global's alignment.

Elsewhere (particularly computeKnownBits) we assume that a global will be
aligned to the value returned by Value::getPointerAlignment. This is used to
boost the alignment on memcpy/memset, so any target-specific request can only
increase that value.
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@275918 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 21:36:33 +00:00
Hans Wennborg
559cd2e3e7 Merging r275870:
------------------------------------------------------------------------
r275870 | arsenm | 2016-07-18 11:34:59 -0700 (Mon, 18 Jul 2016) | 1 line

AMDGPU/R600: Replace barrier intrinsics
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_39@275896 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 20:03:22 +00:00
Simon Pilgrim
c91180f272 [X86][AVX] Add target shuffle decode support for VBROADCAST
Currently we only decode broadcasts from a vector of the same size.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275823 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 17:32:59 +00:00
Krzysztof Parzyszek
4cb51c5c01 [Hexagon] Handle returning small structures by value
This is compliant with the official ABI, but allows experimentation with
calling conventions.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275822 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 17:30:41 +00:00
Chih-Hung Hsieh
f94271deae [X86] Accept SELECT op code for x86-64 fp128 type
DAGTypeLegalizer::CanSkipSoftenFloatOperand should allow
SELECT op code for x86_64 fp128 type for MME targets,
so SoftenFloatOperand does not abort on SELECT op code.

Differential Revision: http://reviews.llvm.org/D21758


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275818 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 17:20:09 +00:00
Simon Pilgrim
00a0a786d0 [X86][AVX2] Added tests that demonstrate duplicate broadcasts
We don't yet decode broadcasts as a target shuffle

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275808 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 16:17:34 +00:00
Krzysztof Parzyszek
a7c00b136c [Hexagon] Enable .cur formation in MISched for Hexagon V60
Schedule a load and its use in the same packet in MISched. Previously,
isResourceAvailable was returning false for dependences in the same
packet, which prevented MISched from packetizing a load and its use in
the same packet for v60.

Patch by Ikhlas Ajbar.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275804 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 16:05:27 +00:00
Nemanja Ivanovic
fe4ad6d3ea [PowerPC] Remove redundant direct moves when extracting integers and converting to FP
This patch corresponds to review:
https://reviews.llvm.org/D21354

We use direct moves for extracting integer elements from vectors. We also use
direct moves when converting integers to FP. When these operations are chained,
we get a direct move out of a VSR followed by a direct move back into a VSR.
These are redundant - all we need to do is line up the element and convert.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275796 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 15:30:00 +00:00
Krzysztof Parzyszek
56af121d06 [Hexagon] Use timing class info as tie-breaker in machine scheduler
Patch by Sirish Pande.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275794 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 15:17:10 +00:00
Krzysztof Parzyszek
98b655feba [Hexagon] HexagonMachineScheduler should account for resources
The machine scheduler needs to account for available resources
more accurately in order to avoid scheduling an instruction that
forces a new packet to be created.

This occurs in two ways: First, an instruction without an available
resource may have a large priority due to other metrics and be
scheduled when there are other instructions with available resources.
Second, an instruction with a non-zero latency may become available
prematurely. In both these cases, we attempt change the priority
in order to allow a better instruction to be scheduled.

Patch by Brendon Cahoon.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275793 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 14:52:13 +00:00
Krzysztof Parzyszek
9547556e81 [Hexagon] Fix zero latency instructions with multiple predecessors
An instruction may have multiple predecessors that are candidates
for using .cur. However, only one of them can use .cur in the
packet. When this case occurs, we need to make sure that only
one of the dependences gets a 0 latency value.

Patch by Brendon Cahoon.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275790 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 14:23:10 +00:00