Commit Graph

16048 Commits

Author SHA1 Message Date
Alexei Starovoitov
2462cfbd35 BPF: emit an error message for unsupported signed division operation
Signed-off-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@fb.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263842 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-18 22:02:47 +00:00
Chad Rosier
b0c326b0a1 [AArch64] Enable more load clustering in the MI Scheduler.
This patch adds unscaled loads and sign-extend loads to the TII
getMemOpBaseRegImmOfs API, which is used to control clustering in the MI
scheduler. This is done to create more opportunities for load pairing.  I've
also added the scaled LDRSWui instruction, which was missing from the scaled
instructions. Finally, I've added support in shouldClusterLoads for clustering
adjacent sext and zext loads that too can be paired by the load/store optimizer.

Differential Revision: http://reviews.llvm.org/D18048

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263819 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-18 19:21:02 +00:00
Nicolai Haehnle
a99f51df81 AMDGPU: Overload return type of llvm.amdgcn.buffer.load.format
Summary:
Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before
the frontend patches land in Mesa. Eventually, we may want to automatically
reduce the size of loads at the LLVM IR level, which requires such overloads,
and in some cases Mesa can generate them directly.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18255

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263792 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-18 16:24:40 +00:00
Nicolai Haehnle
e9f50a4929 AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics
Summary:
These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used
by Mesa to implement atomics with buffer semantics. The intrinsic interface
matches that of buffer.load.format and buffer.store.format, except that the
GLC bit is not exposed (it is automatically deduced based on whether the
return value is used).

The change of hasSideEffects is required for TableGen to accept the pattern
that matches the intrinsic.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, rivanvx, llvm-commits

Differential Revision: http://reviews.llvm.org/D18151

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263791 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-18 16:24:31 +00:00
Nicolai Haehnle
71acdb08db AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format
Summary:
We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend
cannot easily make use of an explicit soffset parameter either. Furthermore,
it is likely that in the future, LLVM will be in a better position than the
frontend to choose an SGPR offset if possible.

Since there aren't any frontend uses of these intrinsics in upstream
repositories yet, I would like to take this opportunity to change the
intrinsic signatures to a single offset parameter, which is then selected
to immediate offsets or voffsets using a ComplexPattern.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18218

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263790 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-18 16:24:20 +00:00
Sam Kolton
5460c31240 [AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3
Review: http://reviews.llvm.org/D18267

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263789 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-18 15:35:51 +00:00
Tim Shen
0ead5321f0 [PPC, FastISel] Fix ordered/unordered fcmp
For fcmp, major concern about the following 6 cases is NaN result. The
comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered),
only one of which will be set. The result is generated by fcmpu
instruction. However, bc instruction only inspects one of the first 3
bits, so when un is set, bc instruction may jump to to an undesired
place.

More specifically, if we expect an unordered comparison and un is set, we
expect to always go to true branch; in such case UEQ, UGT and ULT still
give false, which are undesired; but UNE, UGE, ULE happen to give true,
since they are tested by inspecting !eq, !lt, !gt, respectively.

Similarly, for ordered comparison, when un is set, we always expect the
result to be false. In such case OGT, OLT and OEQ is good, since they are
actually testing GT, LT, and EQ respectively, which are false. OGE, OLE
and ONE are tested through !lt, !gt and !eq, and these are true.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263753 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-17 22:27:58 +00:00
Tim Northover
dfee902f30 ARM: stop asserting on weird <3 x Ty> vectors in ISelLowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263741 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-17 20:10:28 +00:00
Petar Jovanovic
cd74bb6415 [PowerPC] Disable CTR loops optimization for soft float operations
This patch prevents CTR loops optimization when using soft float operations
inside loop body. Soft float operations use function calls, but function
calls are not allowed inside CTR optimized loops.

Patch by Aleksandar Beserminji.

Differential Revision: http://reviews.llvm.org/D17600


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263727 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-17 17:11:33 +00:00
Derek Schuff
3d15212029 [WebAssembly] Stackify code emitted by eliminateFrameIndex and SP writeback
Summary:
MRI::eliminateFrameIndex can emit several instructions to do address
calculations; these can usually be stackified. Because instructions with
FI operands can have subsequent operands which may be expression trees,
find the top of the leftmost tree and insert the code before it, to keep
the LIFO property.

Also use stackified registers when writing back the SP value to memory
in the epilog; it's unnecessary because SP will not be used after the
epilog, and it results in better code.

Differential Revision: http://reviews.llvm.org/D18234

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263725 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-17 17:00:29 +00:00
Changpeng Fang
6405fe8e88 AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute
Symmary:
  ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed.

Reviewers
  arsenm, tstellarAMD

Subscribers
  llvm-commits, arsenm

Differential Revision:
  http://reviews.llvm.org/D18197

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263720 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-17 16:43:50 +00:00
Simon Pilgrim
96ac27b20b [X86][SSE] Simplified blend-with-zero combining
We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines

This patch stops the combine if we already have a blend in place (means we miss some domain corrections)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263717 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-17 15:59:36 +00:00
Saleem Abdulrasool
a944db98cd ARM: Revert SVN r253865, 254158, fix windows division
The two changes together weakened the test and caused a regression with division
handling in MSVC mode.  They were applied to avoid an assertion being triggered
in the block frequency analysis.  However, the underlying problem was simply
being masked rather than solved properly.  Address the actual underlying problem
and revert the changes.  Rather than analyze the cause of the assertion, the
division failure was assumed to be an overflow.

The underlying issue was a subtle bug in the BB construction in the emission of
the div-by-zero check (WIN__DBZCHK).  We did not construct the proper successor
information in the basic blocks, nor did we update the PHIs associated with the
basic block when we split them.  This would result in assertions being triggered
in the block frequency analysis pass.

Although the original tests are being removed, the tests themselves performed
very little in terms of validation but merely tested that we did not assert when
generating code.  Update this with new tests that actually ensure that we do not
regress on the code generation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263714 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-17 14:10:49 +00:00
Nicolai Haehnle
d22ce50fea AMDGPU: Prevent uniform loops from becoming infinite
Summary:
Uniform loops where the branch leaving the loop is predicated on VCCNZ
must be skipped if EXEC = 0, otherwise they will be infinite.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18137

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263658 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-16 20:14:33 +00:00
Simon Pilgrim
1372924ef8 [X86] Reduced alignment of widened vector load/stores to better match PR26953 cases
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263649 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-16 18:32:44 +00:00
Simon Pilgrim
10e0424cf0 [X86] Regenerated + extended widened vector conversion tests
- Ensure we test X86 + X64
- sitopfp / uitofp requires testing for SSE2 and SSE42 as well (part of the fix for PR26953)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263640 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-16 15:33:43 +00:00
Igor Breger
81434f47ae AVX512BW: Fix SRA v64i8 lowering. Use PCMPGTM (cmp result in k register) for 512bit vector because PCMPGT supported only for 128/256bit.
Differential Revision: http://reviews.llvm.org/D18204

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263624 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-16 08:48:26 +00:00
Simon Pilgrim
a235a96895 [X86] Regenerated widen load tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263608 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-16 00:41:21 +00:00
Simon Pilgrim
e30740e25e [X86][SSE41] Additional tests for extracting zeroable shuffle elements
We can currently only match zeroable vector elements of the same size as the shuffle type - these tests demonstrate the problem and a solution will be shortly added in an updated D14261

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263606 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-16 00:13:36 +00:00
Quentin Colombet
c0ed3e9a27 [MIR] Add a test case for the diagnostic of a wrongly typed generic instruction
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263573 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-15 18:31:29 +00:00
Quentin Colombet
02dabc9fee [AArch64] Move GlobalISel test cases into a GlobalISel subdirectory
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263572 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-15 18:30:00 +00:00
Changpeng Fang
53914149bf AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS
Summary:
  Static LDS size is saved in MachineFunctionInfo::LDSSize,
We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter,
we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize.

Reviewers:
    arsenm
    tstellarAMD

Subscribers
    llvm-commits, arsenm

Differential Revision:
   http://reviews.llvm.org/D18064

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263563 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-15 17:28:44 +00:00
Lang Hames
875b7560fb [MachO] Add MachO alt-entry directive support.
This patch adds support for the MachO .alt_entry assembly directive, and uses
it for global aliases with non-zero GEP offsets. The alt_entry flag indicates
that a symbol should be layed out immediately after the preceding symbol.
Conceptually it introduces an alternate entry point for a function or data
structure. E.g.:

safe_foo:
  // check preconditions for foo
.alt_entry fast_foo
fast_foo:
  // body of foo, can assume preconditions.

The .alt_entry flag is also implicitly set on assembly aliases of the form:

a = b + C

where C is a non-zero constant, since these have the same effect as an
alt_entry symbol: they introduce a label that cannot be moved relative to the
preceding one. Setting the alt_entry flag on aliases of this form fixes
http://llvm.org/PR25381.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263521 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-15 01:43:05 +00:00
Eric Christopher
a27966ebf6 Temporarily Revert "[X86][SSE] Simplify vector LOAD + EXTEND on
pre-SSE41 hardware" as it seems to be causing crashes during code
generation in halide. PR forthcoming.

This reverts commit r263303.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263512 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-14 23:59:57 +00:00
Chad Rosier
4005bceb12 [AArch64] Break the dependency between FP and SP when possible.
When the SP in not changed because of realignment/VLAs etc., we restore the SP
by using the previous value of SP and not the FP. Breaking the dependency will
help in cases when the epilog of a callee is close to the epilog of the caller;
for then "sub sp, fp, #" depends on the load restoring the FP in the epilog of
the callee.

http://reviews.llvm.org/D18060
Patch by Aditya Kumar and Evandro Menezes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263458 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-14 18:17:41 +00:00
Tom Stellard
f53246799f AMDGPU/SI: Handle wait states required for DPP instructions
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17543

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263447 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-14 17:05:56 +00:00
Sanjay Patel
d26914da05 [x86, AVX] replace masked load with full vector load when possible
Converting masked vector loads to regular vector loads for x86 AVX should always be a win.
I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any
objections.

1. x86 already does this kind of optimization for multiple scalar loads -> vector load.
2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner.

Differential Revision: http://reviews.llvm.org/D18094



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263446 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-14 16:54:43 +00:00
Daniel Sanders
eacb2ec057 [mips] MIPS32R6 compact branch support
Summary:
MIPSR6 introduces a class of branches called compact branches. Unlike the
traditional MIPS branches which have a delay slot, compact branches do not
have a delay slot. The instruction following the compact branch is only
executed if the branch is not taken and must not be a branch.

It works by generating compact branches for MIPS32R6 when the delay slot
filler cannot fill a delay slot. Then, inspecting the generated code for
forbidden slot hazards (a compact branch with an adjacent branch or other
CTI) and inserting nops to clear this hazard.

Patch by Simon Dardis.

Reviewers: vkalintiris, dsanders

Subscribers: MatzeB, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D16353


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263444 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-14 16:24:05 +00:00
Marek Olsak
01d3696081 AMDGPU/SI: Incomplete shader binaries need to finish execution at the end
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D18058

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263441 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-14 15:57:14 +00:00
Ulrich Weigand
55474c1f67 [SystemZ] Avoid LER on z13 due to partial register dependencies
On the z13, it turns out to be more efficient to access a full
floating-point register than just the upper half (as done e.g.
by the LE and LER instructions).

Current code already takes this into account when loading from
memory by using the LDE instruction in place of LE.  However,
we still generate LER, which shows the same performance issues
as LE in certain circumstances.

This patch changes the back-end to emit LDR instead of LER to
implement FP32 register-to-register copies on z13.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263431 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-14 13:50:03 +00:00
Zlatko Buljan
a4bfc57321 [mips] Fix an issue with long double when function roundl is defined
Differential Revision: http://reviews.llvm.org/D17760


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263428 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-14 12:50:23 +00:00
Igor Breger
aecc6a2077 AVX512: icmp operation should be always lowered to CMPM (AVX-512) instruction on SKX.
implemented by delena

Differential Revision: http://reviews.llvm.org/D18054

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263417 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-14 10:26:39 +00:00
Simon Pilgrim
38258959b8 [X86][XOP] Added target shuffle combine tests for XOP's VPPERM 2-op shuffle
Actual combing support will be added in a future patch

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263402 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-14 00:18:26 +00:00
Simon Pilgrim
75aa6fae3a [X86][SSE] Added truncated vector arithmetic tests.
For cases where we are truncating an integer vector arithmetic result, it may be better to pre-truncate the input operands - no code to support this yet (scalar is done with SimplifyDemandedBits but adding vector support could be a lot of work) but these tests represent the current codegen status.

Example bugs: PR14666, PR22703

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263384 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-13 19:08:01 +00:00
Simon Pilgrim
b419ca4ee0 [X86][SSE41] Avoid variable blend for constant v8i16 shifts
The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263383 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-13 18:35:59 +00:00
Nemanja Ivanovic
6f20310e9e Fix for PR 26378
This patch corresponds to review:
http://reviews.llvm.org/D17712

We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This
caused duplicate definition asserts when the pass is reused on the module
(i.e. with -compile-twice or in JIT contexts).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263338 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-12 10:23:07 +00:00
Quentin Colombet
f02ed558bd [X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer.
cmpxchg[8|16]b uses RBX as one of its argument.
In other words, using this instruction clobbers RBX as it is defined to hold one
the input. When the backend uses dynamically allocated stack, RBX is used as a
reserved register for the base pointer. 

Reserved registers have special semantic that only the target understands and
enforces, because of that, the register allocator don’t use them, but also,
don’t try to make sure they are used properly (remember it does not know how
they are supposed to be used).

Therefore, when RBX is used as a reserved register but defined by something that
is not compatible with that use, the register allocator will not fix the
surrounding code to make sure it gets saved and restored properly around the
broken code. This is the responsibility of the target to do the right thing with
its reserved register.

To fix that, when the base pointer needs to be preserved, we use a different
pseudo instruction for cmpxchg that save rbx.
That pseudo takes two more arguments than the regular instruction:
- One is the value to be copied into RBX to set the proper value for the
  comparison.
- The other is the virtual register holding the save of the value of RBX as the
  base pointer. This saving is done as part of isel (i.e., we emit a copy from
  rbx).

cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp

This gets expanded into:
rbx = copy input_for_rbx_reg
cmpxchg <regular cmpxchg args>
rbx = save_of_rbx_as_bp

Note: The actual modeling of the pseudo is a bit more complicated to make sure
the interferes that appears after the pseudo gets expanded are properly modeled
before that expansion.

This fixes PR26883.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263325 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-12 02:25:27 +00:00
Simon Pilgrim
ddeaf238c9 [X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.

We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.

Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).

Differential Revision: http://reviews.llvm.org/D17932

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263303 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-11 22:18:05 +00:00
Ahmed Bougacha
cdae7a5e30 [AArch64] Don't blindly lower f16/f128 FCCMPs.
Instead, extend f16 (like we do when lowering a standalone SETCC),
and let f128 be legalized to the RT calls.

Fixes PR26803.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263301 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-11 22:02:58 +00:00
Chad Rosier
9c9879a621 Update test case to appease bots after 263255.
I'll follow up with Matt to confirm this is the correct fix.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263268 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-11 17:33:36 +00:00
Quentin Colombet
36053724e3 [IRTranslator] Translate unconditional branches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263265 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-11 17:28:03 +00:00
Simon Pilgrim
91f04ade7e [X86][AVX] Fixed issue where a long chain of shuffles could attempt to combine to a single (illegal) PSHUFB instruction.
Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263239 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-11 14:39:10 +00:00
Vasileios Kalintiris
aaa219c75c [mips] MIPSR6 Instruction itineraries
Summary: Defines instruction itineraries for common MIPSR6 instructions.

Patch by Simon Dardis.

Reviewers: vkalintiris

Subscribers: MatzeB, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D17198

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263229 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-11 13:05:06 +00:00
Nikolay Haustov
63cffd62c1 [AMDGPU] Assembler: change v_madmk operands to have same order as mad.
The constant is now at source operand 1 (previously at 2).
This is also how it is in legacy AMD sp3 assembler.
Update tests.

Differential Revision: http://reviews.llvm.org/D17984

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263212 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-11 09:27:25 +00:00
Matt Arsenault
e4e707f153 AMDGPU: Materialize sign bits with bfrev
If a constant is the same as the reverse of an inline immediate,
this is 4 bytes smaller than having to embed a 32-bit literal.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263201 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-11 07:42:49 +00:00
Tim Northover
32048586b0 AArch64: only try to use scaled fcvt ops on legal vector types.
Before we ended up calling getSimpleVectorType on a <3 x float>, which
asserted.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263169 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-10 23:02:21 +00:00
Simon Pilgrim
86875b1fdf [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).

Differential Revision: http://reviews.llvm.org/D17691

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263159 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-10 20:40:26 +00:00
Artur Pilipenko
980df33d17 Support arbitrary addrspace pointers in masked load/store intrinsics
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263158 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-10 20:39:22 +00:00
Balaram Makam
21374d486c Fix testicase to turn buildbot green. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263154 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-10 19:07:50 +00:00
Nicolai Haehnle
f0eb7094d4 AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.

The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.

For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.

BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17277

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263140 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-10 18:43:50 +00:00