Commit Graph

30580 Commits

Author SHA1 Message Date
Chandler Carruth
a6cc351c5b [x86] Teach the new vector shuffle lowering to use 'punpcklwd' and
'punpckhwd' instructions when suitable rather than falling back to the
generic algorithm.

While we could canonicalize to these patterns late in the process, that
wouldn't help when the freedom to use them is only visible during
initial lowering when undef lanes are well understood. This, it turns
out, is very important for matching the shuffle patterns that are used
to lower sign extension. Fixes a small but relevant regression in
gcc-loops with the new lowering.

When I changed this I noticed that several 'pshufd' lowerings became
unpck variants. This is bad because it removes the ability to freely
copy in the same instruction. I've adjusted the widening test to handle
undef lanes correctly and now those will correctly continue to use
'pshufd' to lower. However, this caused a bunch of churn in the test
cases. No functional change, just churn.

Both of these changes are part of addressing a general weakness in the
new lowering -- it doesn't sufficiently leverage undef lanes. I've at
least a couple of patches that will help there at least in an academic
sense.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217752 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 09:02:37 +00:00
Chandler Carruth
e610c324e1 [x86] Teach the new vector shuffle lowering to use BLENDPS and BLENDPD.
These are super simple. They even take precedence over crazy
instructions like INSERTPS because they have very high throughput on
modern x86 chips.

I still have to teach the integer shuffle variants about this to avoid
so many domain crossings. However, due to the particular instructions
available, that's a touch more complex and so a separate patch.

Also, the backend doesn't seem to realize it can commute blend
instructions by negating the mask. That would help remove a number of
copies here. Suggestions on how to do this welcome, it's an area I'm
less familiar with.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217744 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-14 23:43:33 +00:00
Chandler Carruth
33957173a7 [x86] Teach the vector combiner that picks a canonical shuffle from to
support transforming the forms from the new vector shuffle lowering to
use 'movddup' when appropriate.

A bunch of the cases where we actually form 'movddup' don't actually
show up in the test results because something even later than DAG
legalization maps them back to 'unpcklpd'. If this shows back up as
a performance problem, I'll probably chase it down, but it is at least
an encoded size loss. =/

To make this work, also always do this canonicalizing step for floating
point vectors where the baseline shuffle instructions don't provide any
free copies of their inputs. This also causes us to canonicalize
unpck[hl]pd into mov{hl,lh}ps (resp.) which is a nice encoding space
win.

There is one test which is "regressed" by this: extractelement-load.
There, the test case where the optimization it is testing *fails*, the
exact instruction pattern which results is slightly different. This
should probably be fixed by having the appropriate extract formed
earlier in the DAG, but that would defeat the purpose of the test.... If
this test case is critically important for anyone, please let me know
and I'll try to work on it. The prior behavior was actually contrary to
the comment in the test case and seems likely to have been an accident.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217738 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-14 22:41:37 +00:00
James Molloy
4fb9bbe72f [A57FPLoadBalancing] Modify r217689 - actually we do need to check defs
... Just make sure we check uses first so we see the kill first. It
turns out ignoring defs gives some pretty nasty runtime failures.
I'm certain this is the fix but I'm still reducing a testcase.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217735 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-14 18:24:26 +00:00
Juergen Ributzka
5bf1f01c15 [FastISel][AArch64] Add support for non-native types for logical ops.
Extend the logical ops selection to also support non-native types such as i1,
i8, and i16.

Fixes rdar://problem/18330589.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217732 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-13 23:46:28 +00:00
Matt Arsenault
75d7f73678 Fix typo
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217730 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-13 19:58:27 +00:00
Chad Rosier
995738064a [AArch64] Don't enable the post-RA MI scheduler at OptNone.
Hopefully, this will appease the bots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217712 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 22:17:28 +00:00
Yaron Keren
0f39f35425 The MCAssembler.h include isn't used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217705 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 20:29:17 +00:00
Chad Rosier
4fb3a966d0 [AArch64] Enable post-RA MI scheduler.
Phabricator Revision: http://reviews.llvm.org/D5278
Patch by Sanjin Sijaric!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217693 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 17:40:39 +00:00
James Molloy
ca332457ba [A57FPLoadBalancing] Remove support for vector types
Vector MUL/MLAs have tied operands, which gives us extra constraints
that we currently can't handle. Instead of silently doing the wrong
thing, remove support to be readded later properly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217690 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 16:55:32 +00:00
James Molloy
b4fbcbc288 [A57FPLoadBalancing] Ignore <def>s when checking if a chain may be killed.
Defs are seen before uses, so a def without the kill flag doesn't necessarily
mean that the register is not killed on that instruction. It may be killed
in a later use operand.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217689 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 16:55:26 +00:00
James Molloy
9204ab0456 [A57LoadBalancing] unique_ptr-ify.
Thanks to David Blakie for the in-depth review!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217682 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 14:35:17 +00:00
Zoran Jovanovic
614d8681e0 [mips][microMIPS] Implement JRADDIUSP instruction
Differential Revision: http://reviews.llvm.org/D5046


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217681 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 14:29:54 +00:00
Bill Schmidt
183704cb08 Address comments on r217622
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217680 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 14:26:36 +00:00
Zoran Jovanovic
7fd9d5636a [mips][microMIPS] Implement BGEZALS and BLTZALS instructions
Differential Revision: http://reviews.llvm.org/D5004


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217678 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 13:51:58 +00:00
Zoran Jovanovic
cf6da9bed3 [mips][microMIPS] Implement JALS and JALRS instructions.
Differential Revision: http://reviews.llvm.org/D5003


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217676 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 13:43:41 +00:00
Zoran Jovanovic
75449bc4d7 [mips][microMIPS] Implement TLBP, TLBR, TLBWI and TLBWR instructions
Differential Revision: http://reviews.llvm.org/D5211


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217675 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 13:33:33 +00:00
James Molloy
94c25519a2 [ARM] Teach the cost model that cross-class copies are costly.
Cross-class copies being expensive is actually a trait of the microarchitecture, but as I haven't yet seen an example of a microarchitecture where they're cheap it seems best to just enable this by default, covering the non-mcpu build case.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217674 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 13:29:40 +00:00
Patrik Hagglund
4a93e8dd02 Fix gcc -Wpedantic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217669 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 12:32:08 +00:00
Craig Topper
275458e7da Remove a temporary variable and just construct a unique_ptr directly using make_unique.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217655 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 05:17:20 +00:00
Matt Arsenault
86ffcddf42 R600/SI: Fix off by 1 error in used register count
The register numbers start at 0, so if only 1 register
was used, this was reported as 0.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217636 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 22:51:37 +00:00
Bill Schmidt
24ebd0edd6 [PATCH, PowerPC] Accept 'U' and 'X' constraints in inline asm
Inline asm may specify 'U' and 'X' constraints to print a 'u' for an
update-form memory reference, or an 'x' for an indexed-form memory
reference.  However, these are really only useful in GCC internal code
generation.  In inline asm the operand of the memory constraint is
typically just a register containing the address, so 'U' and 'X' make
no sense.

This patch quietly accepts 'U' and 'X' in inline asm patterns, but
otherwise does nothing.  If we ever unexpectedly see a non-register,
we'll assert and sort it out afterwards.

I've added a new test for these constraints; the test case should be
used for other asm-constraints changes down the road.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217622 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 20:10:03 +00:00
Brad Smith
da9bce2e13 Provide an implementation of getNoopForMachoTarget for SPARC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217611 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 17:40:51 +00:00
Adam Nemet
49f31255be [AVX512] Fix miscompile for unpack
r189189 implemented AVX512 unpack by essentially performing a 256-bit unpack
between the low and the high 256 bits of src1 into the low part of the
destination and another unpack of the low and high 256 bits of src2 into the
high part of the destination.

I don't think that's how unpack works.  AVX512 unpack simply has more 128-bit
lanes but other than it works the same way as AVX.  So in each 128-bit lane,
we're always interleaving certain parts of both operands rather different
parts of one of the operands.

E.g. for this:
__v16sf a = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
__v16sf b = { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 };
__v16sf c = __builtin_shufflevector(a, b, 0, 8, 1, 9, 4, 12, 5, 13, 16,
	    			       	     24, 17, 25, 20, 28, 21, 29);

we generated punpcklps (notice how the elements of a and b are not interleaved
in the shuffle).  In turn, c was set to this:

  0 16 1 17 4 20 5 21 8 24 9 25 12 28 13 29

Obviously this should have just returned the mask vector of the shuffle
vector.

I mostly reverted this change and made sure the original AVX code worked
for 512-bit vectors as well.

Also updated the tests because they matched the logic from the code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217602 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 16:51:10 +00:00
Benjamin Kramer
db414b01a3 Move constant-sized bitvector to the stack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217600 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 15:58:39 +00:00
Aaron Watry
036303c4bf R600: Add cmpxchg instruction for evergreen
Refactored the R600_LDS_1A2D class a bit to get it to actually work.

It seemed to be previously unused and broken.

We also have to disable the conversion to the noret variant for now in
R600ISelLowering because the getLDSNoRetOp method only handles 1A1D LDS ops.

Someone can feel free to modify the AMDGPU::getLDSNoRetOp method to
work for more than 1A1D variants of LDS operations. It's being left as a
future TODO for now.

Signed-off-by: Aaron Watry <awatry at gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217596 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 15:02:54 +00:00
Aaron Watry
c71f7e46bd R600: Add LDS_WRXCHG[_RET] instructions for Evergreen.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217594 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 15:02:49 +00:00
Aaron Watry
770b630480 R600: Add LDS_MIN_[U]INT[_RET] instructions for Evergreen
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217593 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 15:02:47 +00:00
Aaron Watry
bbfa901ddf R600: Add LDS_XOR[_RET] instructions for Evergreen
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217592 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 15:02:46 +00:00
Aaron Watry
0e16268cd7 R600: Add LDS_OR[_RET] instructions for Evergreen
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217591 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 15:02:44 +00:00
Aaron Watry
9efd1ebcab R600: Add LDS_AND[_RET] instructions for Evergreen
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217590 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 15:02:43 +00:00
Aaron Watry
8f476d22b3 R600: Add LDS_MAX_[U]INT[_RET] instructions for Evergreen
This was only present for SI before.

Cayman may still be missing, but I am unable to test that currently.

v2: Don't create atomicrmw max tests in separate file

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217589 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 15:02:41 +00:00
Matt Arsenault
5ee5d45e7e R600/SI: Fix losing chain when fixing reg class of loads.
The lost chain resulting in earlier side effecting nodes
being deleted.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217561 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 23:26:19 +00:00
Matt Arsenault
c8256c4dcb R600/SI: Report offset in correct units for st64 DS instructions
Need to convert the 64 element offset into bytes, not just the element
size like the normal case instructions.

Noticed by inspection. This can't be hit now because
st64 instructions aren't emitted during instruction selection,
and the post-RA scheduler isn't enabled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217560 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 23:26:16 +00:00
Matt Arsenault
257e85e7c2 R600: Custom lower frem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217553 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 21:44:27 +00:00
Rafael Espindola
3b670550ad Add doInitialization/doFinalization to DataLayoutPass.
With this a DataLayoutPass can be reused for multiple modules.

Once we have doInitialization/doFinalization, it doesn't seem necessary to pass
a Module to the constructor.

Overall this change seems in line with the idea of making DataLayout a required
part of Module. With it the only way of having a DataLayout used is to add it
to the Module.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217548 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 21:27:43 +00:00
Gerolf Hoflehner
e7648ae68d [AArch64] Revert r216141 for cyclone
The increase of the interleave factor to 4 has side-effects
like performance losses eg. due to reminder loops being executed
more frequently and may increase code size. It requires more
analysis and careful heuristic tuning. Expect double digit gains
in small benchmarks like lowercase.c and losses in puzzle.c.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217540 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 20:31:57 +00:00
Sanjay Patel
87c977a52b Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable.
"Unroll" is not the appropriate name for this variable. Clang already uses 
the term "interleave" in pragmas and metadata for this.

Differential Revision: http://reviews.llvm.org/D5066



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217528 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 17:58:16 +00:00
Arnaud A. de Grandmaison
2c925e0e8f [AArch64] Address Chad's post commit review comments for r217504 (PBQP experimental support)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217518 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 17:03:25 +00:00
Arnaud A. de Grandmaison
ac67a5fd18 [AArch64] Pacify lld buildbot complaining about an unused static function in release build.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217505 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 14:24:02 +00:00
Arnaud A. de Grandmaison
438669ca81 [AArch64] Add experimental PBQP support
This adds target specific support for using the PBQP register allocator on the
AArch64, for the A57 cpu.

By default, the PBQP allocator is not used, unless explicitely required
on the command line with "-aarch64-pbqp".

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217504 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 14:06:10 +00:00
Asiri Rathnayake
3babc141b2 [AArch 64] Use a constant pool load for weak symbol references when
using static relocation model and small code model.

Summary: currently we generate GOT based relocations for weak symbol
references regardless of the underlying relocation model. This should
be change so that in static relocation model we use a constant pool
load instead.

Patch from: Keith Walker

Reviewers: Renato Golin, Tim Northover

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217503 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 13:54:38 +00:00
Sid Manning
96597a70dc Add missing HWEncoding to base register class.
This change gives tblgen the information needed to fill in the
HexagonRegEncodingTable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217500 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 13:09:25 +00:00
Tim Northover
01dbae1163 ARM: don't size-reduce STMs using the LR register.
The only Thumb-1 multi-store capable of using LR is the PUSH instruction, which
translates to STMDB, so we shouldn't convert STMIAs.

Patch by Sergey Dmitrouk.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217498 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 12:53:28 +00:00
Daniel Sanders
cfae729840 [mips] Remove inverted predicates from MipsSubtarget that were only used by MipsCallingConv.td
Summary: No functional change

Reviewers: echristo, vmedic

Reviewed By: echristo, vmedic

Subscribers: echristo, llvm-commits

Differential Revision: http://reviews.llvm.org/D5266

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217494 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 12:02:27 +00:00
Daniel Sanders
9a86695445 [mips] Return an ArrayRef from MipsCC::intArgRegs() and remove MipsCC::numIntArgRegs()
Summary: No functional change.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5265

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217485 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 10:37:03 +00:00
Yuri Gorshenin
ca31084292 [asan-assembly-instrumentation] Added CFI directives to the generated instrumentation code.
Summary: [asan-assembly-instrumentation] Added CFI directives to the generated instrumentation code.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5189

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217482 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 09:45:49 +00:00
Job Noorman
77f923cfc1 Drop the W postfix on the 16-bit registers.
This ensures the inline assembly register constraints are properly recognised in
TargetLowering::getRegForInlineAsmConstraint.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217479 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 06:58:14 +00:00
Kai Nacke
5672e68951 [MIPS] Add aliases for sync instruction used by Octeon CPU
This commit adds aliases for the sync instruction (synciobdma,
syncs, syncw, syncws) which are used by the Octeon CPU.

Reviewed by D. Sanders

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217477 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 06:10:24 +00:00
Craig Topper
c4e394a333 Use cast to MVT instead of EVT on a couple calls to getSizeInBits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217473 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 04:51:36 +00:00