40508 Commits

Author SHA1 Message Date
Sanjay Patel
eee7c89f77 [InstCombine] move/fix tests for adjusted min/max
I think the former 'test50' had a typo making it functionally equivalent
to the former 'test49'; changed the predicate to provide more coverage.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285706 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 16:39:30 +00:00
Tom Stellard
2b70009a94 AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64
I wanted to implement this as a target independent expansion, however when
targets say they want to expand FP_TO_FP16 what they actually want is
the unsafe math expansion when possible and expansion to a libcall in all
other cases.

The only way to make this work as a target independent would be to add logic
to target's TargetLowering construction to mark theses nodes as Expand when
LegalizeDAG can use the unsafe expansion and mark them as LibCall when it
cannot.  I think this would be possible, but I think it would be too fragile
and complex as it would require targets to keep their expansion logic up
to date with the code in LegalizeDAG.

Reviewers: bogner, ab, t.p.northover, arsenm

Subscribers: wdng, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25999

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285704 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 16:31:48 +00:00
Sjoerd Meijer
d6c57ac268 This is a 1 character fix for an ARM build attribute test (r284571): the
purpose of the test was to have 2 different function attribute sets, but due
to a typo there was only one both with number #0.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285701 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 15:59:37 +00:00
Sanjay Patel
caa8396841 [InstCombine] fix tests for adjusted min/max
1. Delete identical tests
2. Rename tests to reflect actual functionality
3. Add comments
4. Add unsigned variants
5. Add vector variants with FIXME comments
6. Rename test file


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285699 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 15:48:30 +00:00
Simon Pilgrim
cf8d4833fd [InstCombine] Folding of shifts by the sum of positive values
This patch introduces the combine:

(C1 shift (A add C2)) -> ((C1 shift C2) shift A)
iff A and C2 are both positive

If both A and C2 are know to be positive then we can safely split into 2 shifts, permitting the folding of the Inner shift.

Fix for the spec benchmark case mentioned by @nadav on PR15141 (assuming we can prove that the inputs as positive).

Differential Revision: https://reviews.llvm.org/D26000

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285696 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 15:40:30 +00:00
Sanjay Patel
5d99995928 [InstCombine] auto-generate better checks
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285693 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 14:38:30 +00:00
Chris Dewhurst
447ffef48b [Sparc][LEON] Test for FixFDIVSQRT erratum fix.
Note: Test is per differential review, but the other changed code in the review was for an optimisation that din't quite work. Nevertheless, the test is valid for the unoptimised version of the fix.

Differential Review: https://reviews.llvm.org/D24658

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285692 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 14:23:37 +00:00
James Molloy
9b12d6a515 [Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables
[Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment]

The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.

It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.

TBB example:
Before: lsls r0, r0, #2    After: add  r0, pc
        adr  r1, .LJTI0_0         ldrb r0, [r0, #6]
        ldr  r0, [r0, r1]         lsls r0, r0, #1
        mov  pc, r0               add  pc, r0
  => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.

The only case that can increase dynamic instruction count is the TBH case:

Before: lsls r0, r4, #2    After: lsls r4, r4, #1
        adr  r1, .LJTI0_0         add  r4, pc
        ldr  r0, [r0, r1]         ldrh r4, [r4, #6]
        mov  pc, r0               lsls r4, r4, #1
                                  add  pc, r4
  => 1 more instruction in prologue. Jump table shrunk by a factor of 2.

So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285690 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 13:37:41 +00:00
Valery Pykhtin
a66e032afa [AMDGPU] Expand vector mulhu/mulhs
Differential revision: https://reviews.llvm.org/D26077

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285684 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 10:26:48 +00:00
Nemanja Ivanovic
790687f4e0 [PowerPC] Implement vector shift builtins - llvm portion
This patch corresponds to review https://reviews.llvm.org/D26095.
Committing on behalf of Tony Jiang.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285681 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 09:42:32 +00:00
Sanjay Patel
b983cb0423 [DAG] disable nsw/nuw for add/sub/mul when simplifying based on demanded bits (PR30841)
This bug was exposed by using nsw/nuw for more aggressive folds in:
https://reviews.llvm.org/rL284844

The changes mimic the IR demanded bits logic in InstCombiner::SimplifyDemandedUseBits(),
but we can't just flip flag bits in the DAG; we have to create a new node that has the
bits cleared.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30841 


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285656 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 23:28:45 +00:00
Saleem Abdulrasool
f185e91e2b CodeGen: further loosen -O0 CG for WoA division
Generate the slowest possible codepath for noopt CodeGen.  Even trying to be
clever with the negated jump can cause out-of-range jumps.  Use a wide branch
instead. Although the code is modelled simplistically, the later optimizations
would recombine the branching into `cbz` if possible.  This re-enables the
previous optimization as well as hopefully gives us working code in all cases.

Addresses PR30356!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285649 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 22:12:37 +00:00
Teresa Johnson
485ef16bd4 [ThinLTO] Disable importing and other cross-module optis at -O0
Summary:
There is no point to importing at -O0, since we won't inline. We should
also disable other cross-module optimizations.

(Plan to backport this fix to the 3.9 branch to fix PR30774)

Reviewers: pcc

Subscribers: johanengelen, mehdi_amini

Differential Revision: https://reviews.llvm.org/D25918

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285648 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 22:12:21 +00:00
Justin Lebar
8ddde8c45f [NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.
Summary:
This has been replaced by the NVPTXInferAddressSpaces pass.  We've had
the new one as the default with the old one accessible via a flag for
some months now, and we've had no problems.

Reviewers: tra

Subscribers: llvm-commits, jholewinski, jingyue, mgorny

Differential Revision: https://reviews.llvm.org/D26165

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285642 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 21:51:42 +00:00
Kevin Enderby
e9885e072b More additional error checks for invalid Mach-O files when
the offsets and sizes of an element of the file overlaps with
another element in the Mach-O file.

This shows the approach to this testing for three elements
and contains for tests for their overlap.  Checking for all the
remain elements will be added next.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285632 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 20:29:48 +00:00
Nemanja Ivanovic
7e057dcd4e [PPC] add absolute difference altivec instructions and matching intrinsics
This patch corresponds to review https://reviews.llvm.org/D26072.
Committing on behalf of Sean Fertile.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285627 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 19:47:52 +00:00
Victor Leschuk
b33954931c DebugInfo: make DW_TAG_atomic_type valid
DW_TAG_atomic_type was already included in Dwarf.defs and emitted correctly,
however Verifier didn't recognize it as valid.
Thus we introduce the following changes:

  * Make DW_TAG_atomic_type valid tag for IR and DWARF (enabled only with -gdwarf-5)
  * Add it to related docs
  * Add DebugInfo tests

Differential Revision: https://reviews.llvm.org/D26144



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285624 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 19:09:38 +00:00
Kuba Brecka
588bf7bf31 [asan] Move instrumented null-terminated strings to a special section, LLVM part
On Darwin, simple C null-terminated constant strings normally end up in the __TEXT,__cstring section of the resulting Mach-O binary. When instrumented with ASan, these strings are transformed in a way that they cannot be in __cstring (the linker unifies the content of this section and strips extra NUL bytes, which would break instrumentation), and are put into a generic __const section. This breaks some of the tools that we have: Some tools need to scan all C null-terminated strings in Mach-O binaries, and scanning all the contents of __const has a large performance penalty. This patch instead introduces a special section, __asan_cstring which will now hold the instrumented null-terminated strings.

Differential Revision: https://reviews.llvm.org/D25026



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285619 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 18:51:58 +00:00
Nirav Dave
30e34b0f1a [MC] Make llvm-mc fail cleanly on invalid output asm variant.
Fixes PR28488.

Reviewers: rnk, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25834

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285616 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 18:36:31 +00:00
Tim Northover
d9f01d8662 GlobalISel: allow truncating pointer casts on AArch64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285615 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 18:31:09 +00:00
Tim Northover
fa8311d1c9 GlobalISel: translate stack protector intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285614 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 18:30:59 +00:00
Artem Tamazov
86d93952ed [AMDGPU][MC][gfx8] Support 20-bit immediate offset in SMEM instructions.
Fixes Bug 30808.
Note that passing subtarget information to predicates seems too complicated, so gfx8-specific def smrd_offset_20 introduced.
Old gfx6/7-specific def renamed to smrd_offset_8 for clarity.
Lit tests updated.

Differential Revision: https://reviews.llvm.org/D26085

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285590 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 16:07:39 +00:00
Krzysztof Parzyszek
abc5ce4bbc [Hexagon] Don't expand mux instructions with both sources identical
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285588 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 15:45:09 +00:00
George Rimar
8043131791 Recommit r285285 - [Object/ELF] - Fixed behavior when SectionHeaderTable->sh_size is too large.
with fix: edited invalid-section-index2.elf input to pass the new check and 
fail on the same place it was intended to fail.

Original commit message:
Elf.h already has code checking that section table does not go past end of file.
Problem is that this check may not work on values greater than UINT64_MAX / Header->e_shentsize
because of calculation overflow.

Parch fixes the issue.

Differential revision: https://reviews.llvm.org/D25432


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285586 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 15:33:00 +00:00
Ulrich Weigand
c1487e1e69 [SystemZ] Rework processor feature definitions and add -mcpu=archX support
This patch implements two changes:

- Move processor feature definition into a new file SystemZFeatures.td,
  and provide explicit lists of supported and unsupported features for
  each level of the z/Architecture.  This allows specifying unsupported
  features in the scheduler definition files for each processor.

- Add optional aliases for the -mcpu processor names according to the
  level of the z/Architecture, for compatibility with other compilers
  on the platform.  The supported aliases are:
    -mcpu=arch8  equals  -mcpu=z10
    -mcpu=arch9  equals  -mcpu=z196
    -mcpu=arch10 equals  -mcpu=zEC12
    -mcpu=arch11 equals  -mcpu=z13



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285577 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 14:33:29 +00:00
Ulrich Weigand
19e305e8e7 [SystemZ] Correctly diagnose missing features in AsmParser
Currently, when using an instruction that is not supported on the
currently selected architecture, the LLVM assembler is likely to
diagnose an "invalid operand" instead of a "missing feature".

This is because many operands require a custom parser in order to
be processed correctly, and if an instruction is not available
according to the current feature set, the generated parser code
will also not detect the associated custom operand parsers.

Fixed by temporarily enabling all features while parsing operands.
The missing features will then be correctly detected when actually
parsing the instruction itself.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285575 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 14:25:05 +00:00
Ulrich Weigand
b12a0a51d5 [SystemZ] Fix encoding of MVCK and .insn ss
LLVM currently treats the first operand of MVCK as if it were a
regular base+index+displacement address.  However, it is in fact
a base+displacement combined with a length register field.

While the two might look syntactically similar, there are two
semantic differences:
- %r0 is a valid length register, even though it cannot be used
  as an index register.
- In an expression with just a single register like 0(%rX), the
  register is treated as base with normal addresses, while it is
  treated as the length register (with an empty base) for MVCK.

Fixed by adding a new operand parser class BDRAddr and reworking
the assembler parser to distinguish between address + length
register operands and regular addresses.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285574 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 14:21:36 +00:00
Dorit Nuzman
3aa311854a Second attempt at r285517.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285568 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 13:17:31 +00:00
Alexey Bataev
7af02fcbc9 Improved cost model for FDIV and FSQRT, by Andrew Tischenko
There is a bug describing poor cost model for floating point operations:
Bug 29083 - [X86][SSE] Improve costs for floating point operations. This
patch is the second one in series of patches dealing with cost model.

Differential Revision: https://reviews.llvm.org/D25722

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285564 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 12:10:53 +00:00
Manuel Klimek
8b9bc3d4c1 Add triple to test so it does not fail on windows.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285560 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 11:40:14 +00:00
Manuel Klimek
ef99bbdeeb Delete .s file that did not test anything, and check in test that works.
In D26098, Davide Italiano submitted a .s file instead of the .ll file
that was the last stage of the review.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285559 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 11:18:39 +00:00
Craig Topper
ddcef0e5a4 [AVX-512] Add missing patterns for selecting masked vector extracts that started from shuffles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285546 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 05:55:57 +00:00
Sanjay Patel
d43c4b8bfc [DAG] x | x --> x
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285522 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-30 18:19:35 +00:00
Sanjay Patel
73a78bf8d3 [DAG] x & x --> x
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285521 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-30 18:13:30 +00:00
Sanjay Patel
6c05e2a692 [x86] add tests for basic logic op folds
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285520 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-30 18:04:19 +00:00
Dorit Nuzman
6d3c9bdc8f Revert r285517 due to build failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285518 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-30 14:34:57 +00:00
Dorit Nuzman
b10d927158 [LoopVectorize] Make interleaved-accesses analysis less conservative about
possible pointer-wrap-around concerns, in some cases.

Before this patch, collectConstStridedAccesses (part of interleaved-accesses
analysis) called getPtrStride with [Assume=false, ShouldCheckWrap=true] when
examining all candidate pointers. This is too conservative. Instead, this
patch makes collectConstStridedAccesses use an optimistic approach, calling
getPtrStride with [Assume=true, ShouldCheckWrap=false], and then, once the
candidate interleave groups have been formed, revisits the pointer-wrapping
analysis but only where it matters: namely, in groups that have gaps, and where
the gaps are not at the very end of the group (in which case the loop is
peeled). This second time getPtrStride is called with [Assume=false,
ShouldCheckWrap=true], but this could further be improved to using Assume=true,
once we also add the logic to track that we are not going to meet the scev
runtime checks threshold.

Differential Revision: https://reviews.llvm.org/D25276



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285517 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-30 12:23:26 +00:00
Teresa Johnson
39970a7476 [ThinLTO] Correctly resolve linkonce when importing aliasee
Summary:
When we have an aliasee that is linkonce, while we can't convert
the non-prevailing copies to available_externally, we still need to
convert the prevailing copy to weak. If a reference to the aliasee
is exported, not converting a copy to weak will result in undefined
references when the linkonce is removed in its original module.

Add a new test and update existing tests.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26076

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285512 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-30 05:15:23 +00:00
Sanjay Patel
14ebdf2999 [ValueTracking] recognize more variants of smin/smax
Try harder to detect obfuscated min/max patterns: the initial pattern was added with D9352 / rL236202. 
There was a bug fix for PR27137 at rL264996, but I think we can do better by folding the corresponding
smax pattern and commuted variants.

The codegen tests demonstrate the effect of ValueTracking on the backend via SelectionDAGBuilder. We
can't expose these differences minimally in IR because we don't have smin/smax intrinsics for IR.

Differential Revision: https://reviews.llvm.org/D26091


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285499 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-29 16:21:19 +00:00
Sanjay Patel
c69fe17c4b [x86] add tests for smin/smax matchSelPattern (D26091)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285498 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-29 16:02:57 +00:00
Sanjay Patel
9978e17243 [InstCombine] re-use bitcasted compare operands in selects (PR28001)
These mixed bitcast patterns show up with SSE/AVX intrinsics because we bitcast function parameters to <2 x i64>.

The bitcasts obfuscate the expected min/max forms as shown in PR28001:
https://llvm.org/bugs/show_bug.cgi?id=28001#c6

Differential Revision: https://reviews.llvm.org/D25943


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285495 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-29 15:22:04 +00:00
Simon Pilgrim
6722cc3c8a [DAGCombiner] (REAPPLIED) Add vector demanded elements support to computeKnownBits
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.

I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.

DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.

This looked like this had caused compile time regressions on some buildbots (and was reverted in rL285381), but appears to have just been a harmless bystander!

Differential Revision: https://reviews.llvm.org/D25691

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285494 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-29 11:29:39 +00:00
Elena Demikhovsky
2fd63302fe Fixed FMA + FNEG combine.
Masked form of FMA should be omitted in this optimization.

Differential Revision: https://reviews.llvm.org/D25984



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285492 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-29 08:44:46 +00:00
Matt Arsenault
ac5efca3f0 AMDGPU: Use 1/2pi inline imm on VI
I'm guessing at how it is supposed to be printed

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285490 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-29 04:05:06 +00:00
Rui Ueyama
d5b5d46b95 Do not print out Flags field twice.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285481 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 23:57:37 +00:00
Davide Italiano
c5763946b3 [DAGCombiner] Fix a crash visiting AND nodes.
Instead of asserting that the shift count is != 0 we just bail out
as it's not profitable trying to optimize a node which will be
removed anyway.

Differential Revision:  https://reviews.llvm.org/D26098

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285480 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 23:55:32 +00:00
Tom Stellard
b15bbca10c AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructions
Summary:
Flat instruction can return out of order, so we need always need to wait
for all the outstanding flat operations.

Reviewers: tony-tye, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D25998

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285479 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 23:53:48 +00:00
Justin Lebar
27d02ea698 Add missing lit.local.cfg to llvm/test/Transforms/CodeGenPrepare/NVPTX.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285464 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 21:56:07 +00:00
Matt Arsenault
d6028cdcc7 AMDGPU: Add definitions for scalar store instructions
Also add glc bit to the scalar loads since they exist on VI
and change the caching behavior.

This currently has an assembler bug where the glc bit is incorrectly
accepted on SI/CI which do not have it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285463 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 21:55:15 +00:00
Justin Lebar
30c499dfda [NVPTX] Compute 'rem' using the result of 'div', if possible.
Summary:
In isel, transform

  Num % Den

into

  Num - (Num / Den) * Den

if the result of Num / Den is already available.

Reviewers: tra

Subscribers: hfinkel, llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D26090

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285461 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 21:44:00 +00:00