Commit Graph

129504 Commits

Author SHA1 Message Date
Sanjoy Das
de765686f8 Introduce a @llvm.experimental.guard intrinsic
Summary:
As discussed on llvm-dev[1].

This change adds the basic boilerplate code around having this intrinsic
in LLVM:

 - Changes in Intrinsics.td, and the IR Verifier
 - A lowering pass to lower @llvm.experimental.guard to normal
   control flow
 - Inliner support

[1]: http://lists.llvm.org/pipermail/llvm-dev/2016-February/095523.html

Reviewers: reames, atrick, chandlerc, rnk, JosephTremoulet, echristo

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18527

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264976 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 00:18:46 +00:00
Hans Wennborg
d3f401b2e9 Add some more triples after r264966
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264972 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 23:55:22 +00:00
Hans Wennborg
b667d698ac [X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325)
The size savings are significant, and from what I can tell, both ICC and GCC do this.

Differential Revision: http://reviews.llvm.org/D18573

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264966 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 23:38:01 +00:00
Matthias Braun
137f4d9ca1 CodeGen: Factor out code for tail call result compatibility check; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264959 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:46:04 +00:00
Matthias Braun
8c3b4a5a2f Avoid unnecessary #include; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264958 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:45:58 +00:00
Paul Robinson
cc01455dbf Update copyright year to 2016.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264954 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:41:06 +00:00
Matt Arsenault
e8cd894c28 AMDGPU: Add frexp_exp intrinsic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264944 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:28:52 +00:00
Matt Arsenault
f965621c5f AMDGPU: Constant folding for frexp_mant
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264943 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:28:26 +00:00
Teresa Johnson
d47ed92bc6 Use existing PrintEscapedString in AssemblyWriter
r264884 introduced a helper to escape the backslashes in the source file
path, but I since discovered an existing mechanism to escape strings.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264936 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:17:28 +00:00
Peter Collingbourne
af289e0441 Cloning: Reduce complexity of debug info cloning and fix correctness issue.
Commit r260791 contained an error in that it would introduce a cross-module
reference in the old module. It also introduced O(N^2) complexity in the
module cloner by requiring the entire module to be visited for each function.
Fix both of these problems by avoiding use of the CloneDebugInfoMetadata
function (which is only designed to do intra-module cloning) and cloning
function-attached metadata in the same way that we clone all other metadata.

Differential Revision: http://reviews.llvm.org/D18583

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264935 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:05:13 +00:00
Sanjay Patel
c3260cad54 fix typos
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264933 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 21:38:20 +00:00
Aaron Ballman
59a288b05f Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264929 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 21:30:00 +00:00
Matt Arsenault
af539c40a3 LegalizeDAG: Don't replace vector store with integer if not legal
For the same reason as the corresponding load change.

Note that ExpandStore is completely broken for non-byte sized element
vector stores, but preserve the current broken behavior which has tests
for it. The behavior should be the same, but now introduces a new typed
store that is incorrectly split later rather than doing it directly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264928 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 21:15:18 +00:00
Matt Arsenault
532e88a080 LegalizeDAG: Don't replace vector load with integer unless legal
On AMDGPU we want to be able to promote i64/f64 loads to v2i32.
If the access is unaligned, this would conclude that since i64 is legal,
it would convert it back to i64 and there is an endless legalization
loop.

Extract the logic for scalarizing the load into a new TargetLowering
function, where this can also replace the custom function AMDGPU
has for this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264927 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 21:15:10 +00:00
David Majnemer
82ef54d4b0 [IndVarSimplify] Don't insert after a catchswitch
Widening a PHI requires us to insert a trunc.
The logical place for this trunc is in the same BB as the PHI.
This is not possible if the BB is terminated by a catchswitch.

This fixes PR27133.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264926 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 21:12:06 +00:00
Justin Lebar
5fa9211ba4 Add #include <functional> to PassManagerBuilder, now that it uses std::function. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264923 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 20:52:40 +00:00
Simon Pilgrim
8f62cbea76 [X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros
Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264922 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 20:52:24 +00:00
Justin Lebar
6feb1718ed [NVPTX] Make NVVMReflect a function pass.
Summary:
Currently it's a module pass.  Make it a function pass so that we can
move it to PassManagerBuilder's EP_EarlyAsPossible extension point,
which only accepts function passes.

Reviewers: rnk

Subscribers: tra, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D18615

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264919 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 20:40:11 +00:00
Justin Lebar
3daeed4a53 [PassManager] Make PassManagerBuilder::addExtension take an std::function, rather than a function pointer.
Summary:
This gives callers flexibility to pass lambdas with captures, which lets
callers avoid the C-style void*-ptr closure style.  (Currently, callers
in clang store state in the PassManagerBuilderBase arg.)

No functional change, and the new API is backwards-compatible.

Reviewers: chandlerc

Subscribers: joker.eph, cfe-commits

Differential Revision: http://reviews.llvm.org/D18613

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264918 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 20:39:29 +00:00
Justin Bogner
18af5051ba test: Remove a test for a transform that hasn't existed in 5 years.
The TailDup transform was removed in r138841 in 2011, along with most
of the tests for it. This test, however, was missed. Probably because
it had already been XFAIL'd for 3 years at that point (since r52243!)
and continued to fail when the opt flag for -tailduplicate stopped
being valid.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264916 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 20:36:07 +00:00
Hal Finkel
f5526fc013 Add a copy constructor to StringMap
There is code under review that requires StringMap to have a copy constructor,
and this makes StringMap more consistent with our other containers (like
DenseMap) that have copy constructors.

Differential Revision: http://reviews.llvm.org/D18506

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264906 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 19:54:56 +00:00
Hal Finkel
dfdada0adb [LoopVectorize] Don't vectorize loops when everything will be scalarized
This change prevents the loop vectorizer from vectorizing when all of the vector
types it generates will be scalarized. I've run into this problem on the PPC's QPX
vector ISA, which only holds floating-point vector types. The loop vectorizer
will, however, happily vectorize loops with purely integer computation. Here's
an example:

  LV: The Smallest and Widest types: 32 / 32 bits.
  LV: The Widest register is: 256 bits.
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 1 for VF 1 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 1 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 1 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 1 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Scalar loop costs: 3.
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 2 for VF 2 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 2 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 2 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 2 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Vector loop of width 2 costs: 2.
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 4 for VF 4 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 4 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 4 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 4 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Vector loop of width 4 costs: 1.
  ...
  LV: Selecting VF: 8.
  LV: The target has 32 registers
  LV(REG): Calculating max register usage:
  LV(REG): At #0 Interval # 0
  LV(REG): At #1 Interval # 1
  LV(REG): At #2 Interval # 2
  LV(REG): At #4 Interval # 1
  LV(REG): At #5 Interval # 1
  LV(REG): VF = 8

The problem is that the cost model here is not wrong, exactly. Since all of
these operations are scalarized, their cost (aside from the uniform ones) are
indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF
picked, the lower the relative overhead from the loop itself (and the
induction-variable update and check), and so in a sense, picking the largest VF
here is the right thing to do.

The problem is that vectorizing like this, where all of the vectors will be
scalarized in the backend, isn't really vectorizing, but rather interleaving.
By itself, this would be okay, but then the vectorizer itself also interleaves,
and that's where the problem manifests itself. There's aren't actually enough
scalar registers to support the normal interleave factor multiplied by a factor
of VF (8 in this example). In other words, the problem with this is that our
register-pressure heuristic does not account for scalarization.

While we might want to improve our register-pressure heuristic, I don't think
this is the right motivating case for that work. Here we have a more-basic
problem: The job of the vectorizer is to vectorize things (interleaving aside),
and if the IR it generates won't generate any actual vector code, then
something is wrong. Thus, if every type looks like it will be scalarized (i.e.
will be split into VF or more parts), then don't consider that VF.

This is not a problem specific to PPC/QPX, however. The problem comes up under
SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's
reduced test case from PR26837 to this commit.

Differential Revision: http://reviews.llvm.org/D18537

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264904 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 19:37:08 +00:00
Rong Xu
37444017a2 [PGO] PGOFuncName in LTO optimizations
PGOFuncNames are used as the key to retrieve the Function definition from the
MD5 stored in the profile. For internal linkage function, we prefix the source
file name to the PGOFuncNames. LTO's internalization privatizes many global linkage
symbols. This happens after value profile annotation, but those internal
linkage functions should not have a source prefix. To differentiate compiler
generated internal symbols from original ones, PGOFuncName meta data are
created and attached to the original internal symbols in the value profile
annotation step. If a symbol does not have the meta data, its original linkage
must be non-internal.

Also add a new map that maps PGOFuncName's MD5 value to the function definition.

Differential Revision: http://reviews.llvm.org/D17895





git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264902 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 18:37:52 +00:00
Reid Kleckner
564787f18e [cmake] Instead of testing char16_t for MSVC compat, directly ask cl.exe its version
Credit to Aaron Ballman for thinking of this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264886 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 18:19:39 +00:00
Teresa Johnson
2dec5fc2c0 Restore "[ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly"
This restores commit 264869, with a fix for windows bots to properly
escape '\' in the path when serializing out. Added test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264884 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 18:15:08 +00:00
Chad Rosier
21db7b79c9 [AArch64] Fix warnings pointed out by Hal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264882 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 18:08:51 +00:00
Reid Kleckner
db91435d44 [cmake] Add -fms-compatibility-version=19 when clang-cl gives errors about char16_t
What we are really trying to do here is to figure out if we are using
the 2015 STL. Unfortunately, so far as I know the MSVC STL does not
define a version macro that we can check directly. Instead I wrote a
check to see if char16_t works.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264881 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 17:30:26 +00:00
Reid Kleckner
f3ba4d1f7a [cmake] Allow EH usage with clang-cl
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264880 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 17:28:21 +00:00
Rong Xu
34be7e62f2 [PGO] Use ArrayRef in annotateValueSite()
Using ArrayRef in annotateValueSite's parameter instead of using an array
and it's size.

Differential Revision: http://reviews.llvm.org/D18568


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264879 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 16:56:31 +00:00
Tom Stellard
7725fd8c02 AMDGPU/SI: Improve MachineSchedModel definition
This patch contains a few improvements to the model, including:

- Using a single resource with a defined buffers size for each memory unit.
- Setting the IssueWidth correctly.
- Fixing latency values for memory instructions.

shader-db stats:

16429 shaders in 3231 tests
Totals:
SGPRS: 318232 -> 312328 (-1.86 %)
VGPRS: 208996 -> 209346 (0.17 %)
Code Size: 7147044 -> 7166440 (0.27 %) bytes
LDS: 83 -> 83 (0.00 %) blocks
Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave
Max Waves: 49182 -> 49243 (0.12 %)
Wait states: 0 -> 0 (0.00 %)A

Differential Revision: http://reviews.llvm.org/D18453

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264877 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 16:35:13 +00:00
Tom Stellard
d3adac51fc AMDGPU/SI: Enable lanemask tracking in misched
Summary:
This results in higher register usage, but should make it easier for
the compiler to hide latency.

This pass is a prerequisite for some more scheduler improvements, and I
think the increase register usage with this patch is acceptable, because
when combined with the scheduler improvements, the total register usage
will decrease.

shader-db stats:

2382 shaders in 478 tests
Totals:
SGPRS: 48672 -> 49088 (0.85 %)
VGPRS: 34148 -> 34847 (2.05 %)
Code Size: 1285816 -> 1289128 (0.26 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 492544 -> 573440 (16.42 %) bytes per wave
Max Waves: 6856 -> 6846 (-0.15 %)
Wait states: 0 -> 0 (0.00 %)

Depends on D18451

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18452

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264876 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 16:35:09 +00:00
Jonas Paulsson
5254fb9fa5 [SystemZ] Add nop and nopr InstAliases.
For compatability with GAS, nop and nopr are recognized as alises for
bc and bcr, respectively. A mask of 0 turns these instructions
effectively into no-operations.

Reviewed by Ulrich Weigand.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264875 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 16:11:58 +00:00
Nirav Dave
e0d3b8510d Remove HasFnAttribute guards to getFnAttribute calls
These checks are redundant and can be removed

Reviewers: hans

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D18564

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264872 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 15:41:12 +00:00
Teresa Johnson
e1eb43b8cf Revert "[ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly"
This reverts commit r264869. I am seeing Windows bot failures due to the
"\" in the path being mishandled at some point (seems to be interpreted
wrongly at some point and llvm-as | llvm-dis is yielding some junk
characters). Need to investigate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264871 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 15:16:04 +00:00
Simon Pilgrim
5a4fec24e3 [X86][XOP] BITREVERSE lowering using VPPERM
XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264870 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 14:14:00 +00:00
Teresa Johnson
a6e6ae2836 [ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly
Summary:
This change serializes out and in the SourceFileName to LLVM assembly
so that it is preserved through "llvm-dis | llvm-as". This is
necessary to ensure that the global identifiers created for local values
in the module summary index are the same even if the bitcode is
streamed out and read back from LLVM assembly.

Serializing the summary itself to LLVM assembly is in progress.

Reviewers: joker.eph

Subscribers: llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D18588

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264869 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 14:00:02 +00:00
Simon Pilgrim
b85e39b653 [X86][SSE] Test the legalization of vector comparison results
We are currently doing a REALLY bad job of packing results of vector comparisons into the legalized <X x i1> result equivalents - a mixture of PACKSS/PMOVMSKB would be much better here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264867 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 13:55:00 +00:00
Benjamin Kramer
4a03fa3aa8 [NVPTX] Avoid temporary std::string and make single-use function local to the cpp file.
No functionality change intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264861 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 12:31:51 +00:00
Marianne Mailhot-Sarrasin
d9529c5cae gold-plugin: Fixed typo in an error message.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264860 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 12:20:53 +00:00
Simon Pilgrim
c079cbfd8b [X86][SSE] Added tests for clearing upper bits of vector elements
Patterns based on PR6455

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264857 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 11:43:26 +00:00
James Molloy
cd309008e2 [VectorUtils] Don't try and truncate PHIs to a smaller bitwidth
We already try not to truncate PHIs in computeMinimalBitwidths. LoopVectorize can't handle it and we really don't need to, because both induction and reduction PHIs are truncated by other means.

However, we weren't bailing out in all the places we should have, and we ended up by returning a PHI to be truncated, which has caused PR27018.

This fixes PR17018.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264852 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 10:11:43 +00:00
Chandler Carruth
e56cb5ff31 [x86] Fix a horrible bug in our lowering of x86 floating point atomic
operations.

Specifically, we had code that tried to badly approximate reconstructing
all of the possible variations on addressing modes in two x86
instructions based on those in one pseudo instruction. This is not the
first bug uncovered with doing this, so stop doing it altogether.
Instead generically and pedantically copy every operand from the address
over to both new instructions, and strip kill flags from any register
operands.

This fixes a subtle bug seen in the wild where we would mysteriously
drop parts of the addressing mode, causing for example the index
argument in the added test case to just be completely ignored.

Hypothetically, this was an extremely bad miscompile because it actually
caused a predictable and leveragable write of a 64bit quantity to an
unintended offset (the first element of the array intead of whatever
other element was intended). As a consequence, in theory this could even
have introduced security vulnerabilities.

However, this was only something that could happen with an atomic
floating point add. No other operation could trigger this bug, so it
seems extremely unlikely to have occured widely in the wild.

But it did in fact occur, and frequently in scientific applications
which were using relaxed atomic updates of a floating point value after
adding a delta. Those would end up being quite badly miscompiled by
LLVM, which is how we found this. Of course, this often looks like
a race condition in the code, but it was actually a miscompile.

I suspect that this whole RELEASE_FADD thing was a complete mistake.
There is no such operation, and I worry that anything other than add
will get remarkably worse codegeneration. But that's not for this
change....

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264845 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 08:41:59 +00:00
Craig Topper
40e9bee60f [CodeGen] Mark EVT:getExtendedSizeInBits() as LLVM_READONLY.
I think I had tried this a long time back and some bots failed. Hoping that was with an older gcc and maybe now it will work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264840 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 05:26:43 +00:00
Jingyue Wu
8c5f0de0ab [docs] Add gpucc publication and tutorial.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264839 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 05:05:40 +00:00
Duncan P. N. Exon Smith
540e0f57f5 IR: Constify LLVMContext::discardValueNames, NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264823 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 04:32:29 +00:00
Duncan P. N. Exon Smith
dfdbebcd1d BitcodeReader: Fix weird whitespace, NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264822 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 04:21:52 +00:00
George Burgess IV
ff01852cd7 [MemorySSA] Make the visitor more careful with calls.
Prior to this patch, the MemorySSA caching visitor would cache all
calls that it visited. When paired with phi optimization, this can be
problematic. Consider:

define void @foo() {
  ; 1 = MemoryDef(liveOnEntry)
  call void @clobberFunction()
  br i1 undef, label %if.end, label %if.then

if.then:
  ; MemoryUse(??)
  call void @readOnlyFunction()
  ; 2 = MemoryDef(1)
  call void @clobberFunction()
  br label %if.end

if.end:
  ; 3 = MemoryPhi(...)
  ; MemoryUse(?)
  call void @readOnlyFunction()
  ret void
}

When optimizing MemoryUse(?), we visit defs 1 and 2, so we note to
cache them later. We ultimately end up not being able to optimize
passed the Phi, so we set MemoryUse(?) to point to the Phi. We then
cache the clobbering call for def 1 to be the Phi.

This commit changes this behavior so that we wipe out any calls
added to VisistedCalls while visiting the defs of a phi we couldn't
optimize.

Aside: With this patch, we now can bootstrap clang/LLVM without a
single MemorySSA verifier failure. Woohoo. :)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264820 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 03:12:08 +00:00
Chandler Carruth
1f7adda3e4 [x86] Extract a helper function to compute the full addressing mode from
an x86 MachineInstr's operands. This will be super useful to fix some
bad atomics code in my next commit.

No functionality changed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264819 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 03:10:24 +00:00
Xinliang David Li
6be6ccad2b [PGO] Handle invoke inst in IR based icall instrumentation
Differential Revision: http://reviews.llvm.org/D18580


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264818 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 02:16:07 +00:00
George Burgess IV
2e682ed868 [MemorySSA] Change how the walker views/walks visited phis.
This patch teaches the caching MemorySSA walker a few things:

1. Not to walk Phis we've walked before. It seems that we tried to do
   this before, but it didn't work so well in cases like:

define void @foo() {
  %1 = alloca i8
  %2 = alloca i8
  br label %begin

begin:
  ; 3 = MemoryPhi({%0,liveOnEntry},{%end,2})
  ; 1 = MemoryDef(3)
  store i8 0, i8* %2
  br label %end

end:
  ; MemoryUse(?)
  load i8, i8* %1
  ; 2 = MemoryDef(1)
  store i8 0, i8* %2
  br label %begin
}

Because we wouldn't put Phis in Q.Visited until we tried to visit them.
So, when trying to optimize MemoryUse(?):
  - We would visit 3 above
    - ...Which would make us put {%0,liveOnEntry} in Q.Visited
    - ...Which would make us visit {%0,liveOnEntry}
    - ...Which would make us put {%end,2} in Q.Visited
    - ...Which would make us visit {%end,2}
      - ...Which would make us visit 3
        - ...Which would realize we've already visited everything in 3
        - ...Which would make us conservatively return 3.

In the added test-case, (@looped_visitedonlyonce) this behavior would
cause us to give incorrect results. Specifically, we'd visit 4 twice
in the same query, but on the second visit, we'd skip while.cond because
it had been visited, visit if.then/if.then2, and cache "1" as the
clobbering def on the way back.

2. If we try to walk the defs of a {Phi,MemLoc} and see it has been
   visited before, just hand back the Phi we're trying to optimize.

I promise this isn't as terrible as it seems. :)

We now insert {Phi,MemLoc} pairs just before walking the Phi's upward
defs. So, we check the cache for the {Phi,MemLoc} pair before checking
if we've already walked the Phi.

The {Phi,MemLoc} pair is (almost?) always guaranteed to have a cache
entry if we've already fully walked it, because we cache as we go.

So, if the {Phi,MemLoc} pair isn't in cache, either:
 (a) we must be in the process of visiting it (in which case, we can't
     give a better answer in a cache-as-we-go DFS walker)

 (b) we visited it, but didn't cache it on the way back (...which seems
     to require `ModifyingAccess` to not dominate `StartingAccess`,
     so I'm 99% sure that would be an error. If it's not an error, I
     haven't been able to get it to happen locally, so I suspect it's
     rare.)

- - - - -

As a consequence of this change, we no longer skip upward defs of phis,
so we can kill the `VisitedOnlyOne` check. This gives us better accuracy
than we had before, at the cost of potentially doing a bit more work
when we have a loop.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264814 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 00:26:26 +00:00