43042 Commits

Author SHA1 Message Date
Simon Pilgrim
cbdd3bd87f [X86] Regenerate scalar stack reload test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295195 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 16:48:45 +00:00
Simon Pilgrim
64bd85693f [X86] Regenerate i64 ext-load on 32-bit target tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295177 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 14:06:17 +00:00
Simon Pilgrim
a0d03b22c4 [X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise ZERO inputs
Add support for specifying an UNPCK input as ZERO, particularly improves ZEXT cases with non-zero offsets


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295169 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 11:46:15 +00:00
Sagar Thakur
22d520c2ac [LLVM][XRAY][MIPS] Support xray on mips/mipsel/mips64/mips64el
Summary: Adds support for xray instrumentation on mips for both 32-bit and 64-bit.

Reviewed by sdardis, dberris
Differential: D27697


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295164 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 10:48:11 +00:00
Daniel Jasper
96670d1a33 Revert r295110 and r295144.
This fails under ASAN:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/798/steps/check-llvm%20asan/logs/stdio

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295162 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 09:56:08 +00:00
Craig Topper
53bbf700f8 [X86] Don't create VBROADCAST nodes with 256-bit or 512-bit input types
Summary:
We don't seem to have great rules on what a valid VBROADCAST node looks like. And as a consequence we end up with a lot of patterns to try to catch everything. We have patterns with scalar inputs, 128-bit vector inputs, 256-bit vector inputs, and 512-bit vector inputs.

As you can see from the things improved here we are currently missing patterns for 128-bit loads being extended to 256-bit before the vbroadcast.

I'd like to propose that VBROADCAST should always take a 128-bit vector type as input. As a first step towards that this patch adds an EXTRACT_SUBVECTOR in front of VBROADCAST when the input is 256 or 512-bits. In the future I would like to add scalar_to_vector around all the scalar operations. And maybe we should consider adding a VBROADCAST+load node to avoid separating loads from the broadcasting operation when the load itself isn't foldable.

This requires an additional change in target shuffle combining to look for the extract subvector and look through it to find the original operand. I'm sure this change isn't perfect but was enough to fix a few test failures that were being caused.

Another interesting thing I noticed is that the changes in masked_gather_scatter.ll show cases were we don't remove a useless insert into element 1 before broadcasting element 0.

Reviewers: delena, RKSimon, zvi

Reviewed By: zvi

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D28747

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295155 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 06:58:47 +00:00
Craig Topper
fc3e843620 [AVX-512] Add PACKSS/PACKUS instructions to load folding tables.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295154 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 06:51:39 +00:00
Peter Collingbourne
fe08334b47 SimplifyCFG: Register cloned assume intrinsics with assumption cache when creating critical edge.
Differential Revision: https://reviews.llvm.org/D29976

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295145 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 03:01:11 +00:00
Stanislav Mekhanoshin
237ec36765 [AMDGPU] Fix MaxWorkGroupsPerCU for large workgroups
This patch corrects the maximum workgroups per CU if we have big
workgroups (more than 128). This calculation contributes to the
occupancy calculation in respect to LDS size.

Differential Revision: https://reviews.llvm.org/D29974

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295134 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 01:03:59 +00:00
Dimitry Andric
193dc0bf84 Disable wrapping llvm-xray YAML output
Summary:
The YAML output produced by llvm-xray is supposed to be wrapped at the
arbitrary default of 70 columns set by `yaml:Output`.  Unfortunately,
the wrapping is rather unpredictable, and can easily go past the set
number of columns, depending on the execution environment.

To make the YAML output environment-independent, disable wrapping
instead.

Reviewers: dberris

Reviewed By: dberris

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D29962

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295116 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 22:49:49 +00:00
Easwaran Raman
e9b25f27d9 Fix a bug in caller's BFI update code after inlining.
Multiple blocks in the callee can be mapped to a single cloned block
since we prune the callee as we clone it. The existing code
iterates over the value map and clones the block frequency (and
eventually scales the frequencies of the cloned blocks). Value map's
iteration is not deterministic and so the cloned block might get the
frequency of any of the original blocks. The fix is to set the max of
the original frequencies to the cloned block. The first block in the
sequence must have this max frequency and, in the call context,
subsequent blocks must have its frequency.

Differential Revision: https://reviews.llvm.org/D29696

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295115 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 22:49:28 +00:00
Peter Collingbourne
f2e64c5579 WholeProgramDevirt: Change internal vcall data structures to match summary.
Group calls into constant and non-constant arguments up front, and use uint64_t
instead of ConstantInt to represent constant arguments. The goal is to allow
the information from the summary to fit naturally into this data structure in
a future change (specifically, it will be added to CallSiteInfo).

This has two side effects:
- We disallow VCP for constant integer arguments of width >64 bits.
- We remove the restriction that the bitwidth of a vcall's argument and return
  types must match those of the vfunc definitions.
I don't expect either of these to matter in practice. The first case is
uncommon, and the second one will lead to UB (so we can do anything we like).

Differential Revision: https://reviews.llvm.org/D29744

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295110 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 22:12:23 +00:00
Taewook Oh
38fa7051aa [BasicBlockUtils] Use getFirstNonPHIOrDbg to set debugloc for instructions created in SplitBlockPredecessors
Summary:
When setting debugloc for instructions created in SplitBlockPredecessors, current implementation copies debugloc from the first-non-phi instruction of the original basic block. However, if the first-non-phi instruction is a call for @llvm.dbg.value, the debugloc of the instruction may point the location outside of the block itself. For the example code of

```
  1 typedef struct _node_t {
  2   struct _node_t *next;
  3 } node_t;
  4
  5 extern node_t *root;
  6
  7 int foo() {
  8   node_t *node, *tmp;
  9   int ret = 0;
 10
 11   node = tmp = root->next;
 12   while (node != root) {
 13     while (node) {
 14       tmp = node;
 15       node = node->next;
 16       ret++;
 17     }
 18   }
 19
 20   return ret;
 21 }
```

, below is the basicblock corresponding to line 12 after Reassociate expressions pass:

```
while.cond:                                       ; preds = %while.cond2, %entry
  %node.0 = phi %struct._node_t* [ %1, %entry ], [ null, %while.cond2 ]
  %ret.0 = phi i32 [ 0, %entry ], [ %ret.1, %while.cond2 ]
  tail call void @llvm.dbg.value(metadata i32 %ret.0, i64 0, metadata !19, metadata !20), !dbg !21
  tail call void @llvm.dbg.value(metadata %struct._node_t* %node.0, i64 0, metadata !11, metadata !20), !dbg !31
  %cmp = icmp eq %struct._node_t* %node.0, %0, !dbg !33
  br i1 %cmp, label %while.end5, label %while.cond2, !dbg !35
```

As you can see, the first-non-phi instruction is a call for @llvm.dbg.value, and the debugloc is

```
!21 = !DILocation(line: 9, column: 7, scope: !6)
```

, which is a definition of 'ret' variable and outside of the scope of the basicblock itself. However, current implementation picks up this debugloc for the instructions created in SplitBlockPredecessors. This patch addresses this problem by picking up debugloc from the first-non-phi-non-dbg instruction.

Reviewers: dblaikie, samsonov, eugenis

Reviewed By: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29867

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295106 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 21:10:40 +00:00
Reid Kleckner
f3470b10da [BranchFolding] Tail common all identical unreachable blocks
Summary:
Blocks ending in unreachable are typically cold because they end the
program or throw an exception, so merging them with other identical
blocks is usually profitable because it reduces the size of cold code.
MachineBlockPlacement generally does not arrange to fall through to such
blocks, so commoning these blocks will not introduce additional
unconditional branches.

Reviewers: hans, iteratee, haicheng

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29153

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295105 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 21:02:24 +00:00
Tim Northover
07fb294d72 GlobalISel: deal with new G_PTR_MASK instruction on AArch64.
It's just an AND-immediate instruction for us, surprisingly simple to select.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295104 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 20:56:29 +00:00
Tim Northover
173c7d4750 GlobalISel: introduce G_PTR_MASK to simplify alloca handling.
This instruction clears the low bits of a pointer without requiring (possibly
dodgy if pointers aren't ints) conversions to and from an integer. Since (as
far as I'm aware) all masks are statically known, the instruction takes an
immediate operand rather than a register to specify the mask.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295103 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 20:56:18 +00:00
Vedant Kumar
dca651ff6c Re-apply "[profiling] Remove dead profile name vars after emitting name data"
This reverts 295092 (re-applies 295084), with a fix for dangling
references from the array of coverage names passed down from frontends.

I missed this in my initial testing because I only checked test/Profile,
and not test/CoverageMapping as well.

Original commit message:

The profile name variables passed to counter increment intrinsics are dead
after we emit the finalized name data in __llvm_prf_nm. However, we neglect to
erase these name variables. This causes huge size increases in the
__TEXT,__const section as well as slowdowns when linker dead stripping is
disabled. Some affected projects are so massive that they fail to link on
Darwin, because only the small code model is supported.

Fix the issue by throwing away the name constants as soon as we're done with
them.

Differential Revision: https://reviews.llvm.org/D29921

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295099 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 20:03:48 +00:00
Wolfgang Pieb
5c49cf1beb Reapply r294532, reverted in r294787.
Store instructions can have more than one memory operand as a result
of optimizations that fold different stores into one.
When we identify spill instructions to generate DBG_VALUE instructions
to record the spilling of a variable, we disregard stores with 
multiple memory operands for now. We may miss some relevant spills but
the handling is a bit more complex, so we'll do it in a different patch.

This fixes PR31935.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295093 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 19:08:45 +00:00
Vedant Kumar
a3d1bca20e Revert "[profiling] Remove dead profile name vars after emitting name data"
This reverts commit r295084. There is a test failure on:

http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/2620/

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295092 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 19:08:39 +00:00
Vedant Kumar
21ea8f286f [profiling] Remove dead profile name vars after emitting name data
The profile name variables passed to counter increment intrinsics are
dead after we emit the finalized name data in __llvm_prf_nm. However, we
neglect to erase these name variables. This causes huge size increases
in the __TEXT,__const section as well as slowdowns when linker dead
stripping is disabled. Some affected projects are so massive that they
fail to link on Darwin, because only the small code model is supported.

Fix the issue by throwing away the name constants as soon as we're done
with them.

Differential Revision: https://reviews.llvm.org/D29921

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295084 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 18:48:48 +00:00
Taewook Oh
17063d9322 Do not apply redundant LastCallToStaticBonus
Summary:
As written in the comments above, LastCallToStaticBonus is already applied to
the cost if Caller has only one user, so it is redundant to reapply the bonus
here.

If the only user is not a caller, TotalSecondaryCost will not be adjusted
anyway because callerWillBeRemoved is false. If there's no caller at all, we
don't need to care about TotalSecondaryCost because
inliningPreventsSomeOuterInline is false.

Reviewers: chandlerc, eraman

Reviewed By: eraman

Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D29169

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295075 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 17:30:05 +00:00
Brian Cain
24ee76184b Correct a typo, s/hosting/hoisting/
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295066 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 16:41:10 +00:00
Matthew Simpson
06f1a4b52b Reapply "[LV] Extend trunc optimization to all IVs with constant integer steps"
This reapplies commit r294967 with a fix for the execution time regressions
caught by the clang-cmake-aarch64-quick bot. We now extend the truncate
optimization to non-primary induction variables only if the truncate isn't
already free.

Differential Revision: https://reviews.llvm.org/D29847

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295063 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 16:28:32 +00:00
Simon Pilgrim
2fce16a04e [X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise UNDEF inputs
Add support for specifying an UNPCK input as UNDEF


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295061 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 16:22:04 +00:00
Igor Laevsky
0ef69ab99f [SCEV] Cache results during GetMinTrailingZeros query
Differential Revision: https://reviews.llvm.org/D29759



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295060 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 15:53:12 +00:00
Simon Pilgrim
fbe71f8404 [X86][SSE] Add shuffle combine tests showing missed opportunities to use UNPCK
Not correctly using UNDEF or ZERO inputs to combine to UNPCK shuffles

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295059 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 15:49:37 +00:00
Simon Pilgrim
5090868cab [X86][SSE] Regenerate intrinsic upgrade tests
Remove excess semicolons

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295058 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 15:29:50 +00:00
Alexey Bataev
2cc74041bd [SLP] Fix for PR31879: vectorize repeated scalar ops that don't get put
back into a vector

Previously the cost of the existing ExtractElement/ExtractValue
instructions was considered as a dead cost only if it was detected that
they have only one use. But these instructions may be considered
dead also if users of the instructions are also going to be vectorized,
like:
```
%x0 = extractelement <2 x float> %x, i32 0
%x1 = extractelement <2 x float> %x, i32 1
%x0x0 = fmul float %x0, %x0
%x1x1 = fmul float %x1, %x1
%add = fadd float %x0x0, %x1x1
```
This can be transformed to
```
%1 = fmul <2 x float> %x, %x
%2 = extractelement <2 x float> %1, i32 0
%3 = extractelement <2 x float> %1, i32 1
%add = fadd float %2, %3
```
because though `%x0` and `%x1` have 2 users each other, these users are
part of the vectorized tree and we can consider these `extractelement`
instructions as dead.

Differential Revision: https://reviews.llvm.org/D29900

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295056 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 15:20:48 +00:00
Alexander Timofeev
23db8abf86 Revert "[AMDGPU] Fix for SIMachineScheduler crash. SI Scheduler should track"
This reverts commit ce06d9cb99298eb844b66e117f5108a06747c907.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295054 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 14:29:05 +00:00
Alexey Bataev
a4e9174165 [SLP] Additional tests for extractelement cost fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295050 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 12:52:05 +00:00
Simon Pilgrim
10283cd1fa [X86][SSE] Test case showing missed PSHUFB target shuffle constant fold opportunity.
It also shows an unnecessary pshufb/broadcast being used - the original pshufb mask only requested the lowest byte.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295046 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 11:20:11 +00:00
Karl-Johan Karlsson
ad0908f33b Revert "[LoopVectorize] Added address space check when analysing interleaved accesses"
This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed.
I'm reverting to investigate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295042 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 10:06:16 +00:00
Karl-Johan Karlsson
26a008fb2f [LoopVectorize] Added address space check when analysing interleaved accesses
Prevent memory objects of different address spaces to be part of
the same load/store groups when analysing interleaved accesses.

This is fixing pr31900.

Reviewers: HaoLiu, mssimpso, mkuper

Reviewed By: mssimpso, mkuper

Subscribers: llvm-commits, efriedma, mzolotukhin

Differential Revision: https://reviews.llvm.org/D29717

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295038 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 08:14:06 +00:00
Craig Topper
6383c00b0a [AVX-512] Add PAVGB/PAVGW to load folding tables.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295035 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 06:54:57 +00:00
Mikael Holmen
828d0d9643 [LSR] Pointers with different address spaces are considered incompatible.
Summary:
Function isCompatibleIVType is already used as a guard before the call to

 SE.getMinusSCEV(OperExpr, PrevExpr);

in LSRInstance::ChainInstruction. getMinusSCEV requires the expressions
to be of the same type, so we now consider two pointers with different
address spaces to be incompatible, since it is possible that the pointers
in fact have different sizes.

Reviewers: qcolombet, eli.friedman

Reviewed By: qcolombet

Subscribers: nhaehnle, Ka-Ka, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D29885

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295033 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 06:37:42 +00:00
Peter Collingbourne
38e3958af5 ThinLTOBitcodeWriter: Write available_externally copies of VCP eligible functions to merged module.
Differential Revision: https://reviews.llvm.org/D29701

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295021 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 03:42:38 +00:00
Philip Reames
33ee99ac22 [LICM] Make store promotion work in the face of unordered atomics
Extend our store promotion code to deal with unordered atomic accesses. Ordered atomics continue to be unhandled.

Most of the change is straight-forward, the only complicated bit is in the reasoning around mixing of atomic and non-atomic memory access. Rather than trying to reason about the complex semantics in these cases, I simply disallowed promotion when both atomic and non-atomic accesses are present. This is conservatively correct.

It seems really tempting to just promote all access to atomics, but the original accesses might have been conditional. Since we can't lower an arbitrary atomic type, it might not be safe to promote all access to atomic. Consider a loop like the following:
while(b) {
  load i128 ...
  if (can lower i128 atomic)
    store atomic i128 ...
  else
    store i128
}

It could be there's no race on the location and thus the code is perfectly well defined even if we can't lower a i128 atomically. 

It's not clear we need to be this conservative - arguably the program above is brocken since it can't be lowered unless the branch is folded - but I didn't want to have to fix any fallout which might result.

Differential Revision: https://reviews.llvm.org/D15592



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295015 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 01:38:31 +00:00
Andrew Kaylor
a46c278634 [X86] Add MXCSR register
This adds MXCSR to the set of recognized registers for X86 targets and updates the instructions that read or write it. I do not intend for all of the various floating point instructions that implicitly use the control bits or update the status bits of this register to ever have that usage modeled by default. However, when constrained floating point modes (such as strict FP exception status modeling or dynamic rounding modes) are enabled, implicit use/def information for MXCSR will be added to those instructions.

Until those additional updates are made this should cause (almost?) no functional changes. Theoretically, this will prevent instructions like LDMXCSR and STMXCSR from being moved past one another, but that should be prevented anyway and I haven't found a case where it is happening now.

Differential Revision: https://reviews.llvm.org/D29903



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295004 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 23:38:52 +00:00
Sanjay Patel
a771f08794 [FunctionAttrs] try to extend nonnull-ness of arguments from a callsite back to its parent function
As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-December/108182.html
...we should be able to propagate 'nonnull' info from a callsite back to its parent.

The original motivation for this patch is our botched optimization of "dyn_cast" (PR28430),
but this won't solve that problem.

The transform is currently disabled by default while we wait for clang to work-around
potential security problems:
http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html

Differential Revision: https://reviews.llvm.org/D27855


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294998 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 23:10:51 +00:00
Amaury Sechet
1cf4c63780 Revert autogenerated check result for test/CodeGen/X86/atomic-minmax-i6432.ll as they don't regenerate cleanly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294996 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 23:00:23 +00:00
Tim Northover
13a4b7e61f GlobalISel: represent atomic loads & stores via the MachineMemOperand.
Also make sure the AArch64 backend doesn't try to convert them into normal
loads and stores.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294993 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 22:14:16 +00:00
Tim Northover
1a3fe2bfe6 MIR: parse & print the atomic parts of a MachineMemOperand.
We're going to need them very soon for GlobalISel.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294992 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 22:14:08 +00:00
Arnold Schwaighofer
db4a46b079 swiftcc: Don't emit tail calls from callers with swifterror parameters
Backends don't support this yet. They would have to move to the swifterror
register before the tail call to make sure it is live-in to the call.

rdar://30495920

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294982 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 19:58:28 +00:00
Peter Collingbourne
a5035323ac IR: Type ID summary extensions for WPD; thread summary into WPD pass.
Make the whole thing testable by adding YAML I/O support for the WPD
summary information and adding some negative tests that exercise the
YAML support.

Differential Revision: https://reviews.llvm.org/D29782

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294981 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 19:26:18 +00:00
Alexey Bataev
aa5c0a0385 [SLP] Test for extractelement cost fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294980 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 19:08:19 +00:00
Taewook Oh
9fdcd96d07 Make MachineBasicBlock::updateTerminator to update DebugLoc as well
Summary:
Currently MachineBasicBlock::updateTerminator simply drops DebugLoc for newly created branch instructions, which may cause incorrect stepping and/or imprecise sample profile data. Below is an example:

```
  1 extern int bar(int x);
  2
  3 int foo(int *begin, int *end) {
  4   int *i;
  5   int ret = 0;
  6   for (
  7       i = begin ;
  8       i != end ;
  9       i++)
 10   {
 11       ret += bar(*i);
 12   }
 13   return ret;
 14 }
```

Below is a bitcode of 'foo' at the end of LLVM-IR level optimizations with -O3:

```
define i32 @foo(i32* readonly %begin, i32* readnone %end) !dbg !4 {
entry:
  %cmp6 = icmp eq i32* %begin, %end, !dbg !9
  br i1 %cmp6, label %for.end, label %for.body.preheader, !dbg !12

for.body.preheader:                               ; preds = %entry
  br label %for.body, !dbg !13

for.body:                                         ; preds = %for.body.preheader, %for.body
  %ret.08 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
  %i.07 = phi i32* [ %incdec.ptr, %for.body ], [ %begin, %for.body.preheader ]
  %0 = load i32, i32* %i.07, align 4, !dbg !13, !tbaa !15
  %call = tail call i32 @bar(i32 %0), !dbg !19
  %add = add nsw i32 %call, %ret.08, !dbg !20
  %incdec.ptr = getelementptr inbounds i32, i32* %i.07, i64 1, !dbg !21
  %cmp = icmp eq i32* %incdec.ptr, %end, !dbg !9
  br i1 %cmp, label %for.end.loopexit, label %for.body, !dbg !12, !llvm.loop !22

for.end.loopexit:                                 ; preds = %for.body
  br label %for.end, !dbg !24

for.end:                                          ; preds = %for.end.loopexit, %entry
  %ret.0.lcssa = phi i32 [ 0, %entry ], [ %add, %for.end.loopexit ]
  ret i32 %ret.0.lcssa, !dbg !24
}
```

where

```
!12 = !DILocation(line: 6, column: 3, scope: !11)
```

. As you can see, the terminator of 'entry' block, which is a loop control branch, has a DebugLoc of line 6, column 3. Howerver, after the execution of 'MachineBlock::updateTerminator' function, which is triggered by MachineSinking pass, the DebugLoc info is dropped as below (see there's no debug-location for JNE_1):

```
  bb.0.entry:
    successors: %bb.4(0x30000000), %bb.1.for.body.preheader(0x50000000)
    liveins: %rdi, %rsi

    %6 = COPY %rsi
    %5 = COPY %rdi
    %8 = SUB64rr %5, %6, implicit-def %eflags, debug-location !9
    JNE_1 %bb.1.for.body.preheader, implicit %eflags
```

This patch addresses this issue and make newly created branch instructions to keep debug-location info.

Reviewers: aprantl, MatzeB, craig.topper, qcolombet

Reviewed By: qcolombet

Subscribers: qcolombet, llvm-commits

Differential Revision: https://reviews.llvm.org/D29596

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294976 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 18:15:31 +00:00
Matthew Simpson
d1fc5442e7 Revert "[LV] Extend trunc optimization to all IVs with constant integer steps"
This reverts commit r294967. This patch caused execution time slowdowns in a
few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot.
I'm reverting to investigate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294973 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 18:02:35 +00:00
Quentin Colombet
58a124bd02 [FastISel] Add a diagnostic to warm on fallback.
This is consistent with what we do for GlobalISel. That way, it is easy
to see whether or not FastISel is able to fully select a function.
At some point we may want to switch that to an optimization remark.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294970 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 17:38:59 +00:00
James Molloy
eef9539b00 [ARM] Fix crash caused by r294945
I'd missed a creator of FCMP nodes - duplicateCmp().

Kindly and promptly reported by Gabor Ballabas, due to his CSiBE test suite.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294968 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 17:18:00 +00:00
Matthew Simpson
f32a3fd4af [LV] Extend trunc optimization to all IVs with constant integer steps
This patch extends the optimization of truncations whose operand is an
induction variable with a constant integer step. Previously we were only
applying this optimization to the primary induction variable. However, the cost
model assumes the optimization is applied to the truncation of all integer
induction variables (even regardless of step type). The transformation is now
applied to the other induction variables, and I've updated the cost model to
ensure it is better in sync with the transformation we actually perform.

Differential Revision: https://reviews.llvm.org/D29847

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294967 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 16:48:00 +00:00