22103 Commits

Author SHA1 Message Date
Teresa Johnson
0640903d30 Handle link of NoDebug CU with a CU that has debug emission enabled
Summary:
This is an issue both with regular and Thin LTO. When we link together
a DICompileUnit that is marked NoDebug (e.g when compiling with -g0
but applying an AutoFDO profile, which requires location tracking
in the compiler) and a DICompileUnit with debug emission enabled,
we can have failures during dwarf debug generation. Specifically,
when we have inlined from the NoDebug compile unit into the debug
compile unit, we can fail during construction of the abstract and
inlined scope DIEs. This is because the SPMap does not include NoDebug
CUs (they are skipped in the debug_compile_units_iterator).

This patch fixes the failures by skipping locations from NoDebug CUs
when extracting lexical scopes.

Reviewers: dblaikie, aprantl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D29765

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295384 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-17 00:21:19 +00:00
Benjamin Kramer
878598debf [MachinePipeliner] Remove redundant destructor. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295372 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-16 20:26:51 +00:00
David Blaikie
b42dbbf6b5 Refactor DebugHandlerBase a bit to common non-debug-having-function filtering
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295354 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-16 18:48:33 +00:00
Artur Pilipenko
e97e1b5918 [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Resubmit -r295314 with PowerPC and AMDGPU tests updated.

Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295336 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-16 17:07:27 +00:00
Artur Pilipenko
914d7a67a3 Rever -r295314 "[DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine"
This change causes some of AMDGPU and PowerPC tests to fail.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295316 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-16 13:04:46 +00:00
Artur Pilipenko
40edfac454 [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295314 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-16 12:53:26 +00:00
Diana Picus
cb363f5625 [ARM] GlobalISel: Lower double precision FP args
For the hard float calling convention, we just use the D registers.

For the soft-fp calling convention, we use the R registers and move values
to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we
make sure to honor the endianness of the target, since the CCAssignFn doesn't do
that for us.

For pure soft float targets, we still bail out because we don't support the
libcalls yet.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295295 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-16 07:53:07 +00:00
Hans Wennborg
b6ae6ad928 [X86] Re-enable conditional tail calls and fix PR31257.
This reverts r294348, which removed support for conditional tail calls
due to the PR above. It fixes the PR by marking live registers as
implicitly used and defined by the now predicated tailcall. This is
similar to how IfConversion predicates instructions.

Differential Revision: https://reviews.llvm.org/D29856

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295262 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-16 00:04:05 +00:00
Tim Northover
5562e17d88 GlobalISel: legalize va_arg on AArch64.
Uses a Custom implementation because the slot sizes being a multiple of the
pointer size isn't really universal, even for the architectures that do have a
simple "void *" va_list.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295255 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 23:22:50 +00:00
Tim Northover
432026394b GlobalISel: support translating va_arg
Since (say) i128 and [16 x i8] map to the same type in generic MIR, we also
need to attach the required alignment info.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295254 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 23:22:33 +00:00
Matt Arsenault
ad35e2220e Fix typos
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295246 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 22:19:06 +00:00
Matt Arsenault
34ff574128 DAG: Do not scalarize fsub if fneg is legal
Tests will be included with future commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295242 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 22:02:42 +00:00
Kyle Butt
a466b368fe Codegen: Make chains from trellis-shaped CFGs
Lay out trellis-shaped CFGs optimally.
A trellis of the shape below:

  A     B
  |\   /|
  | \ / |
  |  X  |
  | / \ |
  |/   \|
  C     D

would be laid out A; B->C ; D by the current layout algorithm. Now we identify
trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an
increasing number of predecessors. A trellis is a a group of 2 or more
predecessor blocks that all have the same successors.

because of this we can tail duplicate to extend existing trellises.

As an example consider the following CFG:

    B   D   F   H
   / \ / \ / \ / \
  A---C---E---G---Ret

Where A,C,E,G are all small (Currently 2 instructions).

The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret.

The current code will copy C into B, E into D and G into F and yield the layout
A,C,B(C),E,D(E),F(G),G,H,ret

define void @straight_test(i32 %tag) {
entry:
  br label %test1
test1: ; A
  %tagbit1 = and i32 %tag, 1
  %tagbit1eq0 = icmp eq i32 %tagbit1, 0
  br i1 %tagbit1eq0, label %test2, label %optional1
optional1: ; B
  call void @a()
  br label %test2
test2: ; C
  %tagbit2 = and i32 %tag, 2
  %tagbit2eq0 = icmp eq i32 %tagbit2, 0
  br i1 %tagbit2eq0, label %test3, label %optional2
optional2: ; D
  call void @b()
  br label %test3
test3: ; E
  %tagbit3 = and i32 %tag, 4
  %tagbit3eq0 = icmp eq i32 %tagbit3, 0
  br i1 %tagbit3eq0, label %test4, label %optional3
optional3: ; F
  call void @c()
  br label %test4
test4: ; G
  %tagbit4 = and i32 %tag, 8
  %tagbit4eq0 = icmp eq i32 %tagbit4, 0
  br i1 %tagbit4eq0, label %exit, label %optional4
optional4: ; H
  call void @d()
  br label %exit
exit:
  ret void
}

here is the layout after D27742:
straight_test:                          # @straight_test
; ... Prologue elided
; BB#0:                                 # %entry ; A (merged with test1)
; ... More prologue elided
	mr 30, 3
	andi. 3, 30, 1
	bc 12, 1, .LBB0_2
; BB#1:                                 # %test2 ; C
	rlwinm. 3, 30, 0, 30, 30
	beq	 0, .LBB0_3
	b .LBB0_4
.LBB0_2:                                # %optional1 ; B (copy of C)
	bl a
	nop
	rlwinm. 3, 30, 0, 30, 30
	bne	 0, .LBB0_4
.LBB0_3:                                # %test3 ; E
	rlwinm. 3, 30, 0, 29, 29
	beq	 0, .LBB0_5
	b .LBB0_6
.LBB0_4:                                # %optional2 ; D (copy of E)
	bl b
	nop
	rlwinm. 3, 30, 0, 29, 29
	bne	 0, .LBB0_6
.LBB0_5:                                # %test4 ; G
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
	b .LBB0_7
.LBB0_6:                                # %optional3 ; F (copy of G)
	bl c
	nop
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
.LBB0_7:                                # %optional4 ; H
	bl d
	nop
.LBB0_8:                                # %exit ; Ret
	ld 30, 96(1)                    # 8-byte Folded Reload
	addi 1, 1, 112
	ld 0, 16(1)
	mtlr 0
	blr

The tail-duplication has produced some benefit, but it has also produced a
trellis which is not laid out optimally. With this patch, we improve the layouts
of such trellises, and decrease the cost calculation for tail-duplication
accordingly.

This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have
back edges, which is a negative, but it has a bigger compensating
positive, which is that it handles the case where there are long strings
of skipped blocks much better than the original layout. Both layouts
handle runs of executed blocks equally well. Branch prediction also
improves if there is any correlation between subsequent optional blocks.

Here is the resulting concrete layout:

straight_test:                          # @straight_test
; BB#0:                                 # %entry ; A (merged with test1)
	mr 30, 3
	andi. 3, 30, 1
	bc 12, 1, .LBB0_4
; BB#1:                                 # %test2 ; C
	rlwinm. 3, 30, 0, 30, 30
	bne	 0, .LBB0_5
.LBB0_2:                                # %test3 ; E
	rlwinm. 3, 30, 0, 29, 29
	bne	 0, .LBB0_6
.LBB0_3:                                # %test4 ; G
	rlwinm. 3, 30, 0, 28, 28
	bne	 0, .LBB0_7
	b .LBB0_8
.LBB0_4:                                # %optional1 ; B (Copy of C)
	bl a
	nop
	rlwinm. 3, 30, 0, 30, 30
	beq	 0, .LBB0_2
.LBB0_5:                                # %optional2 ; D (Copy of E)
	bl b
	nop
	rlwinm. 3, 30, 0, 29, 29
	beq	 0, .LBB0_3
.LBB0_6:                                # %optional3 ; F (Copy of G)
	bl c
	nop
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
.LBB0_7:                                # %optional4 ; H
	bl d
	nop
.LBB0_8:                                # %exit

Differential Revision: https://reviews.llvm.org/D28522

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295223 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 19:49:14 +00:00
Xinliang David Li
210c690520 include function name in dot filename
Differential Revision: http://reviews.llvm.org/D29975


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295220 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 19:21:04 +00:00
Michael Kuperstein
2b723d3caf [DAG] Don't try to create an INSERT_SUBVECTOR with an illegal source
We currently can't legalize those, but we should really not be creating
them in the first place, since legalization would probably look similar to the
way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD.

This fixes PR311956.

Differential Revision: https://reviews.llvm.org/D29961


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295213 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 18:37:26 +00:00
Sagar Thakur
22d520c2ac [LLVM][XRAY][MIPS] Support xray on mips/mipsel/mips64/mips64el
Summary: Adds support for xray instrumentation on mips for both 32-bit and 64-bit.

Reviewed by sdardis, dberris
Differential: D27697


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295164 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 10:48:11 +00:00
Craig Topper
dc5ec056d8 [SelectionDAGBuilder] Simplify creation of shufflevector DAG nodes where inputs are larger than the mask
Summary:
The current code loops over all elements to calculate a used range. Then a second short loop looks at the ranges and determines if they can be used in a extract and creates a properly aligned start index for the extract.

This range finding is unnecessary, we can just calculate a properly aligned start index for an extract for each input during the first loop. If we don't find the same start index for each indice we can't use an extract.

Reviewers: zvi, RKSimon

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29926

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295152 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 05:57:16 +00:00
Reid Kleckner
f3470b10da [BranchFolding] Tail common all identical unreachable blocks
Summary:
Blocks ending in unreachable are typically cold because they end the
program or throw an exception, so merging them with other identical
blocks is usually profitable because it reduces the size of cold code.
MachineBlockPlacement generally does not arrange to fall through to such
blocks, so commoning these blocks will not introduce additional
unconditional branches.

Reviewers: hans, iteratee, haicheng

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29153

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295105 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 21:02:24 +00:00
Tim Northover
173c7d4750 GlobalISel: introduce G_PTR_MASK to simplify alloca handling.
This instruction clears the low bits of a pointer without requiring (possibly
dodgy if pointers aren't ints) conversions to and from an integer. Since (as
far as I'm aware) all masks are statically known, the instruction takes an
immediate operand rather than a register to specify the mask.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295103 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 20:56:18 +00:00
Eric Christopher
71648d0a82 Reformat slightly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295096 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 19:43:50 +00:00
Wolfgang Pieb
5c49cf1beb Reapply r294532, reverted in r294787.
Store instructions can have more than one memory operand as a result
of optimizations that fold different stores into one.
When we identify spill instructions to generate DBG_VALUE instructions
to record the spilling of a variable, we disregard stores with 
multiple memory operands for now. We may miss some relevant spills but
the handling is a bit more complex, so we'll do it in a different patch.

This fixes PR31935.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295093 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 19:08:45 +00:00
Aditya Nandakumar
3ca564103a [Tablegen] Instrumenting table gen DAGGenISelDAG
To help assist in debugging ISEL or to prioritize GlobalISel backend
work, this patch adds two more tables to <Target>GenISelDAGISel.inc -
one which contains the patterns that are used during selection and the
other containing include source location of the patterns
Enabled through CMake varialbe LLVM_ENABLE_DAGISEL_COV

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295081 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 18:32:41 +00:00
Adam Nemet
117458356f Add new pass LazyMachineBlockFrequencyInfo
And use it in MachineOptimizationRemarkEmitter.  A test will follow on top of
Justin's changes to enable MachineORE in AsmPrinter.

The approach is similar to the IR-level pass.  It's a bit simpler because BPI
is immutable at the Machine level so we don't need to make that lazy.

Because of this, a new function mapping is introduced (BPIPassTrait::getBPI).
This function extracts BPI from the pass.  In case of the lazy pass, this is
when the calculation of the BFI occurs.  For Machine-level, this is the
identity function.

Differential Revision: https://reviews.llvm.org/D29836

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295072 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 17:21:09 +00:00
Artyom Skrobov
90b785146d Removing a redundant assignment
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295055 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 14:44:01 +00:00
Eugene Zelenko
7211c537b1 [MC] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
Same changes in files affected by reduced MC headers dependencies.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295009 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 00:33:36 +00:00
Tim Northover
13a4b7e61f GlobalISel: represent atomic loads & stores via the MachineMemOperand.
Also make sure the AArch64 backend doesn't try to convert them into normal
loads and stores.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294993 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 22:14:16 +00:00
Tim Northover
1a3fe2bfe6 MIR: parse & print the atomic parts of a MachineMemOperand.
We're going to need them very soon for GlobalISel.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294992 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 22:14:08 +00:00
Taewook Oh
47946adf77 Address post-commit comments for https://reviews.llvm.org/D29596. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294985 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 21:12:27 +00:00
Arnold Schwaighofer
db4a46b079 swiftcc: Don't emit tail calls from callers with swifterror parameters
Backends don't support this yet. They would have to move to the swifterror
register before the tail call to make sure it is live-in to the call.

rdar://30495920

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294982 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 19:58:28 +00:00
Taewook Oh
9fdcd96d07 Make MachineBasicBlock::updateTerminator to update DebugLoc as well
Summary:
Currently MachineBasicBlock::updateTerminator simply drops DebugLoc for newly created branch instructions, which may cause incorrect stepping and/or imprecise sample profile data. Below is an example:

```
  1 extern int bar(int x);
  2
  3 int foo(int *begin, int *end) {
  4   int *i;
  5   int ret = 0;
  6   for (
  7       i = begin ;
  8       i != end ;
  9       i++)
 10   {
 11       ret += bar(*i);
 12   }
 13   return ret;
 14 }
```

Below is a bitcode of 'foo' at the end of LLVM-IR level optimizations with -O3:

```
define i32 @foo(i32* readonly %begin, i32* readnone %end) !dbg !4 {
entry:
  %cmp6 = icmp eq i32* %begin, %end, !dbg !9
  br i1 %cmp6, label %for.end, label %for.body.preheader, !dbg !12

for.body.preheader:                               ; preds = %entry
  br label %for.body, !dbg !13

for.body:                                         ; preds = %for.body.preheader, %for.body
  %ret.08 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
  %i.07 = phi i32* [ %incdec.ptr, %for.body ], [ %begin, %for.body.preheader ]
  %0 = load i32, i32* %i.07, align 4, !dbg !13, !tbaa !15
  %call = tail call i32 @bar(i32 %0), !dbg !19
  %add = add nsw i32 %call, %ret.08, !dbg !20
  %incdec.ptr = getelementptr inbounds i32, i32* %i.07, i64 1, !dbg !21
  %cmp = icmp eq i32* %incdec.ptr, %end, !dbg !9
  br i1 %cmp, label %for.end.loopexit, label %for.body, !dbg !12, !llvm.loop !22

for.end.loopexit:                                 ; preds = %for.body
  br label %for.end, !dbg !24

for.end:                                          ; preds = %for.end.loopexit, %entry
  %ret.0.lcssa = phi i32 [ 0, %entry ], [ %add, %for.end.loopexit ]
  ret i32 %ret.0.lcssa, !dbg !24
}
```

where

```
!12 = !DILocation(line: 6, column: 3, scope: !11)
```

. As you can see, the terminator of 'entry' block, which is a loop control branch, has a DebugLoc of line 6, column 3. Howerver, after the execution of 'MachineBlock::updateTerminator' function, which is triggered by MachineSinking pass, the DebugLoc info is dropped as below (see there's no debug-location for JNE_1):

```
  bb.0.entry:
    successors: %bb.4(0x30000000), %bb.1.for.body.preheader(0x50000000)
    liveins: %rdi, %rsi

    %6 = COPY %rsi
    %5 = COPY %rdi
    %8 = SUB64rr %5, %6, implicit-def %eflags, debug-location !9
    JNE_1 %bb.1.for.body.preheader, implicit %eflags
```

This patch addresses this issue and make newly created branch instructions to keep debug-location info.

Reviewers: aprantl, MatzeB, craig.topper, qcolombet

Reviewed By: qcolombet

Subscribers: qcolombet, llvm-commits

Differential Revision: https://reviews.llvm.org/D29596

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294976 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 18:15:31 +00:00
Quentin Colombet
58a124bd02 [FastISel] Add a diagnostic to warm on fallback.
This is consistent with what we do for GlobalISel. That way, it is easy
to see whether or not FastISel is able to fully select a function.
At some point we may want to switch that to an optimization remark.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294970 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 17:38:59 +00:00
Sanne Wouda
083bd6368a [Assembler] Improve diagnostics for inline assembly.
Summary:
Keep a vector of LocInfos around; one for each call to EmitInlineAsm.
Since each call to EmitInlineAsm creates a new buffer in the inline asm
SourceMgr, we can use the buffer number to map to the right LocInfo.

Reviewers: rengolin, grosbach, rnk, echristo

Reviewed By: rnk

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D29769


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294947 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 13:58:00 +00:00
Andrew V. Tischenko
ed1646e691 Compile time decreasing in the case we're dealing with Machine Combiner.
Before this patch compile time was about 21s (see below). After this patch
we have less than 2s (see bellow).

  Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz

    DAGCombiner - trunk
        time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
        real  0m1.685s

    DAGCombiner + Speed patch
        time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
        real  0m1.655s

    MachineCombiner w/o Speed patch
        time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
        real  0m21.614s

    MachineCombiner + Speed patch
        time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
        real  0m1.593s

The test spill_fdiv.ll  is attached to D29627
D29627 should be closed.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294936 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 09:43:37 +00:00
Craig Topper
5b7ece9f05 [DAGCombiner] Teach DAG combine that inserting an extract_subvector result into the same location of a an undef vector can just use the original input to the extract.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294932 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 04:53:33 +00:00
Craig Topper
220d93f415 [DAGCombiner] Remove the half vector width check for the combine of EXTRACT_SUBVECTOR from an INSERT_SUBVECTOR.
This gives more parallelism opportunities for AVX-512 when dealing with 128-bit extracts from 512-bit vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294930 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-12 23:49:49 +00:00
Sanjay Patel
771a901cd4 [TargetLowering] fix SETCC SETLT folding with FP types
The bug was introduced with:
https://reviews.llvm.org/rL294863

...and manifests as a selection failure in x86, but that's actually
another bug. This fix prevents wrong codegen with -0.0, but in the
more common case when we have NSZ and NNAN (-ffast-math), we should 
still be able to fold this setcc/compare.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294924 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-12 23:07:52 +00:00
Craig Topper
a1a9b947dc [DAGCombiner] Make the combine of INSERT_SUBVECTOR into a CONCAT_VECTOR more generic to support larger concats.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294875 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-11 22:57:09 +00:00
Sanjay Patel
57101fca7a [TargetLowering] check for sign-bit comparisons in SimplifyDemandedBits
I don't know if anything other than x86 vectors is affected by this change, but this may allow 
us to remove target-specific intrinsics for blendv* (vector selects). The simplification arises
from the fact that blendv* instructions only use the sign-bit when deciding which vector element
to choose for the destination vector. The mechanism to fold VSELECT into SHRUNKBLEND nodes already
exists in x86 lowering; this demanded bits change just enables the transform to fire more often.

The original motivation starts with a bug for DSE of masked stores that seems completely unrelated, 
but I've explained the likely steps in this series here:
https://llvm.org/bugs/show_bug.cgi?id=11210

Differential Revision: https://reviews.llvm.org/D29687


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294863 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-11 18:01:55 +00:00
Nico Weber
4a827800c6 Revert r294532, it caused PR31935
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294787 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-10 21:57:30 +00:00
Tim Shen
2c44e216a8 [XRay] Implement powerpc64le xray.
Summary:
powerpc64 big-endian is not supported, but I believe that most logic can
be shared, except for xray_powerpc64.cc.

Also add a function InvalidateInstructionCache to xray_util.h, which is
copied from llvm/Support/Memory.cpp. I'm not sure if I need to add a unittest,
and I don't know how.

Reviewers: dberris, echristo, iteratee, kbarton, hfinkel

Subscribers: mehdi_amini, nemanjai, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D29742

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294781 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-10 21:03:24 +00:00
Tim Northover
10fde8b13b GlobalISel: drop lifetime intrinsics during translation.
We don't use them yet and they just cause problems.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294770 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-10 19:10:38 +00:00
Simon Pilgrim
ada0a4f5b0 [DAGCombine] Allow vector constant folding of any value type before type legalization
The patch comes in 2 parts:

1 - it makes use of the SelectionDAG::NewNodesMustHaveLegalTypes flag to tell when it can safely constant fold illegal types.

2 - it correctly resets SelectionDAG::NewNodesMustHaveLegalTypes at the start of each call to SelectionDAGISel::CodeGenAndEmitDAG so all the pre-legalization stages can make use of it - not just the first basic block that gets handled.

Fix for PR30760

Differential Revision: https://reviews.llvm.org/D29568

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294749 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-10 14:37:25 +00:00
Craig Topper
5b47011340 [SelectionDAG] Dump the DAG after legalizing vector ops and after the second type legalization
Summary:
With -debug, we aren't dumping the DAG after legalizing vector ops. In particular, on X86 with AVX1 only, we don't dump the DAG after we split 256-bit integer ops into pairs of 128-bit ADDs since this occurs during vector legalization.

I'm only dumping if the legalize vector ops changes something since we don't print anything during legalize vector ops. So this dump shows up right after the first type-legalization dump happens. So if nothing changed this second dump is unnecessary.

Having said that though, I think we should probably fix legalize vector ops to log what its doing.

Reviewers: RKSimon, eli.friedman, spatel, arsenm, chandlerc

Reviewed By: RKSimon

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D29554

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294711 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-10 05:05:57 +00:00
Eric Fiselier
a61fc423f3 [CMake] Fix pthread handling for out-of-tree builds
LLVM defines `PTHREAD_LIB` which is used by AddLLVM.cmake and various projects
to correctly link the threading library when needed. Unfortunately
`PTHREAD_LIB` is defined by LLVM's `config-ix.cmake` file which isn't installed
and therefore can't be used when configuring out-of-tree builds. This causes
such builds to fail since `pthread` isn't being correctly linked.

This patch attempts to fix that problem by renaming and exporting
`LLVM_PTHREAD_LIB` as part of`LLVMConfig.cmake`. I renamed `PTHREAD_LIB`
because It seemed likely to cause collisions with downstream users of
`LLVMConfig.cmake`.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294690 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-10 01:59:20 +00:00
Geoff Berry
e22db94a8f [SelectionDAG] Fix bugs in inverted condition splitting code.
Summary:
Fix two bugs in SelectionDAGBuilder::FindMergedConditions reported by
Mikael Holmen.  Handle non-canonicalized xor not operation
correctly (was assuming operand 0 was always the non-constant operand)
and check that the negated condition is also in the same block as the
original and/or instruction (as is done for and/or operands already)
before proceeding with optimization.

Reviewers: bogner, MatzeB, qcolombet

Subscribers: mcrosier, uabelho, llvm-commits

Differential Revision: https://reviews.llvm.org/D29680

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294605 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-09 18:28:17 +00:00
David Bozier
32e76df475 Revert: "[Stack Protection] Add diagnostic information for why stack protection was applied to a function"
this reverts revision r294590 as it broke some buildbots.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294593 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-09 15:40:14 +00:00
David Bozier
ed71bd2a51 [Stack Protection] Add diagnostic information for why stack protection was applied to a function
Stack Smash Protection is not completely free, so in hot code, the overhead it causes can cause performance issues. By adding diagnostic information for which function have SSP and why, a user can quickly determine what they can do to stop SSP being applied to a specific hot function.

This change adds an SSP-specific DiagnosticInfo class and uses of it to the Stack Protection code. A subsequent change to clang will cause the remarks to be emitted when enabled.

Patch by: James Henderson

Differential Revision: https://reviews.llvm.org/D29023



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294590 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-09 15:08:40 +00:00
Artur Pilipenko
66a342c211 [DAGCombiner] Support non-zero offset in load combine
Enable folding patterns which load the value from non-zero offset:

  i8 *a = ...
  i32 val = a[4] | (a[5] << 8) | (a[6] << 16) | (a[7] << 24)
=>
  i32 val = *((i32*)(a+4))

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D29394


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294582 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-09 12:06:01 +00:00
Wolfgang Pieb
2012b38a6b Reapply r294356 ("Keep track of spilled variables in LiveDebugValues").
Was reverted with r294447 due to undefined behavior with negative offsets
in DBG_VALUE instructions.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294532 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-08 23:46:59 +00:00
Tim Northover
03324d2ec1 GlobalISel: legalize G_FPOW to a libcall on AArch64.
There's no instruction to implement it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294531 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-08 23:23:39 +00:00