Added code to check constant bus restrictions for VOP formats (only one SGPR value or literal-constant may be used by the instruction).
Note that the same checks are performed by SIInstrInfo::verifyInstruction (used by lowering code).
Added LIT tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296873 91177308-0d34-0410-b5e6-96231b3b80d8
for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR.
The fix is to compute the mask for out of order memory accesses while building the vectorizable tree
instead of actual vectorization of vectorizable tree.It also needs to recompute the proper Lane for
external use of vectorizable scalars based on shuffle mask.
Reviewers: mkuper
Differential Revision: https://reviews.llvm.org/D30159
Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f487e2f60ce5d
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296863 91177308-0d34-0410-b5e6-96231b3b80d8
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296862 91177308-0d34-0410-b5e6-96231b3b80d8
This is a cleanup/rewrite of the parseSysAlias function. It was not using the
tablegen instruction descriptions, but was “manually” matching the mnemonics
and recreating the operands whereas all this information is already in
tablegen; all this code has been replaced with calls to lookupXYZByName
tablegen calls.
Differential Revision: https://reviews.llvm.org/D30491
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296857 91177308-0d34-0410-b5e6-96231b3b80d8
A call should never modify the stack pointer, but some backends are
not so sure about this and never list SP in the regmask. For the
purposes of LiveDebugValues we assume a call never clobbers SP. We
already have a similar workaround in DbgValueHistoryCalculator (which
we hopefully can retire soon).
This fixes the availabilty of local ASANified variables on AArch64.
rdar://problem/27757381
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296847 91177308-0d34-0410-b5e6-96231b3b80d8
For chains of triangles with small join blocks that can be tail duplicated, a
simple calculation of probabilities is insufficient. Tail duplication
can be profitable in 3 different ways for these cases:
1) The post-dominators marked 50% are actually taken 56% (This shrinks with
longer chains)
2) The chains are statically correlated. Branch probabilities have a very
U-shaped distribution.
[http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805]
If the branches in a chain are likely to be from the same side of the
distribution as their predecessor, but are independent at runtime, this
transformation is profitable. (Because the cost of being wrong is a small
fixed cost, unlike the standard triangle layout where the cost of being
wrong scales with the # of triangles.)
3) The chains are dynamically correlated. If the probability that a previous
branch was taken positively influences whether the next branch will be
taken
We believe that 2 and 3 are common enough to justify the small margin in 1.
The code pre-scans a function's CFG to identify this pattern and marks the edges
so that the standard layout algorithm can use the computed results.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296845 91177308-0d34-0410-b5e6-96231b3b80d8
Such edges may otherwise result in infinite recursion if a pointer to a vtable
is reachable from the vtable itself. This can happen in practice if a TU
defines the ABI types used to implement RTTI, and is itself compiled with RTTI.
Fixes PR32121.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296839 91177308-0d34-0410-b5e6-96231b3b80d8
ValueTracking is used for more thorough analysis of operands. Based on the
analysis, either run-time checks can be simplified (e.g. check only one operand
instead of two) or the transformation can be avoided. For example, it is quite
often the case that a divisor is promoted from a shorter type and run-time
checks for it are redundant.
With additional compile-time analysis of values, two special cases naturally
arise and are addressed by the patch:
1) Both operands are known to be short enough. Then, the long division can be
simply replaced with a short one without CFG modification.
2) If a division is unsigned and the dividend is known to be short then the
long division is not needed at all. Because if the divisor is too big for
short division then the quotient is obviously zero (and the remainder is
equal to the dividend). Actually, the division is not needed when
(divisor > dividend).
Differential Revision: https://reviews.llvm.org/D29897
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296832 91177308-0d34-0410-b5e6-96231b3b80d8
The most important goal of the patch is to break large insertFastDiv function
into separate pieces, so that later a different fast insertion logic can be
implemented using some of these pieces.
Differential Revision: https://reviews.llvm.org/D29896
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296828 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below
```
extern int bar();
extern int baz();
int foo(int x, int y) {
if (x != y)
return bar();
else
return baz();
}
```
, following is the bitcode representation of 'foo' at the end of llvm-ir level optimization:
```
define i32 @foo(i32 %x, i32 %y) !dbg !4 {
entry:
tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12
tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13
%cmp = icmp ne i32 %x, %y, !dbg !14
br i1 %cmp, label %if.then, label %if.else, !dbg !16
if.then: ; preds = %entry
%call = tail call i32 (...) @bar() #3, !dbg !17
br label %return, !dbg !18
if.else: ; preds = %entry
%call1 = tail call i32 (...) @baz() #3, !dbg !19
br label %return, !dbg !20
return: ; preds = %if.else, %if.then
%retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
ret i32 %retval.0, !dbg !21
}
!14 = !DILocation(line: 5, column: 9, scope: !15)
!16 = !DILocation(line: 5, column: 7, scope: !4)
```
As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue.
Reviewers: atrick, bogner, andreadb, craig.topper, aprantl
Reviewed By: andreadb
Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D29813
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296825 91177308-0d34-0410-b5e6-96231b3b80d8
Outlining optional branches isn't a good heuristic, and it's never been
on by default. Remove it to clean things up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296818 91177308-0d34-0410-b5e6-96231b3b80d8
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation
as LastOp.
This patch fixes some cases where we would move stores to the wrong
insert point.
Re-commit with a fix to increment NumMove in the right place.
Differential Revision: https://reviews.llvm.org/D30124
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296815 91177308-0d34-0410-b5e6-96231b3b80d8
and also "clang-format GenericDomTreeConstruction.h, since the current
formatting makes it look like their is a bug in the loop indentation, and there
is not"
This reverts commit r296535.
There are still some open design questions which I would like to discuss. I
revert this for Daniel (who gave the OK), as he is on vacation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296812 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes pr32063.
Current code in PPCTargetLowering::PerformDAGCombine can transform
bswap
store
into a single PPCISD::STBRX instruction. but it doesn't consider the case that the operand size of bswap may be larger than store size. When it occurs, we need 2 modifications,
1 For the last operand of PPCISD::STBRX, we should not use DAG.getValueType(N->getOperand(1).getValueType()), instead we should use cast<StoreSDNode>(N)->getMemoryVT().
2 Before PPCISD::STBRX, we need to shift the original operand of bswap to the right side.
Differential Revision: https://reviews.llvm.org/D30362
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296811 91177308-0d34-0410-b5e6-96231b3b80d8
This patch extends the current functionality of the AArch64 redundant copy
elimination pass to handle non-zero cases such as:
BB#0:
cmp x0, #1
b.eq .LBB0_1
.LBB0_1:
orr x0, xzr, #0x1 ; <-- redundant copy; x0 known to hold #1.
Differential Revision: https://reviews.llvm.org/D29344
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296809 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for struct return values to the MSP430
target backend. It also reverses the order of argument and return
registers in the calling convention to bring it into closer
alignment with the published EABI from TI.
Patch by Andrew Wygle (awygle).
Differential Revision: https://reviews.llvm.org/D29069
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296807 91177308-0d34-0410-b5e6-96231b3b80d8
Make opcode selection code for the load instruction a bit easier
to read and maintain.
This patch also catches number of f16 load/store variants that were
not handled before.
Differential Revision: https://reviews.llvm.org/D30513
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296785 91177308-0d34-0410-b5e6-96231b3b80d8
MMX extraction often ends up as extract_i32(bitcast_v2i32(extract_i64(bitcast_v1i64(x86mmx v), 0)), 0) which fails to simplify on 32-bit targets as i64 isn't legal
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296782 91177308-0d34-0410-b5e6-96231b3b80d8
Original commit message:
"Allow externally dlopen-ed libraries to be registered as permanent libraries.
This is also useful in cases when llvm is in a shared library. First we dlopen
the llvm shared library and then we register it as a permanent library in order
to keep the JIT and other services working.
Patch reviewed by Vedant Kumar (D29955)!"
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296774 91177308-0d34-0410-b5e6-96231b3b80d8
This patch reduces the stack frame size by not allocating the parameter area if
it is not required. In the current implementation LowerFormalArguments_64SVR4
already handles the parameter area, but LowerCall_64SVR4 does not
(when calculating the stack frame size). What this patch does is make
LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4.
Committing on behalf of Hiroshi Inoue.
Differential Revision: https://reviews.llvm.org/D29881
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296771 91177308-0d34-0410-b5e6-96231b3b80d8
This bug was introduced with:
https://reviews.llvm.org/rL296699
There may be a way to loosen the restriction, but for now just bail out
on any opaque constant.
The tests show that opacity is target-specific. This goes back to cost
calculations in ConstantHoisting based on TTI->getIntImmCost().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296768 91177308-0d34-0410-b5e6-96231b3b80d8
This re-applies r289696, which caused TSan perf regression, which has
since been addressed in separate changes (see PR for details).
See PR31382.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296759 91177308-0d34-0410-b5e6-96231b3b80d8
The CallingConv.td rules allocate 8 bytes for these kinds of arguments
on AAPCS targets, but we were only recording the smaller amount. The
difference is theoretical on AArch64 because we don't actually store
more than the smaller amount, but it's still much better to have these
two components in agreement.
Based on Diana Picus's ARM equivalent patch (where it matters a lot
more).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296754 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When InstCombine is optimizing certain select-cmp-br patterns
it replaces the result of the select in uses outside of the
basic block containing the select. This is only legal if the
path from the select to the outside use is disjoint from all
other paths out from the originating basic block.
The problem found was that InstCombiner::replacedSelectWithOperand
did not consider the case when both edges out from the br pointed
to the same label. In that case the paths aren't disjoint and the
transformation is illegal. This patch avoids the faulty rewrites
by verifying that there is a single flow to the successor where
we want to replace uses.
Reviewers: llvm-commits, spatel, majnemer
Differential Revision: https://reviews.llvm.org/D30455
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296752 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches (ARM|AArch64)ISelLowering.cpp to match illegal vector types
to interleaved access intrinsics as long as the types are multiples of the
vector register width. A "wide" access will now be mapped to multiple
interleave intrinsics similar to the way in which non-interleaved accesses with
illegal types are legalized into multiple accesses. I'll update the associated
TTI costs (in getInterleavedMemoryOpCost) as a follow-on.
Differential Revision: https://reviews.llvm.org/D29466
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296750 91177308-0d34-0410-b5e6-96231b3b80d8