------------------------------------------------------------------------
r247435 | david.majnemer | 2015-09-11 13:34:34 -0400 (Fri, 11 Sep 2015) | 8 lines
[X86] Make sure startproc/endproc are paired
We used different conditions to determine if we should emit startproc vs
endproc. Use the same condition to ensure that they will always be
paired.
This fixes PR24374.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@253742 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r243462 | Matthew.Arsenault | 2015-07-28 14:47:00 -0400 (Tue, 28 Jul 2015) | 5 lines
AMDGPU: Don't try to use LDS/vector for private if pointer value stored
If the pointer is the store's value operand, this would produce
a broken module. Make sure the use is actually for the pointer operand.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@253228 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r243461 | Matthew.Arsenault | 2015-07-28 14:29:14 -0400 (Tue, 28 Jul 2015) | 5 lines
AMDGPU: Fix crash if called function is a bitcast
getCalledFunction() is null, so this would crash. Replace
crash with an error on unsupported call.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@253227 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r251582 | hfinkel | 2015-10-28 19:43:00 -0400 (Wed, 28 Oct 2015) | 11 lines
[PowerPC] Recurse through constants when looking for TLS globals
We cannot form ctr-based loops around function calls, including calls to
__tls_get_addr used for PIC TLS variables. References to such TLS variables,
however, might be buried within constant expressions, and so we need to search
the entire constant expression to be sure that no references to such TLS
variables exist.
Fixes PR25256, reported by Eric Schweitz. This is a slightly-modified version
of the patch suggested by Eric in the bug report, and a test case I created.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@253120 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r245862 | wschmidt | 2015-08-24 15:27:27 -0400 (Mon, 24 Aug 2015) | 8 lines
[PPC64LE] Fix PR24546 - Swap optimization and debug values
This patch fixes PR24546, which demonstrates a segfault during the VSX
swap removal pass. The problem is that debug value instructions were
not excluded from the list of instructions to be analyzed for webs of
related computation. I've added the test case from the PR as a crash
test in test/CodeGen/PowerPC.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@252850 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r247083 | echristo | 2015-09-08 18:14:58 -0400 (Tue, 08 Sep 2015) | 3 lines
Fix the PPC CTR Loop pass to look for calls to the intrinsics that
read CTR and count them as reading the CTR.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@252511 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r250085 | Andrea_DiBiagio | 2015-10-12 15:22:30 -0400 (Mon, 12 Oct 2015) | 60 lines
[x86] Fix wrong lowering of vsetcc nodes (PR25080).
Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong
assumption that for non-AVX512 targets, the source type and destination type
of a type-legalized setcc node were always the same type.
This assumption was unfortunately incorrect; the type legalizer is not always
able to promote the return type of a setcc to the same type as the first
operand of a setcc.
In the case of a vsetcc node, the legalizer firstly checks if the first input
operand has a legal type. If so, then it promotes the return type of the vsetcc
to that same type. Otherwise, the return type is promoted to the 'next legal
type', which, for vectors of MVT::i1 is always a 128-bit integer vector type.
Example (-mattr=+avx):
%0 = trunc <8 x i32> %a to <8 x i23>
%1 = icmp eq <8 x i23> %0, zeroinitializer
The initial selection dag for the code above is:
v8i1 = setcc t5, t7, seteq:ch
t5: v8i23 = truncate t2
t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1
t7: v8i32 = build_vector of all zeroes.
The type legalizer would firstly check if 't5' has a legal type. If so, then it
would reuse that same type to promote the return type of the setcc node.
Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to
promote the return type of the setcc node. Consequently, the setcc return type
is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the
following dag node:
v8i16 = setcc t32, t25, seteq:ch
where t32 and t25 are now values of type v8i32.
Before this patch, function LowerVSETCC would have wrongly expanded the setcc
to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an
instruction. In our case, ISel would have matched a VPCMPEQWrr:
t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25
However, t36 and t25 are both VR256, while the result type is instead of class
VR128. This inconsistency ended up causing the insertion of COPY instructions
like this:
%vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3
Which is an invalid full copy (not a sub register copy).
Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy
instruction" in the attempt to expand the malformed pseudo COPY instructions.
This patch fixes the problem adding the missing logic in LowerVSETCC to handle
the corner case of a setcc with 128-bit return type and 256-bit operand type.
This problem was originally reported by Dimitry as PR25080. It has been latent
for a very long time. I have added the minimal reproducible from that bugzilla
as test setcc-lowering.ll.
Differential Revision: http://reviews.llvm.org/D13660
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@252484 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r250324 | wschmidt | 2015-10-14 16:45:00 -0400 (Wed, 14 Oct 2015) | 10 lines
[PowerPC] Fix invalid lxvdsx optimization (PR25157)
PR25157 identifies a bug where a load plus a vector shuffle is
incorrectly converted into an LXVDSX instruction. That optimization
is only valid if the load is of a doubleword, and in the noted case,
it was not. This corrects that problem.
Joint patch with Eric Schweitz, who provided the bugpoint-reduced test
case.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@252483 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r246937 | hfinkel | 2015-09-06 00:17:30 -0400 (Sun, 06 Sep 2015) | 13 lines
[PowerPC] Don't commute trivial rlwimi instructions
To commute a trivial rlwimi instructions (meaning one with a full mask and zero
shift), we'd need to ability to form an all-zero mask (instead of an all-one
mask) using rlwimi. We can't represent this, however, and we'll miscompile code
if we try.
The code quality problem that this highlights (that SDAG simplification can
lead to us generating an ISD::OR node with a constant zero LHS) will be fixed
as a follow-up.
Fixes PR24719.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@252481 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r246900 | hfinkel | 2015-09-04 20:02:59 -0400 (Fri, 04 Sep 2015) | 14 lines
[PowerPC] Fix and(or(x, c1), c2) -> rlwimi generation
PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from
an input pattern that looks like this:
and(or(x, c1), c2)
but the associated logic does not work if there are bits that are 1 in c1 but 0
in c2 (these are normally canonicalized away, but that can't happen if the 'or'
has other users. Make sure we abort the transformation if such bits are
discovered.
Fixes PR24704.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@252480 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r246675 | hfinkel | 2015-09-02 12:52:37 -0400 (Wed, 02 Sep 2015) | 9 lines
[PowerPC] Don't always consider P8Altivec-only masks in LowerVECTOR_SHUFFLE
LowerVECTOR_SHUFFLE needs to decide whether to pass a vector shuffle off to the
TableGen-generated matching code, and it does this by testing the same
predicates used by the TableGen files. Unfortunately, when we added new
P8Altivec-only predicates, we started universally testing them in
LowerVECTOR_SHUFFLE, and if then matched when targeting a system prior to a P8,
we'd end up with a selection failure.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@252479 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r246400 | hfinkel | 2015-08-30 18:12:50 -0400 (Sun, 30 Aug 2015) | 20 lines
[PowerPC] Fixup SELECT_CC (and SETCC) patterns with i1 comparison operands
There were really two problems here. The first was that we had the truth tables
for signed i1 comparisons backward. I imagine these are not very common, but if
you have:
setcc i1 x, y, LT
this has the '0 1' and the '1 0' results flipped compared to:
setcc i1 x, y, ULT
because, in the signed case, '1 0' is really '-1 0', and the answer is not the
same as in the unsigned case.
The second problem was that we did not have patterns (at all) for the unsigned
comparisons select_cc nodes for i1 comparison operands. This was the specific
cause of PR24552. These had to be added (and a missing Altivec promotion added
as well) to make sure these function for all types. I've added a bunch more
test cases for these patterns, and there are a few FIXMEs in the test case
regarding code-quality.
Fixes PR24552.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@252478 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r245907 | hfinkel | 2015-08-24 19:48:28 -0400 (Mon, 24 Aug 2015) | 6 lines
[PowerPC] PPCVSXFMAMutate should ignore trivial-copy addends
We might end up with a trivial copy as the addend, and if so, we should ignore
the corresponding FMA instruction. The trivial copy can be coalesced away later,
so there's nothing to do here. We should not, however, assert. Fixes PR24544.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@252476 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r251622 | vkalintiris | 2015-10-29 10:17:16 +0000 (Thu, 29 Oct 2015) | 17 lines
[mips] Check the register class before replacing materializations of zero with $zero in microMIPS.
Summary:
The microMIPS register class GPRMM16 does not contain the $zero register.
However, MipsSEDAGToDAGISel::replaceUsesWithZeroReg() would replace uses
of the $dst register:
[d]addiu, $dst, $zero, 0
with the $zero register, without checking for membership in the register
class of the target machine operand.
Reviewers: dsanders
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D13984
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@252158 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r245741 | hfinkel | 2015-08-21 17:34:24 -0400 (Fri, 21 Aug 2015) | 8 lines
[PowerPC] PPCVSXFMAMutate should not segfault on undef input registers
When PPCVSXFMAMutate would look at the input addend register, it would get its
input value number. This would fail, however, if the register was undef,
causing a segfault. Don't segfault (just skip such FMA instructions).
Fixes the test case from PR24542 (although that may have been over-reduced).
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@252132 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r249718 | ast | 2015-10-08 11:52:40 -0700 (Thu, 08 Oct 2015) | 16 lines
[bpf] Do not expand UNDEF SDNode during insn selection lowering
o Before this patch, BPF backend will expand UNDEF node
to i64 constant 0.
o For second pass of dag combiner, legalizer will run through
each to-be-processed dag node.
o If any new SDNode is generated and has an undef operand,
dag combiner will put undef node, newly-generated constant-0 node,
and any node which uses these nodes in the working list.
o During this process, it is possible undef operand is
generated again, and this will form an infinite loop
for dag combiner pass2.
o This patch allows UNDEF to be a legal type.
Signed-off-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@251177 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r249371 | ast | 2015-10-05 21:00:53 -0700 (Mon, 05 Oct 2015) | 25 lines
[bpf] Avoid extra pointer arithmetic for stack access
For the program like below
struct key_t {
int pid;
char name[16];
};
extern void test1(char *);
int test() {
struct key_t key = {};
test1(key.name);
return 0;
}
For key.name, the llc/bpf may generate the below code:
R1 = R10 // R10 is the frame pointer
R1 += -24 // framepointer adjustment
R1 |= 4 // R1 is then used as the first parameter of test1
OR operation is not recognized by in-kernel verifier.
This patch introduces an intermediate FI_ri instruction and
generates the following code that can be properly verified:
R1 = R10
R1 += -20
Patch by Yonghong Song <yhs@plumgrid.com>
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@251175 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r247128 | dsanders | 2015-09-09 10:53:20 +0100 (Wed, 09 Sep 2015) | 31 lines
Fix vector splitting for extract_vector_elt and vector elements of <8-bits.
Summary:
One of the vector splitting paths for extract_vector_elt tries to lower:
define i1 @via_stack_bug(i8 signext %idx) {
%1 = extractelement <2 x i1> <i1 false, i1 true>, i8 %idx
ret i1 %1
}
to:
define i1 @via_stack_bug(i8 signext %idx) {
%base = alloca <2 x i1>
store <2 x i1> <i1 false, i1 true>, <2 x i1>* %base
%2 = getelementptr <2 x i1>, <2 x i1>* %base, i32 %idx
%3 = load i1, i1* %2
ret i1 %3
}
However, the elements of <2 x i1> are not byte-addressible. The result of this
is that the getelementptr expands to '%base + %idx * (1 / 8)' which simplifies
to '%base + %idx * 0', and then simply '%base' causing all values of %idx to
extract element zero.
This commit fixes this by promoting the vector elements of <8-bits to i8 before
splitting the vector.
This fixes a number of test failures in pocl.
Reviewers: pekka.jaaskelainen
Subscribers: pekka.jaaskelainen, llvm-commits
Differential Revision: http://reviews.llvm.org/D12591
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@247539 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r245535 | hfinkel | 2015-08-19 20:02:02 -0700 (Wed, 19 Aug 2015) | 6 lines
[PowerPC] Fix value type on XVCMPEQDP for v2f64 comparisons
XVCMPEQDP is used for VSX v2f64 equality comparisons, but the value type needs
to be v2i64 (as that's the corresponding SETCC type).
Fixes PR24225.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@245574 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r245530 | hfinkel | 2015-08-19 18:18:20 -0700 (Wed, 19 Aug 2015) | 5 lines
[PowerPC] Fix the int2fp(fp2int(x)) DAGCombine to ignore ppc_fp128
This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128
operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)),
but shouldn't (it should only apply to f32/f64 types). The result was a crash.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@245573 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r229099 in branch 37 only, because it caused PR24292.
I'll continue investigating and will fix on trunk, but being an optimization
change, we can let the rest of the release go without this one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@245568 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r244889 | uweigand | 2015-08-13 06:37:06 -0700 (Thu, 13 Aug 2015) | 22 lines
[SystemZ] Support large LLVM IR struct return values
Recent mesa/llvmpipe crashes on SystemZ due to a failed assertion when
attempting to compile a routine with a return type of
{ <4 x float>, <4 x float>, <4 x float>, <4 x float> }
on a system without vector instruction support.
This is because after legalizing the vector type, we get a return value
consisting of 16 floats, which cannot all be returned in registers.
Usually, what should happen in this case is that the target's CanLowerReturn
routine rejects the return type, in which case SelectionDAG falls back to
implementing a structure return in memory via implicit reference.
However, the SystemZ target never actually implemented any CanLowerReturn
routine, and thus would accept any struct return type.
This patch fixes the crash by implementing CanLowerReturn. As a side effect,
this also handles fp128 return values, fixing a todo that was noted in
SystemZCallingConv.td.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@244909 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r243984 | vkalintiris | 2015-08-04 07:26:35 -0700 (Tue, 04 Aug 2015) | 11 lines
Revert r229675 - [mips] Avoid redundant sign extension of the result of binary bitwise instructions.
It introduced two regressions on 64-bit big-endian targets running under N32
(MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4, and
MultiSource/Applications/kimwitu++/kc) The issue is that on 64-bit targets
comparisons such as BEQ compare the whole GPR64 but incorrectly tell the
instruction selector that they operate on GPR32's. This leads to the
elimination of i32->i64 extensions that are actually required by
comparisons to work correctly.
There's currently a patch under review that fixes this problem.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@244096 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r243057 | spatel | 2015-07-23 15:56:53 -0700 (Thu, 23 Jul 2015) | 16 lines
fix crash in machine trace metrics due to processing dbg_value instructions (PR24199)
The test in PR24199 ( https://llvm.org/bugs/show_bug.cgi?id=24199 ) crashes because machine
trace metrics was not ignoring dbg_value instructions when calculating data dependencies.
The machine-combiner pass asks machine trace metrics to calculate an instruction trace,
does some reassociations, and calls MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
along with MachineTraceMetrics::invalidate(). The dbg_value instructions have their operands
invalidated, but the instructions are not expected to be deleted.
On a subsequent loop iteration of the machine-combiner pass, machine trace metrics would be
called again and die while accessing the invalid debug instructions.
Differential Revision: http://reviews.llvm.org/D11423
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@243662 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r243638 | vkalintiris | 2015-07-30 05:39:33 -0700 (Thu, 30 Jul 2015) | 12 lines
[mips][FastISel] Remove hidden mips-fast-isel option.
Summary:
This hidden option would disable code generation through FastISel by
default. It was removed from the available options and from the
Fast-ISel tests that required it in order to run the tests.
Reviewers: dsanders
Subscribers: qcolombet, llvm-commits
Differential Revision: http://reviews.llvm.org/D11610
------------------------------------------------------------------------
------------------------------------------------------------------------
r243640 | vkalintiris | 2015-07-30 06:13:09 -0700 (Thu, 30 Jul 2015) | 5 lines
[mips] Fix out-of-date debug information in test file.
Update the debug info in the check-lines because the change in r243638
introduced a constant initialization before the prologue's end as part
of a register spill.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@243650 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r243636 | vkalintiris | 2015-07-30 04:51:44 -0700 (Thu, 30 Jul 2015) | 34 lines
[mips][FastISel] Apply only zero-extension to constants prior to their materialization.
Summary:
Previously, we would sign-extend non-boolean negative constants and
zero-extend otherwise. This was problematic for PHI instructions with
negative values that had a type with bitwidth less than that of the
register used for materialization.
More specifically, ComputePHILiveOutRegInfo() assumes the constants
present in a PHI node are zero extended in their container and
afterwards deduces the known bits.
For example, previously we would materialize an i16 -4 with the
following instruction:
addiu $r, $zero, -4
The register would end-up with the 32-bit 2's complement representation
of -4. However, ComputePHILiveOutRegInfo() would generate a constant
with the upper 16-bits set to zero. The SelectionDAG builder would use
that information to generate an AssertZero node that would remove any
subsequent trunc & zero_extend nodes.
In theory, we should modify ComputePHILiveOutRegInfo() to consult
target-specific hooks about the way they prefer to materialize the
given constants. However, git-blame reports that this specific code
has not been touched since 2011 and it seems to be working well for every
target so far.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11592
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@243648 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r243485 | vkalintiris | 2015-07-28 14:43:31 -0700 (Tue, 28 Jul 2015) | 12 lines
[mips][FastISel] Fix call lowering by bailing out on "fastcc" calls.
Summary:
Currently, we support only the MIPS O32 ABI calling convention for call
lowering. With this change we avoid using the O32 calling convetion for
lowering calls marked as using the fast calling convention.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11515
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@243647 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r243519 | wschmidt | 2015-07-29 07:31:57 -0700 (Wed, 29 Jul 2015) | 14 lines
[PPC] Fix PR24216: Don't generate splat for misaligned shuffle mask
Given certain shuffle-vector masks, LLVM emits splat instructions
which splat the wrong bytes from the source register. The issue is
that the function PPC::isSplatShuffleMask() in PPCISelLowering.cpp
does not ensure that the splat pattern found is requesting bytes that
are aligned on an EltSize boundary. This patch detects this situation
as not a valid splat mask, resulting in a permute being generated
instead of a splat.
Patch and test case by Tyler Kenney, cleaned up a bit by me.
This is a simple bug fix that would be good to incorporate into 3.7.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@243528 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r243500 | spatel | 2015-07-28 16:28:22 -0700 (Tue, 28 Jul 2015) | 16 lines
ignore duplicate divisor uses when transforming into reciprocal multiplies (PR24141)
PR24141: https://llvm.org/bugs/show_bug.cgi?id=24141
contains a test case where we have duplicate entries in a node's uses() list.
After r241826, we use CombineTo() to delete dead nodes when combining the uses into
reciprocal multiplies, but this fails if we encounter the just-deleted node again in
the list.
The solution in this patch is to not add duplicate entries to the list of users that
we will subsequently iterate over. For the test case, this avoids triggering the
combine divisors logic entirely because there really is only one user of the divisor.
Differential Revision: http://reviews.llvm.org/D11345
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@243524 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r243361 | spatel | 2015-07-27 17:48:32 -0700 (Mon, 27 Jul 2015) | 17 lines
fix invalid load folding with SSE/AVX FP logical instructions (PR22371)
This is a follow-up to the FIXME that was added with D7474 ( http://reviews.llvm.org/rL229531 ).
I thought this load folding bug had been made hard-to-hit, but it turns out to be very easy
when targeting 32-bit x86 and causes a miscompile/crash in Wine:
https://bugs.winehq.org/show_bug.cgi?id=38826https://llvm.org/bugs/show_bug.cgi?id=22371#c25
The quick fix is to simply remove the scalar FP logical instructions from the load folding table
in X86InstrInfo, but that causes us to miss load folds that should be possible when lowering fabs,
fneg, fcopysign. So the majority of this patch is altering those lowerings to use *vector* FP
logical instructions (because that's all x86 gives us anyway). That lets us do the load folding
legally.
Differential Revision: http://reviews.llvm.org/D11477
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@243435 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r243294 | mareko | 2015-07-27 11:16:08 -0700 (Mon, 27 Jul 2015) | 9 lines
AMDGPU: don't match vgpr loads for constant loads
Author: Dave Airlie <airlied@redhat.com>
In order to implement indirect sampler loads, we don't
want to match on a VGPR load but an SGPR one for constants,
as we cannot feed VGPRs to the sampler only SGPRs.
this should be applicable for llvm 3.7 as well.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@243317 91177308-0d34-0410-b5e6-96231b3b80d8
(r242742 is the interesting patch here, but I picked the others too to get a clean
merge since there's been some back-and-forth on this file.)
------------------------------------------------------------------------
r242733 | matze | 2015-07-20 16:17:14 -0700 (Mon, 20 Jul 2015) | 3 lines
Revert "ARM: Use SpecificBumpPtrAllocator to fix leak introduced in r241920"
This reverts commit r241951. It caused http://llvm.org/PR24190
------------------------------------------------------------------------
------------------------------------------------------------------------
r242734 | matze | 2015-07-20 16:17:16 -0700 (Mon, 20 Jul 2015) | 3 lines
Revert "ARMLoadStoreOpt: Merge subs/adds into LDRD/STRD; Factor out common code"
This reverts commit r241928. This caused http://llvm.org/PR24190
------------------------------------------------------------------------
------------------------------------------------------------------------
r242735 | matze | 2015-07-20 16:17:20 -0700 (Mon, 20 Jul 2015) | 3 lines
Revert "ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2"
This reverts commit r241926. This caused http://llvm.org/PR24190
------------------------------------------------------------------------
------------------------------------------------------------------------
r242742 | matze | 2015-07-20 17:18:59 -0700 (Mon, 20 Jul 2015) | 7 lines
ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2
Re-apply r241926 with an additional check that r13 and r15 are not used
for LDRD/STRD. See http://llvm.org/PR24190. This also already includes
the fix from r241951.
Differential Revision: http://reviews.llvm.org/D10623
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@242907 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r242673 | tstellar | 2015-07-20 07:28:41 -0700 (Mon, 20 Jul 2015) | 11 lines
AMDGPU/SI: Add VI patterns to select FLAT instructions for global memory ops
Summary:
The MUBUF addr64 bit has been removed on VI, so we must use FLAT
instructions when the pointer is stored in VGPRs.
Reviewers: arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11067
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@242685 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r242442 | wschmidt | 2015-07-16 14:14:07 -0700 (Thu, 16 Jul 2015) | 14 lines
[PowerPC] v4i32 is a VSRCRegClass
I was looking at some vector code generation and kept seeing
unnecessary vector copies into the Altivec half of the VSX registers.
I discovered that we overlooked v4i32 when adding the register classes
for VSX; we only added v4f32 and v2f64. This means that anything that
canonicalizes into v4i32 (which is a LOT of stuff) ends up being
forced into VRRC on its way to VSRC.
The fix is one line. The rest of the patch is fixing up some test
cases whose code generation has changed as a result.
This seems like it would be a good candidate for backport to 3.7.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@242447 91177308-0d34-0410-b5e6-96231b3b80d8
------------------------------------------------------------------------
r242239 | hfinkel | 2015-07-14 15:53:11 -0700 (Tue, 14 Jul 2015) | 4 lines
[PowerPC] Support symbolic targets in patchpoints
Follow-up r235483, with the corresponding support in PPC. We use a regular call
for symbolic targets (because they're much cheaper than indirect calls).
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_37@242325 91177308-0d34-0410-b5e6-96231b3b80d8
We used to take the address specified as the direct target of the patchpoint
and did no TOC-pointer handling. This, however, as not all that useful,
because MCJIT tends to create a lot of modules, and they have their own TOC
sections. Thus, to call from the generated code to other generated code, you
really need to switch TOC pointers. Make this work as expected, and under
ELFv1, tread the address as the function descriptor address so that the correct
TOC pointer can be loaded.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242217 91177308-0d34-0410-b5e6-96231b3b80d8
PowerPC uses itineraries to describe processor pipelines (and dispatch-group
restrictions for P7/P8 cores). Unfortunately, the target-independent
implementation of TII.getInstrLatency calls ItinData->getStageLatency, and that
looks for the largest cycle count in the pipeline for any given instruction.
This, however, yields the wrong answer for the PPC itineraries, because we
don't encode the full pipeline. Because the functional units are fully
pipelined, we only model the initial stages (there are no relevant hazards in
the later stages to model), and so the technique employed by getStageLatency
does not really work. Instead, we should take the maximum output operand
latency, and that's what PPCInstrInfo::getInstrLatency now does.
This caused some test-case churn, including two unfortunate side effects.
First, the new arrangement of copies we get from function parameters now
sometimes blocks VSX FMA mutation (a FIXME has been added to the code and the
test cases), and we have one significant test-suite regression:
SingleSource/Benchmarks/BenchmarkGame/spectral-norm
56.4185% +/- 18.9398%
In this benchmark we have a loop with a vectorized FP divide, and it with the
new scheduling both divides end up in the same dispatch group (which in this
case seems to cause a problem, although why is not exactly clear). The grouping
structure is hard to predict from the bottom of the loop, and there may not be
much we can do to fix this.
Very few other test-suite performance effects were really significant, but
almost all weakly favor this change. However, in light of the issues
highlighted above, I've left the old behavior available via a
command-line flag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242188 91177308-0d34-0410-b5e6-96231b3b80d8
Convert logical operations on general-purpose registers to the correspon-
ding operations on predicate registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242186 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Before this change, personality directives were not emitted
if there was no invoke left in the function (of course until
recently this also meant that we couldn't know what
the personality actually was). This patch forces personality directives
to still be emitted, unless it is known to be a noop in the absence of
invokes, or the user explicitly specified `nounwind` (and not
`uwtable`) on the function.
Reviewers: majnemer, rnk
Subscribers: rnk, llvm-commits
Differential Revision: http://reviews.llvm.org/D10884
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242185 91177308-0d34-0410-b5e6-96231b3b80d8
This can be done only with moves which theoretically
will optimize better later.
Although this transform increases the instruction count,
it should be code size / cycle count neutral in the worst
VALU case. It also seems to slightly improve a couple
of testcases due to other DAG combines this exposes.
This is probably slightly worse for the SALU case, so
it might be better to handle this during moveToVALU,
although then you lose some simplifications like
the load width reducing in the simple testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242177 91177308-0d34-0410-b5e6-96231b3b80d8
If the read2 produced was supposed to be writing into a
super register, it would use the wrong subregister indices.
Fix this by inserting copies, so we only ever write to a vreg_64.
Run the register coalescer again to clean this up, although this
isn't ideal and often does result in an extra move.
Also remove the assert that offset1 > offset0.
There isn't a real reason to not allow this other than a minor
convenience in the compiler, and it doesn't seem worth the effort
of avoiding it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242174 91177308-0d34-0410-b5e6-96231b3b80d8