This is the first of several steps to incorporate information from the new
TargetTransformInfo infrastructure into BBVectorize. Two things are done here:
1. Target information is used to determine if it is profitable to fuse two
instructions. This means that the cost of the vector operation must not
be more expensive than the cost of the two original operations. Pairs that
are not profitable are no longer considered (because current cost information
is incomplete, for intrinsics for example, equal-cost pairs are still
considered).
2. The 'cost savings' computed for the profitability check are also used to
rank the DAGs that represent the potential vectorization plans. Specifically,
for nodes of non-trivial depth, the cost savings is used as the node
weight.
The next step will be to incorporate the shuffle costs into the DAG weighting;
this will give the edges of the DAG weights as well. Once that is done, when
target information is available, we should be able to dispense with the
depth heuristic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166716 91177308-0d34-0410-b5e6-96231b3b80d8
The isValueEqualityComparison() guard at the top of SimplifySwitch()
only applies to some of the possible transformations.
The newer transformations work just fine on large switches, and the
check on predecessor count is nonsensical.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166710 91177308-0d34-0410-b5e6-96231b3b80d8
structs having size 3, 5, 6, or 7. Such a struct must be passed and received
as right-justified within its register or memory slot. The problem is only
present for structs that are passed in registers.
Previously, as part of a patch handling all structs of size less than 8, I
added logic to rotate the incoming register so that the struct was left-
justified prior to storing the whole register. This was incorrect because
the address of the parameter had already been adjusted earlier to point to
the right-adjusted value in the storage slot. Essentially I had accidentally
accounted for the right-adjustment twice.
In this patch, I removed the incorrect logic and reorganized the code to make
the flow clearer.
The removal of the rotates changes the expected code generation, so test case
structsinregs.ll has been modified to reflect this. I also added a new test
case, jaggedstructs.ll, to demonstrate that structs of these sizes can now
be properly received and passed.
I've built and tested the code on powerpc64-unknown-linux-gnu with no new
regressions. I also ran the GCC compatibility test suite and verified that
earlier problems with these structs are now resolved, with no new regressions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166680 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds initial PPC64 TOC MC object creation using the small mcmodel
(a single 64K TOC) adding the some TOC relocations (R_PPC64_TOC,
R_PPC64_TOC16, and R_PPC64_TOC16DS).
The addition of 'undefinedExplicitRelSym' hook on 'MCELFObjectTargetWriter'
is meant to avoid the creation of an unreferenced ".TOC." symbol (used in
the .odp creation) as well to set the R_PPC64_TOC relocation target as the
temporary ".TOC." symbol. On PPC64 ABI, the R_PPC64_TOC relocation should
not point to any symbol.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166677 91177308-0d34-0410-b5e6-96231b3b80d8
smaller integer loads and stores.
The high-level motivation is that the frontend sometimes generates
a single whole-alloca integer load or store during ABI lowering of
splittable allocas. We need to be able to break this apart in order to
see the underlying elements and properly promote them to SSA values. The
hope is that this fixes some performance regressions on x86-32 with the
new SROA pass.
Unfortunately, this causes quite a bit of churn in the test cases, and
bloats some IR that comes out. When we see an alloca that consists soley
of bits and bytes being extracted and re-inserted, we now do some
splitting first, before building widened integer "bucket of bits"
representations. These are always well folded by instcombine however, so
this shouldn't actually result in missed opportunities.
If this splitting of all-integer allocas does cause problems (perhaps
due to smaller SSA values going into the RA), we could potentially go to
some extreme measures to only do this integer splitting trick when there
are non-integer component accesses of an alloca, but discovering this is
quite expensive: it adds yet another complete walk of the recursive use
tree of the alloca.
Either way, I will be watching build bots and LNT bots to see what
fallout there is here. If anyone gets x86-32 numbers before & after this
change, I would be very interested.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166662 91177308-0d34-0410-b5e6-96231b3b80d8
into a sbc with a positive number, the immediate should be complemented, not
negated. Also added a missing pattern for ARM codegen.
rdar://12559385
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166613 91177308-0d34-0410-b5e6-96231b3b80d8
When the trip count is -1, getSmallConstantTripMultiple could return zero,
and this would cause runtime loop unrolling to assert. Instead of returning
zero, one is now returned (consistent with the existing overflow cases).
Fixes PR14167.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166612 91177308-0d34-0410-b5e6-96231b3b80d8
- If more than 1 elemennts are defined and target supports the vectorized
conversion, use the vectorized one instead to reduce the strength on
conversion operation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166546 91177308-0d34-0410-b5e6-96231b3b80d8
- As there's no 64-bit GPRs in 32-bit mode, a custom conversion from v2u32 to
v2f32 is added to improve the efficiency of the code generated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166545 91177308-0d34-0410-b5e6-96231b3b80d8
the difference from "int x" (which should go in registers and
"struct y {int x;}" (which should not).
Clang will be updated in the next patches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166536 91177308-0d34-0410-b5e6-96231b3b80d8
- Check index being extracted to be constant 0 before simplfiying.
Otherwise, retain the original sequence.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166504 91177308-0d34-0410-b5e6-96231b3b80d8
loads. It's not really profitable and may result in GVN going into an infinite
loop when it hits constructs like this:
%x = gep %some.type %x, ...
Found via an LTO build of LLVM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166490 91177308-0d34-0410-b5e6-96231b3b80d8
%V = mul i64 %N, 4
%t = getelementptr i8* bitcast (i32* %arr to i8*), i32 %V
into
%t1 = getelementptr i32* %arr, i32 %N
%t = bitcast i32* %t1 to i8*
incorporating the multiplication into the getelementptr.
This happens all the time in dragonegg, for example for
int foo(int *A, int N) {
return A[N];
}
because gcc turns this into byte pointer arithmetic before it hits the plugin:
D.1590_2 = (long unsigned int) N_1(D);
D.1591_3 = D.1590_2 * 4;
D.1592_5 = A_4(D) + D.1591_3;
D.1589_6 = *D.1592_5;
return D.1589_6;
The D.1592_5 line is a POINTER_PLUS_EXPR, which is turned into a getelementptr
on a bitcast of A_4 to i8*, so this becomes exactly the kind of IR that the
transform fires on.
An analogous transform (with no testcases!) already existed for bitcasts of
arrays, so I rewrote it to share code with this one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166474 91177308-0d34-0410-b5e6-96231b3b80d8
The CFG of the machine function needs to know that the targets of the indirect
branch are successors to the indirect branch.
<rdar://problem/12529625>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166448 91177308-0d34-0410-b5e6-96231b3b80d8
Per the October 12, 2012 Proposal for annotated disassembly output sent out by
Jim Grosbach this set of changes implements this for X86 and arm. The llvm-mc
tool now has a -mdis option to produced the marked up disassembly and a couple
of small example test cases have been added.
rdar://11764962
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166445 91177308-0d34-0410-b5e6-96231b3b80d8
Unreachable blocks can have invalid instructions. For example,
jump threading can produce self-referential instructions in
unreachable blocks. Also, we should not be spending time
optimizing unreachable code. Fixes PR14133.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166423 91177308-0d34-0410-b5e6-96231b3b80d8
very small but very important bugfix:
bool shouldExplore(Use *U) {
Value *V = U->get();
if (isa<CallInst>(V) || isa<InvokeInst>(V))
[...]
should have read:
bool shouldExplore(Use *U) {
Value *V = U->getUser();
if (isa<CallInst>(V) || isa<InvokeInst>(V))
Fixes PR14143!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166407 91177308-0d34-0410-b5e6-96231b3b80d8
This is important for vectors of pointers because only DataLayout,
not the underlying vector type, knows how to calculate the size
of the pointers in the vector. Fixes PR14138.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166401 91177308-0d34-0410-b5e6-96231b3b80d8
It passes all tests, produces better results than the old code but uses the
wrong pass, LoopDependenceAnalysis, which is old and unmaintained. "Why is it
still in tree?", you might ask. The answer is obviously: "To confuse developers."
Just swapping in the new dependency pass sends the pass manager into an infinte
loop, I'll try to figure out why tomorrow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166399 91177308-0d34-0410-b5e6-96231b3b80d8
Requires a lot less code and complexity on loop-idiom's side and the more
precise analysis can catch more cases, like the one I included as a test case.
This also fixes the edge-case miscompilation from PR9481. I'm not entirely
sure that all cases are handled that the old checks handled but LDA will
certainly become smarter in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166390 91177308-0d34-0410-b5e6-96231b3b80d8