This patch disables the handling of selects in optimization
extensing scope of optimizeMemoryInst.
The optimization itself is disable by default.
The idea here is just to switch optimiztion level step by step.
Specifically, first optimization will be enabled only for Phi nodes,
then select instructions will be added.
In case someone will complain about perfromance it will be easier to
detect what part of optimizations is responsible for that.
Differential Revision: https://reviews.llvm.org/D36073
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317555 91177308-0d34-0410-b5e6-96231b3b80d8
undefined reference to `llvm::TargetPassConfig::ID' on
clang-ppc64le-linux-multistage
This reverts commit eea333c33fa73ad225ef28607795984829f65688.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317213 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is mostly a noop (most of the test diffs are renamed blocks).
There are a few temporary register renames (eax<->ecx) and a few blocks are
shuffled around.
See the discussion in PR33325 for more details.
Reviewers: spatel
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D39456
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317211 91177308-0d34-0410-b5e6-96231b3b80d8
Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725
If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area. There's a bunch of obviously wrong code in the same function. I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316967 91177308-0d34-0410-b5e6-96231b3b80d8
- Targets that want to support memcmp expansions now return the list of
supported load sizes.
- Expansion codegen does not assume that all power-of-two load sizes
smaller than the max load size are valid. For examples, this is not the
case for x86(32bit)+sse2.
Fixes PR34887.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316905 91177308-0d34-0410-b5e6-96231b3b80d8
This lets us optimize away selects that perform the same address computation in
two different ways and is also the first step towards being able to handle
selects between two different, but compatible, address computations.
Differential Revision: https://reviews.llvm.org/D38242
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314794 91177308-0d34-0410-b5e6-96231b3b80d8
This only affects very small memcmp on x86 for now, but it
will become more important if we allow vector-sized load and
compares.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309711 91177308-0d34-0410-b5e6-96231b3b80d8
As noted in the code comment, transforming this in the other direction might require
a separate transform here in CGP given the block-at-a-time DAG constraint.
Besides that theoretical motivation, there are 2 practical motivations for the
subtract-of-cmps form:
1. The codegen for both x86 and PPC is better for this IR (though PPC could be better still).
There is discussion about canonicalizing IR to the select form
( http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html ),
so we probably need to add DAG transforms for those patterns anyway, but this improves the
memcmp output without waiting for that step.
2. If we allow vector-sized chunks for the load and compare, x86 is better prepared to convert
that to optimal code when using subtract-of-cmps, so another prerequisite patch is avoided
if we choose to enable that.
Differential Revision: https://reviews.llvm.org/D34904
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309597 91177308-0d34-0410-b5e6-96231b3b80d8
D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914).
Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os).
This patch should be considered for the 5.0.0 release branch as well
Differential Revision: https://reviews.llvm.org/D35830
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308986 91177308-0d34-0410-b5e6-96231b3b80d8
Allowing cycles in Phi traversal increases the scope of optimize memory instruction
in case we are in loop.
The added test shows an example of enabling optimization inside a loop.
Reviewers: loladiro, spatel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35294
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308419 91177308-0d34-0410-b5e6-96231b3b80d8
It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads.
Notes:
Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz.
There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare.
We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329:
https://bugs.llvm.org/show_bug.cgi?id=33329#c12
TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion.
Committed on behalf of Sanjay Patel
Differential Revision: https://reviews.llvm.org/D35067
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308322 91177308-0d34-0410-b5e6-96231b3b80d8
This was auto-generated using an older version of the script,
and that version does not work with phis, so if we enable
expansion it will go bad.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307267 91177308-0d34-0410-b5e6-96231b3b80d8
As noted in D34071, there are some IR optimization opportunities that could be
handled by normal IR passes if this expansion wasn't happening so late in CGP.
Regardless of that, it seems wasteful to knowingly produce suboptimal IR here,
so I'm proposing this change:
%s = sub i32 %x, %y
%r = icmp ne %s, 0
=>
%r = icmp ne %x, %y
Changing the predicate to 'eq' mimics what InstCombine would do, so that's just
an efficiency improvement if we decide this expansion should happen sooner.
The fact that the PowerPC backend doesn't eliminate the 'subf.' might be
something for PPC folks to investigate separately.
Differential Revision: https://reviews.llvm.org/D34416
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306471 91177308-0d34-0410-b5e6-96231b3b80d8
I don't think there's any visible difference from having the wrong layout
for the 32-bit case at this point, but that could change in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305931 91177308-0d34-0410-b5e6-96231b3b80d8
There are a couple of potential improvements as seen in the IR and asm:
1. We're unnecessarily extending to a larger type to compare values.
2. The codegen for (select cond, 1, -1) could avoid a cmov.
(or we could change the order of the compares, so we have a select with 0 operand)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305802 91177308-0d34-0410-b5e6-96231b3b80d8
No IR tests were added with rL304313 ( https://reviews.llvm.org/D28637 ),
so I want these for extra coverage if we enable memcmp expansion for x86.
As shown, nothing is expanded for x86 in CGP yet.
Also fundamentally, we're doing an IR transform, so we should have IR tests
for just that part. If something goes wrong, we need to know if the bug is
in CGP or later lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305011 91177308-0d34-0410-b5e6-96231b3b80d8
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.
This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.
Differential Revision: https://reviews.llvm.org/D29916
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296416 91177308-0d34-0410-b5e6-96231b3b80d8
When we construct addressing modes, we use isNoopAddrSpaceCast to ignore
addrspacecast instructions. Make sure we insert the correct addrspacecast
when we reconstruct the addressing mode.
Differential Revision: https://reviews.llvm.org/D30114
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296167 91177308-0d34-0410-b5e6-96231b3b80d8
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.
This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.
Differential Revision: https://reviews.llvm.org/D29916
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296149 91177308-0d34-0410-b5e6-96231b3b80d8
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly :
Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.
Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl
Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22696
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289988 91177308-0d34-0410-b5e6-96231b3b80d8
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block:
Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.
Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl
Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22696
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289951 91177308-0d34-0410-b5e6-96231b3b80d8
It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line
670 ("Expected irreducible CFG").
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288052 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Merging an empty case block into the header block of switch could cause
ISel to add COPY instructions in the header of switch, instead of the case
block, if the case block is used as an incoming block of a PHI. This could
potentially increase dynamic instructions, especially when the switch is in a
loop. I added a test case which was reduced from the benchmark I was targetting.
Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl
Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22696
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287553 91177308-0d34-0410-b5e6-96231b3b80d8
The sink cast machinery is supposed to sink casts as close to their user
as possible. However, an EH pad is the first instruction in it's basic
block. Don't sink if the user is an EH pad.
This fixes PR27536.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267767 91177308-0d34-0410-b5e6-96231b3b80d8
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.
Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.
Motivation
----------
Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.
We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.
Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.
http://reviews.llvm.org/D19034
<rdar://problem/25256815>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266446 91177308-0d34-0410-b5e6-96231b3b80d8
Sinking comparisons in CGP can undo the job of hoisting them done
earlier by LICM, and soft-FP makes this an expensive mistake.
A common pattern that produces floating point comparisons uniform
over a loop is an explicit check for division by zero. If the divisor
is hoisted out of the loop, the comparison can also be, but hoisting
the function that unwinds is never legal, since it may cause side
effects in the loop body prior to the unwinding to not be executed.
Differential Revision: http://reviews.llvm.org/D18744
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265264 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches CGP to duplicate addressing mode computations into cold paths (detected via explicit cold attribute on calls) if required to let addressing mode be safely sunk into the basic block containing each load and store.
In general, duplicating code into cold blocks may result in code growth, but should not effect performance. In this case, it's better to duplicate some code than to put extra pressure on the register allocator by making it keep the address through the entirely of the fast path.
This patch only handles addressing computations, but in principal, we could implement a more general cold cold scheduling heuristic which tries to reduce register pressure in the fast path by duplicating code into the cold path. Getting the profitability of the general case right seemed likely to be challenging, so I stuck to the existing case (addressing computation) we already had.
Differential Revision: http://reviews.llvm.org/D17652
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263074 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Both the hardware and LLVM have changed since 2012.
Now, load-based heuristic don't show big differences any more on OoO cores.
There is no notable regressons and improvements on spec2000/2006. (Cortex-A57, Core i5).
Reviewers: spatel, zansari
Differential Revision: http://reviews.llvm.org/D16836
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261809 91177308-0d34-0410-b5e6-96231b3b80d8
I originally reapplied this in 257550, but had to revert again due to bot
breakage. The only change in this version is to allow either the TypeSize
or the TypeAllocSize of the variable to be the one represented in debug info
(hopefully in the future we can figure out how to encode the difference).
Additionally, several bot failures following r257550, were due to
optimizer bugs now fixed in r257787 and r257795.
r257550 commit message was:
```
The follow extra changes were made to test cases:
Manually making the variable be the actual type instead of a pointer
to avoid pointer-size differences in generic code:
LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll
LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll
LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll
LLVM :: DebugInfo/Generic/varargs.ll
Delete sizing information from debug info for the same reason
(but the presence of the pointer was important to the test case):
LLVM :: DebugInfo/Generic/restrict.ll
LLVM :: DebugInfo/Generic/tu-composite.ll
LLVM :: Linker/type-unique-type-array-a.ll
LLVM :: Linker/type-unique-simple2.ll
Fixing an incorrect DW_OP_deref
LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll
Fixing a missing DW_OP_deref
LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll
Additionally, clang should no longer complain during bootstrap should no
longer happen after r257534.
The original commit message was:
``
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.
One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.
Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
variable as that would depend on the target, even though this test is
supposed to be generic
- I had to manually declared size/align for reference type. See also the
discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref
``
```
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257850 91177308-0d34-0410-b5e6-96231b3b80d8
The follow extra changes were made to test cases:
Manually making the variable be the actual type instead of a pointer
to avoid pointer-size differences in generic code:
LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll
LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll
LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll
LLVM :: DebugInfo/Generic/varargs.ll
Delete sizing information from debug info for the same reason
(but the presence of the pointer was important to the test case):
LLVM :: DebugInfo/Generic/restrict.ll
LLVM :: DebugInfo/Generic/tu-composite.ll
LLVM :: Linker/type-unique-type-array-a.ll
LLVM :: Linker/type-unique-simple2.ll
Fixing an incorrect DW_OP_deref
LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll
Fixing a missing DW_OP_deref
LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll
Additionally, clang should no longer complain during bootstrap should no
longer happen after r257534.
The original commit message was:
```
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.
One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.
Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
variable as that would depend on the target, even though this test is
supposed to be generic
- I had to manually declared size/align for reference type. See also the
discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref
```
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257550 91177308-0d34-0410-b5e6-96231b3b80d8
Looks like there's a case where clang generates debug info that triggers
the new verifier check. Reverting while investigating.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257107 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.
One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.
Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
variable as that would depend on the target, even though this test is
supposed to be generic
- I had to manually declared size/align for reference type. See also the
discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref
Reviewers: aprantl
Differential Revision: http://reviews.llvm.org/D14276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257105 91177308-0d34-0410-b5e6-96231b3b80d8
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
These intrinsics currently have an explicit alignment argument which is
required to be a constant integer. It represents the alignment of the
source and dest, and so must be the minimum of those.
This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments. The alignment
argument itself is removed.
There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe. For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)
For out of tree owners, I was able to strip alignment from calls using sed by replacing:
(call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
$1i1 false)
and similarly for memmove and memcpy.
I then added back in alignment to test cases which needed it.
A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.
In IRBuilder itself, a new argument was added. Instead of calling:
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)
There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool. This is to prevent isVolatile here from passing its default
parameter to the source alignment.
Note, changes in future can now be made to codegen. I didn't change anything here, but this
change should enable better memcpy code sequences.
Reviewed by Hal Finkel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253511 91177308-0d34-0410-b5e6-96231b3b80d8
This is a redo of r251849 except the tests have been split into arch-specific folders
to hopefully make the bots happy.
This is a follow-up from the discussion in D12965. The block-at-a-time limitation of
SelectionDAG also came up in D13297.
Without the InstCombine change from D12965, I don't expect this patch to make any
difference in the real world because InstCombine does not shrink cases like this in
visitSwitchInst(). But we need to have this CGP safety harness in place before
proceeding with any shrinkage in D12965, so we won't generate extra extends for compares.
I've opted for IR regression tests in the patch because that seems like a clearer way to
test the transform, but PowerPC CodeGen for an i16 widening test is shown below. x86
will need more work to solve: https://llvm.org/bugs/show_bug.cgi?id=22473
Before:
BB#0:
mr 4, 3
extsh. 3, 4
ble 0, .LBB0_5
BB#1:
cmpwi 3, 99
bgt 0, .LBB0_9
BB#2:
rlwinm 4, 4, 0, 16, 31 <--- 32-bit mask/extend
li 3, 0
cmplwi 4, 1
beqlr 0
BB#3:
cmplwi 4, 10
bne 0, .LBB0_12
BB#4:
li 3, 1
blr
.LBB0_5:
rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend
cmplwi 3, 65436
beq 0, .LBB0_13
BB#6:
cmplwi 3, 65526
beq 0, .LBB0_15
BB#7:
cmplwi 3, 65535
bne 0, .LBB0_12
BB#8:
li 3, 4
blr
.LBB0_9:
rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend
cmplwi 3, 100
beq 0, .LBB0_14
...
After:
BB#0:
rlwinm 4, 3, 0, 16, 31 <--- mask/extend to 32-bit and then use that for comparisons
cmpwi 4, 999
ble 0, .LBB0_5
BB#1:
lis 3, 0
ori 3, 3, 65525
cmpw 4, 3
bgt 0, .LBB0_9
BB#2:
cmplwi 4, 1000
beq 0, .LBB0_14
BB#3:
cmplwi 4, 65436
bne 0, .LBB0_13
BB#4:
li 3, 6
blr
.LBB0_5:
li 3, 0
cmplwi 4, 1
beqlr 0
BB#6:
cmplwi 4, 10
beq 0, .LBB0_12
BB#7:
cmplwi 4, 100
bne 0, .LBB0_13
BB#8:
li 3, 2
blr
.LBB0_9:
cmplwi 4, 65526
beq 0, .LBB0_15
BB#10:
cmplwi 4, 65535
bne 0, .LBB0_13
...
Differential Revision: http://reviews.llvm.org/D13532
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251857 91177308-0d34-0410-b5e6-96231b3b80d8
This was originally checked in at r250527, but reverted at r250570 because of PR25222.
There were at least 2 problems:
1. The cost check was checking for an instruction with an exact cost of TCC_Expensive;
that should have been >=.
2. The cause of the clang stage 1 failures was illegally sinking 'call' instructions;
we can't sink instructions that may have side effects / are not safe to execute speculatively.
Fixed those conditions in sinkSelectOperand() and added test cases.
Original commit message:
This is a follow-up to the discussion in D12882.
Ideally, we would like SimplifyCFG to be able to form select instructions even when the operands
are expensive (as defined by the TTI cost model) because that may expose further optimizations.
However, we would then like a later pass like CodeGenPrepare to undo that transformation if the
target would likely benefit from not speculatively executing an expensive op (this patch).
Once we have this safety mechanism in place, we can adjust SimplifyCFG to restore its
select-formation behavior that changed with r248439.
Differential Revision: http://reviews.llvm.org/D13297
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250743 91177308-0d34-0410-b5e6-96231b3b80d8