This provides an initial implementation of getUnrollingPreferences for x86.
getUnrollingPreferences is used by the generic (concatenation) unroller, which
is distinct from the unrolling done by the loop vectorizer. Many modern x86
cores have some kind of uop cache and loop-stream detector (LSD) used to
efficiently dispatch small loops, and taking full advantage of this requires
unrolling small loops (small here means 10s of uops).
These caches also have limits on the number of taken branches in the loop, and
so we also cap the loop unrolling factor based on the maximum "depth" of the
loop. This is currently calculated with a partial DFS traversal (partial
because it will stop early if the path length grows too much). This is still an
approximation, and one that is both conservative (because it does not account
for branches eliminated via block placement) and optimistic (because it is only
recording the maximum depth over minimum paths). Nevertheless, because the
loops that fit in these uop caches are so small, it is not clear how much the
details matter.
The original set of patches posted for review produced the following test-suite
performance results (from the TSVC benchmark) at that time:
ControlLoops-dbl - 13% speedup
ControlLoops-flt - 15% speedup
Reductions-dbl - 7.5% speedup
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205348 91177308-0d34-0410-b5e6-96231b3b80d8
The PowerPC A2 core greatly benefits from aggressive concatenation unrolling;
use the new getUnrollingPreferences to enable this by default when targeting
the PPC A2 core.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190549 91177308-0d34-0410-b5e6-96231b3b80d8
- Instead of setting the suffixes in a bunch of places, just set one master
list in the top-level config. We now only modify the suffix list in a few
suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).
- Aside from removing the need for a bunch of lit.local.cfg files, this enables
4 tests that were inadvertently being skipped (one in
Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
XFAILED).
- This commit also fixes a bunch of config files to use config.root instead of
older copy-pasted code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188513 91177308-0d34-0410-b5e6-96231b3b80d8
This update was done with the following bash script:
find test/Transforms -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP
done
mv $TEMP $NAME
fi
done
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186268 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes rdar:14036816, PR16130.
There is an opportunity to compute precise trip counts for 'or'
expressions and multi-exit loops.
rdar:14038809: Optimize trip count computation for multi-exit loops.
To do this we need to record the fact that ExitLimit assumes NSW. When
it does not we can safely assume that the loop trip count is the
minimum ExitLimt across all subexpressions and loop exits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183060 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Statistics are still available in Release+Asserts (any +Asserts builds),
and stats can also be turned on with LLVM_ENABLE_STATS.
Move some of the FastISel stats that were moved under DEBUG()
back out of DEBUG(), since stats are disabled across the board now.
Many tests depend on grepping "-stats" output. Move those into
a orig_dir/Stats/. so that they can be marked as unsupported
when building without statistics.
Differential Revision: http://llvm-reviews.chandlerc.com/D486
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176733 91177308-0d34-0410-b5e6-96231b3b80d8
Similarly inlining of the function is inhibited, if that would duplicate the call (in particular inlining is still allowed when there is only one callsite and the function has internal linkage).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170704 91177308-0d34-0410-b5e6-96231b3b80d8
When the trip count is -1, getSmallConstantTripMultiple could return zero,
and this would cause runtime loop unrolling to assert. Instead of returning
zero, one is now returned (consistent with the existing overflow cases).
Fixes PR14167.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166612 91177308-0d34-0410-b5e6-96231b3b80d8
Take this opportunity to generalize the indirectbr bailout logic for
loop transformations. CFG transformations will never get indirectbr
right, and there's no point trying.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154386 91177308-0d34-0410-b5e6-96231b3b80d8
Patch by Brendon Cahoon!
This extends the existing LoopUnroll and LoopUnrollPass. Brendon
measured no regressions in the llvm test suite with -unroll-runtime
enabled. This implementation works by using the existing loop
unrolling code to unroll the loop by a power-of-two (default 8). It
generates an if-then-else sequence of code prior to the loop to
execute the extra iterations before entering the unrolled loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146245 91177308-0d34-0410-b5e6-96231b3b80d8
The loop tree's inclusive block lists are painful and expensive to
update. (I have no idea why they're inclusive). The design was
supposed to handle this case but the implementation missed it and my
unit tests weren't thorough enough.
Fixes PR11335: loop unroll update.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144970 91177308-0d34-0410-b5e6-96231b3b80d8
SCEV unrolling can unroll loops with arbitrary induction variables. It
is a prerequisite for -disable-iv-rewrite performance. It is also
easily handles loops of arbitrary structure including multiple exits
and is generally more robust.
This is under a temporary option to avoid affecting default
behavior for the next couple of weeks. It is needed so that I can
checkin unit tests for updateUnloop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137384 91177308-0d34-0410-b5e6-96231b3b80d8
These are not individual bug fixes. I had to rewrite a good chunk of
the unroller to make it sane. I think it was getting lucky on trivial
completely unrolled loops with no early exits. I included some fairly
simple unit tests for partial unrolling. I didn't do much stress
testing, so it may not be perfect, but should be usable now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137190 91177308-0d34-0410-b5e6-96231b3b80d8
unrolling threshold to the optimize-for-size threshold. Basically, for loops containing calls, unrolling
can still be profitable as long as the loop is REALLY small.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113439 91177308-0d34-0410-b5e6-96231b3b80d8
input filename so that opt doesn't print the input filename in the
output so that grep lines in the tests don't unintentionally match
strings in the input filename.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@81537 91177308-0d34-0410-b5e6-96231b3b80d8
partially unroll a loop when fully unrolling would not fit under the threshold.
Patch by Mikael Lepistö.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54160 91177308-0d34-0410-b5e6-96231b3b80d8
in the presence of out-of-loop users of in-loop values and the trip
count is not a known multiple of the unroll count, and to be a bit
simpler overall. This fixes PR2253.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@52645 91177308-0d34-0410-b5e6-96231b3b80d8