This is slightly more interesting than the previous batch of changes.
Specifically:
1. We refactor getSpillWeight to take a MachineBlockFrequencyInfo (MBFI)
object. This enables us to completely encapsulate the actual manner we
use the MachineBlockFrequencyInfo to get our spill weights. This yields
cleaner code since one does not need to fetch the actual block frequency
before getting the spill weight if all one wants it the spill weight. It
also gives us access to entry frequency which we need for our
computation.
2. Instead of having getSpillWeight take a MachineBasicBlock (as one
might think) to look up the block frequency via the MBFI object, we
instead take in a MachineInstr object. The reason for this is that the
method is supposed to return the spill weight for an instruction
according to the comments around the function.
llvm-svn: 197296
were falling into the cases for 24-bit branch kinds which are not 24-bit
branches. The routine is to return false for fixups are expected to always
be resolvable at assembly time. Which these three fixups are as they have
limited displacement and are for local references within a function.
rdar://15586725
llvm-svn: 197282
This commit does not complete the type units feature - there are issues
around fission support (skeletal type units, pubtypes/pubnames) and
hashing of some types including those containing references to types in
other type units.
Originally committed as r197073 and reverted in r197079.
Recommitted as r197197 to reproduce the failure and reverted as r197199
Turns out there was unstable ordering in the type unit dumping code.
Fixed by using MapVector in DWARFContext to store the debug_types
comdat sections.
Recommitted as r197210 with a fix to dumping and reverted as r197211
because I was a bit gun shy and thought I saw a failure that turned out
to be unrelated.
So here we go - once more with feeling! \o/
llvm-svn: 197275
This reverts commit r197254.
This was an accidental merge of Juergen's patch. It will be checked in
shortly, but wasn't meant to go in quite yet.
Conflicts:
include/llvm/CodeGen/StackMaps.h
lib/CodeGen/StackMaps.cpp
test/CodeGen/X86/stackmap-liveness.ll
llvm-svn: 197260
The cpp backend is not a reasonable fallback for a missing target. It is a
very special backend, so it is reasonable to use it only if explicitly
requested.
While at it, simplify the interface a bit.
llvm-svn: 197241
This originally came about after noticing that InstCombine turns
some of the TMHH (icmp (and...), ...) tests into plain comparisons.
Since there is no instruction to compare with a 64-bit immediate,
TMHH is generally better than an ordered comparison for the cases
that it can handle.
llvm-svn: 197238
This patch makes more use of LPGFR and LNGFR. It builds on top of
the LTGFR selection from r197234. Most of the tests are motivated
by what InstCombine would produce.
llvm-svn: 197236
...in an attempt to rein back the increasingly complex selection code.
A knock-on effect is that ICmpType is exposed from the outset, which
slightly simplifies adjustSubwordCmp.
The code is no piece of art even after this change, but at least it should
be slightly better. No behavioral change intended.
llvm-svn: 197235
InstCombine turns (sext (trunc)) into (ashr (shl)), then converts any
comparison of the ashr against zero into a comparison of the shl against zero.
This makes sense in itself, but we want to undo it for z, since the sign-
extension instruction has a CC-setting form.
I've included tests for both the original and InstCombined variants,
but the former already worked. The patch fixes the latter.
llvm-svn: 197234
While it's safe for the X86-specific shift nodes, dag combining will
kill generic nodes. Insert an AND to make it safe, isel will nuke it
as x86's shift instructions have an implicit AND.
Fixes PR16108, which contains a contraption to hit this case in between
constant folders.
llvm-svn: 197228
branch instructions for mips and micromips instruction sets thus avoiding
the situation of generating branches to undesired locations if offsets
cannot be encoded.
This patch also checks if a fixup cannot be applied and returns a fatal error
if that's the case.
llvm-svn: 197223
through an invoke instruction.
The original patch for this was written by Mark Seaborn, but I've
reworked his test case into the existing returns_twice test case and
implemented the fix by the prior refactoring to actually run the cost
analysis over invoke instructions, and then here fixing our detection of
the returns_twice attribute to work for both calls and invokes. We never
noticed because we never saw an invoke. =[
llvm-svn: 197216
handles terminator instructions.
The inline cost analysis inheritted some pretty rough handling of
terminator insts from the original cost analysis, and then made it much,
much worse by factoring all of the important analyses into a separate
instruction visitor. That instruction visitor never visited the
terminator.
This works fine for things like conditional branches, but for many other
things we simply computed The Wrong Value. First example are
unconditional branches, which should be free but were counted as full
cost. This is most significant for conditional branches where the
condition simplifies and folds during inlining. We paid a 1 instruction
tax on every branch in a straight line specialized path. =[
Oh, we also claimed that the unreachable instruction had cost.
But it gets worse. Let's consider invoke. We never applied the call
penalty. We never accounted for the cost of the arguments. Nope. Worse
still, we didn't handle the *correctness* constraints of not inlining
recursive invokes, or exception throwing returns_twice functions. Oops.
See PR18206. Sadly, PR18206 requires yet another fix, but this
refactoring is at least a huge step in that direction.
llvm-svn: 197215
This commit does not complete the type units feature - there are issues
around fission support (skeletal type units, pubtypes/pubnames) and
hashing of some types including those containing references to types in
other type units.
Originally committed as r197073 and reverted in r197079.
Recommitted as r197197 to reproduce the failure and reverted as r197199
Turns out there was unstable ordering in the type unit dumping code.
Fixed by using MapVector in DWARFContext to store the debug_types
comdat sections.
llvm-svn: 197210
Since gcc 4.6 the compiler uses ___chkstk_ms which has the same semantics as the
MS CRT function __chkstk. This simplifies the prologue generation a bit.
Reviewed by Rafael Espíndola.
llvm-svn: 197205
This option tells llvm-cov to print out branch probabilities when
a basic block contains multiple branches. It also prints out some
function summary info including the number of times the function enters,
the percent of time it returns, and how many blocks were executed.
Also updated tests.
llvm-svn: 197198
This commit does not complete the type units feature - there are issues
around fission support (skeletal type units, pubtypes/pubnames) and
hashing of some types including those containing references to types in
other type units.
Originally committed as r197073 and reverted in r197079.
This commit originally got jumbled up with another build-breaking commit
and I can't find the failures I thought this caused anymore.
Recommitting to hopefully get some clean buildbot results to work from.
I have a sneaking suspicion there's unstable output in the comdat group
output of MCStreamer...
llvm-svn: 197197
GlobalOpt's CleanupConstantGlobalUsers function uses a worklist array to manage
constant users to be visited. The pointers in this array need to be weak
handles because when we delete a constant array, we may also be holding a
pointer to one of its elements (or an element of one of its elements if we're
dealing with an array of arrays) in the worklist.
Fixes PR17347.
llvm-svn: 197178
The barrier pass is a temporary hack, and should go away soon. Nevertheless, if
we don't initialize it, then opt will not understand -barrier, and this will
break bugpoint (because when it dumps the passes from the default pass manager
-barrier will be there).
llvm-svn: 197177
- Copy patterns with float/double types are enough.
- Fix typos in test case names that were using v1fx.
- There is no ACLE intrinsic that uses v1f32 type. And there is no conflict of
neon and non-neon ovelapped operations with this type, so there is no need to
support operations with this type.
- Remove v1f32 from FPR32 register and disallow v1f32 as a legal type for
operations.
Patch by Ana Pazos!
llvm-svn: 197159
a vector packed single/double fp operation followed by a vector insert.
The effect is that the backend coverts the packed fp instruction
followed by a vectro insert into a SSE or AVX scalar fp instruction.
For example, given the following code:
__m128 foo(__m128 A, __m128 B) {
__m128 C = A + B;
return (__m128) {c[0], a[1], a[2], a[3]};
}
previously we generated:
addps %xmm0, %xmm1
movss %xmm1, %xmm0
we now generate:
addss %xmm1, %xmm0
llvm-svn: 197145
The old AddFixedStringToRegEx() it was based on got away with this for the
longest time, but the problem became easy to spot after the cleanup in r197096.
Also add a quick unit test to cover regex escaping.
llvm-svn: 197121
Aside from a few minor latency corrections, the major change here is a new
hazard recognizer which focuses on better dispatch-group formation on the
POWER7. As with the PPC970's hazard recognizer, the most important thing it
does is avoid load-after-store hazards within the same dispatch group. It uses
the POWER7's special dispatch-group-terminating nop instruction (instead of
inserting multiple regular nop instructions). This new hazard recognizer makes
use of the scheduling dependency graph itself, built using AA information, to
robustly detect the possibility of load-after-store hazards.
significant test-suite performance changes (the error bars are 99.5% confidence
intervals based on 5 test-suite runs both with and without the change --
speedups are negative):
speedups:
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
-0.55171% +/- 0.333168%
MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl
-17.5576% +/- 14.598%
MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
-29.5708% +/- 7.09058%
MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt
-34.9471% +/- 11.4391%
SingleSource/Benchmarks/BenchmarkGame/puzzle
-25.1347% +/- 11.0104%
SingleSource/Benchmarks/Misc/flops-8
-17.7297% +/- 9.79061%
SingleSource/Benchmarks/Shootout-C++/ary3
-35.5018% +/- 23.9458%
SingleSource/Regression/C/uint64_to_float
-56.3165% +/- 25.4234%
SingleSource/UnitTests/Vectorizer/gcc-loops
-18.5309% +/- 6.8496%
regressions:
MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000
18.351% +/- 12.156%
SingleSource/Benchmarks/Shootout-C++/methcall
27.3086% +/- 14.4733%
llvm-svn: 197099
The assertion was checking that the virtual register VReg used to represent the
physical register PReg uses the same register class as the one passed to
MachineFunction::addLiveIn.
This is over-constraining because it is sufficient to check that the register
class of VReg (VRegRC) is a subclass of the register class of PReg (PRegRC) and
that VRegRC contains PReg.
Indeed, if VReg gets constrained because of some operation constraints
between two calls of MachineFunction::addLiveIn, the original assertion
cannot match.
This fixes <rdar://problem/15633429>.
llvm-svn: 197097
For one predicate to subsume another, they must both check the same condition
register. Failure to check this prerequisite was causing miscompiles.
Fixes PR18003.
llvm-svn: 197089
This adds two additional functions to the hazard recognizer interface. These
are optional (in the sense that the default implementations preserve the
current behavior), and used by the post-RA scheduler. Upcoming commits will use
this functionality in order to improve dispatch-group formation on the POWER7
and related cores. Dispatch groups are an odd construct: sometimes we need to
insert nops to force a new one to start (for performance reasons), and some
instructions need to appear in certain positions within a group, but the groups
are not fundamentally cycle based (they can contain instructions with data
dependencies with non-trivial latencies).
Motivation:
unsigned PreEmitNoops(SUnit *) - Used to force the post-RA scheduler to insert
nops to force a new dispatch group to begin. We already have a NoopHazard, and
this is also still needed. However, NoopHazard only causes a nop to be inserted
if there are no other available instructions, and so is not always sufficient.
The number of nops to insert depends on state that only the hazard recognizer
has, so a general callback is necessary.
bool ShouldPreferAnother(SUnit *) - Used to avoid scheduling instructions that
would start a new dispatch group when others are available that could be part
of the current dispatch group. In this case, we don't want to issue nops,
because the non-preferred instruction will implicitly start a new dispatch
group regardless.
Although the motivation for these functions is driven by the PowerPC backend,
they are completely general.
llvm-svn: 197084
The linkers on these systems don't have anything special to do with these
symbols. Since the intent is for them to be absent from the final object,
just treat them as private.
llvm-svn: 197080
This reverts commit r197073.
The test seems to be failing on some buildbots for unknown reasons.
Reverting until I can figure that out. If anyone's got a reproduction
(.s and .o together would be great) - I'd really appreciate it.
llvm-svn: 197079
This commit does not complete the type units feature - there are issues
around fission support (skeletal type units, pubtypes/pubnames) and
hashing of some types including those containing references to types in
other type units.
llvm-svn: 197073
point reciprocal exponent, and floating-point reciprocal square root estimate
LLVM AArch64 intrinsics to use f32/f64 types, rather than their vector
equivalents.
llvm-svn: 197066
The tests were no longer using fast-isel at all (MachO needs an "ios" rather
than "darwin" triple at the moment and Linux needs ARM mode). Once that was
corrected, the verifier complained about a t2ADDri created for the alloca.
llvm-svn: 197046
I moved a test from avx512-vbroadcast-crash.ll to avx512-vbroadcast.ll
I defined HasAVX512 predicate as AssemblerPredicate. It means that you should invoke llvm-mc with "-mcpu=knl" to get encoding for AVX-512 instructions. I need this to let AsmMatcher to set different encoding for AVX and AVX-512 instructions that have the same mnemonic and operands (all scalar instructions).
llvm-svn: 197041
DAGCombiner could fold (truncate (load)) -> smaller load if the original
load was the width of the truncation result or wider. This patch extends
it to handle cases where the original load was narrower (and so the
extension type stays the same).
llvm-svn: 197030
This hook reverses the order of assignment for local live ranges. This
will generally allocate shorter local live ranges first. For targets with
many registers, this could reduce regalloc compile time by a large
factor. It should still achieve optimal coloring; however, it can change
register eviction decisions. It is disabled by default for two reasons:
(1) Top-down allocation is simpler and easier to debug for targets that
don't benefit from reversing the order.
(2) Bottom-up allocation could result in poor evicition decisions on some
targets affecting the performance of compiled code.
llvm-svn: 197001
The combination of inline asm, stack realignment, and dynamic allocas
turns out to be too common to reject out of hand.
ASan inserts empy inline asm fragments and uses aligned allocas.
Compiling any trivial function containing a dynamic alloca with ASan is
enough to trigger the check.
XFAIL the test cases that would be miscompiled and add one that uses the
relevant functionality.
llvm-svn: 196986
It was failing because ASan was adding all of the following to one
function:
- dynamic alloca
- stack realignment
- inline asm
This patch avoids making the static alloca dynamic when coverage is
used.
ASan should probably not be inserting empty inline asm blobs to inhibit
duplicate tail elimination.
llvm-svn: 196973
This re-lands commit r196876, which was reverted in r196879.
The tests have been fixed to pass on platforms with a stack alignment
larger than 4.
Update to clang side tests will land shortly.
llvm-svn: 196939
Most users would be surprised if "isCOFF" and "isMachO" were simultaneously
true, unless they'd put the compiler in a box with a gun attached to a photon
detector.
This makes sure precisely one of the three formats is true for any triple and
simplifies some target logic based on that.
llvm-svn: 196934