Fix a bug triggered in IfConverterTriangle when CvtBB has multiple predecessors
by getting the weights before removing a successor.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200958 91177308-0d34-0410-b5e6-96231b3b80d8
This solves a problem where a def machine operand has no uses but has
not been marked dead. In this case, the initial RP analysis was being
extra precise and determining from LiveIntervals the the register was
actually dead. This caused us to omit the register from the RP
tracker's block live out. That's all good, but the per-instruction
summary still accounted for it as a valid def. This could cause an
assertion in the tracker later when we underflow pressure.
This is from a bug report on an out-of-tree target. It is not
reproducible on well-behaved targets. I'm just making an obvious fix
without unit test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200941 91177308-0d34-0410-b5e6-96231b3b80d8
In a previous commit (r199818) we added a const_cast to an existing
subtarget info instead of creating a new one so that we could reuse
it when creating the TargetAsmParser for parsing inline assembly.
This cast was necessary because we needed to reuse the existing STI
to avoid generating incorrect code when the inline asm contained
mode-switching directives (e.g. .code 16).
The root cause of the failure was that there was an implicit sharing
of the STI between the parser and the MCCodeEmitter. To fix a
different but related issue, we now explicitly pass the STI to the
MCCodeEmitter (see commits r200345-r200351).
The const_cast is no longer necessary and we can now create a fresh
STI for the inline asm parser to use.
Differential Revision: http://llvm-reviews.chandlerc.com/D2709
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200929 91177308-0d34-0410-b5e6-96231b3b80d8
The aim in this patch is to reduce work that VirtRegRewriter needs to do when
telling MachineRegisterInfo which physregs are in use. Up until now
VirtRegRewriter::rewrite has been doing rewriting and populating def info and
then proceeding to set whether a physreg is used based this info for every
physreg that the target provides. This can be expensive when a target has an
unusually high number of supported physregs, and is a noticeable chunk of
compile time for small programs on such targets.
So to reduce compile time, this patch simply adds the use of a SparseSet to the
rewrite function that is used to flag each physreg that is encountered in a
MachineFunction. Afterward, rather than iterating over the set of all physregs
for a given target to set the physregs used in MachineRegisterInfo, the new way
is to iterate over the set of physregs that were actually encountered and set
in the SparseSet. This improves compile time because the existing rewrite
function was iterating over all MachineOperands already, and because the
iterations afterward to setPhysRegUsed is reduced by use of the SparseSet data.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200919 91177308-0d34-0410-b5e6-96231b3b80d8
programs on targets with large register files. The root of the compile time
overhead was in the use of llvm::SmallVector to hold PhysRegEntries, which
resulted in slow-down from calling llvm::SmallVector::assign(N, 0). In contrast
std::vector uses the faster __platform_bzero to zero out primitive buffers when
assign is called, while SmallVector uses an iterator.
The fix for this was simply to replace the SmallVector with a dynamically
allocated buffer and to initialize or reinitialize the buffer based on the
total registers that the target architecture requires. The changes support
cases where a pass manager may be reused for different targets, and note that
the PhysRegEntries is allocated using calloc mainly for good for, and also to
quite tools like Valgrind (see comments for more info on this).
There is an rdar to track the fact that SmallVector doesn't have platform
specific speedup optimizations inside of it for things like this, and I'll
create a bugzilla entry at some point soon as well.
TL;DR: This fix replaces the expensive llvm::SmallVector<unsigned
char>::assign(N, 0) with a call to calloc for N bytes which is much faster
because SmallVector's assign uses iterators.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200917 91177308-0d34-0410-b5e6-96231b3b80d8
large register files. The omission of Queries.clear() is perfectly safe because
LiveIntervalUnion::Query doesn't contain any data that needs freeing and
because LiveRegMatrix::runOnFunction happens to reset the OwningArrayPtr
holding Queries every time it is run, so there's no need to zero out the
queries either. Not having to do this for very large numbers of physregs
is a noticeable constant cost reduction in compilation of small programs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200913 91177308-0d34-0410-b5e6-96231b3b80d8
During DAGCombine visitShiftByConstant assumes that certain binary operations
with only constant operands can always be folded successfully. This is no longer
true when the constant is opaque. This commit fixes visitShiftByConstant by not
performing the optimization for opaque constants. Otherwise we would end up in
an infinite DAGCombine loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200900 91177308-0d34-0410-b5e6-96231b3b80d8
find a register.
The idea is to choose a color for the variable that cannot be allocated and
recolor its interferences around. Unlike the current register allocation scheme,
it is allowed to change the color of an already assigned (but maybe not
splittable or spillable) live interval while propagating this change to its
neighbors.
In other word, there are two things that may help finding an available color:
- Already assigned variables (RS_Done) can be recolored to different color.
- The recoloring allows to catch solutions that needs to touch more that just
the neighbors of the current allocated variable.
E.g.,
vA can use {R1, R2 }
vB can use { R2, R3}
vC can use {R1 }
Where vA, vB, and vC cannot be split anymore (they are reloads for instance) and
they all interfere.
vA is assigned R1
vB is assigned R2
vC tries to evict vA but vA is already done.
=> Regular register allocation heuristic fails.
Last chance recoloring kicks in:
vC does as if vA was evicted => vC uses R1.
vC is marked as fixed.
vA needs to find a color.
None are available.
vA cannot evict vC: vC is a fixed virtual register now.
vA does as if vB was evicted => vA uses R2.
vB needs to find a color.
R3 is available.
Recoloring => vC = R1, vA = R2, vB = R3.
<rdar://problem/15947839>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200883 91177308-0d34-0410-b5e6-96231b3b80d8
A bunch of test cases needed to be cleaned up for this, many my fault -
when implementid imported modules I updated test cases by simply
duplicating the prior metadata field - which wasn't always the empty
metadata entry.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200731 91177308-0d34-0410-b5e6-96231b3b80d8
This changes the PrologueEpilogInserter and LocalStackSlotAllocation passes to
follow the extended stack layout rules for sspstrong and sspreq.
The sspstrong layout rules are:
1. Large arrays and structures containing large arrays (>= ssp-buffer-size)
are closest to the stack protector.
2. Small arrays and structures containing small arrays (< ssp-buffer-size) are
2nd closest to the protector.
3. Variables that have had their address taken are 3rd closest to the
protector.
Differential Revision: http://llvm-reviews.chandlerc.com/D2546
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200601 91177308-0d34-0410-b5e6-96231b3b80d8
Calls with inalloca are lowered by skipping all stores for arguments
passed in memory and the initial stack adjustment to allocate argument
memory.
Now the frontend is responsible for the memory layout, and the backend
doesn't have to do any work. As a result these changes are pretty
minimal.
Reviewers: echristo
Differential Revision: http://llvm-reviews.chandlerc.com/D2637
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200596 91177308-0d34-0410-b5e6-96231b3b80d8
Allocas marked inalloca are never static, but we were trying to put them
into the static alloca map if they were in the entry block. Also add an
assertion in x86 fastisel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200593 91177308-0d34-0410-b5e6-96231b3b80d8
algorithm. Sink the 'A' + Attribute hash into each form so we don't
have to check valid forms before deciding whether or not we're going
to hash which will let the default be to return without doing anything.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200571 91177308-0d34-0410-b5e6-96231b3b80d8
This ensures DWARF consumers don't confuse these references for
definitions. I'd argue it might be nice to improve debuggers so we don't
need this, but it's just one field in an abbreviation anyway - so it
doesn't seem worth the fight.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200569 91177308-0d34-0410-b5e6-96231b3b80d8
when the input is a concat_vectors and the insert replaces one of the
concat halves:
Lower half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors Z, Y)
Upper half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors X, Z)
This can be seen with the following IR:
define <8 x float> @lower_half(<4 x float> %v1, <4 x float> %v2, <4 x
float> %v3) {
%1 = shufflevector <4 x float> %v1, <4 x float> %v2, <8 x i32> <i32
0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%2 = tail call <8 x float> @llvm.x86.avx.vinsertf128.ps.256(<8 x
float> %1, <4 x float> %v3, i8 0)
The vinsertf128 intrinsic is converted into an insert_subvector node
in SelectionDAGBuilder.cpp.
Using AVX, without the patch this generates two vinsertf128 instructions:
vinsertf128 $1, %xmm1, %ymm0, %ymm0
vinsertf128 $0, %xmm2, %ymm0, %ymm0
With the patch this is optimized into:
vinsertf128 $1, %xmm1, %ymm2, %ymm0
Patch by Robert Lougher.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200506 91177308-0d34-0410-b5e6-96231b3b80d8
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.
The previous attempt at r200431 was reverted at r200434 because of
two testing case failures. I modified my patch a little, but forgot
to re-run "make check-all".
Testing case CodeGen/ARM/lsr-unfolded-offset.ll is updated because of
the patch's impact on branch probability which causes changes in
spill placement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200502 91177308-0d34-0410-b5e6-96231b3b80d8
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200431 91177308-0d34-0410-b5e6-96231b3b80d8
This commit only handles IfConvertTriangle. To update edge weights
of a successor, one interface is added to MachineBasicBlock:
/// Set successor weight of a given iterator.
setSuccWeight(succ_iterator I, uint32_t weight)
An existing testing case test/CodeGen/Thumb2/v8_IT_5.ll is updated,
since we now correctly update the edge weights, the cold block
is placed at the end of the function and we jump to the cold block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200428 91177308-0d34-0410-b5e6-96231b3b80d8
are relative to in the compile unit. Currently let's just use 0...
Thanks to Greg Clayton for the catch!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200425 91177308-0d34-0410-b5e6-96231b3b80d8
module since there's no range guarantee that we could make given
output order. This also fixes up the testcases that have multiple
CUs to have the correct range offset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200422 91177308-0d34-0410-b5e6-96231b3b80d8
After all hard work to implement the EHABI and with the test-suite
passing, it's time to turn it on by default and allow users to
disable it as a work-around while we fix the eventual bugs that show
up.
This commit also remove the -arm-enable-ehabi-descriptors, since we
want the tables to be printed every time the EHABI is turned on
for non-Darwin ARM targets.
Although MCJIT EHABI is not working yet (needs linking with the right
libraries), this commit also fixes some relocations on MCJIT regarding
the EH tables/lib calls, and update some tests to avoid using EH tables
when none are needed.
The EH tests in the test-suite that were previously disabled on ARM
now pass with these changes, so a follow-up commit on the test-suite
will re-enable them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200388 91177308-0d34-0410-b5e6-96231b3b80d8
Make sure that we don't introduce illegal build_vector dag nodes
when trying to fold a sign_extend of a build_vector.
This fixes a regression introduced by r200234.
Added test CodeGen/X86/fold-vector-sext-crash.ll
to verify that llc no longer crashes with an assertion failure
due to an illegal build_vector of type MVT::v4i64.
Thanks to Ilia Filippov for spotting this regression and for
providing a reproducible test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200313 91177308-0d34-0410-b5e6-96231b3b80d8
Before this patch we used getIntImmCost from TargetTransformInfo to determine if
a load of a constant should be converted to just a constant, but the threshold
for this was set to an arbitrary value. This value works well for the two
targets (X86 and ARM) that implement this target-hook, but it isn't
target-independent at all.
Now targets have the possibility to decide directly if this optimization should
be performed. The default value is set to false to preserve the current
behavior. The target hook has been moved to TargetLowering, which removed the
last use and need of TargetTransformInfo in SelectionDAG.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200271 91177308-0d34-0410-b5e6-96231b3b80d8
code to see if we're emitting a function into a non-default
text section. This is still a less-than-ideal solution, but more
contained than r199871 to determine whether or not we're emitting
code into an array of comdat sections.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200269 91177308-0d34-0410-b5e6-96231b3b80d8
Also update the comment, since it actually produces a
select (setcc) instead of select_cc.
It was checking and using the setcc result type for the
type of the sext, instead of the type of the compared items.
In my problem case, the sext was to i32 and was used as the setcc type,
but the expected type was i64.
No test since I haven't been able to hit the problem with
this on any in-tree targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200249 91177308-0d34-0410-b5e6-96231b3b80d8